Encoding Transitions
Learning Architectural Thresholds with Spatiotemporal Transformers
Learning Architectural Thresholds with Spatiotemporal Transformers


2025
Deep Learning, Isovist Analysis, Spatial Perception
Python, Grasshopper
Published at the 31st International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), Apr 2026, Hsinchu, Taiwan. [Preprint link]
A boundary is not that at which something stops, but, as the Greeks recognized, that from which something begins its presencing.
– Martin Heidegger
Architectural thresholds are often treated as fixed components in digital workflows, despite being perceptual events that unfold through movement. This paper addresses the problem of how to computationally model these dynamics. The aim is to develop a method that encodes and classifies threshold moments directly from sequential visibility data, asking whether spatiotemporal learning can capture the structural logic of perceptual change. The study contributes a framework that integrates 3D isovist sampling, transition detection, and masked spatiotemporal transformers to treat thresholds as learnable patterns. Using depth-encoded panoramas derived from spherical isovists, a TimeSformer encoder is pretrained on synthetic typologies and applied to diverse architectural case studies. Results show coherent latent organisation and successful transfer to real environments, revealing how thresholds vary across interior and garden contexts. The study concludes that thresholds possess consistent geometric signatures that can be learned computationally, while noting limitations related to path dependence, synthetic bias, and cultural variability.






Learning Architectural Thresholds with Spatiotemporal Transformers


2025
Deep Learning, Isovist Analysis, Spatial Perception
Python, Grasshopper
Published at the 31st International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), Apr 2026, Hsinchu, Taiwan. [Preprint link]
A boundary is not that at which something stops, but, as the Greeks recognized, that from which something begins its presencing.
– Martin Heidegger
Architectural thresholds are often treated as fixed components in digital workflows, despite being perceptual events that unfold through movement. This paper addresses the problem of how to computationally model these dynamics. The aim is to develop a method that encodes and classifies threshold moments directly from sequential visibility data, asking whether spatiotemporal learning can capture the structural logic of perceptual change. The study contributes a framework that integrates 3D isovist sampling, transition detection, and masked spatiotemporal transformers to treat thresholds as learnable patterns. Using depth-encoded panoramas derived from spherical isovists, a TimeSformer encoder is pretrained on synthetic typologies and applied to diverse architectural case studies. Results show coherent latent organisation and successful transfer to real environments, revealing how thresholds vary across interior and garden contexts. The study concludes that thresholds possess consistent geometric signatures that can be learned computationally, while noting limitations related to path dependence, synthetic bias, and cultural variability.






Learning Architectural Thresholds with Spatiotemporal Transformers


2025
Deep Learning, Isovist Analysis, Spatial Perception
Python, Grasshopper
Published at the 31st International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), Apr 2026, Hsinchu, Taiwan. [Preprint link]
A boundary is not that at which something stops, but, as the Greeks recognized, that from which something begins its presencing.
– Martin Heidegger
Architectural thresholds are often treated as fixed components in digital workflows, despite being perceptual events that unfold through movement. This paper addresses the problem of how to computationally model these dynamics. The aim is to develop a method that encodes and classifies threshold moments directly from sequential visibility data, asking whether spatiotemporal learning can capture the structural logic of perceptual change. The study contributes a framework that integrates 3D isovist sampling, transition detection, and masked spatiotemporal transformers to treat thresholds as learnable patterns. Using depth-encoded panoramas derived from spherical isovists, a TimeSformer encoder is pretrained on synthetic typologies and applied to diverse architectural case studies. Results show coherent latent organisation and successful transfer to real environments, revealing how thresholds vary across interior and garden contexts. The study concludes that thresholds possess consistent geometric signatures that can be learned computationally, while noting limitations related to path dependence, synthetic bias, and cultural variability.





