
City Foundation Models to Solve Urban Challenges
Synopsis
CityFM is a foundation model designed for geospatial data, leveraging OpenStreetMap to create multimodal representations of urban entities. By enhancing geospatial applications with AI-powered insights, CityFM addresses the challenges of urban planning and improves decision-making for smart city initiatives.
Opportunity
The geospatial industry is undergoing a data revolution, with platforms like OpenStreetMap generating vast quantities of diverse information. However, the heterogeneous nature of this data, encompassing nodes, ways and relations, poses significant challenges for traditional machine learning approaches. Sparse annotations, inconsistent tagging and the inability to fully leverage untagged spatial entities hinder progress in addressing urban challenges.
CityFM introduces an innovative framework to overcome these limitations. By employing self-supervised learning and incorporating multiple data modalities, it generates meaningful representations for spatial entities, unlocking valuable insights from both tagged and untagged data. This capability positions CityFM as a transformative solution for urban mobility challenges, urban planning and other geospatial applications. The growing demand for AI-driven geospatial tools in transportation, real estate and retail further underscores its market potential.
Technology
CityFM employs a novel framework for pre-training foundation models on geospatial data, using three key contrastive learning objectives:
- Text-based contrastive objective: Generates embeddings from textual annotations to identify semantic relationships among spatial entities.
- Vision-language contrastive objective: Combines shape, size and textual data to infer functionalities of entities like buildings.
- Road-based contrastive objective: Leverages relational data, particularly public transportation routes, to refine road segment representations and analyse critical infrastructure.
These objectives enable CityFM to deliver actionable insights for downstream tasks, including traffic speed prediction, building functionality classification and region-level analysis. Its flexible, scalable framework allows application to diverse urban contexts, addressing the challenges of urban planning and mobility with precision.
Figure 1: The framework of CityFM. A deep learning algorithm that fuses different data types (e.g., points, polygons, polylines) and different modalities (text, images, positions), to generate comprehensive representations.

Figure 2: Examples of CityFM multimodal capabilities: it can successfully identify the functionality of a building from it shape/size; locate arterial roads in a city and is able to identify places with similar functionality, given their tags/descriptions.
Applications & Advantages
- Urban planning: Optimise land use and infrastructure development
- Traffic management: Improve traffic flow and public transit efficiency
- Real estate analytics: Enhance property valuation and market analysis
- Retail site selection: Identify strategic locations for new businesses ventures