Methodology
The Toronto Parks Vitality Atlas is an experimental, data-informed reading of Toronto parks through Jane Jacobs-inspired ideas of urban vitality. The score is not a definitive judgment. It’s a transparent way of noticing patterns.
The lens
Jacobs argued that great urban places emerge from short blocks, mixed primary uses, dense and permeable edges, and the ordinary surveillance of “eyes on the street.” Parks live or die by the same logic: a park surrounded by cafés, schools, homes and small shops behaves differently from one bordered by parking lots, expressways, or blank institutional walls.
We translate that intuition into six measurable proxies. Each proxy is a 0 to 100 sub-score with a plain-English explanation. The overall Vitality Score is a weighted average:
- Edge Activation: 25%
- Connectivity: 20%
- Amenity Diversity: 20%
- Natural Comfort: 15%
- Enclosure / Eyes on Park: 10%
- Border Vacuum Risk: 10% (inverted: high risk reduces vitality)
Weights are configurable via SCORE_WEIGHT_* env vars.
The metrics, in plain English
Amenity Diversity
Connectivity
Edge Activation
Border Vacuum Risk
Natural Comfort
Enclosure / Eyes on Park
Data sources
- City of Toronto Open Data — Parks (Green Space)Polygon boundaries, official names, types.
- Parks & Recreation FacilitiesInventory of in-park amenities (washrooms, fields, rinks…).
- Toronto Pedestrian NetworkSidewalk segments around and through parks; estimated park entrances.
- Toronto Centreline V2Street segments + intersection nodes near park edges; trails and walkways.
- Toronto 3D MassingBuilding footprints + heights for edge-building counts, frontage density, and tower-in-the-park risk.
- Toronto Treed AreaTree canopy share inside park polygons via stratified-grid sampling.
- Toronto Waterbodies & RiversWater surface inside parks + nearest-water distance for cooling.
- Ravine & Natural Feature ProtectionRavine overlap as a cooling / natural-comfort signal.
- Toronto Street Tree InventoryTree count + density inside park polygons.
- Neighbourhood Profiles(Pending) Equity context proxy.
- OpenStreetMap (Overpass API)Cafés, restaurants, retail, transit stops, parking, highways, rail.
Data Confidence
Different metrics rest on different evidence. The headline score blends them all, but the per-metric confidence and the table below let you see where the model is on solid ground and where it’s holding a placeholder until more data lands.
| Metric | Status | Basis |
|---|---|---|
| Amenity Diversity | direct | City Parks & Recreation Facilities. Distinct amenity types per park, spatially joined to the Green Spaces polygon. |
| Edge Activation | direct | OSM POIs (cafe / restaurant / shop / school / community / parking / highway / rail / industrial) within 100 m of park edge. Quality varies with OSM coverage. |
| Border Vacuum Risk | direct | OSM landuse, highways, rail, parking within 50 m of park edge. |
| Connectivity | direct | Toronto Centreline V2 (street segments within 25 m, intersection nodes within 100 m, trails / walkways within 50 m) + Pedestrian Network (sidewalk segments within 50 m, path-polygon crossings as estimated entrances) + OSM transit stops within 400 m. Components weighted 35 / 20 / 20 / 15 / 10 (paths / intersections / transit / entrances / superblock penalty). Confidence is tiered: high when all three sources are present near the park, medium when two are, lower when only one is. |
| Natural Comfort | partial | Toronto Treed Area (canopy % via stratified-grid sampling) + Ravine & Natural Feature Protection Area (ravine overlap %) + Waterbodies & Rivers (water % inside park, distance to nearest water) + Street Tree Inventory (tree count + density per ha). Components weighted 35 / 20 / 20 / 15 / 10 (canopy / impervious / green / ravine+water / diversity). Impervious surface is *approximated*. Toronto's authoritative impervious layer ships only as a GeoTIFF raster, which the pipeline can't read without GDAL. |
| Enclosure / Eyes on Park | direct | Toronto 3D Massing (428 k building polygons with footprints + heights). Counts buildings within 25 m and 50 m of park edge, avg edge height (binned into low-rise / mid-rise / tower), frontage density per 100 m of perimeter, blank-edge share, tower-in-the-park count. Components weighted 30 / 25 / 20 / 15 / 10 (frontage / human-scale / mid-rise eyes / blank-edge avoidance / tower-penalty). Held at neutral 50 with low confidence for parks with no nearby buildings (ravines, hydro corridors). |
| Equity Context | placeholder | Requires the Toronto Neighbourhood Profiles join. Surfaced as context only, not in the headline weighting. |
- direct: measured from a primary source we’ve loaded.
- partial: some inputs are loaded; others are still placeholder.
- placeholder: no source data loaded yet. The metric defaults to neutral 50 with low per-metric confidence, so it doesn’t silently inflate or deflate the headline.
Because some dimensions are placeholders, the headline score should be read as a Jacobs-inspired model in motion, not an official ranking. Park detail pages show a confidence value per sub-score so you can read the score at the right level of certainty.
Research-grade methods
| Sub-score | Inputs | Normalization |
|---|---|---|
| Edge Activation | OSM POIs within 100 m of park edge: positives (cafes / restaurants / retail / schools / community / transit / residential), negatives (parking / highway / rail / industrial / blank institutional). | 100·p / (p + 6) − 8·n, clamped to [0, 100] |
| Connectivity | Centreline V2 streets ≤ 25 m, intersection nodes ≤ 100 m, paths/sidewalks ≤ 50 m, OSM transit ≤ 400 m, estimated entrances, edge density per 100 m of perimeter. | 35% paths · 20% intersections · 20% transit · 15% entrances · 10% superblock penalty |
| Amenity Diversity | Distinct amenity types from City Parks & Recreation Facilities, spatially joined to the park polygon (with a 25 m fallback buffer). | 100·d / (d + 6), clamped |
| Natural Comfort | Treed-area canopy %, ravine overlap %, water % + nearest-water distance, street-tree count + density. Effective canopy = max(polygon, density × 0.7). | 35% canopy · 20% impervious · 20% green · 15% ravine+water · 10% diversity |
| Enclosure | 3D Massing buildings ≤ 25 m / 50 m of edge, avg height bin (low-rise < 9 m / mid-rise 9 to 21 m / tower ≥ 40 m), frontage density per 100 m of perimeter. | 30% frontage · 25% human-scale · 20% mid-rise eyes · 15% blank-edge avoidance · 10% tower-penalty |
| Border Vacuum (inverted) | Sum of weighted hostile uses within 50 m of edge: highway 30, rail 18, parking 12, industrial 14, blank-institutional 10. | overall contribution = 100 − risk |
Sampling. Cover-overlap measures (canopy %, ravine %, water %) use stratified-grid point sampling inside each park polygon at adaptive 6 to 30 m steps, capped at 400 sample points per park. Confidence is dampened on parks with fewer than 25 valid samples.
Clustering.k-means (k = 8, k-means++ init, deterministic seed) on the five normalised sub-scores. Cluster names are derived from each cluster centroid’s most-distinctive dimensions versus the citywide mean (delta-based, not z-score), with hand-curated overrides for cleanly identifiable patterns.
Confidence tiering. Each sub-score reports its own confidence based on which source layers contributed: measured (≥ 0.7) when the canonical source layer landed and had non-empty results for the park; partial (0.4 to 0.7) when one of multiple expected sources is missing; inferred(< 0.4) when the metric falls back to a placeholder. The headline score weights its sub-scores by their stated weights but does not re-weight by confidence. That’s a deliberate choice, so a low-confidence reading doesn’t silently shrink. It only gets visually flagged.
Human Activity Signals
The spatial-form scoring above describes parks that should work. It says nothing about parks that are actually programmed, photographed, walked through, or socially meaningful. The Activity Signals layer tries to pick up that second framing, partially. It is partial by design.
| Sub-score | Direct vs proxy | Sources |
|---|---|---|
| Programming | Direct (event records). | Toronto Festivals & Events JSON feed; future Eventbrite / Meetup APIs (require keys). |
| Social attention | Proxy (mention/photo/review counts). | Wikipedia REST pageviews + manual CSV imports. Optional Flickr / Google Places APIs. |
| Temporal diversity | Optional / manual. | Manual Popular-Times-shaped CSV. We do not scrape Google Maps. |
| Pedestrian / cycling flow | Proxy (counter at distance). | Toronto Permanent Bicycle Counters; future pedestrian counters where available. |
| Cultural significance | Proxy. | Wikipedia article presence + sentiment + tag diversity. |
- Inferred vitality is incomplete. The spatial model says how parks should behave. The activity layer tries to say how they actually do. Both readings are useful; neither is sufficient.
- Social media is biased. Photogenic parks over-index; everyday neighbourhood parks under-index. We label this clearly in the social-attention sub-score.
- Event data over-represents programmed civic use. Saturation and recurrence weighting prevent Nathan Phillips Square from dwarfing the rest, but the underlying feed is still city-curated.
- Popular Times is optional / manual / licensed only. We never scrape Google Maps. If no popular-times data has been imported for a park, the temporal-diversity sub-score is flagged as “unknown”.
- Counters measure movement near parks, not park occupancy. Proximity confidence reflects how close the counter sits to the park edge.
- Confidence is honest. Activity scores under sample mode are clamped to 0.25 confidence; real-data scores rise to 0.9 only when all five sub-source families contributed.
Privacy guarantees: see /data-ethics for what we will and will not collect.
Why scores are not bell-curved
Toronto’s parks aren’t normally distributed. Hydro corridors and ravine slivers cluster near zero; iconic neighbourhood parks cluster up high; the long middle is where most of the city lives. If we forced the raw scores onto a bell curve we’d be pretending that asymmetry away, and real structure would be hidden. So we don’t do it.
What we add instead is context. For every park we publish four numbers alongside the raw score:
- Citywide percentile: rank against all 3,273 parks. Useful for absolute orientation.
- Typology percentile: rank within parks of the same primary typology. Prevents unfair comparisons (a Civic Square and a Ravine aren’t doing the same job).
- Cluster percentile: rank within the auto-detected morphological cluster.
- Expected score: median of a similar-park cohort defined by typology + size band + ravine/waterfront status. The performance gap = raw − expected.
The gap is labelled in five buckets (strong over, modest over, typical, modest under, strong under) at ±5 and ±12 thresholds. Each park’s panel publishes its cohort size and a context-confidence (high, medium, low) so the reader can tell whether the gap is well-supported or comes from a tiny cohort.
What this answers: is this park strong for what it is?A modest raw score in a beloved parkette can still rank in the 90th percentile of its typology and read as a strong overperformer. That’s a more useful sentence than “55 / 100” alone.
Why typologies matter
A Civic Square (paved, for events, surrounded by towers) and a Ravine Park (forest, no buildings, off-grid) cannot be compared directly on a single Vitality score and the comparison would be actively misleading if you tried. Each park is automatically assigned a typology and a cluster, and the headline rankings on the home page are now typology-aware: best-in-class within a family rather than a city-wide leaderboard. The classifier is rule-based and explainable, and each detail page shows exactly which thresholds fired.
Jacobs vs Wilderness: two axes, not one
Behind the headline score sit two distinct frameworks. Urban integration(the average of edge activation, connectivity, and enclosure) measures whether the park is woven into the daily city Jacobs cared about. Natural comfort measures whether the park provides ecological respite. The Insights → Jacobs vs Wilderness page plots every Toronto park on these two axes; the result is bimodal. Most parks are strong on one axis and weak on the other; genuinely balanced parks are rare. We treat that as a finding, not a bug. The city actually has different kinds of parks doing different jobs.
Limitations of algorithmic urbanism
- Typologies are heuristics, not ground truth. A park may straddle types and the secondary read on the detail page is sometimes more accurate than the primary.
- Clustering is unsupervised k-means on five normalised dimensions. The cluster names are descriptive labels assigned by us, not categories the data “knows” about itself.
- Narratives are generated from real metric values via templates. They reference actual numbers but they are still pattern-matching, not understanding.
- The validation feedback we collect is a small signal, not a verdict. It weights the model’s confidence over time but does not override scoring.
The right way to use this site is as a conversation starter about Toronto’s parks, not as a ranking that decides which is “better”.
Limitations
- Measured ≠ truth. We measure proxies. A high-scoring park can still feel uninviting; a low-scoring park can be beloved.
- OSM coverage varies. Café and entrance density depend on volunteer mapping and skews toward downtown.
- No real pedestrian counts. We have no observation data, so “activity” is inferred from surrounding land use.
- Static snapshot. Seasonality, events, and time-of-day effects are not modelled in this MVP.
- Equity context is rough. Neighbourhood-level proxies hide block-level differences.
Privacy & ethics
We use only open, aggregated data sources. No individual movements, no proprietary location traces. Scores describe places, not people. We deliberately do not infer “safety” from policing or enforcement data; that conflation harms communities and was explicitly excluded.
Roadmap
- Street View / computer-vision derived facade and shade scoring.
- Real pedestrian activity (sidewalk counts, anonymised cell-network mobility).
- Event permits and programming density.
- Real-estate proxy data for neighbourhood vibrancy.
- Resident-perception surveys and Jane’s Walk-style observations.
- Seasonal usage modelling.