Dashboard
NYC Commercial
Intelligence
Decision support for commercial site selection across all 71 New York City CDTAs. Combine deterministic filters with natural-language soft preferences to surface the right neighborhoods.
CDTAs
71
Community District Tabulation Areas
Boroughs
—
Loading
Numeric features
—
Filterable in /api/feature-ranges
Step 1
K-Selection & Clustering
Group neighborhoods by the metrics you care about. Choose your cluster count — it carries over to Ranking.
- Custom NumPy K-Means with elbow + silhouette
- Pick which features to cluster on
- Borough-aware filtering and a CDTA choropleth
Open
Step 2
Ranking
Apply hard SQL filters, then blend semantic similarity with a competition penalty to rank neighborhoods for your site.
- Hard filters via DuckDB SQL (transparent query)
- Soft preferences via OpenAI / Supabase pgvector
- Optional Claude analysis on the filtered set
Open
How the pipeline fits together
Public datasets → engineered features → semantic profiles → blended ranking.
01
Ingest
MTA subway, DOT pedestrian counts, NYC storefront filings, NYS shooting incidents, ACS demographics.
02
Aggregate
Spatial join into 2020 CDTA polygons. Borough/citywide median imputation for missing rows.
03
Embed
Per-CDTA text profiles → 1536-d OpenAI embeddings, stored in Supabase with HNSW cosine index.
04
Rank
MinMax([semantic, −competition]) on the filtered set, then α·semantic + (1−α)·competition.