NYC Commercial Intelligence

Dashboard

NYC Commercial
Intelligence

Decision support for commercial site selection across all 71 New York City CDTAs. Combine deterministic filters with natural-language soft preferences to surface the right neighborhoods.

CDTAs
71
Community District Tabulation Areas
Boroughs
Loading
Numeric features
Filterable in /api/feature-ranges

Step 1

K-Selection & Clustering

Group neighborhoods by the metrics you care about. Choose your cluster count — it carries over to Ranking.

  • Custom NumPy K-Means with elbow + silhouette
  • Pick which features to cluster on
  • Borough-aware filtering and a CDTA choropleth
Open

Step 2

Ranking

Apply hard SQL filters, then blend semantic similarity with a competition penalty to rank neighborhoods for your site.

  • Hard filters via DuckDB SQL (transparent query)
  • Soft preferences via OpenAI / Supabase pgvector
  • Optional Claude analysis on the filtered set
Open

How the pipeline fits together

Public datasets → engineered features → semantic profiles → blended ranking.

01
Ingest
MTA subway, DOT pedestrian counts, NYC storefront filings, NYS shooting incidents, ACS demographics.
02
Aggregate
Spatial join into 2020 CDTA polygons. Borough/citywide median imputation for missing rows.
03
Embed
Per-CDTA text profiles → 1536-d OpenAI embeddings, stored in Supabase with HNSW cosine index.
04
Rank
MinMax([semantic, −competition]) on the filtered set, then α·semantic + (1−α)·competition.