Dashboard

NYC Commercial
Intelligence

Decision support for commercial site selection across all 71 New York City CDTAs. Combine deterministic filters with natural-language soft preferences to surface the right neighborhoods.

CDTAs

Community District Tabulation Areas

Boroughs

—

Numeric features

—

Filterable in /api/feature-ranges

Step 1

K-Selection & Clustering

Group neighborhoods by the metrics you care about. Choose your cluster count — it carries over to Ranking.

Custom NumPy K-Means with elbow + silhouette
Pick which features to cluster on
Borough-aware filtering and a CDTA choropleth

Open

Step 2

Ranking

Apply hard SQL filters, then blend semantic similarity with a competition penalty to rank neighborhoods for your site.

Hard filters via DuckDB SQL (transparent query)
Soft preferences via OpenAI / Supabase pgvector
Optional Claude analysis on the filtered set

Open

How the pipeline fits together

Public datasets → engineered features → semantic profiles → blended ranking.

Ingest

MTA subway, DOT pedestrian counts, NYC storefront filings, NYS shooting incidents, ACS demographics.

Aggregate

Spatial join into 2020 CDTA polygons. Borough/citywide median imputation for missing rows.

Embed

Per-CDTA text profiles → 1536-d OpenAI embeddings, stored in Supabase with HNSW cosine index.

Rank

MinMax([semantic, −competition]) on the filtered set, then α·semantic + (1−α)·competition.