The Data Problem

I wanted to score healthcare quality for 100+ countries.

Three minutes of searching returned WHO Global Health Observatory data, Numbeo's Healthcare Index, the Healthcare Access and Quality Index from the Global Burden of Disease study, and several pages of expat forum threads where people described their experiences in Portugal, Thailand, and Mexico with equal confidence and completely contradictory conclusions.

The sources disagreed. Some were five years old. Most were country-level only. Several required a commercial license to access programmatically. None answered the question a retiree actually has: will an English-speaking doctor be available when I need one, and what will it cost out of pocket?

In a bigger organization, this is where a researcher hands findings to a Product Manager and they decide together what "good enough" looks like for v1. Solo, I played both roles. The gap between the data that exists and the answer a real person needs is the central problem in building WhereToAdvisor. Finding the data was the easy part. Everything after that was the work.

What the data actually covers

The Data Sources

Facet	Description	Data Sources	Update Frequency
Governance	Rule of law, corruption levels, political stability, and democratic freedoms	Transparency International CPI; Freedom House Freedom in the World	Annual
Economics	Cost of living, income potential, tax burden, and economic stability	Numbeo Cost of Living Index; Heritage Foundation Economic Freedom Index; World Bank (Gini, GDP PPP)	Quarterly (Numbeo); Annual (others)
Safety	Crime rates, personal security, natural disaster risk, and political stability	UNODC Homicide Statistics; World Bank WGI Political Stability; Global Peace Index	Annual
Health	Healthcare quality, hospital access, insurance options, and life expectancy	WHO/GBD Healthcare Access & Quality Index; World Bank health expenditure data; IQAir PM2.5	Every 2–3 years (HAQ); Annual (others)
Education	School quality, university access, literacy rates, and STEM investment	OECD PISA (math, reading, science scores for 15-year-olds)	Every 3 years
Culture	Language accessibility, expat community size, internet speed, and lifestyle fit	EF English Proficiency Index; Ookla Speedtest Global Index	Annual (EF EPI); Monthly (Ookla)
Mobility	Visa options, passport strength, travel freedom, and residency pathways	Henley Passport Index; custom visa pathway dataset compiled from government immigration portals	Quarterly (Henley); Annual (visa pathways)
Acceptance & Inclusion	LGBTQ+ protections, racial inclusion, gender equality, and expat friendliness	Social Progress Index (Tolerance & Inclusion); ILGA World + Equaldex LGBTQ+ Rights Index	Annual

I sequenced the facets by data confidence, not by user importance. That's a deliberate product management call. Start where you can ship clean, learn the normalization process, then take on the harder facets with a working system behind you.

Governance was first because the data is the strongest. Transparency International's Corruption Perceptions Index, the World Justice Project Rule of Law Index, Freedom House's Freedom in the World, Reporters Without Borders Press Freedom Index. Well-maintained, methodologically sound, updated on reliable cycles. If you want to score democratic participation and rule of law, the data is there and it is good.

Economics required a purchase decision. Numbeo is the best city-level cost of living source available, so I bought the commercial API license. The World Bank covers macroeconomics well but does not answer what a remote worker wants to know: what does a two-bedroom apartment cost in Lisbon versus Medellin, and what is the realistic grocery bill for a family of four? The spec said city-level. Numbeo was the only answer. The purchase was straightforward.

Safety is where I discovered a spec gap mid-build. Country-level homicide statistics from UNODC are reliable. The moment I needed to differentiate between Mexico City neighborhoods and the national Mexico rate, the data got sparse. A user evaluating Mexico City does not need Mexico's national homicide rate. They need Roma Norte versus Iztapalapa. I made the call to ship with country-level data and flag it, rather than delay for city-level coverage that does not yet exist cleanly across 100+ destinations. That is an explicit v1.1 item.

Health is where the data gaps are widest. WHO data covers life expectancy, disease burden, and system capacity at a macro level. It tells you almost nothing about wait times, out-of-pocket costs, English-language availability, or whether a specific specialty is accessible without traveling to the capital. I knew this going in. The PM decision was to ship the macro picture honestly rather than pretend the specific picture exists.

Acceptance was the hardest to build. The data exists but is scattered across the World Values Survey, ILGA World's annual report, Gallup World Poll city-level tolerance questions, the Social Progress Index, and the Georgetown Women Peace and Security Index. No single source covers the full picture. Building the acceptance facet meant assembling those pieces and making explicit decisions about how they fit together. Those decisions are the subject of Post 3.

The normalization problem

Side-by-side horizontal bar charts comparing two methods of converting raw homicide rates into 0–100 safety scores. Left chart uses naive linear scaling: Japan (0.2 per 100k), Portugal (0.8), and Germany (0.8) score 99.6, 98.4, and 98.4 respectively — nearly indistinguishable. Right chart uses log₁₀ scaling with inversion: the same three countries score 89, 67, and 67 — a spread that reflects the real fourfold difference between Japan and Germany. Costa Rica, Colombia, and Mexico score lower on both methods but are more meaningfully differentiated under log scaling. — Two charts, same data. Naive linear scaling (left) compresses Japan, Portugal, and Germany into a one-point band at the top of the scale — a fourfold difference in homicide rate becomes a 1.2-point gap. Log₁₀ scaling (right) spreads those same countries across 22 points, making the differences between genuinely safe destinations legible. Without this step, destinations that feel very different on the ground would look nearly identical in our scoring.

Every source uses a different scale, a different methodology, and a different update cycle.

Transparency International scores countries 0-100. The World Justice Project uses 0-1. ILGA World produces categorical legal ratings, not numeric scores. The HAQ Index uses 0-100 but calibrated against a different baseline than Numbeo's Healthcare Index. Some sources update monthly. Some update annually. The World Values Survey runs on a multi-year cycle.

Normalizing everything to a consistent 0-100 scale is not a technical problem. It is a product decision with downstream consequences. Every normalization choice introduces bias, and I am the only one accountable for those choices.

One concrete example. The HAQ Index scores countries on healthcare access and quality with a theoretical range of 0-100. In practice, the top score in the current dataset is around 97 (Iceland) and the lowest is around 18. A naive normalization mapping 18-97 to 0-100 treats Iceland as a perfect 100 and compresses the meaningful variation in the middle. I cap the top performer at 99, scale everything else proportionally, and document the method. That decision shapes every health score in the product. There is no neutral choice. The methodology page exists to make those choices visible. In a normal product org, this is where you bring in a data scientist and a PM to align before anything ships. I wrote the spec, made the calls, and documented them as I went.

The coverage problem

Popular expat destinations have good data. Bangkok, Lisbon, Medellin, Prague: well-covered across most facets. Move further down the list and the gaps appear. City-level safety data for Tbilisi. English-proficiency survey data for smaller Latin American cities. Rental discrimination reports outside the top 50 expat markets.

The PM instinct here is to either delay shipping until coverage is complete or paper over the gaps with interpolation. Both are wrong. I built transparent coverage indicators instead. Every metric displays its data source and the date last updated. Where data is thin, the product says so. Known gaps, labeled honestly, beat a complete-looking product built on guesswork. Shipping with visible limitations is a credibility decision as much as a product decision.

The commercial license problem

Some of the best data is paywalled.

Numbeo's commercial API license was a required purchase. Global Property Guide, which provides transaction cost and foreign ownership data for the real estate scoring, requires a paid subscription for programmatic access. Ethnologue, the most comprehensive language data source, costs $480 per year and more for dataset access.

Solo builders face these tradeoffs without a procurement process or a budget owner. I made each call against a simple test: does this data materially change the user's answer, and is there a free alternative good enough to get to MVP?

For language data, the answer was yes. I used Glottolog, the Max Planck Institute's open-access language catalog, combined with the CIA World Factbook archive. The Factbook was discontinued in February 2026, but the Mozilla Data Collective published a machine-readable JSON archive of the final snapshot. The archive covers 260 entities, is public domain, and gets me through MVP. Ethnologue is a v1.1 decision.

Every v1.1 item I named during the build went into the spec as a logged decision, not a forgotten compromise. That discipline is what keeps technical debt from becoming invisible.

What AI helped with, and what it did not

This is the part most AI development posts get wrong.

AI did not make the methodology decisions. Which sources to trust, how to normalize conflicting scales, what to do when two authoritative sources disagree: those required judgment and are documented on the methodology page. That is PM work. It does not compress.

AI handled execution once I made those decisions. Data pipeline code. Normalization functions. Supabase ingestion scripts. Debugging when a source changed its format. Cross-referencing outputs when a score looked wrong. The pipeline queries 18 sources, normalizes the outputs, and loads them into the database. AI wrote the code against a spec I provided. A team would have needed days. I did it in hours.

The spec is what made that possible. Vague direction produces vague output. A detailed spec with documented decisions gives AI something real to execute against. That is not a new idea. It is just more consequential now.

The methodology is the product. AI built the plumbing.

Next in the series:

The Ethics Problem. The same data that helps a mixed-race family find safety theoretically points someone toward the opposite. Here is how the architecture prevents that, and why a policy document is not enough.

If you're interested in the "why" of the product, I would recommend reading Post 1: Why I Built This. Full data source inventory at wheretoadvisor.com/sources.