EngineeringMay 25, 2026

Our Most Important Metric Is 0%, and We're Keeping It That Way

We launched a price tracker six months ago. Our headline trust metric — the share of products with a high-confidence price estimate — is 0%. Here's why we're holding the bar.

We launched a price tracker for Canadian consumer electronics six months ago. Our headline trust metric — the share of products in our catalog whose price estimate is high-confidence — is 0%.

We're keeping it there.

What the Metric Measures

Every item in our catalog gets a worth estimate: a synthesized number representing what we think the item is currently worth, given whatever observations we have. Each estimate carries a confidence score in [0, 1]. The score is a product of four factors:

How many independent retailer observations we have (f_count)
How fresh the freshest observation is (f_recency)
How tightly the observed prices agree (f_agreement)
How trustworthy the source tier is (f_source)

Above 0.75, an estimate is high-confidence. We publish it with the strongest framing the site allows: a defensible median, a tight confidence band, structured data emitted as AggregateOffer. Below 0.55, the framing softens. Below 0.35, we publish no estimate at all — the page shows identity and specs only, honestly labelled.

Two numbers fall out of this for any vertical we ship:

Worth-tier coverage — the share of products clearing the publishable floor on any source tier
High-confidence coverage — the share clearing 0.75

The first is encouraging. The second is 0%.

Why 0%

The honest answer is: Canadian electronics retail is too thin.

We integrate four retailers. They don't carry the same SKUs — each is a different slice of the market, and the intersections are smaller than the unions. In practice, even our well-covered products see two retailer observations, not three. A product with N=2 perfectly-agreeing fresh observations tops out at confidence around 0.50 under our scoring — well below the 0.75 threshold.

The first time we ran the engine against the live catalog, no product anywhere reached N≥3. Zero. The best-populated tier was tracked (N=2).

This was a fork. We could have recalibrated — lower the f_count curve, drop the threshold from 0.75 to 0.55 — and high-confidence coverage would have jumped to something pretty for the dashboard. Or we could keep the bar and accept 0%.

We Kept the Bar

The reasoning is statistical, not philosophical: two retailers agreeing tells you they agree, not that the price is right. They might both carry one distributor's price. They might both be running the same vendor-pushed sale. Two sources cannot distinguish agreement from coincidence. Three can.

Lowering high-confidence to mean "two stores" would soften the one number that is supposed to be hard. The whole point of a confidence score is to tell the truth about what we know. A confidence score that makes the dashboard look good has stopped being a confidence score.

0% high-confidence coverage isn't a bug. It's the engine correctly reporting that the data we have today isn't enough to populate a real high-confidence tier in this category. That's information, not a failure mode.

What It Will Take to Get Off 0%

If recalibration is off the table, the only path to non-zero high-confidence coverage is more and better data. The worth engine has a tier ladder beyond live retail:

W1 — live retailer observations (today: four retailers, the thin layer)
W2 — decayed historical retail (today: the same data with older observations)
W3 — secondary-market comps from eBay's official API and domain-specific partnerships (not shipped)
W4 — modelled estimates from siblings and depreciation curves (deferred)

W3 is where high-confidence coverage becomes real. eBay's official API doesn't return two or three comps for an item — it returns dozens of recent sales. At that density, N≥3 is genuinely the floor of the distribution, not the unreachable ceiling. The well-tracked tier populates honestly.

This is months out. It's gated on phase, capital, partnerships, and the catalog credibility we're building right now. It is not gated on parameter tweaks. We could not have made 0% into 80% with a config change, and you should be deeply suspicious of any engineering team that could.

The Anti-Gaming Principle

The general rule applies beyond pricing.

Every metric you publish to the world — coverage percentage, confidence score, trust signal, freshness indicator — gives you two ways to make the number go up. You can do the underlying work, or you can tune the definition. The second one is cheaper, faster, and usually invisible to the people reading the dashboard.

The discipline isn't refusing to ever tune a definition. Definitions evolve when they're wrong. The discipline is: a tuned definition is only legitimate if it tracks a real property of the data. If the only effect of a parameter change is making your vertical look greener, the change is rejected. Not because it's dishonest in the moment, but because the metric loses its meaning over time, and every soft definition compounds into a dashboard nobody trusts.

For a canonical reference catalog, this matters structurally. The trust property is the moat. The moment a published metric is gameable, every other claim the catalog makes becomes suspect by association. The cheapest thing to lose is also the most expensive thing to rebuild.

Honest Labeling, Everywhere

The 0% number is the catalog-wide manifestation of a principle that runs through every entity page on the site. Each product gets one of five coverage labels:

Well-tracked (N≥3 fresh observations) — comparison page fully populated, median price defensible
Tracked (N=2) — best-price claim caveated
Single-source (N=1) — the price shows, but the "Lowest current" stat tile drops and the emerald comparison-styling goes with it
Historical — last known retail price with a "no current retail availability" amber callout
Encyclopedic only — identity, specs, description; no worth claim until a source populates

The structured data we emit follows the same partition. AggregateOffer for well-tracked. A single Offer for single-source. Product without offers for historical and encyclopedic. The schema honesty matches the visible-page honesty, because LLM grounding pipelines and human readers should reach the same conclusion about what we do and don't know.

A misleading "best price" claim on a single-source SKU is a trust break, even if every other criterion is satisfied. The label is the spec.

The Harder Thing to Build

Most price comparison sites publish confidence the source data doesn't support. They have to — their business model is the click. A page that says "we have one retailer's price and we don't really know if it's a fair one" doesn't drive conversions, doesn't display well on a comparison grid, doesn't make money on the existing model.

We took a different path. No display ads, no sponsored placements, no paid ranking — not because we hold a moral position, but because a canonical reference can't stay canonical any other way. Wikipedia, Wikidata, OpenStreetMap, MusicBrainz all hold that line. The trust that makes them the citation source erodes the moment they accept the cheaper revenue model. We think the same is true for a canonical registry of physical things — and not just for human readers. For grounded AI systems, source independence is becoming an explicit procurement criterion.

That choice forces a harder thing to build. The honest dashboard is a slower dashboard. The hard confidence threshold is a smaller "covered" number. The published 0% is the cost of admission to being something worth citing in five years.

We think it's the only number worth keeping.

More on the catalog and how to access it: /for-llms.