distribution embedder setting to correct the returned _rankingScores of semantic hits with an affine transformation. Tuning distribution is useful when your chosen embedder consistently rates unrelated documents as “somewhat relevant”, making it hard for downstream code (or users) to tell truly good matches apart from noise.
Changing
distribution does not trigger a reindexing operation. This makes it safe to iterate on, unlike most other embedder settings.When to tune distribution
Different embedding models produce _rankingScore values on different effective ranges. Some models report high scores for nearly every document, while others spread their scores more evenly. Tuning distribution rescales these raw scores so that:
- Very relevant hits land near
1. - Somewhat relevant hits land near
0.5. - Irrelevant hits land near
0.
How distribution works
distribution is an optional field compatible with all embedder sources. It is an object with two fields, both numbers between 0 and 1:
| Field | Meaning |
|---|---|
mean | The semantic score of “somewhat relevant” hits before applying the distribution setting. |
sigma | The average absolute difference in _rankingScores between “very relevant” hits and “somewhat relevant” hits, and between “somewhat relevant” hits and “irrelevant” hits. |
Tuning workflow
Configuringdistribution requires a certain amount of trial and error. In practice:
- Run representative semantic searches against your index with
showRankingScore: true. - Note the
_rankingScores of hits you consider “very relevant”, “somewhat relevant”, and “irrelevant”. - Record the observed
mean(the score of your “somewhat relevant” hits) andsigma(the average distance between your relevance tiers). - Update your embedder with the new
distributionvalues. - Re-run the searches and check that top hits now score near
1, borderline hits near0.5, and poor hits near0. - Repeat until the scores match your expectations.
Example configuration
0.7 are treated as “somewhat relevant”, and scores are spread out so that truly good matches move toward 1 while weak matches drift toward 0.
Next steps
Choose an embedder
Compare embedding providers and pick the right one for your use case.
Custom hybrid ranking
Tune
semanticRatio to balance keyword and semantic results.