Rank ParadeDB search results by BM25 relevance only for pure-text searches
over a plain project scope. Numeric searches (the `OR index = N` branch) and
the Favorites view (the `id IN (<subquery>)` scope) keep the default ordering
(unranked, as on main): pdb.score rejects both as unsupported query shapes, and
the contortions previously needed to score them (two-arm numeric merge with
in-memory pagination, a favorites LEFT JOIN) added far more complexity than the
ranking was worth. Neither path was ranked before this PR, so leaving them at
the default order is no regression.
The BM25 relevance ranking added `pdb.score(tasks.id)` to the search SELECT
and ORDER BY. ParadeDB can only compute a score for a pure-ParadeDB query
shape, so two cases produced "pq: Unsupported query shape":
1. A numeric search (e.g. "#17") OR's the ParadeDB `|||` operators with a
plain `"index" = N` equality in the same boolean group. Scoring that mixed
group is unsupported.
2. When favorites are in scope, the `project_id IN (...) OR id IN (<favorites
subquery>)` predicate is unsupported under pdb.score regardless of how the
subquery is expressed (OR or UNION) - it just was never exercised because
the ranking tests searched a single project with no favorites.
Both are now handled so each query ParadeDB scores is a supported shape:
- Numeric search runs as two arms: an exact `index = N` arm (no score, ranked
first) and a text `|||` arm scored by pdb.score DESC. The arms are merged in
Go (index matches first, deduped by task id) and paginated in memory; the
count query keeps the combined `OR index = N` predicate (no score), which is
a supported shape, so totalItems stays correct.
- The relevance arms reach favorites through a LEFT JOIN and scope on the
joined column (`rank_favorites.entity_id IS NOT NULL`) instead of an
id-IN-subquery, which ParadeDB can score.
Non-numeric (pure text) searches keep the single pdb.score-ordered query.
Non-ParadeDB databases are unchanged (no pdb.score, no ranking).
TestTaskSearchRelevanceRankingNumericIndex covers the numeric case: on
ParadeDB the exact-index task ranks first, then text matches by relevance; on
other databases it only asserts the matches are returned.
Validated against the CI-pinned ParadeDB image (paradedb 0.21.12): the full
pkg/models and pkg/webtests suites pass, including
TestTaskCollection_ReadAll/search_for_task_index and the HTTP search tests.
When ParadeDB is in use and a search is run, results now keep the current
fuzzy/OR matching but are ordered by BM25 relevance so tasks matching all
query words rank above tasks matching only some.
Details:
- ParadeDB exposes the BM25 score via pdb.score(<key_field>); Vikunja's
key_field is id, so we order by pdb.score(tasks.id) DESC, then the
existing order-by (ending in a stable tasks.id tiebreak).
- Gating: relevance ordering only applies when ParadeDB is available, a
search term is present, AND the user did not pass an explicit sort_by.
An explicit user sort still wins; relevance only replaces the default
(id / position) sort.
- DISTINCT requires every ORDER BY expression to appear in the SELECT
list, so pdb.score(tasks.id) is added to the selected columns too (for
both the plain and task_positions-join query shapes). Because xorm's
Distinct() quotes each column and corrupts the function call, the
ranking path uses Select(rawColumns).Distinct() instead.
- ParadeDB-only by nature: pdb.score is invalid SQL on sqlite, mysql and
plain postgres, so those paths are completely unchanged.
A test (TestTaskSearchRelevanceRanking) creates a task matching all query
words plus tasks matching only one, then searches a multi-word query. On
ParadeDB it asserts the all-words task ranks first; on other databases it
only asserts the matching tasks are returned, so it stays green across the
whole CI database matrix. The CI ParadeDB matrix entry exercises the
ranking assertion.
Follow-up (not in this change): boosting results where the words appear in
order / in close proximity above plain all-words matches.
Fixes#2690