The BM25 relevance ranking added `pdb.score(tasks.id)` to the search SELECT
and ORDER BY. ParadeDB can only compute a score for a pure-ParadeDB query
shape, so two cases produced "pq: Unsupported query shape":
1. A numeric search (e.g. "#17") OR's the ParadeDB `|||` operators with a
plain `"index" = N` equality in the same boolean group. Scoring that mixed
group is unsupported.
2. When favorites are in scope, the `project_id IN (...) OR id IN (<favorites
subquery>)` predicate is unsupported under pdb.score regardless of how the
subquery is expressed (OR or UNION) - it just was never exercised because
the ranking tests searched a single project with no favorites.
Both are now handled so each query ParadeDB scores is a supported shape:
- Numeric search runs as two arms: an exact `index = N` arm (no score, ranked
first) and a text `|||` arm scored by pdb.score DESC. The arms are merged in
Go (index matches first, deduped by task id) and paginated in memory; the
count query keeps the combined `OR index = N` predicate (no score), which is
a supported shape, so totalItems stays correct.
- The relevance arms reach favorites through a LEFT JOIN and scope on the
joined column (`rank_favorites.entity_id IS NOT NULL`) instead of an
id-IN-subquery, which ParadeDB can score.
Non-numeric (pure text) searches keep the single pdb.score-ordered query.
Non-ParadeDB databases are unchanged (no pdb.score, no ranking).
TestTaskSearchRelevanceRankingNumericIndex covers the numeric case: on
ParadeDB the exact-index task ranks first, then text matches by relevance; on
other databases it only asserts the matches are returned.
Validated against the CI-pinned ParadeDB image (paradedb 0.21.12): the full
pkg/models and pkg/webtests suites pass, including
TestTaskCollection_ReadAll/search_for_task_index and the HTTP search tests.
When ParadeDB is in use and a search is run, results now keep the current
fuzzy/OR matching but are ordered by BM25 relevance so tasks matching all
query words rank above tasks matching only some.
Details:
- ParadeDB exposes the BM25 score via pdb.score(<key_field>); Vikunja's
key_field is id, so we order by pdb.score(tasks.id) DESC, then the
existing order-by (ending in a stable tasks.id tiebreak).
- Gating: relevance ordering only applies when ParadeDB is available, a
search term is present, AND the user did not pass an explicit sort_by.
An explicit user sort still wins; relevance only replaces the default
(id / position) sort.
- DISTINCT requires every ORDER BY expression to appear in the SELECT
list, so pdb.score(tasks.id) is added to the selected columns too (for
both the plain and task_positions-join query shapes). Because xorm's
Distinct() quotes each column and corrupts the function call, the
ranking path uses Select(rawColumns).Distinct() instead.
- ParadeDB-only by nature: pdb.score is invalid SQL on sqlite, mysql and
plain postgres, so those paths are completely unchanged.
A test (TestTaskSearchRelevanceRanking) creates a task matching all query
words plus tasks matching only one, then searches a multi-word query. On
ParadeDB it asserts the all-words task ranks first; on other databases it
only asserts the matching tasks are returned, so it stays green across the
whole CI database matrix. The CI ParadeDB matrix entry exercises the
ranking assertion.
Follow-up (not in this change): boosting results where the words appear in
order / in close proximity above plain all-words matches.
Fixes#2690