Commit Graph

3 Commits

Author SHA1 Message Date
kolaente 116fb1e2e0 fix(search): rank exact task-index match before BM25 text relevance on ParadeDB
The BM25 relevance ranking added `pdb.score(tasks.id)` to the search SELECT
and ORDER BY. ParadeDB can only compute a score for a pure-ParadeDB query
shape, so two cases produced "pq: Unsupported query shape":

1. A numeric search (e.g. "#17") OR's the ParadeDB `|||` operators with a
   plain `"index" = N` equality in the same boolean group. Scoring that mixed
   group is unsupported.
2. When favorites are in scope, the `project_id IN (...) OR id IN (<favorites
   subquery>)` predicate is unsupported under pdb.score regardless of how the
   subquery is expressed (OR or UNION) - it just was never exercised because
   the ranking tests searched a single project with no favorites.

Both are now handled so each query ParadeDB scores is a supported shape:

- Numeric search runs as two arms: an exact `index = N` arm (no score, ranked
  first) and a text `|||` arm scored by pdb.score DESC. The arms are merged in
  Go (index matches first, deduped by task id) and paginated in memory; the
  count query keeps the combined `OR index = N` predicate (no score), which is
  a supported shape, so totalItems stays correct.
- The relevance arms reach favorites through a LEFT JOIN and scope on the
  joined column (`rank_favorites.entity_id IS NOT NULL`) instead of an
  id-IN-subquery, which ParadeDB can score.

Non-numeric (pure text) searches keep the single pdb.score-ordered query.
Non-ParadeDB databases are unchanged (no pdb.score, no ranking).

TestTaskSearchRelevanceRankingNumericIndex covers the numeric case: on
ParadeDB the exact-index task ranks first, then text matches by relevance; on
other databases it only asserts the matches are returned.

Validated against the CI-pinned ParadeDB image (paradedb 0.21.12): the full
pkg/models and pkg/webtests suites pass, including
TestTaskCollection_ReadAll/search_for_task_index and the HTTP search tests.
2026-06-19 22:52:26 +02:00
kolaente 9fb0d86c1b feat(search): rank ParadeDB search results by BM25 relevance (#2690)
When ParadeDB is in use and a search is run, results now keep the current
fuzzy/OR matching but are ordered by BM25 relevance so tasks matching all
query words rank above tasks matching only some.

Details:
- ParadeDB exposes the BM25 score via pdb.score(<key_field>); Vikunja's
  key_field is id, so we order by pdb.score(tasks.id) DESC, then the
  existing order-by (ending in a stable tasks.id tiebreak).
- Gating: relevance ordering only applies when ParadeDB is available, a
  search term is present, AND the user did not pass an explicit sort_by.
  An explicit user sort still wins; relevance only replaces the default
  (id / position) sort.
- DISTINCT requires every ORDER BY expression to appear in the SELECT
  list, so pdb.score(tasks.id) is added to the selected columns too (for
  both the plain and task_positions-join query shapes). Because xorm's
  Distinct() quotes each column and corrupts the function call, the
  ranking path uses Select(rawColumns).Distinct() instead.
- ParadeDB-only by nature: pdb.score is invalid SQL on sqlite, mysql and
  plain postgres, so those paths are completely unchanged.

A test (TestTaskSearchRelevanceRanking) creates a task matching all query
words plus tasks matching only one, then searches a multi-word query. On
ParadeDB it asserts the all-words task ranks first; on other databases it
only asserts the matching tasks are returned, so it stays green across the
whole CI database matrix. The CI ParadeDB matrix entry exercises the
ranking assertion.

Follow-up (not in this change): boosting results where the words appear in
order / in close proximity above plain all-words matches.

Fixes #2690
2026-06-19 20:46:28 +02:00
kolaente 53264d350e
fix(kanban): make bucket query fixed per-view (#1007) 2025-06-25 11:38:24 +00:00