vikunja

Commit Graph

Author	SHA1	Message	Date
kolaente	116fb1e2e0	fix(search): rank exact task-index match before BM25 text relevance on ParadeDB The BM25 relevance ranking added `pdb.score(tasks.id)` to the search SELECT and ORDER BY. ParadeDB can only compute a score for a pure-ParadeDB query shape, so two cases produced "pq: Unsupported query shape": 1. A numeric search (e.g. "#17") OR's the ParadeDB `\|\|\|` operators with a plain `"index" = N` equality in the same boolean group. Scoring that mixed group is unsupported. 2. When favorites are in scope, the `project_id IN (...) OR id IN (<favorites subquery>)` predicate is unsupported under pdb.score regardless of how the subquery is expressed (OR or UNION) - it just was never exercised because the ranking tests searched a single project with no favorites. Both are now handled so each query ParadeDB scores is a supported shape: - Numeric search runs as two arms: an exact `index = N` arm (no score, ranked first) and a text `\|\|\|` arm scored by pdb.score DESC. The arms are merged in Go (index matches first, deduped by task id) and paginated in memory; the count query keeps the combined `OR index = N` predicate (no score), which is a supported shape, so totalItems stays correct. - The relevance arms reach favorites through a LEFT JOIN and scope on the joined column (`rank_favorites.entity_id IS NOT NULL`) instead of an id-IN-subquery, which ParadeDB can score. Non-numeric (pure text) searches keep the single pdb.score-ordered query. Non-ParadeDB databases are unchanged (no pdb.score, no ranking). TestTaskSearchRelevanceRankingNumericIndex covers the numeric case: on ParadeDB the exact-index task ranks first, then text matches by relevance; on other databases it only asserts the matches are returned. Validated against the CI-pinned ParadeDB image (paradedb 0.21.12): the full pkg/models and pkg/webtests suites pass, including TestTaskCollection_ReadAll/search_for_task_index and the HTTP search tests.	2026-06-19 22:52:26 +02:00
kolaente	9fb0d86c1b	feat(search): rank ParadeDB search results by BM25 relevance (#2690 ) When ParadeDB is in use and a search is run, results now keep the current fuzzy/OR matching but are ordered by BM25 relevance so tasks matching all query words rank above tasks matching only some. Details: - ParadeDB exposes the BM25 score via pdb.score(<key_field>); Vikunja's key_field is id, so we order by pdb.score(tasks.id) DESC, then the existing order-by (ending in a stable tasks.id tiebreak). - Gating: relevance ordering only applies when ParadeDB is available, a search term is present, AND the user did not pass an explicit sort_by. An explicit user sort still wins; relevance only replaces the default (id / position) sort. - DISTINCT requires every ORDER BY expression to appear in the SELECT list, so pdb.score(tasks.id) is added to the selected columns too (for both the plain and task_positions-join query shapes). Because xorm's Distinct() quotes each column and corrupts the function call, the ranking path uses Select(rawColumns).Distinct() instead. - ParadeDB-only by nature: pdb.score is invalid SQL on sqlite, mysql and plain postgres, so those paths are completely unchanged. A test (TestTaskSearchRelevanceRanking) creates a task matching all query words plus tasks matching only one, then searches a multi-word query. On ParadeDB it asserts the all-words task ranks first; on other databases it only asserts the matching tasks are returned, so it stays green across the whole CI database matrix. The CI ParadeDB matrix entry exercises the ranking assertion. Follow-up (not in this change): boosting results where the words appear in order / in close proximity above plain all-words matches. Fixes #2690	2026-06-19 20:46:28 +02:00
kolaente	53264d350e	fix(kanban): make bucket query fixed per-view (#1007 )	2025-06-25 11:38:24 +00:00

Author

SHA1

Message

Date

kolaente

116fb1e2e0

fix(search): rank exact task-index match before BM25 text relevance on ParadeDB

The BM25 relevance ranking added `pdb.score(tasks.id)` to the search SELECT
and ORDER BY. ParadeDB can only compute a score for a pure-ParadeDB query
shape, so two cases produced "pq: Unsupported query shape":

1. A numeric search (e.g. "#17") OR's the ParadeDB `|||` operators with a
   plain `"index" = N` equality in the same boolean group. Scoring that mixed
   group is unsupported.
2. When favorites are in scope, the `project_id IN (...) OR id IN (<favorites
   subquery>)` predicate is unsupported under pdb.score regardless of how the
   subquery is expressed (OR or UNION) - it just was never exercised because
   the ranking tests searched a single project with no favorites.

Both are now handled so each query ParadeDB scores is a supported shape:

- Numeric search runs as two arms: an exact `index = N` arm (no score, ranked
  first) and a text `|||` arm scored by pdb.score DESC. The arms are merged in
  Go (index matches first, deduped by task id) and paginated in memory; the
  count query keeps the combined `OR index = N` predicate (no score), which is
  a supported shape, so totalItems stays correct.
- The relevance arms reach favorites through a LEFT JOIN and scope on the
  joined column (`rank_favorites.entity_id IS NOT NULL`) instead of an
  id-IN-subquery, which ParadeDB can score.

Non-numeric (pure text) searches keep the single pdb.score-ordered query.
Non-ParadeDB databases are unchanged (no pdb.score, no ranking).

TestTaskSearchRelevanceRankingNumericIndex covers the numeric case: on
ParadeDB the exact-index task ranks first, then text matches by relevance; on
other databases it only asserts the matches are returned.

Validated against the CI-pinned ParadeDB image (paradedb 0.21.12): the full
pkg/models and pkg/webtests suites pass, including
TestTaskCollection_ReadAll/search_for_task_index and the HTTP search tests.

2026-06-19 22:52:26 +02:00

kolaente

9fb0d86c1b

feat(search): rank ParadeDB search results by BM25 relevance (#2690 )

When ParadeDB is in use and a search is run, results now keep the current
fuzzy/OR matching but are ordered by BM25 relevance so tasks matching all
query words rank above tasks matching only some.

Details:
- ParadeDB exposes the BM25 score via pdb.score(<key_field>); Vikunja's
  key_field is id, so we order by pdb.score(tasks.id) DESC, then the
  existing order-by (ending in a stable tasks.id tiebreak).
- Gating: relevance ordering only applies when ParadeDB is available, a
  search term is present, AND the user did not pass an explicit sort_by.
  An explicit user sort still wins; relevance only replaces the default
  (id / position) sort.
- DISTINCT requires every ORDER BY expression to appear in the SELECT
  list, so pdb.score(tasks.id) is added to the selected columns too (for
  both the plain and task_positions-join query shapes). Because xorm's
  Distinct() quotes each column and corrupts the function call, the
  ranking path uses Select(rawColumns).Distinct() instead.
- ParadeDB-only by nature: pdb.score is invalid SQL on sqlite, mysql and
  plain postgres, so those paths are completely unchanged.

A test (TestTaskSearchRelevanceRanking) creates a task matching all query
words plus tasks matching only one, then searches a multi-word query. On
ParadeDB it asserts the all-words task ranks first; on other databases it
only asserts the matching tasks are returned, so it stays green across the
whole CI database matrix. The CI ParadeDB matrix entry exercises the
ranking assertion.

Follow-up (not in this change): boosting results where the words appear in
order / in close proximity above plain all-words matches.

Fixes #2690

2026-06-19 20:46:28 +02:00

kolaente

53264d350e

fix(kanban): make bucket query fixed per-view (#1007 )

2025-06-25 11:38:24 +00:00

3 Commits