Rank ParadeDB search results by BM25 relevance only for pure-text searches
over a plain project scope. Numeric searches (the `OR index = N` branch) and
the Favorites view (the `id IN (<subquery>)` scope) keep the default ordering
(unranked, as on main): pdb.score rejects both as unsupported query shapes, and
the contortions previously needed to score them (two-arm numeric merge with
in-memory pagination, a favorites LEFT JOIN) added far more complexity than the
ranking was worth. Neither path was ranked before this PR, so leaving them at
the default order is no regression.
The BM25 relevance ranking added `pdb.score(tasks.id)` to the search SELECT
and ORDER BY. ParadeDB can only compute a score for a pure-ParadeDB query
shape, so two cases produced "pq: Unsupported query shape":
1. A numeric search (e.g. "#17") OR's the ParadeDB `|||` operators with a
plain `"index" = N` equality in the same boolean group. Scoring that mixed
group is unsupported.
2. When favorites are in scope, the `project_id IN (...) OR id IN (<favorites
subquery>)` predicate is unsupported under pdb.score regardless of how the
subquery is expressed (OR or UNION) - it just was never exercised because
the ranking tests searched a single project with no favorites.
Both are now handled so each query ParadeDB scores is a supported shape:
- Numeric search runs as two arms: an exact `index = N` arm (no score, ranked
first) and a text `|||` arm scored by pdb.score DESC. The arms are merged in
Go (index matches first, deduped by task id) and paginated in memory; the
count query keeps the combined `OR index = N` predicate (no score), which is
a supported shape, so totalItems stays correct.
- The relevance arms reach favorites through a LEFT JOIN and scope on the
joined column (`rank_favorites.entity_id IS NOT NULL`) instead of an
id-IN-subquery, which ParadeDB can score.
Non-numeric (pure text) searches keep the single pdb.score-ordered query.
Non-ParadeDB databases are unchanged (no pdb.score, no ranking).
TestTaskSearchRelevanceRankingNumericIndex covers the numeric case: on
ParadeDB the exact-index task ranks first, then text matches by relevance; on
other databases it only asserts the matches are returned.
Validated against the CI-pinned ParadeDB image (paradedb 0.21.12): the full
pkg/models and pkg/webtests suites pass, including
TestTaskCollection_ReadAll/search_for_task_index and the HTTP search tests.
When ParadeDB is in use and a search is run, results now keep the current
fuzzy/OR matching but are ordered by BM25 relevance so tasks matching all
query words rank above tasks matching only some.
Details:
- ParadeDB exposes the BM25 score via pdb.score(<key_field>); Vikunja's
key_field is id, so we order by pdb.score(tasks.id) DESC, then the
existing order-by (ending in a stable tasks.id tiebreak).
- Gating: relevance ordering only applies when ParadeDB is available, a
search term is present, AND the user did not pass an explicit sort_by.
An explicit user sort still wins; relevance only replaces the default
(id / position) sort.
- DISTINCT requires every ORDER BY expression to appear in the SELECT
list, so pdb.score(tasks.id) is added to the selected columns too (for
both the plain and task_positions-join query shapes). Because xorm's
Distinct() quotes each column and corrupts the function call, the
ranking path uses Select(rawColumns).Distinct() instead.
- ParadeDB-only by nature: pdb.score is invalid SQL on sqlite, mysql and
plain postgres, so those paths are completely unchanged.
A test (TestTaskSearchRelevanceRanking) creates a task matching all query
words plus tasks matching only one, then searches a multi-word query. On
ParadeDB it asserts the all-words task ranks first; on other databases it
only asserts the matching tasks are returned, so it stays green across the
whole CI database matrix. The CI ParadeDB matrix entry exercises the
ranking assertion.
Follow-up (not in this change): boosting results where the words appear in
order / in close proximity above plain all-words matches.
Fixes#2690
When expand[]=subtasks is used, the LEFT JOIN on task_relations filters
out same-project subtasks. If a parent task was deleted, the JOIN
produces NULL for parent_tasks.id/project_id, and since NULL != value
evaluates to NULL (not TRUE) in SQL, these tasks were incorrectly
excluded.
Add builder.IsNull{"parent_tasks.id"} to the OR condition so tasks
whose parent was deleted are still included in results.
Typesense was an optional external search backend. This commit fully
removes the integration, leaving the database searcher as the only
search implementation.
Changes:
- Delete pkg/models/typesense.go (core integration)
- Delete pkg/cmd/index.go (CLI command for indexing)
- Simplify task search to always use database searcher
- Remove Typesense event listeners for task sync
- Remove TypesenseSync model registration
- Remove Typesense config keys and defaults
- Remove Typesense doctor health check
- Remove Typesense initialization from startup
- Clean up benchmark test
- Add migration to drop typesense_sync table
- Remove golangci-lint suppression for typesense.go
- Remove typesense-go dependency
Restrict the AND-joined sub-table filter merging to range comparators
(>, >=, <, <=) only. Equality and negative comparators (=, !=, in,
not in) must remain as separate EXISTS/NOT EXISTS subqueries because
each matching value lives in its own row.
Merging equality filters like `labels = 4 && labels = 5` into a single
EXISTS would produce an unsatisfiable condition (no single row has
label_id=4 AND label_id=5). Merging negative filters like
`labels != 4 && labels != 5` into NOT EXISTS(label_id IN 4 AND
label_id IN 5) would be trivially true.
Also fix the join tracking to use the first filter's join type
(how the group connects to the previous element) instead of the last.
When multiple AND-joined filter conditions target the same sub-table
(e.g., reminders > X && reminders < Y), they are now combined into
a single EXISTS subquery so that all conditions must be satisfied by
the same row. Previously, each condition generated a separate EXISTS
subquery that could match different rows, causing false positives.
Fixes#2245
This fixes a bug where tasks which were filtered out by their label would still be shown. That was caused by the way the filter query was translated to sql under the hood.
Resolves https://github.com/go-vikunja/vikunja/issues/394
This fixes two closely-related bugs:
1. When loading tasks from a bucket of a saved filter, the saved filter query would override the user-supplied filter, which would cause to only tasks matching the saved filter query to be returned.
2. When a filter query for a bucket was specified, the function would only check if one of the top level filters was a filter for tasks in a specific bucket. That means a filter like "bucket_id = 42 && labels = foo" would return the expected result, while a filter like "labels = foo && (bucket_id = 42 && priority = 1)" would fail with an error 500 because the task_buckets table was not joined to the sql query. The fix from the first bug caused such filter queries.
Mysql cannot handle year values < 1. That means filtering for a date value like 0000-01-01 won't work with mysql. Additionally, dates like 0001-01-01 could under some circumstances not work either when the date in combination with the time zone would resolve to something like 0000-12-31 - for example when the server is located (and configured) in UTC, but the user running the query is in New York. This could be observed by setting the time zone manually using the filter_timezone query parameter.
Resolves https://vikunja.sentry.io/share/issue/42bce92c15354c109eb1e6488b6a542b/
Resolves https://vikunja.sentry.io/share/issue/ef81451b0c7b43f1bff2d3a86ba393bb/
This change fixes a bug where Typesense would try to sort by the project view of a saved filter. The view position is not indexed in Typesense, hence filtering fails. Because sorting by position is not a feature in saved filters, I've removed the logic for sorting saved filters with Typesense.
This change fixes a bug where the project view to use for joining was empty, since Typesense only supports 3 sorting parameters. When using more than that, the logic to fetch the view ID parameter would not return the correct parameter, but the logic building the order by statement would. That led to inconsistencies where the task position was included in the order by statement, but the table would not be joined, failing the query.