Overview
A long time a go, there was a full text search plugin for Tuleap. It was eventually removed from the code base because it became un-manageable for two main reasons:
- the initial architecture tradeoff were not the good ones for permissions (it's not possible to actually store permissions for documents in the engine).
- the hard dependency on ElasticSearch at a stage where it evolved really fast was not manageable.
- ElasticSearch itself proved to be a tricky beast and unqualified teams for OnPrem deployments made it a nightmare to manage.
The current strategy is quite different:
- permissions should not be stored with indexation but applied afterward by Tuleap
- The tradeoff here is that it might be slower to get the results and it will not be possible to give the number of matched elements (on this point, keep in mind the google doesn't give it either)
- as a first step, the target will be to mainly index tracker's texts (string & texteara fields as well as follow-up comments)
- the indexation engine should be designed for replacement with MySQL itself as primary target (then it might be replaced/enhanced with external engine like ElasticSearch or Algolia for instance).
- for instance we certainly don't want to manage ourselves ElasticSearch instances for Tuleap Cloud.
There are now 2 backends:
Search on artifacts
From a functional stand-point, search on artifacts would be accessible via the "Switch to" button.
A mock-up of the interaction and how the existing behavior of "filter" is handled is accessible in codepen:
https://cdpn.io/pen/debug/OJxpRMY/0266ad34436604c43b0b0ecfaf768406
REST route
Search is done via a REST route.
As no permissions are associated with the indexed data, the REST route must filter the data to ensure that permissions are applied. There are 2 situations:
- fields: need to take into account permissions on it self field (sic!), artifact (permission on artifact), tracker (esp. with permissions relative to artifact creator & assignee), project (visibility) and platform (anonymous, restricted, ...).
- follow-up comments: same as field but without the field level
Indexing
By default, indexing is done in a dedicated MySQL table designed for search using FullTextSearch index.
Index is done in an asynchronous way (need of redis for message queue) after events:
- Artifact Create
- Artifact Update
- Artifact Delete
- Tracker Delete
- Field Delete
- Project Delete
- Follow-up Update
Table format (proposal): Search (type: string, content: TEXT, metadata: JSON) with metadata like {art_id: INT, field_id: INT} for fields or {art_id: INT} for follow-up.
Create initial index
A dedicated tuleap command is added to manage the initial index of the platform. Maybe we should design this command to take slice of artifacts to allow progressive creation of the index on very large platform. That said, on platforms with 1M+ artifacts, it's unlikely that an admin will be behind their desk to watch and launch such a command so maybe we need to design something with daily root or weekly to automatically index the platform over the course of a week for instance.
Get better results
This section covers the thoughts about search experience and what is desirable in the future. It focuses on meilisearch backend because trying to improve db backend doesn't make much sense.
Filtering out automated test exec
Possible strategies:
- do not index test_execution artifacts that have automated_tests not empty
- do not index test_execution at all (but we would lack of test results)
Take project and tracker shortname into account
Given I have a tracker of type bug
in a project called garden
And issue name dependency 2022
When I search for garden bug 2022
Then I should be able to find this bug
Another way to approach this would be to introduce special filters like in:project_name
, in:bug
, 'status:closed'