Overview

A long time a go, there was a full text search plugin for Tuleap. It was eventually removed from the code base because it became un-manageable for two main reasons:

the initial architecture tradeoff were not the good ones for permissions (it's not possible to actually store permissions for documents in the engine).
the hard dependency on ElasticSearch at a stage where it evolved really fast was not manageable.
ElasticSearch itself proved to be a tricky beast and unqualified teams for OnPrem deployments made it a nightmare to manage.

The current strategy is quite different:

permissions should not be stored with indexation but applied afterward by Tuleap
- The tradeoff here is that it might be slower to get the results and it will not be possible to give the number of matched elements (on this point, keep in mind the google doesn't give it either)
as a first step, the target will be to mainly index tracker's texts (string & texteara fields as well as follow-up comments)
the indexation engine should be designed for replacement with MySQL itself as primary target (then it might be replaced/enhanced with external engine like ElasticSearch or Algolia for instance).
- for instance we certainly don't want to manage ourselves ElasticSearch instances for Tuleap Cloud.

There are now 2 backends:

one based on MySQL
one based on meilisearch

Search on artifacts

From a functional stand-point, search on artifacts would be accessible via the "Switch to" button.

A mock-up of the interaction and how the existing behavior of "filter" is handled is accessible in codepen: https://cdpn.io/pen/debug/OJxpRMY/0266ad34436604c43b0b0ecfaf768406

REST route

Search is done via a REST route.

As no permissions are associated with the indexed data, the REST route must filter the data to ensure that permissions are applied. There are 2 situations:

fields: need to take into account permissions on it self field (sic!), artifact (permission on artifact), tracker (esp. with permissions relative to artifact creator & assignee), project (visibility) and platform (anonymous, restricted, ...).
follow-up comments: same as field but without the field level

Indexing

By default, indexing is done in a dedicated MySQL table designed for search using FullTextSearch index.

Index is done in an asynchronous way (need of redis for message queue) after events:

Artifact Create
Artifact Update
Artifact Delete
Tracker Delete
Field Delete
Project Delete
Follow-up Update

Table format (proposal): Search (type: string, content: TEXT, metadata: JSON) with metadata like {art_id: INT, field_id: INT} for fields or {art_id: INT} for follow-up.

Create initial index

A dedicated tuleap command is added to manage the initial index of the platform. Maybe we should design this command to take slice of artifacts to allow progressive creation of the index on very large platform. That said, on platforms with 1M+ artifacts, it's unlikely that an admin will be behind their desk to watch and launch such a command so maybe we need to design something with daily root or weekly to automatically index the platform over the course of a week for instance.

Get better results

This section covers the thoughts about search experience and what is desirable in the future. It focuses on meilisearch backend because trying to improve db backend doesn't make much sense.

Filtering out automated test exec

Possible strategies:

do not index test_execution artifacts that have automated_tests not empty
do not index test_execution at all (but we would lack of test results)

Take project and tracker shortname into account

Given I have a tracker of type bug in a project called garden And issue name dependency 2022 When I search for garden bug 2022 Then I should be able to find this bug

Another way to approach this would be to introduce special filters like in:project_name, in:bug, 'status:closed'

Progress

Start dateEmpty

End dateEmpty

StatusClosed

Details

Artifact ID#24164

Submitted ByManuel Vacelet (vaceletm)

Last Modified On2024-09-26 10:50

Submitted On2021-11-22 14:01

Attachments

By Manuel Vacelet (vaceletm)