Overview
The goal of this epic is to have REST API format that will better suit with consumption by a BI tool (eg. Qlik, Power BI, etc).
The following routes will be covered:
-
GET /tracker_reports/{id}/artifacts
-
GET /trackers/{id}/artifacts
-
GET /artifacts
Format & approaches
Initially CSV was identified as the best tradeoff as an output format, despite its limitations, because it was easy to injest by most BI tools without any end users manipulations.
After deeper thoughts, it will not be possible to generate a CSV output that doesn't require end users manipulations. At the very least, end users will have to deal with pagination as the output cannot be sent without any (risk of server overload, request truncation, timeout, etc). Moreover CSV as a data format is already by itself problematic:
- It's exposed to the issues related to encoding (charset, BOM & co).
- It as issues with date time values, separator and quotations.
- The topic is well covered in the RFC of the format itself
With this in mind:
- Given that most BI tools are able to injest JSON (at least Qlik and PowerBI)
- But the JSON exposed by Tuleap APIs is too complex for average BI users to parse
- The current proposal is to expose a simplified json output, easier to parse.
How to parse JSON with Qlik: https://www.youtube.com/watch?v=8xkwFjDjO84
Json
The identified routes, get a new optional parameter (eg. ouput_format
) that can be either nested
(default) or flat
(new, BI easy). The output representation is then:
[
{"field_name_int": 0, "field_name_array": ["A", "B"], "field_name_c": ...},
...
]
The logic:
- One line per artifact
- Each field value is made of
- one key with field name (not id)
- one value with field value
- When field value is :
- a simple type (string, int, float), the value is output directly
- has multiple values (list fields), the value is itself an array with the simple values within
- cannot be represented with a single value (eg. a Text field that comes with a format to be properly managed) have a special treatement
Special fields
This section describe the fields that requires a special treatement for further discussions
- Text
- Text field values have 3 possible formats (text, markdown & HTML). For any sensible treatment, it's required for consumer to know the format.
- As format is reduced to a single value, it's not possible to specify the format so there are two options
- The approach for CSV or XLSX output: transform the output to text only (remove HTML markup).
- Output the raw value and let the consumer figure out the format.
Right now option 1 is in place for experimentation
CSV
Note: From a pure HTTP point of view we should request an Accept: text/csv
header to the request (as we do for XML) but we have no guaranty that tools will provide such customization to their end users. Plus, people who will do the BI integration are likely to be less tech-savvy and the management of HTTP headers would be too complex.
When format=csv
is used, the query result will be sent with "Content-type: text/csv" with UTF8 encoding.
The separator to be used will be the one chosen in user preferences.
Format
The output will be paginated with the same rules than with JSON output.
The header of the CSV document will be the name of the columns made of field's name (not label). Eg:
artifact_id, start_date, submitted_by, ...
The content will be adapted to CSV with following rules:
- String, Text, Float, Int, Computed, Artifact ID, Per tracker ID: literal value (eg.
5
, 2.0
, Error when doing...
)
- Date, Date + Time, Submitted on, Last update date: date according to CSV preferences
- Select box, multi select box, radio button, check box, open list, shared fields:
- static: literal value (eg.
UX & UI
)
- users: login name (eg.
jdoe
)
- groups: group name (eg.
developers
)
Field not supported will not be part of CSV (without warning):
- Permissions on artifacts
- Artifact links
- Cross references
- File upload
- Burnup
- Burndown
- Encrypted fields
Open point
How do we represent lists to make them useful ?
Given a multi select box Category
with "Database", "Web UI", "REST API" as possible values and an artifact whose categories are "Web UI" & "Database":
- Output one line with sub separator :
artifact_id, Category
1234, Web UI|Database
artifact_id, category
1234, Web UI
1234, Database
The first representation is more inline with Tuleap output but we don't know how tools will be able to consume this kind of output.
The second representation is more "natural" in term of CSV but would be more complex. Example with the previous artifact with Language "FR, EN, DE":
artifact_id, category, language
1234, Web UI, FR
1234, Web UI, EN
1234, Web UI, DE
1234, Database, FR
1234, Database, EN
1234, Database, DE