•  
      epic #32665 Improved document generation
    Summary
    Improved document generation
    DocGen

    Overview

    At the time of writing (Tuleap 15.8) document generation covers, in OOXML (docx) format:

    • tracker reports
    • test plan
    • test campaign

    The wished evolution are of two natures:

    • Allow to select which fields of artifacts should be part of the export (to reduce the size of documents)
    • Allow to produce documents that looks like companies official documents
    • The combination of both

    The trickiest part is the second point, there were a couple of options evaluated, if you are curious there is a dedicated section below that analyze the various options. The outcome of this investigations are:

    • Docx export will be done without styling
    • Styled documents will be in PDF
    • Both will be able to leverage "selection of fields to include in produced document"

    Documents that looks like companies official documents

    Option1 : dotx template

    Basically using a dotx (docx template) is not feasible, copying style doesn't produce a usable result.

    Option 2: docx with custom template language

    The alternative could be to have "parametrized docx", that means a regular docx (built with company template) with custom markup language to fill the document. This strategy is currently used by end-users right now, based on Tuealp REST API.

    The downside of this approach is that:

    • It's not intended to be used by regular users. Development skills are required to craft the source document (because usage of markup language)
    • The "Developer Experience" will be terrible, it's not feasible to ensure that submitted document is "correct" (from a language point of view) so it will be a wake mole development: upload -> test result -> correct
    • We need to be able to parse docx document reliably
    • There are security risks associated to putting a templating language in the hand of end users: DoS, Code exec.

    Alternative evaluated:

    • In PHP ecosystem, tests have been done with PhpOffice lib: https://gerrit.tuleap.net/c/tuleap/+/30462. That works for basic transformation but it's unlikely to survive with real data and complex documents.
    • In Java ecosystem, we evaluated usage of M2Doc a solution for generating templatized docx based on Eclipse EMF
      • Pro: they already handle the docx parsing stuff
      • Pro: templating language is very powerful
      • Cons: java based solution
      • Cons: need to be able to produce the data to be filled in the document as "EMF" (disclaimer: I don' really know what is the impact of it)
      • Cons: not clear but it seems that it requires eclipse at some point to generate a template (https://www.m2doc.org/ref-doc/nightly/index.html#generating-a-document)
    • In NodeJS ecosystem, we evaluated docx-template.
      • Pro: it's already used by some Tuleap users.
      • Pro: reliable and maintained.
      • Cons: written in bold in there documentation, Templates can contain arbitrary javascript code. Beware of code injection risks!. While this is not an issue for local development where the developer will be owner of the template and the code being written, it's a major problem in Tuleap context as code could be injected either in the Template or in Tuleap data. That being said, it seems that R4J uses the same library (as far as the syntax is concerned) but they doesn't mention possibility of security issue.
    • In Python, python-docx
    • In dotnet ecosystem
      • Cons: it requires a Windows environment

    Option 3: PDF

    As an alternative to generate a docx with specific styling, we evaluated the possibility to generate PDF instead. It turns out that it's not simple but there is a (tricky) way.

    What doesn't work:

    • PDF generation libraries. We evaluated many, they all fall short when it comes to have a document that is not ugly.
    • PDF generation services. They produce the best results but we cannot rely on an external service (On Prem, AirGap, etc).

    Current state of investigations, the best option would be to leverage browser's print capabilities to print to PDF:

    What we should aim to cover is:

    • a cover page that looks like an official document of the company (incl images)
    • a Table of contents (clickable and/or with page numbers is not mandatory)
    • a styling/color to match companies design guidlines
    • headers and/or footers with images and ability to include custom text like "Company private, etc".

    Fields selections for document export

    Progress
    Empty
    Empty
    Closed
    Details
    #32665
    Manuel Vacelet (vaceletm)
    2024-09-26 11:02
    2023-06-27 10:10
    Attachments
    Empty
    References
    Referencing epic #32665

    Follow-ups

    User avatar
    • Description
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes
    • Status changed from Planned to Closed
    • links
    User avatar

    General improvement of generated doc is not yet planned however :

    • The ability to export to PDF with a company template will be introduced for artidoc with art #38630, if proven useful, it will be propagated to other usage of DocGen
    • The ability to select fields to be included might overlap with artidoc ability to include fields (no story yet). This need to be further investigated.

    • Start date cleared
    • End date cleared
    User avatar
    • Description
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes
    User avatar
    • Description
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes
    User avatar
    • Start date changed from 2024-04-29 to 2024-06-24
    • End date changed from 2024-07-17 to 2024-09-11
    User avatar
    • Description
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes
    User avatar
    • Description
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes
    • links
    • Start date changed from 2024-04-04 to 2024-04-29
    • End date changed from 2024-06-19 to 2024-07-17
    User avatar
    • Start date changed from 2024-02-01 to 2024-04-04
    • End date changed from 2024-03-27 to 2024-06-19
    User avatar
    • Description
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes
    User avatar
    • Start date changed from 2023-11-09 to 2024-02-01
    • End date changed from 2024-01-03 to 2024-03-27
    User avatar
    • Description
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes
    • Start date changed from 2023-08-17 to 2023-11-09