•  
      request #9748 Trackerv5 CSV import format documented nowhere
    Infos
    #9748
    Alexandre Ferrieux (ferrieux)
    2017-01-09 14:41
    2016-12-18 17:58
    10025
    Details
    Trackerv5 CSV import format documented nowhere
    The documentation about trackers (v5) claims that "The CSV format that is accepted as import input is accessible over the CSV file submission screen. ". What does "accessible over" mean ? In that page I can only see a button to select a file, and one to upload it.

    I would happily learn by example, on an hypothetical CSV-export... if only such a thing existed !
    Trackers
    9.2
    Empty
    • [ ] enhancement
    • [ ] internal improvement
    Empty
    Stage
    Empty
    Closed
    2016-12-19
    Attachments
    Empty
    References
    References list is empty

    Follow-ups

    User avatar

    Just use proc_open and let unix do the rest:Assuming you have N files (like deliveries/attachments) and 1 stream on stdin:

      proc_open("zip - file1.bin file2.bin ... fileN.bin -",...)

    Then feed the input pipe with your stream (typically the XML) and plug the output pipe to the PHP process's own sdout: you'll end up streaming ZIP.

    User avatar

    You are right, this can work. But I don't know if there is a lib that allow to stream tar/cpio/zip content on stdout in php.

    User avatar
    last edited by: Alexandre Ferrieux (ferrieux) 2017-01-09 12:08
    Re the multiple components, you can output as tar or cpio, or use separate invocations of the command with flags.

    Re the SSH access, well, it's not an alien setup (since you document it). And it is better than nothing: I'd rather have the feature soon, SSH only, rather than next year in REST.

    Re disk space: it is a trivial matter of quota (if a given user fills his quota, that's his problem). And he doesn't even need to touch the local disk if everything goes over stdout
    :
    ssh user@tuleap-server export_script -tar MyProject \| gzip > /local/disk/myproject.tgz
    User avatar

    Yes but the current CLI requires to be run on the server and it's not sure the project admin will be granted SSH access.

    Not to say that it would be really easy to run out of disk space if things are not carefuly controlled.

    + dumping everything on stdout is not an option as binaries (attachments, SCM repos, file deliveries & co) are not part of the XML archive.

    User avatar
    Yes but if you start with just a cli exposure (not REST), there's no risk of connection timeout: the process (running with mortal rights, chroot, quota, nice, ionice, you name it) may take hours to complete, writing its XML to stdout. No async machinery needed ;-)
    User avatar

    That sounds a good first step.

    We nevertheless need to deal with the export request async (SystemEvents in Tuleap WoW). While duming a simple tracker is nearly instantaneous, duming a whole project with lots of data can take several hours.

    User avatar

    OK, there's a clear asymmetry import/export. What about the following:

    • let mortal admins export (as a backup, or as a bridge to other tools)
    • keep import restricted to site admins

    This way, site admins keep the sharp knife, and project admins manage their backup archives (which are 99% useless until a disaster occurs); when disaster occurs on a given project,  man-to-man discussion happens between its admin and the site admins, who can estimate the usurpation risks by parsing the XML (eg by enumerating the userids)

     

    User avatar

    Now you seem to hesitate: let's discuss the questions you seem to find tricky:

    • file size: what is the problem ?
    • duration of import: again what is the problem ?

    The current export is able to import almost anything (tracker, frs, docman, mediawiki, svn, git, etc). So the archive might be huge (we have dealt with archive of 10GB). When everything is run on the server there is no issue as it can take "forever" without bothering about connexion lost.

    And as site admin control how many imports are done in parallel, they also control the load on the server (importing a large archive is very resource consuming, it's better to run it during quiet periods).

    Now, when run from client side, we have to deal with the upload of the archive + time processing (async) + when it will be done (import scheduling) + reporting to end user.

    • permissions: as said, import/export should be restricted to project admin. Do you have further restrictions in mind ?

    Import is designed to deal with import from the external world. One of the critical operation is the management of the users. As of today, when importing, you can create on the fly the missing users (users described in the archive but that doesn't exist in the target server). This is a very sensitive operation and should not be placed in the hand of "mere mortals" (esp. in context with an LDAP backing up auth).

    At core, the import would allow any ppl granted to create anything in the name of anyone. For instance, as a project admin you can create a document in the name of your CEO saying he wants to fire 50% of the company.

    As of today we don't have to deal with those problems as the import is in the hand of people that can already fiddle in the database.

    So I guess I'll need to RTFS ...will do that :)

    Yeah + questions on mailing list are the way to go.

    User avatar

    Sure, I was't even thinking about CSV ;-)

    You're right, the way to go is to make XML import/export available to project admins. Since you already have the code (to be run in cli with site-admin rights), indeed it is just a matter of deciding about permissions and exposing it in (either in REST, or in cli with project-admin rights).

    Now you seem to hesitate: let's discuss the questions you seem to find tricky:

    • file size: what is the problem ?
    • duration of import: again what is the problem ?
    • permissions: as said, import/export should be restricted to project admin. Do you have further restrictions in mind ?

    Re contributions: thanks for the pointers. But the PNG is not really a design documentation, right ;-)

    So I guess I'll need to RTFS ...will do that :)

     

    User avatar

    Well, "history not importable" may be the current state of the code, but it's surely doable !

    History is not importable with CSV, you are right.

    One of the reason the follow-ups are not managed in tv5 yet is that CSV is very poor to describe history and the tricks that were developed for tv3 where barely protable to tv5 in a clean manner. Here we are talking about history but, as far as I know, there is no ways to deal with attachment in CSV (except maybe encoding in base64 100MB files but I don't consider this as a solution). So in any case CSV cannot be a trustabled loosless import/export.

    This is why we developped the XML import/export.

    Maybe the solution would be to make XML import/export available to end users (or at least project administrators). There are a serie of questions to answer on this topic (like management of the file size, duration of import, management of permissions).

    Until you have a lossless import/export

    We do have it, it's the XML format (I know I keep repeating it).

    It's been used in production since 1 year for real projects migration (even cross-server Tv3 -> Tv5). We have even migrated almost 1000 thousands of projects from another software into Tuleap (again for production use).

    Now you said you'd review contributions. I am a developer, and willing to help. Can you point me to starting material, like a design document with the data model (database schemas) ?

    Great !

    To start a contribution, the process is described here and you can have an old but still accurate view of top level tracker architecture here. Unfortunately we don't have a database schema but the architecture doc translate more or less into the SQL data model.

     

    User avatar
    Well, "history not importable" may be the current state of the code, but it's surely doable !

    Until you have a lossless import/export, every Tuleap deployed today is at risk. Global database or machine backups won't help in case of massively multiproject instances like mine: if project A wants a rollback, you need to also roll back projects B to Z (since they are on the same database), which simply won't happen.

    Now you said you'd review contributions. I am a developer, and willing to help. Can you point me to starting material, like a design document with the data model (database schemas) ?
    User avatar

    Very simple: give a working import/export that does not lose information.

    The CSV export/import looses information, because the history of an artifact is not exported (only the current state) and an history is not importable with this feature (was not possible in TV3)

    This way you'll allow people to backup or migrate a TV5 (with followups) from one instance or machine to another (currently this is impossible).

    This is possible with the import_project_xml script: http://tuleap-documentation.readthedocs.io/en/latest/administration-guide/project-export-import.html#trackers. But you previously said that you cannot use this option and I understand that because of the constraints.

    If you want to convert a TV3 into a TV5 with all the content (including the comments) on the same platform, there is an option during TV5 creation to import a TV3 wih all its data.

    In any case we will update this request when we will have more to provide about the comment export/import.

    User avatar
    Very simple: give a working import/export that does not lose information.
    This way you'll allow people to backup or migrate a TV5 (with followups) from one instance or machine to another (currently this is impossible).
    And at the same time you'll satisfy anybody with tricky requests (that would need lots of work to fulfill with the APIs)
    User avatar

    Leave the TV5 half-featured while already deprecating TV3

    TV3 are already deprecated, and since a long time. For the "half featured", TV5 has more features than TV3 never provides to the end users and is able to do everything the TV3 is capable of (expect the comments import)

    what are your plans

    We have to check with other integrators but I'm personnaly for the REST API improvement. Then we will develop this feature when we will have time to do this. But we will be happy to review an external contribution to add this feature.

    Give some warnings in the docs, saying that TV5 is everything but complete ?

    The real question here is "what is complete" ?

    User avatar
    So.. what are your plans ? Improve the APIs ? Finish the missing TV5 export ? Leave the TV5 half-featured while already deprecating TV3 ? Give some warnings in the docs, saying that TV5 is everything but complete ?
    User avatar
    OK, thanks for shedding light on the design (apologies if that was documented somewhere; please give a link if so)

    So, it looks like the only route is "extend the APIs" or "make import/export work". Your call.
    User avatar

    How hard would that be ? Shouldn't it be simpler than endless extensions of the APIs ?

    Very hard and very dangerous.

    First of all because there is no "one DB per project" (that's madness, mediawiki does that and on some instance we have databases that have 120'000 tables!). Then the data model is not meant to be me modified by anyone. Just for the tracker there are 70+ tables, project admins are more likely to shoot themeselves in their feets than doing good if we were able to grant them access.

    The database is not meant to be used by anything else than Tuleap code base. Hence the API access to provide testable & high level abstraction on underlying details (be it database, file systems, cache layers & co).

     

    User avatar
    I'm "also" looking for anything that works, right :) Since currently neither import nor direct MySQL access are options, anything will do.
    So, yes, an extension of the REST API would be perfect.
    But I guess that's some development work. Contrast this with the perfectly generic MySQL access...
    The show stopper here is the absence of a per-project grant (ie giving project admin full power over his/her project instead of all projects).
    How hard would that be ? Shouldn't it be simpler than endless extensions of the APIs ?
    User avatar

    I'll update the doc in that sense.

    I have a few questions:

    followup update is out of reach of (1) APIs (there does exist a "ArtifactID/changeset" node but it is read-only)

    I'm not sure to get it. You are also looking for a REST route to update a given follow-up comment, am I correct ?

     

    User avatar
    The kind of documentation that would have saved me some time:

    - tracker import/export only exists at the report level. There is an asymmetry in the GUI, since these two features are presented in two completely unrelate places: Export in the Report view ; Import CSV at the Admin level in the tracker view.

    - followup update is out of reach of (1) APIs (there does exist a "ArtifactID/changeset" node but it is read-only) and (2) GUI-based import/export

    - anything not covered by APIs and GUI (like followup update) must be done in MySQL.

    - unfortunately, MySQL access is restricted to a single "all powers" user, there is no notion of per-project MySQL access
    User avatar

    I agree that documentation is sparse on the subject and I tried to do my best to give you the insights (label names + strings). You also got an example by yourself:

    aid,submitted_on,status_id,submitted_by,severity,summary,assigned_to,description,attachment
    300243,"05/28/2016 16:18","On hold",afyf6286,Low,"Test Ticket",,"This is a test ticket ; never mind",dl1rtt.png

    But I cannot give an example on something that doesnt exist: there are no ways to export or import follow-up comments in tracker v5 CSV. This could be implemented of course but that doesn't exist.

    Could you give me an example of the kind of documentation you expect to find and I'll do my best to complete ?

    User avatar
    OK, so , to summarize, you have an "Import CSV" feature that nobody can use, short of format documentation, and you're unable to produce a simple example. So my only option is to reverse-engineer the code, right ?
    User avatar

     As you can see, no trace of the followups.

    Yes. As already wrote below, it's not implemented yet.

     

    User avatar
    Here is what I get from Export-Report / "export all columns", for an artifact with 5 followups with rich text. As you can see, no trace of the followups.

    aid,submitted_on,status_id,submitted_by,severity,summary,assigned_to,description,attachment
    300243,"05/28/2016 16:18","On hold",afyf6286,Low,"Test Ticket",,"This is a test ticket ; never mind",dl1rtt.png
    User avatar

    "start from THE export" ? How could I do since there is no such thing (a CSV export of the artifacts, not the report) ???

    I'm probably missing something as "CSV export of the artifacts" you wish, to me is exactly "export report" feature (you get a CSV file with artifacts fields)

    (though the push to "migrate to v5" may be a bit premature)

    I don't understand what you imply here.

    can you please just give an example of a small CSV file whose effect, when imported, is to set the date of a given followup comment ?

    As stated 2 comments below, there is no such thing in tracker v5 yet.

    User avatar
    "start from THE export" ? How could I do since there is no such thing (a CSV export of the artifacts, not the report) ???

    As I understand this is work in progress, and documentation is a low prio (though the push to "migrate to v5" may be a bit premature), can you please just give an example of a small CSV file whose effect, when imported, is to set the date of a given followup comment ?
    User avatar

    In that case, my question is simple: where can I see documentation for the CSV format expected at this very place ?

    As said, there is no documentation of format yet.

    The closest thing ATM is to start from the export.

    Basically, the columns should named after fields name (not the labels) and the input are strings. Some fields cannot be submitted (like ID or last update date).

    User avatar
    Oh, you are mentioning the full project export/import, which is yet another thing. It's not an option in my case since our site hosts hundreds of projects, and since the export/import must be run in shell as codendiadm (as opposed to my username), I would need to ask the site admins (as opposed to project admin, which is me) to run it for me every time I need to touch the tracker metadata.

    So, back to my initial question: "CSV import" in a tracker v5 does exist, right ? With the label "Import artifacts in the tracker from a CSV file".

    In that case, my question is simple: where can I see documentation for the CSV format expected at this very place ?
    User avatar

    In tracker v3 it was possible to export in CSV the follow-up comments but not the state changes.

    If you are looking for a comprehensive export of the artifact you can have a look to XML export format

    User avatar
    Sorry, I'm not talking about reports, but about the whole contents of the trackers, including followup comments and state changes, along with their metadata (autor and date). This existed for TV3, but not for TV5, unless I'm mistaken.
    User avatar

    I would happily learn by example, on an hypothetical CSV-export... if only such a thing existed !

    Go on report page "Export" and either export all report's columns or just the matching one

    Contributions for a better documentation are welcomed: https://github.com/Enalean/tuleap-documentation-en/blob/master/languages/en/user-guide/tracker.rst


    • Status changed from New to Closed
    • Close date set to 2016-12-19
    User avatar
    • Original Submission
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes