•  
      request #35118 MediaWiki migration doesn't keep authors
    Infos
    #35118
    Manuel Vacelet (vaceletm)
    2023-12-15 14:35
    2023-11-13 11:49
    36728
    Details
    MediaWiki migration doesn't keep authors

    Example of history after migration. Authors are no longer visible

    2606-image.png

    Mediawiki Standalone
    Empty
    Empty
    • [ ] enhancement
    • [ ] internal improvement
    Robert Vogel (rvogel)
    Stage
    Empty
    Closed
    2023-12-15
    Attachments
    References

    Follow-ups

    User avatar

    Nevermind, I got mixed-up with the various patches. I was referring to the previous state when migrated instances history was no longer visible. I double checked and it works now. I'm closing this point.


    • Status changed from Verified to Closed
    • Close date set to 2023-12-15
    User avatar
    Robert Vogel (rvogel)2023-12-15 11:31

    Sorry, I am not sure if I understood your last question. Could you please rephrase?

    User avatar

    Ok perfect.

    One last more question: is there anything we can do for MW there were migrated before the fix ?

    User avatar
    Robert Vogel (rvogel)2023-12-08 14:31

    Thanks for checking.

    Yes, it is expected to use the Tuleap user_id for new users. We decided on this pretty early in the project. The tuleap_user_mapping table was specifically created to maintain the legacy usernames and avoid duplication. IIRC the string value (Vaceletm; unix name?) was discouraged to use in the new implementation.

    User avatar

    We pulled e8b2e2d2f44cd2d64edc5afd5e9f3deb08fa48f1 version from REL_139 branch and I confirm that:

    • history of authors is now correctly migrated
    • users are no longer duplicated when they browse migrated mediawiki

    While looking at the user table of an instance after this new migration I noticed some small differences:

    • user_name when the user had existed before migration is a string eg Vaceletm
    • new users (that didn't use MW prior the migration) are created with their Tuleap user_id asuser_name

    Is that expected ?

    User avatar
    Robert Vogel (rvogel)2023-12-01 15:28

    You are right of course. I didn't consider the type hint. Only looked at the doc-block. facepalm

    This also explains the initial issue now. The mapping provider never returned an User object, even if the DB lookup found an entry. Therefore a new user got created and we got the duplication issue.

    I believe the recent patches have solved this already, but I'll have another look.

    User avatar
    Robert Vogel (rvogel)2023-11-24 10:43

    Well, it can't really: https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/a2e1df60e9e39f1c4fcdcec3f296a570515fca9e/src/UserMappingProvider.php#L29-L55

    I know, that the method description is wrong here, so I created a patch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/TuleapIntegration/+/977125/1/src/UserMappingProvider.php

    If the issue occurs with the patched version, this means, neither the tuleap_user_mapping table contained a proper value, nor TuleapResourceOwner::getId. This let's me think that TuleapResourceOwner::getId returns an empty string, which is very odd, as we retrieve this value directly from the Tuleap REST-API as value of sub in

    1. https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/master/src/Provider/Tuleap.php#L169
    2. https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/a2e1df60e9e39f1c4fcdcec3f296a570515fca9e/src/TuleapResourceOwner.php#L27

    Or maybe the value of sub is an invalid value for a username in MediaWiki.

    User avatar
    Thomas Gerbet (tgerbet)2023-11-24 09:15

    No, it is with the patch applied. It seems that $this->userMappingProvider->provideUserForId( $owner->getId() ); can return a string.

    User avatar
    Robert Vogel (rvogel)2023-11-24 08:13

    I found this issue while doing my investigations for this ticket. Just to be sure about this: The call-stack you have shared here is from before the patch you mentioned, isn't it? Because the patch is meant to prevent just this.

    While UserFactory::newFromName theoretically can return null this should never be the case for a value coming from TuleapResourceOwner::getId.

    User avatar
    Thomas Gerbet (tgerbet)2023-11-23 17:25

    For information, while testing our upgrade to MediaWiki 1.39 I stumbled upon a type issue in a commit you linked to this ticket: https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/commit/a2e1df60e9e39f1c4fcdcec3f296a570515fca9e

    It seems that the first access from a user on a MediaWiki instance that have been migrated to MediaWiki Standalone crashes with the following errors:

    2023-11-23 15:49:00 web plugin_mediawiki_220-mw: [f7bee404b73d8e8e492018b2] /mediawiki/_oauth/Special:TuleapLogin/callback?state=9d40c3551afc9b8a5684fcbb4a3b4310&code=tlp-oauth2-ac1-14.b7c4655e030d3b0c186769395bb41a6aa8a72c72bdb91b85fbe65be23bd7a5bf   Error: Call to a member function setRealName() on string #0 /usr/share/mediawiki-tuleap-flavor/current/extensions/TuleapIntegration/src/Special/TuleapLogin.php(111): TuleapIntegration\Special\TuleapLogin->setUser()
    #1 /usr/share/mediawiki-tuleap-flavor/current/extensions/TuleapIntegration/src/Special/TuleapLogin.php(75): TuleapIntegration\Special\TuleapLogin->callback()
    #2 /usr/share/mediawiki-tuleap-flavor/current/includes/specialpage/SpecialPage.php(701): TuleapIntegration\Special\TuleapLogin->execute()
    #3 /usr/share/mediawiki-tuleap-flavor/current/includes/specialpage/SpecialPageFactory.php(1428): SpecialPage->run()
    #4 /usr/share/mediawiki-tuleap-flavor/current/includes/MediaWiki.php(316): MediaWiki\SpecialPage\SpecialPageFactory->executePath()
    #5 /usr/share/mediawiki-tuleap-flavor/current/includes/MediaWiki.php(904): MediaWiki->performRequest()
    #6 /usr/share/mediawiki-tuleap-flavor/current/includes/MediaWiki.php(562): MediaWiki->main()
    #7 /usr/share/mediawiki-tuleap-flavor/current/index.php(50): MediaWiki->run()
    #8 /usr/share/mediawiki-tuleap-flavor/current/index.php(46): wfIndexMain()
    #9 {main}
    
    User avatar

    Yes, I got the result with the following query:

    select * from tuleap_mediawiki.mw_235_tuleap_user_mapping;
    

    Knowning that user table is tuleap_mediawiki.mw_235_user for instance

    User avatar
    Robert Vogel (rvogel)2023-11-23 09:26

    Does the tuleap_user_mapping table have the same prefix as the other tables? So e.g. if your user table is actually called 987_user then your tuleap_user_mapping need to be called 987_tuleap_user_mapping.

    User avatar
    Robert Vogel (rvogel)2023-11-22 17:20

    Okay, so the values seem good. Then we need to investigate why you haven't been logged in as Er270077 and Vaceletm3 but as 133 and 102. Unfortunately, there are no logs we could check in this case. I'll give feedback as soon as possible.

    User avatar

    Here is the content of tuleap_user_mapping:

    tum_user_id tum_user_name
    133 Er270077
    1139 Fred
    102 Vaceletm3
    User avatar
    Robert Vogel (rvogel)2023-11-22 08:40

    Sorry for the delay.

    I have discussed this with @dsavuljesku and this is the result: We believe that there may be something wrong in the tuleap_user_mapping table.

    In general, the migration process will not alter the user table at all. There will be no renaming of the users or anything like this. Instead, Extension:TuleapIntegration, which is responsible for the automated login process will use the information from tuleap_user_mapping to determine the wiki username [1].

    This means, in your case, the tuleap_user_mapping table should look something like this:

    tum_user_id | tum_user_name
    ------------+--------------
            133 | ER270077
            102 | Vaceletm3
    

    By this, Extension:TuleapIntegration should log you in as Vaceletm3 rather than creating the new user 102 (is should also update the user_real_nameand user_mail fields of the existing user).

    As it did not log you in as Vaceletm3, but created a new user 102 that means that it couldn't find a proper entry in tuleap_user_mapping [2].

    Can you please share the contents of the tuleap_user_mapping table?

    [1] https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/1.0.4/src/Special/TuleapLogin.php#L131C1-L134 [2] https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/1.0.4/src/UserMappingProvider.php#L31-L42

    User avatar

    I don't think it's enough here because it creates duplicated users. In my last comment "bad news", the user 1 and the user 8 are actually the same Tuleap side. User 8 was created after migration.

    And, actually, user 5, 6 and 7 are the same. 5 & 6 were automatically created by MW 1.23 because there was a change of case in the username. 7 was created because user "custom Eric" browsed MW after migration to 1.35

    User avatar
    Robert Vogel (rvogel)2023-11-14 16:33

    Okay this is really only an issue in the display of the username. It is part of the extension TuleapIntegration. We replace all usernames with "real names". Unfortunately the code does not check if the "real name" field is empty :|

    We'll provide a patch.

    User avatar

    Bad news, there is a mix-up somewhere:

    SELECT user_id, user_name, user_real_name FROM tuleap_mediawiki.mw_235_user;
    
    user_id user_name user_real_name
    1 Vaceletm3
    2 Fred
    3 Jean 135
    4 JEAN 135
    5 Er270077
    6 ER270077
    7 133 custom Eric
    8 102 Manuel VACELET

    Users n°7 & 8 where created after migration. It means that the migration didn't properly convert user names & ids

    User avatar

    Good news, the data seems to be there.

    SELECT * FROM tuleap_mediawiki.mw_235_actor WHERE actor_id IN (2, 1, 3, 4);
    
    actor_id actor_user actor_name
    1 1 Vaceletm3
    2 2 Fred
    3 3 Jean 135
    4 4 JEAN 135
    SELECT user_name, user_real_name FROM tuleap_mediawiki.mw_235_user WHERE user_id IN (2, 1, 3, 4)
    
    user_name user_real_name
    Vaceletm3
    Fred
    Jean 135
    JEAN 135

    As the project was migrated, I no longer have access to the original database but I can reproduce the scenario if needed.

    User avatar
    Robert Vogel (rvogel)2023-11-14 08:31

    On the updated database, please execute

    1. SELECT page_id FROM page WHERE page_title = "<pagename_with_underscores_instead_of_spaces"; -> Get the ID of a page whit this issue
    2. SELECT rev_id FROM revision WHERE rev_page = <numeric_page_id_from_step_1>; -> Get all the IDs of the revisions of that page. HINT: the rev_actor column should have 0 everywhere. That's why we perform step 3
    3. SELECT revactor_actor FROM revision_actor_temp WHERE revactor_rev IN ( <CSV_list_of_revision_ids_from_step_2> );
    4. SELECT * FROM actor WHERE actor_id IN ( <CSV_list_of_actor_ids_from_step_3> );
    5. SELECT user_name, user_real_name FROM user WHERE user_id IN ( <CSV_list_of_user_ids_from_step_4> );

    On the original database, please execute 6. SELECT rev_user FROM revision WHERE rev_page = <numeric_page_id_from_step_1>; -> Must be the same ID before and after the migration anyways 7. SELECT user_name, user_real_name FROM user WHERE user_id IN ( <CSV_list_of_user_ids_from_step_6> );

    Please share results of step 4 and share/compare results of steps 5 and 8.

    User avatar
    Robert Vogel (rvogel)2023-11-14 08:00

    Actually this looks a little bit odd. The place where the username should be is just blank. If the user<->revision connection was really broken here, I'd expect an IP address instead of the username.

    I'll create a list of SQL queries to shed some light on this.