request #35108 Duplicated user entry in MediaWiki database
    Manuel Vacelet (vaceletm)
    2023-12-13 10:26
    2023-11-09 17:43
    Duplicated user entry in MediaWiki database

    When attempting to run migration from 1.23 to 1.35 we faced an issue for filling the tuleap_user_mapping table because the source user table contains duplicate entries. Example:

    | 5476 | ER270077 | Er270077 |
    | 5476 | ER270077 | ER270077 |

    Is that safe to remove one of the 2 entries before proceeding ?

    Mediawiki Standalone
    • [ ] enhancement
    • [ ] internal improvement
    Robert Vogel (rvogel)


    User avatar
    Robert Vogel (rvogel)2023-12-08 15:10

    When you perform a user rename on Tuleap, the integer ids of the user should stay the same, shouldn't they?

    We use the Tuleap User-ID provided in the "sub" field [1][2] of the OAuth2 flow to create a MediaWiki User object [3] and persist it into the database. Only for legacy users, we fall back to usernames on the tuleap_user_mapping table [4].

    Let's say ldap_user_5 has an internal id of 1337. This id stays the same after he got renamed to LDAP_user_5. I'd have expected that e.g. when logging in with LDAP_user_5, the "sub" was still 1337. In the tuleap_user_mapping there should be an entry that maps 1337 -> ldap user 5. This way Extension:TuleapIntegration should log the user in as (4, Ldap user 5).

    So the scenario you describe would imply that after renaming users in Tuleap the value in the "sub" field changes.

    [1] https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/b7493742bff5edf7433076f6bb3ebce4c2915338/src/TuleapResourceOwner.php#L27 [2] https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/b7493742bff5edf7433076f6bb3ebce4c2915338/src/TuleapResourceOwner.php#L45 [3] https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/b7493742bff5edf7433076f6bb3ebce4c2915338/src/Special/TuleapLogin.php#L142 [4] https://github.com/wikimedia/mediawiki-extensions-TuleapIntegration/blob/b7493742bff5edf7433076f6bb3ebce4c2915338/src/Special/TuleapLogin.php#L139

    User avatar
    last edited by: Manuel Vacelet (vaceletm) 2023-12-06 15:51

    I'm back on this after the bug of user migration is fixed.

    With the fix I referenced below the migration succeed however I'm facing another issue but I'm unsure about what can be done. I'm sharing it to have another brain on the subject :)


    • a MediaWiki 1.23 instance with Tuleap users ldap_user_5 & ldap_user_6
    • That created 2 MW users (user_id, username): (4, Ldap user 5) & (5, Ldap user 6)
    • Tuleap rename: LDAP_user_5 & ldap_USER_6 (I know, sorry...)
    • Those 2 users goes back on MW, that creates two new users (user_id, user_name): (6, Ldap USER 6) & (7, LDAP user 5)
    • Migration to MediaWiki 1.39
    • Those 2 users goes back on MW, that creates again two new users (user_id, user_name, real_name): (8,137,LDAP user 6) & (9,134,LDAP user 5)

    The database at this point:

    user_id user_name user_real_name
    1 Vaceletm Manuel VACELET
    2 Jean
    3 Fred
    4 Ldap user 5
    5 Ldap user 6
    6 Ldap USER 6
    7 LDAP user 5
    8 137 LDAP user 6
    9 134 LDAP user 5

    At least, the display somehow makes sense:

    But I'm wondering why the 2 last users where re-created?

    User avatar
    Robert Vogel (rvogel)2023-11-14 08:08

    That's probably a valid solution. Even though I don't know how this will play out during the migration. The user_id field is usually the unique identifier in the user table and a foreign key to various other tables. During the update some of those foreign keys may be moved (e.g. in case of the revision table: revision.rev_user -> actor.actor_user ). I can not tell how the migration scripts will behave in this situation.

    User avatar

    Finally I choose to not fiddle with source database. I "just" deduplicate the entry when filling the tuleap_user_mapping table. See gerrit #29762.

    However I discovered a bug that is way more problematic: page history authors are not kept during migration (see art #35118).

    User avatar
    Robert Vogel (rvogel)2023-11-10 15:26

    I am assuming that your example shows columns user_id, user_name and user_real_name in this order. In other words: the two rows match in user_id and user_name and differ in user_real_name.

    If this is the case, then yes, you can just drop one of them.

    If there is a difference in the user_name it may be worth checking other tables as well. MediaWiki stores the user_name as a redundant information (to speed up certain queries) in some other tables. But usually the more important information is the user_id, so it should still be safe to remove one of the rows.