Jonathan Palm (palm)2018-08-03 09:41 Making some better error checking in getLdapGroupMembers would be the least one could do. Other than that, having an alternate LDAP search which does not silence errors as "false" results could also be useful, not just for this particular problem. What do you think, @vaceletm?
Jonathan Palm (palm)2018-08-01 14:17 I've made some additional investigations into the LDAP plugin regarding robustness. So far, I've not seen any apparent place where the sync should fail due to the unavailablity outside of LDAP_GroupManager. setGroupDn is likely the source of the errors, as the only place where the plugin checks whether it can find the group or not is when getGroupDn is called with a null group dn. setGroupDn seems to circumvent this completely, no checks being made to see that it is valid. As I mentioned earlier, there is no distinction between an empty group, a non-existing group, or a disconnected LDAP server after that initial, circumventable check.
Jonathan Palm (palm)2018-07-26 13:52 The failsafes seems pretty good throughout the LDAP plugin. However, I've found one very suspicious method in LDAP_GroupManager. It would seem that the diffDbAndDirectory method makes a call to getLdapGroupMembersIds. If that particular search fails for any reason, the method will return an empty list of users. This could well be the culprit, as it would lead to rapid deletion of group members. I'll continue my investigation and keep you updated
Jonathan Palm (palm)2018-07-26 11:19 Absolutely. I've started investigating. I'll get back to you when I've found where the sync fails
Manuel Vacelet (vaceletm)2018-07-26 08:53 IMHO there are 2 things Having DB transaction for this process is a good idea but it's a risky thing as the transaction might be long. With 50'000 users to check you might have a sync that last 10mn or even one hour. Hence the risk of having a user updated meanwhile is higher and the transaction would be rolled back. Better handerling of LDAP un-availability and errors with a plan for recovery. To me it's not clear enough as of today what's causing those issues, I'd like to get a clear picture of what the situation is, what is observed at the code level and why the current code doesn't deal with it cleanly. Could you investigate that ?
Jonathan Palm (palm)2018-07-25 13:36 Another potential solution would be to just not remove any users if the LDAP server is disconnected during sync. It might be simpler then implementing full transactions, as the whole sync wouldn't need to be rescheduled and/or rolled back.
Jonathan Palm (palm)2018-07-24 09:38 Isn't $sys_ldap_threshold_users_suspension for user sync only? I cannot find any similar features for syncing groups. I am not quite sure what the best solution for handling a disconnection from LDAP would be. I am just thinking that a database 'transaction' would be rather simple, as the uncommited changes done by the sync would be 'rolled back' if it became disconnected while syncing.
Manuel Vacelet (vaceletm)2018-07-24 08:46 I don't really know if the term "transaction" is good but a better handling of LDAP queries is a good idea. It's worth to say that there is already a parameter that manage a behaviour related to that: // Threshold for users to be suspended // On beyond of this value expressed in percentage no users will be suspended $sys_ldap_threshold_users_suspension Wouldn't that be enough in your case ?