Warning

This is a design page. It was used to design and discuss the initial implementation of the change. However, the state of this document does not necessarily correspond to the current state of the implementation since we do not keep this document up to date with further changes and bug fixes.

Improve SSSD Performance with a timestamp cache

Related ticket(s):

At the moment SSSD doesn’t perform well in large environments. Most of the use-cases we’ve had reported revolved around logins of users who are members of large groups or a large amount of groups. Another reported use-case was the time it takes to resolve a large group.

While workarounds are available for some of the issues (such as using ignore_group_members for resolution of large groups), our goal is to be able to perform well without these workarounds.

  • User who is a member of a large amount of AD groups logs in to a GNU/Linux server that is a member of the AD domain.

  • User who is a member of a large amount of AD or IPA groups logs in to a GNU/Linux server that is a member of an IPA domain with a trust relationship to an AD domain

  • Administrator of a GNU/Linux server runs “ls -l” in a directory where files are owned by a large group. An example would be group called “students” in an university setup

During performance analysis with systemtap, we found out that the biggest delay happens when SSSD writes an entry to the cache, especially for large group entries. This is also confirmed by empirical evidence from our users, where most deployments were OK with SSSD performance once the cache was moved to tmpfs or even when ignore_group_members option was enabled.

We can’t skip cache writes completely, even if no attributes changed, because we store also the expiration timestamps in the cache. Also, even if a single attribute (like the timestamp) changes, ldb would need to unpack the whole entry, change the record, pack it back and then write the whole blob.

In order to mitigate the costly cache writes, we should avoid writing the whole cache entry on every cache update, but only write the entries if something actually changed.

To avoid this, we will split the monolithic ldb cache representing the sysdb cache into two ldb files. One would contain the entry itself and would be fully synchronous. The other (new one) would only contain the timestamps and would be open using the LDB_FLG_NOSYNC to avoid synchronous cache writes.

This would have two advantages:

  1. If we detect that the entry hasn’t changed on the LDAP server at all, we could avoid writing into the main ldb cache which would still be costly. We would use the value of the modifyTimestamp attribute of the LDAP entry to see if the entry had changed or not.

  2. The writes to the new async ldb cache would be much faster, because the entry is smaller and because the writes wouldn’t call fsync() due to using the async flag, but rather rely on the underlying filesystem to sync the data to the disk.

On SSSD shutdown, we would write a canary to both the timestamp cache and the main sysdb cache, denoting graceful shutdown. On SSSD startup, if the canary wasn’t found or if the values differ, we would just ditch the timestamp cache, which would result in refresh and write of the entry on the next lookup.

The basic idea is to use a combination of the operational modifyTimestamp attribute and checking the entry itself to see if the entry changed at all and if not, avoid writing to the cache.

Checking the value of modifyTimestamp would be enough for group entries, which should be the first iteration of this work. For checking if other entries (mostly users) have changed, we need to compare the value of the attributes in the cache with what we are about to store in the cache.

Therefore, these enhancements are proposed for the 1.14 versions, sorted by the importance as observed with systemtap testing:

  • only write the cache entry if the modifyTimestamp of the original entry had changed. If it hasn’t changed, only the timestamps would be written to the timestamp cache

  • if the modifyTimestamp had changed, compare the attributes of the cache entry with the attributes we are about to write. If there are no differences, only write to the timestamp cache

  • refactor the nested group processing to make sure expensive lookups (such as lookups of all members of the group, there can potentially be thousands of these) are only performed once and intermediate results are stored in-memory.

  • attempt to shortcut parsing the attributes of the entry returned from LDAP sooner. The idea behind this is that if the modifyTimestamp did not change, we can reuse the entry we already cached.

Minor enhancements in later versions might include:

  • using syncrepl in the server mode for HBAC rules and external groups in refreshAndPersistMode. This would provide performance benefit for legacy clients that rely on server’s HBAC rules for access control.

  • using syncrepl in the server mode for external groups in refreshAndPersistMode. This would mainly simplify the external groups handling, rather than improve performance

  • A lot of time is spent looking up attributes in the sysdb_attrs array. This is something we might want to optimize after we’re done with the cache writes.

  • We might even consider offering syncrepl in refreshOnly mode as an client-side option for enumeration. However, this would have to be an opt-in because every refresh causes the server to walk the changelog since the last refresh operation. Enabling this option on all clients would trash the server performance.

The sysdb_ctx already contains a handle of the main sysdb cache. We would add another ldb file that only contains the timestamp and the DN of an entry. This ldb file would be opened in the nosync mode. Attributes used for lookups, like dataExpireTimestamp must be indexed in this database as well.

When storing a user or a group to sysdb using functions like sysdb_store_user, we first check the difference between modifyTimestamp attributes. If there are no differences, only the timestamp attributes, such as lastUpdate or dataExpireTimestamp would be updated in the timestamp cache. We need to do this check in the lower-level sysdb calls to make sure this enhancement also works for users and groups retrieved through the extop plugin.

If the value of modifyTimestamp differs, we proceed to checking the diff between values in the cache and the values read from LDAP.

Details about shortcut of attribute parsing will be added to this design page later.

Currently no configuration changes are expected. We might add some if we decide to implement on-demand syncrepl.

If the entries on the server did not change (except timestamps), then actions like user and group lookups and logins should be considerably faster.

The SSSD should also correctly detect when the entries in fact did change on the server. In this case, a full cache write will be performed.