Original content by Scott.Lindsey and Mike.Hovis
This document discusses the configuration, processes and best practices for pruning old data from your IdentityIQ environment, in order to maintain performance and database disk space allocation.
IdentityIQ should be configured to store the appropriate amount of historical data to optimize database size and overall system performance based on your business requirements for retaining identity and certification data.
The following objects have the most effect on database size and system performance:
Object Type | Description | Impact |
---|---|---|
Identity Snapshots | Stores a copy of the governance Identity data for historical purposes. These snapshots are created during certification generation, and by using the option “Maintain identity histories” in the Identity Refresh task. The number of Identity Snapshots can affect the time to generate new certifications, the time to render Identities in the UI, and the size of the database. It is recommended that you define an expiration policy for identity snapshots that matches your certification archive policy. |
Performance: High Table Space: High |
Certifications | As certifications are generated, large amounts of historical data is captured and stored in the certification tables along with identity snapshots, history, and entitlement snapshot data. After the certification cycle is complete and audit reports are generated, the certification data becomes stale. It is recommended that you keep at least one previous cycle of data to use during the next certification generation, for comparison purposes. You may choose to keep live certification data in the system for one or more complete certification cycles to provide historical data for current certifications. Older certifications can be archived and deleted to improve certification performance and to reduce disk space. |
Performance: Medium Table Space: High |
Task Results | Any time a task is executed, a task result is saved. Task results include report results that in many cases consume a large portion of the utilized disk space. Depending on the TaskDefinition configuration, the previous result may be renamed or deleted. In the case of renames to keep historical task results, a large amount of task result data can accumulate, especially for scheduled daily or hourly tasks. It may be important to keep this task data for some period of time for debugging and monitoring purposes. Depending on the business and operational requirements task results should be cleaned up as often as necessary to reduce disk space. |
Performance: Low Table Space: Medium |
Identities | Identities accumulate over time as new systems are aggregated into IdentityIQ and users complete their life cycle at a company. As the number of identities increases, the time to search, refresh and maintain these identities increases. This document won’t discuss strategies for optimizing the execution of these daily tasks but will focus on pruning the dead weight from your identity warehouse. |
Performance: Medium Table Space: Low |
Requests | Each time an email is sent, a new request object is created. For large certification cycles or daily-automated tasks, the request queue can become very large. Request objects are also created for future data workflow events and other object expiration events. Although the completed requests have no effect on system performance they do take disk space. |
Performance: Low Table Space: Low |
For each of the object types it is important to define the number of days before the objects will be deleted or archived to optimize the amount of data retained, maintain system performance, and control the size of the data footprint. The system can also be configured to capture more or less granular data based on your business requirements. Figure 1 and Figure 2 below show the settings available from the System Setup > IdentityIQ Configuration section. Refer to the product documentation for specific explanations.
WARNING: Enabling these options can result in deleted data
Setting and Type | Recommended Retention Policies | Task Configuration |
---|---|---|
Days before snapshot deletion | At a minimum this should be set to match your shortest certification cycle and at a maximum it should match your certification archive days. | Perform Maintenance Option: Prune Identity Snapshots |
Days before task result deletion | This should be set to match the longest schedule for tasks where you need to see previous results. For some customers this is every 60 days, where their longest schedule of interest is every 30 days. This option applies to Task Results and Report Results. | Perform Maintenance Option: Prune Task Results |
Days before certifications are archived | When this option is enabled (not set to 0) completed certifications are serialized into a single Character Large Object (CLOB) and persisted into the spt_certification_archive table. The certification objects are then deleted from the live tables and no longer appear in the UI or are available for reporting. Archived certification objects should be considered as permanently removed but not yet deleted. A recommended practice is to archive your certifications that are older than the last 2 cycles. Certification data should be extracted into a data warehouse or report before archiving. | Perform Maintenance Option: Archive and Prune Certifications |
Days before certification archive deletion | To complete the cleanup of the old certifications this option will cause the archive object to be completely deleted. Use this option with caution, as the data is not recoverable. | Perform Maintenance Option: Archive and Prune Certifications |
Snapshot Frequency in Days | In general, this should be set to the highest acceptable value, in order to reduce the amount of snapshot data being stored. This option is used only if the Maintain Identity history option is enabled in the Identity Refresh task. | Refresh Identity Cube: Maintain identity histories |
Index History Granularity |
This determines the increments at which to store history. For example, if the Identity history is set to ‘Week’, snapshots are preserved on a weekly basis. This option should be set to the largest granular window that is acceptable to the user, to reduce the number of updates to the snapshot data. |
Refresh Identity Cube: Maintain identity histories |
The Perform Maintenance task controls which objects are pruned, and when. It is recommended that you configure a dedicated task that executes on a separate cycle to prune and archive objects. You may choose to have a task for each object type for example.
WARNING: Enabling the prune options will result in deleted data
The Identity Refresh task has options which control the amount of historical data retained for identities. For example, you can configure the Identity Refresh Task to maintain identity histories - this creates a snapshot of each identity that is refreshed when this task is run.
You can control the amount of data retained by this task in several ways:
There are several strategies for preserving the certification, identity, and snapshot data for historical purposes. Choose a strategy that best fits your operating and business needs.
The following SQL and HQL (for querying only) examples can be used as methods to remove large data sets directly at the database layer reducing the overall time to prune old data. Note that these are samples and may need adjusting based on your database type and IdentityIQ version. Any reference to time fields must be made in milliseconds since epoch; these can be generated by adding three digits for milliseconds to results generated from an epoch time converter, such as http://www.epochconverter.com/
The console can be used to run SQL or HQL queries.
SQL queries are run using:
HQL queries are run using HQL. Note that returned time fields will be translated, but for queries, epoch time must be used. For indication of the names of various columns, refer to the hbm.xml files within the identityiq.jar file:
For SQL, you must know which tables are involved (such as spt_identity for Identity). For HQL, all you need to know is the name of the type of object (such as Identity). Here is a list of the table names involved with the largest datasets:
SQL Queries (note that timestamps are in epoch format and must be converted for queries):
Great! Exactly what I was looking for!
Awesome, this helps a lot!
BTW, I know the configuration on "Other Object Expiration" items and others are based on the business requirement. But sometimes, we can't get the business requirement for that. Do we have any best practice on that?
Thanks,
Bill
Hi All,
Where we can configure the Prune Requests age.? Currently in my environment there are only 7 days of requests are holding
Identity Requests are pruned by the Perform Identity Request Maintenance task. You can specify how long to retain them in that task.
If you have follow up questions, please ask them in the IdentityIQ Forums. You can put a link back to this article if you want, but more people follow the forums and you are likely to get more answers/participation there than here (plus it muddies the articles to have long threads of Q/A that are not specifically about clarifying doc content).
Hi,
great article. It was provided before sailpoint.object.ProvisioningTransaction have been intriduced to IdentityIQ. They contain good data but have a really high impact on database size and possibly performance. I think this should be added here.
Regards
Martin
Agreed with @martin_dehn, ProvisioningTransaction should be added in this documentation.
Very good article.
Thanks !