Memory management in Intella may seem like a simple task. "All you have to do is just move the slider all the way across to the maximum to use all of the memory that Intella will allow, right?”. Well, that would help in some cases, but memory management in Intella is a bit more complicated than that. This post explains how memory is used for the many components in Intella and why we can't set optimal settings automatically for every hardware and software configuration.
UPDATE July 2019 - New information regarding setting memory and crawler settings
Note that from version 2.3 (due for release approx end of July), for Intella and Connect we have added more memory controls in the user interface (UI) for both products. This allows the user to modify all of the memory and crawler setting from within the UI. There is no longer a requirement to edit the Intella.l4j.ini file which is located in the Intella/Connect installation directory to set the memory settings.
In the text below, this article discusses editing the Intella.l4j.ini file to modify the memory and crawler settings. If you are running version 2.2.x or earlier, then this information is still relevant. We have also added more information below for how to modify the memory and crawler settings when using version 2.3.x and above.
Memory used by Intella
First we need to understand how memory is used by the different components in Intella and the processes we use Intella for. There are three different processes where memory is automatically assigned by Intella:
- Case Manager
- Intella main process
Case manager memory
This Case Manager memory setting controls how much memory is allocated to the Case Manager process. This setting is for the Case Manager only and does not affect processing of data. It is fixed to 256MB by default and usually there is no need to change it, therefore we have not provided any controls in the user interface to adjust this setting. That said, the setting can be changed manually if required. However, the only reason why someone would want to change this setting is for exporting and importing cases. We have seen a case where we needed to increase the Case Manager memory in order to export a case to an ICF file. However, for the most part this setting does not need to be changed.
The Case Manager process usually only lives for a few seconds. After you have selected a case and clicked the 'Open' button, the process is terminated.
Main process memoryThe Main process is started by the Case Manager when you open a case and controls everything you can see in Intella, except for indexing and exporting. It is usually the process that requires the most amount of memory and that is why we added the memory slider in the Case Editor window. This allows the user to easily adjust this memory setting. Below is the table which shows the default memory allocation made by Intella based on the amount of RAM that is in a system:
Crawlers and exporting processes
The memory setting for the Crawler processes is calculated automatically based on the amount of RAM minus the memory used for the main process, and the number of crawlers that will be used. By default Intella calculates the number of crawlers based on the number of CPU cores in the system. However, this number is capped at 4 as assigning more crawlers without other considerations can adversely affect performance.
When the amount of memory per crawler is set automatically by Intella, it will be capped at a maximum of 2GB per crawler. Again, this is a setting that usually does not need any changes, but it can be changed manually if required. The job for the Crawler is only to extract and collect information; they don't index the data right away. The indexing takes place later in the post-processing steps which are done in the Main process.
Note: The settings for the crawlers also controls these other processes:
- PDF converter used by the Preview tab.
- Load file import (TIFF to PDF conversion).
- OCR import (text extraction).
- Outlook and Notes validation.
When do I optimize for better performance?
Now that we have a bit of an understanding of how memory and crawlers are used in Intella, we can look at some examples of getting the best performance from the hardware resources you have and for the dataset that you are working on.
Note that it is possible through manual memory allocation to assign all of the memory on a system to Intella processes. This can leave the system in an unstable state. We recommend not assigning more that 75% of the total memory for a system to Intella processes. The last section of this document provides an example of assigning memory in Intella. In this example the assigned memory falls within around 50% of the total memory of the system.
Because different amounts of memory can be assigned to different processes, you may want to first work out what processes you are performing. For example, there is a difference between indexing and investigating a case. When indexing a data source, quite a bit of the memory needs to be reserved for the crawler processes, especially when a lot of crawlers are being used on high end machines.
During the crawling phase (step 1 of indexing) the Main process actually requires very little memory, but for the post-processing phases (step 2-9 of indexing), the Main process requires considerably more.
When investigating a case (case is already indexed), almost all available memory should be assigned to the Main process (Note that this is proportionate to the case size. E.g. there is no need to assign 128GB memory to a 40GB case).
As I mentioned earlier, memory is automatically assigned to the different processes used by Intella. This is mostly done based on the hardware resources that the system has. We are cautious not to over-assign memory and crawlers as doing so can actually inhibit performance. That said, the user can manually adjust these memory and crawler settings to better suit their hardware specifications and the data which they are indexing.
This leads us to the million dollar question, "What are the best memory and crawler settings to use?”. Well, the answer is not that straightforward, and it really depends on the type of data that is being indexed. For example, if you are indexing a single large PST file, you would probably not see much of a difference in performance if you manually increased the memory allocations and crawlers. In this case the default memory settings and number of crawlers will be enough to provide the best performance so memory settings should only be changed to troubleshoot indexing errors.
On the other hand, if your dataset contains a lot of loose files or a disk image then increasing the number of crawlers can provide better performance. In addition, increasing the service memory heaps (the memory available to each crawler) can helps to resolve out-of-memory errors, or crawler crashes that can occur with specific large items.
In summary, when indexing, increasing the number of crawlers for datasets that contain a large number of loose files can provide better performance. If some of the loose files are large files then increasing the memory assigned to the crawlers can help with out of memory errors and crawler crash errors. In addition, when importing OCR text, assigning more crawlers can increase performance.
When investigating a case, almost all available memory should be assigned to the Main process. Also, when exporting to formats such as PDF or PST, increasing the crawler memory might help with performance.
Optimizing for best performance
The next question is, "How do I change these settings?”. Before we get on to that, note that just increasing crawlers and the memory for each crawler without there being a need for it, may actually hurt performance as there will be less memory for Window's file system cache to work with.
Changing the Main process memoryTo change the memory for the Main process, the user can adjust the slider in the Case Editor window for that specific case. The slider will only allow you to use up to half the memory in the system. This is a safeguard against crashing the system by assigning too much memory to the Main process and not leaving enough for Windows to operate.
Changing the number of crawlersThe amount of crawlers is automatically set based on the number of CPU cores you have on your system. That said, we limit this by default to a maximum of 4 crawlers. The maximum number of crawlers can be increased if you have a high end system with many cores. For example, if you have a system with 12 cores, the maximum amount of crawlers can be set to 6 (2 cores per crawler).
For Intella/Connect version 2.3.x and up, this setting can be adjusted the Case Editor window. Click on the Advanced button to show the advanced settings. The Crawler count option can be changed from Auto to Manual, then you can set the desired number of crawlers for the case.
For Intella/Connect version 2.2.x or lower, this change needs to be made manually. First close Intella, then manually edit the Intella.l4j.ini file located in the install directory. The line to look for and edit is '# -Dintella.crawlersCount=4'. First remove the leading '#' (this symbol disables the line of code), then change the crawlersCount to 6 and save the file. Note that this only specifies an upper limit; the actual amount of crawlers used by Intella will also depend on the evidence data.
Increasing the memory for each crawlerAnother memory setting is for the amount of memory assigned to each crawler. As mentioned earlier, this is a setting that normally does not need to be changed. However, if you are getting 'out of memory' errors, or crawler crash errors during the crawling/indexing phase, then you may need to assign more memory to the crawlers.
For Intella/Connect version 2.3.x and up, this setting can be adjusted the Case Editor window. Click on the Advanced button to show the advanced settings. The Service memory allocation option can be changed from Auto to Manual, then you can set the desired amount of memory for each crawler using the slider.
For Intella/Connect version 2.2.x or lower, this change needs to be made manually. First close Intella, then manually edit the Intella.l4j.ini file located in the install directory. The line to look for and edit is '# -Dintella.serviceMaxHeap=800M'. First remove the leading '#' then change the serviceMaxHeap to what you would want each crawler to use (e.g. 2g). Once done, save the file. From this point, Intella will use 2GB of memory for each crawler when indexing and exporting data in this case.
Be mindful when allocating memory to the crawlers. You must ensure that there is enough memory for all of the crawlers, the Main process, and the other applications that are running on the computer system. We have had customers complain about performance issues when modifying this setting. In one example, someone increased the crawler memory to over half of the physical memory. This made the number of crawler reduce to one crawler. In turn, indexing a large dataset with one crawler caused the user substantial performance issues (this effectively makes Intella indexing a single thread process).
The above mentioned instructions to change the memory and crawlers in the Case Editor window are for Intella Desktop. For Connect, the new memory settings for each case can be seen, and edited in the case list of the Admin dashboard. Simply set the setting to Manual, then enter the memory to be used (in MB), and the number of crawlers to be used (see below).
Intella can use memory that isn't directly assigned to it. Intella will work much faster if the Window's file system has sufficient free memory available for caching evidence files and case index files. Be careful not to starve the operating system of memory by assigning too much memory to the different Intella processes.
As you can see, the configuration settings for Intella are not straightforward and are dependent on a number of factors such as the hardware you have and the type of data you are processing. It is near impossible to specify a set of settings that will work optimally for all hardware specifications and data types.
That said, we can give an example of a setup for a given scenario. If we have a 12 core machine with 128GB of RAM, and you need to index a 500GB E01 disk image which contains a lot of heavy text documents, we can suggest a memory configuration that would provide the best performance. We know that the data set most likely contains a lot of text so the following settings would better suit this type of dataset:
- Increase the main process memory from default (15GB) to 30GB.
- Increase the number of crawlers from 4 to 6.
- Increase the crawler memory from 2 to 3gb.
With these settings, the total memory usage by Intella will be 30GB + 6x3gb = 48GB. If we add 5-10GB for the file system cache, the total memory usage will be 53-58GB out of a total of 128GB. We still have plenty of free memory for the operating system and other programs on the system.
Visit our forum post to see customer questions and answers related to this topic.
Updated Jan 2022