Control Heap Use with the Resource Manager

Control Heap Use with the Resource Manager

Resource manager behavior is closely tied to the triggering of GC activities, the use of concurrent GCs in the JVM, and the number of parallel GC threads used for concurrency.

The recommendations provided here for using the manager assume you have a solid understanding of your Java VM's heap management and garbage collection service.

For the members where you want to implement the resource manager functionality:
  1. Configure GemFire for heap LRU management.
  2. Set the JVM GC tuning parameters to handle heap and garbage collection in conjunction with the GemFire manager.
  3. Monitor and tune heap LRU configurations and your GC configurations.
  4. Before going into production, run your system tests with application behavior and data loads that approximate your target systems so you can tune as well as possible for production needs.
  5. In production, keep monitoring and tuning to meet changing needs.

Configure GemFire for Heap LRU Management

The configuration terms used here are cache.xml elements and attributes, but you can also configure through gfsh and the com.gemstone.gemfire.cache.control.ResourceManager and Region APIs.
  1. When starting up your server, set initial-heap and max-heap to the same value.
  2. Set the resource-manager critical-heap-percentage threshold. This should be as as close to 100 as possible while still low enough so the manager's response can prevent the member from hanging or getting OutOfMemoryError. The threshold is zero (no threshold) by default.
    Note: When you set this threshold, it also enables a query monitoring feature that prevents most out-of-memory exceptions when executing queries or creating indexes. See Monitoring Low Memory When Querying.
  3. Set the resource-manager eviction-heap-percentage threshold to a value lower than the critical threshold. This should be as high as possible while still low enough to prevent your member from reaching the critical threshold. The threshold is zero (no threshold) by default.
  4. Decide which regions will participate in heap eviction and set their eviction-attributes to lru-heap-percentage. See Eviction. The regions you configure for eviction should have enough data activity for the evictions to be useful and should contain data your application can afford to delete or offload to disk.
gfsh Example:
gfsh>start server --name=server1 --initial-heap=30MB --max-heap=30MB \
--critical-heap-percentage=80 --eviction-heap-percentage=90
cache.xml Example:
   <region refid="REPLICATE_HEAP_LRU" />
   <resource-manager critical-heap-percentage="80" eviction-heap-percentage="60"/>
Note: The resource-manager specification must appear after the region declarations in your cache.xml file.

Set the JVM's GC Tuning Parameters

See your JVM documentation for all JVM-specific settings that can be used to improve GC response.

At a minimum, do the following:
  1. Set the initial and maximum heap switches, -Xms and -Xmx, to the same values.
  2. Configure your JVM for concurrent mark-sweep collector garbage collection.
  3. If your JVM allows, configure it to initiate concurrent mark-sweep collection when heap use is at least 10% lower than your setting for the resource manager eviction-heap-percentage. You want the collector to be working when GemFire is evicting or the evictions will not result in more free memory. For example if the eviction-heap-percentage is set to 65, set your garbage collection to start when the heap use is no higher than 55%.
JVM Conc mark-sweep switch flag CMS initiation (begin at heap % N)
Sun HotSpot -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=N
JRockit -Xgc:gencon -XXgcTrigger:N
IBM -Xgcpolicy:gencon N/A

For the gfsh start server command, pass these settings with the --J switch, like --J=-XX:+UseConcMarkSweepGC.

The following is an example of setting JVM for an application:
$ java app.MyApplication -Xms=30MB -Xmx=30MB -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60
Note: Pivotal recommends that you do not use the -XX:+UseCompressedStrings and -XX:+UseStringCache JVM configuration properties when starting up servers. These JVM options can cause issues with data corruption and compatibility.

Monitor and Tune Heap LRU Configurations

In tuning the resource manager, your central focus should be keeping the member below the critical threshold. The critical threshold is provided to avoid member hangs and crashes, but because of its exception throwing behavior for distributed updates, the time spent in critical negatively impacts the entire distributed system. To stay below critical, tune so that the GemFire eviction and the JVM's GC respond adequately when the eviction threshold is reached.

Use the statistics provided by your JVM to make sure your memory and GC settings are sufficient for your needs.

The GemFire ResourceManagerStats provide information about memory use and the manager thresholds and eviction activities.

If you are spiking above the critical threshold on a regular basis, try lowering the eviction threshold. If you never go near critical, you might raise the eviction threshold to gain more usable memory without the overhead of unneeded evictions or GC cycles.

The settings that will work well for your system depend on a number of factors, including these:
  • The size of the data objects you store in the cache. Very large data objects can be evicted and garbage collected relatively quickly. The same amount of space in use by many small objects takes more processing effort to clear and might require lower thresholds to allow eviction and GC activities to keep up.
  • Application behavior. Applications that quickly put a lot of data into the cache can more easily overrun the eviction and GC capabilities. Applications that operate more slowly may be more easily offset by eviction and GC efforts, possibly allowing you to set your thresholds higher than in the more volatile system.
  • Your choice of JVM. Each JVM has its own GC behavior, which affects how efficiently the collector can operate, how quickly it kicks in when needed, and other factors.

In this sample statistics chart in VSD, the manager's evictions and the JVM's GC efforts are good enough to keep heap use very close to the eviction threshold. The eviction threshold could be increased to a setting closer to the critical threshold, allowing the member to keep more data in tenured memory without the risk of overwhelming the JVM. This chart also shows the blocks of times when the manager was running cache evictions.

In this next chart, it looks like the manager's evictions are kicking in at the right time, but the concurrent mark sweep GC is not starting soon enough to keep memory use in check. It might be that it is not configured to start as soon as it should. It should be started just before the eviction threshold is reached. Or there might be some other issue with the garbage collection service.