LATEST VERSION: 9.0.4 - CHANGELOG
Pivotal GemFire® v9.0

Implementing Durable Client/Server Messaging

Use durable messaging for subscriptions that you need maintained for your clients even when your clients are down or disconnected. You can configure any of your event subscriptions as durable. Events for durable queries and subscriptions are saved in queue when the client is disconnected and played back when the client reconnects. Other queries and subscriptions are removed from the queue.

Use durable messaging for client/server installations that use event subscriptions.

These are the high-level tasks described in this topic:

  1. Configure your client as durable
  2. Decide which subscriptions should be durable and configure accordingly
  3. Program your client to manage durable messaging for disconnect, reconnect, and event handling

Configure the Client as Durable

Use one of the following methods:

  • gemfire.properties file:

    durable-client-id=31 
    durable-client-timeout=200 
    
  • Java:

    Properties props = new Properties(); 
    props.setProperty("durable-client-id", "31"); 
    props.setProperty("durable-client-timeout", "" + 200); 
    CacheFactory cf = new CacheFactory(props);
    

The durable-client-id indicates that the client is durable and gives the server an identifier to correlate the client to its durable messages. For a non-durable client, this id is an empty string. The ID can be any number that is unique among the clients attached to servers in the same distributed system.

The durable-client-timeout tells the server how long to wait for client reconnect. When this timeout is reached, the server stops storing to the client’s message queue and discards any stored messages. The default is 300 seconds. This is a tuning parameter. If you change it, take into account the normal activity of your application, the average size of your messages, and the level of risk you can handle, both in lost messages and in the servers’ capacity to store enqueued messages. Assuming that no messages are being removed from the queue, how long can the server run before the queue reaches the maximum capacity? How many durable clients can the server handle? To assist with tuning, use the Geode message queue statistics for durable clients through the disconnect and reconnect cycles.

Configure Durable Subscriptions and Continuous Queries

The register interest and query creation methods all have an optional boolean parameter for indicating durability. By default all are non-durable.

// Durable registration
// Define keySpecification, interestResultPolicy, durability 
exampleRegion.registerInterest(keySpecification, interestResultPolicySpecification, true);

// Durable CQ
// Define cqName, queryString, cqAttributes, durability
CqQuery myCq = queryService.newCq(cqName, queryString, cqAttributes, true);

Save only critical messages while the client is disconnected by only indicating durability for critical subscriptions and CQs. When the client is connected to its servers, it receives messages for all keys and queries reqistered. When the client is disconnected, non-durable interest registrations and CQs are discontinued but all messages already in the queue for them remain there.

Note: For a single durable client ID, you must maintain the same durability of your registrations and queries between client runs.

Program the Client to Manage Durable Messaging

Program your durable client to be durable-messaging aware when it disconnects, reconnects, and handles events from the server.

  1. Disconnect with a request to keep your queues active by using Pool.close or ClientCache.close with the boolean keepalive parameter.

    clientCache.close(true);
    

    To be retained during client down time, durable continuous queries (CQs) must be executing at the time of disconnect.

  2. Program your durable client’s reconnection to:

    1. If desired, detect whether the previously registered subscription queue is available upon durable client reconnection and the count of pending events in the queue. Based on the results, you can then decide whether to receive the remaining events or close the cache if the number is too large.

      For example, for a client with only the default pool created:

      int pendingEvents = cache.getDefaultPool().getPendingEventCount();
      
      if (pendingEvents == -2) { // client connected for the first time  … // continue
      } 
      else if (pendingEvents == -1) { // client reconnected but after the timeout period  
      … // handle possible data loss
      } 
      else { // pendingEvents >= 0  
      … // decide to invoke readyForEvents() or ClientCache::close(false)/pool.destroy()
      }
      

      For a client with multiple pools:

      int pendingEvents = 0;
      
      int pendingEvents1 = PoolManager.find(“pool1”).getPendingEventCount();
      
      pendingEvents += (pendingEvents1 > 0) ? pendingEvents1 : 0;
      
      int pendingEvents2 = PoolManager.find(“pool2”).getPendingEventCount();
      
      pendingEvents += (pendingEvents2 > 0) ? pendingEvents2 : 0;
      
      // process individual pool counts separately.
      

      The getPendingEventCount API can return the following possible values:

      • A value representing a count of events pending at the server. Note that this count is an approximate value based on the time the durable client pool connected or reconnected to the server. Any number of invocations will return the same value.
      • A zero value if there are no events pending at server for this client pool
      • A negative value indicates that no queue is available at the server for the client pool.
        • -1 indicates that the client pool has reconnected to the server after its durable-client-timeout period has elapsed. The pool’s subscription queue has been removed possibly causing data loss.
        • A value of -2 indicates that this client pool has connected to server for the first time.
    2. Connect, initialize the client cache, regions, and any cache listeners, and create and execute any durable continuous queries.

    3. Run all interest registration calls.

      Note: Registering interest with InterestResultPolicy.KEYS_VALUES initializes the client cache with the current values of specified keys. If concurrency checking is enabled for the region, any earlier (older) region events that are replayed to the client are ignored and are not sent to configured listeners. If your client must process all replayed events for a region, register with InterestResultPolicy.KEYS or InterestResultPolicy.NONE when reconnecting. Or, disable concurrency checking for the region in the client cache. See Consistency for Region Updates.

    4. Call ClientCache.readyForEvents so the server will replay stored events. If the ready message is sent earlier, the client may lose events.

    ClientCache clientCache = ClientCacheFactory.create(); 
    // Here, create regions, listeners that are not defined in the cache.xml . . .
    // Here, run all register interest calls before doing anything else
    clientCache.readyForEvents(); 
    
  3. When you program your durable client CacheListener:

    1. Implement the callback methods to behave properly when stored events are replayed. The durable client’s CacheListener must be able to handle having events played after the fact. Generally listeners receive events very close to when they happen, but the durable client may receive events that occurred minutes before and are not relevant to current cache state.
    2. Consider whether to use the CacheListener callback method, afterRegionLive, which is provided specifically for the end of durable event replay. You can use it to perform application-specific operations before resuming normal event handling. If you do not wish to use this callback, and your listener is an instance of CacheListener (instead of a CacheListenerAdapter) implement afterRegionLive as an empty method.

Initial Operation

The initial startup of a durable client is similar to the startup of any other client, except that it specifically calls the ClientCache.readyForEvents method when all regions and listeners on the client are ready to process messages from the server.

Disconnection

While the client and servers are disconnected, their operation varies depending on the circumstances.

  • Normal disconnect. When a client closes its connection, the servers stop sending messages to the client and release its connection. If the client requests it, the servers maintain the queues and durable interest list information until the client reconnects or times out. The non-durable interest lists are discarded. The servers continue to queue up incoming messages for entries on the durable interest list. All messages that were in the queue when the client disconnected remain in the queue. If the client requests not to have its subscriptions maintained, or if there are no durable subscriptions, the servers unregister the client and do the same cleanup as for a non-durable client.
  • Abnormal disconnect. If the client crashes or loses its connections to all servers, the servers automatically maintain its message queue and durable subscriptions until it reconnects or times out.
  • Client disconnected but operational. If the client operates while it is disconnected, it gets what data it can from the local client cache. Since updates are not allowed, the data can become stale. An UnconnectedException occurs if an update is attempted.
  • Client stays disconnected past timeout period. The servers track how long to keep a durable subscription queue alive based on the durable-client-timeout setting. If the client remains disconnected longer than the timeout, the servers unregister the client and do the same cleanup that is performed for a non-durable client. The servers also log an alert. When a timed-out client reconnects, the servers treat it as a new client making its initial connection.

Reconnection

During initialization, the client cache is not blocked from doing operations, so you might be receiving old stored events from the server at the same time that your client cache is being updated by much more current events. These are the things that can act on the cache concurrently:

  • Results returned by the server in response to the client’s interest registrations.
  • Client cache operations by the application.
  • Callbacks triggered by replaying old events from the queue

Geode handles the conflicts between the application and interest registrations so they do not create cache update conflicts. But you must program your event handlers so they don’t conflict with current operations. This is true for all event handlers, but it is especially important for those used in durable clients. Your handlers may receive events well after the fact and you must ensure your programming takes that into account.

This figure shows the three concurrent procedures during the initialization process. The application begins operations immediately on the client (step 1), while the client’s cache ready message (also step 1) triggers a series of queue operations on the servers (starting with step 2 on the primary server). At the same time, the client registers interest (step 2 on the client) and receives a response from the server. Message B2 applies to an entry in Region A, so the cache listener handles B2’s event. Because B2 comes before the marker, the client does not apply the update to the cache.

Durable client reconnection.

Durable Event Replay

When a durable client reconnects before the timeout period, the servers replay the events that were stored while the client was gone and then resume normal event messaging to the client. To avoid overwriting current entries with old data, the stored events are not applied to the client cache. Stored events are distinguished from new normal events by a marker that is sent to the client once all old events are replayed.

  1. All servers with a queue for this client place a marker in their queue when the client reconnects.
  2. The primary server sends the queued messages to the client up to the marker.
  3. The client receives the messages but does not apply the usual automatic updates to its cache. If cache listeners are installed, they handle the events.
  4. The client receives the marker message indicating that all past events have been played back.
  5. The server sends the current list of live regions.
  6. For every CacheListener in each live region on the client, the marker event triggers the afterRegionLive callback. After the callback, the client begins normal processing of events from the server and applies the updates to its cache.

Even when a new client starts up for the first time, the client cache ready markers are inserted in the queues. If messages start coming into the new queues before the servers insert the marker, those messages are considered as having happened while the client was disconnected, and their events are replayed the same as in the reconnect case.

Application Operations During Interest Registration

Application operations take precedence over interest registration responses. The client can perform operations while it is receiving its interest registration responses. When adding register interest responses to the client cache, the following rules are applied:

  • If the entry already exists in the cache with a valid value, it is not updated.
  • If the entry is invalid, and the register interest response is valid, the valid value is put into the cache.
  • If an entry is marked destroyed, it is not updated. Destroyed entries are removed from the system after the register interest response is completed.
  • If the interest response does not contain any results, because all of those keys are absent from the server’s cache, the client’s cache can start out empty. If the queue contains old messages related to those keys, the events are still replayed in the client’s cache.