LATEST VERSION: 9.1.0 - CHANGELOG
Pivotal GemFire® v9.1

Rolling Upgrade

A rolling upgrade allows you to keep your existing distributed system running while individual system members are upgraded.

A rolling upgrade eliminates system downtime. You upgrade one member at a time, and each upgraded member can communicate with other members that are still running the earlier version of GemFire.

Supported Versions for Rolling Upgrade

A rolling upgrade can bring servers or peers running an 8.0, 8.1, or 8.2 version up to the most recent version of 8.2.

A rolling upgrade can also bring servers or peers running earlier versions of 9 up to the most recent version of 9.

Rolling upgrades apply to the peer members or cache servers within a distributed system. Rolling upgrades may also be applied within a site of multi-site (WAN) deployments. See Version Compatibilities for more details on how different versions of GemFire can interoperate.

Rolling upgrades only exist for systems in which all partitioned regions have full redundancy. Check the redundancy state of all your regions before you begin the rolling upgrade and before stopping any members. See Checking Redundancy in Partitioned Regions for details. If a rolling update is not possible for your system, follow the procedure in Upgrade with Cluster Downtime.

Guidelines for Planning a Rolling Upgrade

  • Schedule your upgrade during a period of low user activity for your system and network.
  • Important: After all locators have been upgraded, do not start or restart any processes that are running the older version of the software. The older process will either not be allowed to join the distributed system; or if allowed to join, can potentially cause a deadlock.
  • When you perform a rolling upgrade, your online cluster will have a mix of members running different versions of GemFire. During this time period, do not execute region operations such as region creation or region destruction. Also, do not do a region rebalancing, unless startup-recovery-delay is set to -1.
  • Do not modify region attributes or data either via gfsh or cache.xml configuration during the upgrade process.
  • Region rebalancing affects the restart process. If you have startup-recovery-delay=-1 configured for your partitioned region, you will need to perform a rebalance on your region after you restart each member. If rebalance occurs automatically, as it will if startup-recovery-delay is set to a value other than -1, make sure that the rebalance completes between server restarts. If you have startup-recovery-delay set to a high number, you may need to wait extra time until the region has recovered redundancy, because rebalance must complete before new servers are restarted. The partitioned region attribute startup-recovery-delay is described in Configure Member Join Redundancy Recovery for a Partitioned Region.

Before Doing a Rolling Upgrade

  • Verify that all members that you wish to upgrade are members of the same distributed system cluster. A list of cluster members will be output with the gfsh command:

    gfsh>list members
    
  • Make a backup copy of your persistent data stores prior to upgrade. The discussion at Creating Backups for System Recovery and Operational Management explains the process, and the backup disk-store command reference page details using the gfsh backup disk-store command to make a backup.

Rolling Upgrade Procedure

Here is a step-by-step procedure for performing a rolling upgrade.

Upgrade Locators One at a Time

  1. On the locator you wish to upgrade, install the new version of the software (alongside the older version of the software).

    See Windows/Unix/Linux—Install Pivotal GemFire from a ZIP or tar.gz File for example installation procedures.

  2. Open two terminal consoles on the machine of the locator you are upgrading. In the first console, start a gfsh prompt (from GemFire’s older installation) and connect to the currently running locator. For example:

    gfsh>connect --locator=locator_hostname_or_ip_address[port]
    
  3. In the first console, export the locator’s configuration files to a backup directory. For example:

    gfsh>export config --member=locator_name --dir=locator_config_dir
    
  4. In the second console, modify the GEMFIRE environment variable to point to the new installation of GemFire. Make sure your PATH variable points to the new installation.

  5. In the first console, stop the locator that you are upgrading. For example:

    gfsh>stop locator --name=locator_name
    
  6. In the second console, start gfsh from the new GemFire installation. Verify that you are running the newer version with

    gfsh>version
    
  7. In the second console, restart your locator with the configuration files you exported in step 3. For example:

    gfsh>start locator --name=locator_name --dir=locator_config_dir
    
  8. Confirm that the locator has started up and joined the cluster properly. For example, look in the locator log for a message similar to the following:

    [info 2014/05/05 10:03:29.206 PDT frodo <vm_1_thr_1_frodo> tid=0x1a]
    DistributionManager frodo(locator1:21869:locator)<v16>:28242 started on frodo[15001].
    There were 2 other DMs. others: [frodo(server2:21617)<v4>:14973(version:GFE 7.1),
    frodo(server1:21069)<v1>:60929(version:GFE 7.1)] (locator)
    
  9. After upgrading the first locator, connect to this locator to ensure it becomes the new JMX Manager. For example:

    gfsh>connect --locator=locator_hostname_or_ip_address[port]
    

Repeat this procedure for all locators in the cluster.

Upgrade Servers One at a Time

When upgrading servers, do not start or restart any processes running the older version of Pivotal GemFire. The older process will either not be allowed to join the distributed system; or if allowed to join, can potentially cause a deadlock. Processes that are rejected will produce an error message similar to the following:

Rejecting the attempt of a member using an older version of the product
to join the distributed system
  1. On the server you wish to upgrade, install the new version of the software (alongside the older version of the software).
  2. Open two terminal consoles on the server that you are upgrading. In the first console, start a gfsh prompt and connect to one of the (already upgraded) locators.

    gfsh>connect --locator=locator_hostname_or_ip_address[port]
    
  3. Export the server’s configuration files to a backup directory.

    gfsh>export config --member=server_name --dir=server_config_dir
    
  4. If desired, create a backup snapshot of the server’s in-memory region data.

    gfsh>export data --member=server_name --region=region_name --file=my_region_snapshot.gfd
    
  5. In the second console, modify the GEMFIRE environment variable to point to the new installation of GemFire. Make sure that your PATH points to the new installation.

  6. In the first console, stop the server that you are upgrading.

    gfsh>stop server --name=server_name
    
  7. In the second console, start gfsh from the new GemFire installation and restart your server. For example:

    gfsh>start server --name=server_name --dir=server_config_dir
    

    By providing the exported configuration files directory to the upgraded server upon startup, the restarted server will use the same configuration as the server running the previous version.

  8. Confirm that the server has started up, joined the cluster properly and is communicating with the other members. For example, look in the server logs for a message similar to the following:

    [info 2017/06/19 13:41:27.095 PDT bridgegemfire1_trout_18148 
    <vm_0_thr_0_bridge1_trout_18148> tid=0x17]
    Starting DistributionManager trout
    (bridgegemfire1_trout_18148:18148)<ec><v2>:10004. (took 683 ms)
    
    [info 2017/06/19 13:41:27.098 PDT bridgegemfire1_trout_18148 
    <vm_0_thr_0_bridge1_trout_18148> tid=0x17] 
    Initial (distribution manager)
    view = View[trout(locatorgemfire2_trout_18319:18319)<ec><v0>:10000|2]
    members: 
    [trout(locatorgemfire2_trout_18319:18319)<ec><v0>:10000{lead}, 
    trout(locatorgemfire1_trout_18305:18305)<ec><v1>:10001,
    trout(bridgegemfire4_trout_18162:18162)<ec><v2>:10003,
    trout(bridgegemfire2_trout_18152:18152)<ec><v2>:10002,
    trout(bridgegemfire1_trout_18148:18148)<ec><v2>:10004,
    trout(bridgegemfire3_trout_18158:18158)<ec><v2>:10005]
    
  9. Check the server log for any severe error messages. You should debug these issues before proceeding with the next server upgrade.

  10. If you restarted a member with partitioned regions, verify that the member is providing redundancy buckets after the upgrade. See Checking Redundancy in Partitioned Regions for instructions. Note that the number of buckets without redundancy will change as the server recovers, so you need to wait until this statistic either reaches zero or stops changing before proceeding with the upgrade. If you have startup-recovery-delay=-1 configured for your partitioned region, you will need to perform a rebalance after you start up each member. Make sure that the manually started rebalance completes before starting up a new member.

Repeat this procedure for all servers in the cluster.

After Upgrading the Servers

  • As desired, upgrade GemFire clients. You can only do this after you have completed the upgrade on all locator and server members in the cluster.

Checking Member Versions in a Mixed Cluster

During a rolling upgrade, you can check the current GemFire version of all members in the cluster by looking at the server or locator logs.

When an upgraded member reconnects to the distributed system, it logs all the members it can see as well as the GemFire version of those members. For example, an upgraded locator will now detect GemFire members running the older version of GemFire (in this case, the version being upgraded– GFE 8.0.0) :

[info 2013/06/03 10:03:29.206 PDT frodo <vm_1_thr_1_frodo> tid=0x1a]  DistributionManager frodo(locator1:21869:locator)<v16>:28242 started on frodo[15001]. There
        were 2 other DMs. others: [frodo(server2:21617)<v4>:14973( version:GFE 8.0.0 ), frodo(server1:21069)<v1>:60929( version:GFE 8.0.0 )] (locator)

After some members have been upgraded, non-upgraded members will log the following message when they receive a new membership view:

Membership: received new view [frodo(locator1:20786)<v0>:32240|4]
          [frodo(locator1:20786)<v0>:32240/51878, frodo(server1:21069)<v1>:60929/46949,
          frodo(server2:21617)<v4>( version:UNKNOWN[ordinal=23] ):14973/33919]

Non-upgraded members identify members that have been upgraded to the next version with version: UNKNOWN.