Pivotal GemFire® v9.0

Apache Lucene® Integration

Apache Lucene® is a widely-used Java full-text search engine. This section describes how the system integrates with Apache Lucene. We assume that the reader is familiar with Apache Lucene’s indexing and search functionalities.

The Apache Lucene integration:

  • enables users to create Lucene indexes on data stored in Geode
  • provides high availability of indexes using Geode’s HA capabilities to store the indexes in memory
  • optionally stores indexes on disk
  • updates the indexes asynchronously to minimize impacting write latency
  • provides scalability by partitioning index data
  • colocates indexes with data

For more details, see Javadocs for the classes and interfaces that implement Apache Lucene indexes and searches, including LuceneService, LuceneQueryFactory, LuceneQuery, and LuceneResultStruct.

Using the Apache Lucene Integration

You can create Apache Lucene indexes through a Java API, through the gfsh command-line utility, or by means of the cache.xml configuration file.

To use Apache Lucene Integration, you will need two pieces of information:

  1. The name of the region to be indexed or searched
  2. The names of the fields you wish to index

Key Points

  • Only top level fields of objects stored in the region can be indexed.
  • Apache Lucene indexes are supported only on Partitioned regions.
  • A single index supports a single region. Indexes do not support multiple regions.
  • Heterogeneous objects in single region are supported.
  • Join queries between regions are not supported.
  • Nested objects are not supported.
  • The index needs to be created before the region is created.

Java API Example

// Get LuceneService
LuceneService luceneService = LuceneServiceProvider.get(cache);

// Create Index on fields with default analyzer:
luceneService.createIndex(indexName, regionName, "field1", "field2", "field3");

Region region = cache.createRegionFactory(RegionShutcut.PARTITION).create(regionName);

Search Example

LuceneQuery<String, Person> query = luceneService.createLuceneQueryFactory()
  .create(indexName, regionName, "Main Street", "address");

Collection<Person> results = query.findValues();

Gfsh API

The gfsh command-line utility supports four Apache Lucene actions:

create lucene index
Create a lucene index that can be used to execute queries.
describe lucene index
Display the describe of lucene indexes created for all members.
list lucene indexes [with-stats]
Display the list of lucene indexes created for all members. The optional with-stats qualifier shows activity on the indexes.
search lucene
Search lucene index

Gfsh command-line examples:

// List Index
gfsh> list lucene indexes [with-stats]

// Create Index
gfsh>create lucene index --name=indexName --region=/orders --field=customer,tags

// Create Index, specifying a custom analyzer for the second field
// Note: "null" in the first analyzer position means "use the default analyzer for the first field"
gfsh>create lucene index --name=indexName --region=/orders --field=customer,tags --analyzer=null,

// Execute Lucene query
gfsh> lucene search --regionName=/orders -queryStrings="John*" --defaultField=field1 --limit=100

XML Configuration


    <region name="region" refid="PARTITION">
        <lucene:index name="index">
          <lucene:field name="a" analyzer="org.apache.lucene.analysis.core.KeywordAnalyzer"/>
          <lucene:field name="b" analyzer="org.apache.lucene.analysis.core.SimpleAnalyzer"/>
          <lucene:field name="c" analyzer="org.apache.lucene.analysis.standard.ClassicAnalyzer"/>