How to Index String Fields for Efficient Sorting and Faceting

You must use the Atlas Search token type to index string fields to enable sorting and faceting on these fields. Then, you can perform the following actions:

Use the sort option in your query to sort the results by the indexed field. To learn more, see Sort Atlas Search Results.
Use the facet collector in your query to group the results by the indexed field. To learn more, see How to Use Facets with Atlas Search.

Additionally, you must index string fields as the token type in order to use the equals, in and range operators. To learn more, see the documentation for each respective operator.

Note

You can also use the token type to index string fields for pre-filtering your data for $vectorSearch queries. To learn more, see Atlas Vector Search Overview.

`token` Type Limitations

When you index a field as the token type, you must index that field as string type also to query the text value using operators such as text, phrase, etc. For the following operators, you don't need to index the field as string type also to query the text value in the field:

Note

We do not recommend indexing a field with both the stringFacet (deprecated) and token types, unless you require a different normalizer for the token type for a use case other than faceting. With this index definition, Atlas Search uses the stringFacet definition for faceting.

Review the Behavior of the `token` Type

When you index a field as token type, Atlas Search indexes the terms in the string as a single token (searchable term) and stores them in a columnar storage for efficient filtering or sort operations. You can use a normalizer to transform the token. By default, the normalizer is set to none and so Atlas Search indexes strings in their original form.

The major difference between the Atlas Search string and token types is that Atlas Search creates one or more tokens for fields indexed as string type whereas Atlas Search creates only a single token for fields indexed as the token type.

If a string being indexed as a token field type exceeds 8181 characters, Atlas Search truncates it to 8181 characters before indexing.

Define the Index for the `token` Type

To define the index for the token type, choose your preferred configuration method in the Atlas UI and then select the database and collection.

Click Refine Your Index to configure your index.
In the Field Mappings section, click Add Field to open the Add Field Mapping window.
Click Customized Configuration.
Select the field to index from the Field Name dropdown.
Note
You can't index fields that contain the dollar ($) sign at the start of the field name.
Click the Data Type dropdown and select Token.
(Optional) Expand and configure the Token Properties for the field. To learn more, see Configure token Field Properties.
Click Add.

The following is the JSON syntax for the token type. Replace the default index definition with the following. To learn more about the fields, see Field Properties.

{
  "mappings": {
    "dynamic": true|false,
    "fields": {
      "<field-name>": {
        "type": "token",
        "normalizer": "lowercase | none"
      }
    }
  }
}

Configure `token` Field Properties

The Atlas Search token type takes the following parameters:

Option	Type	Necessity	Description	Default
`type`	string	Required	Human-readable label that identifies this field type. Value must be `token`.
`normalizer`	string	Optional	Type of transformation to perform on the field value. Value can be one of the following: `lowercase` - to transform text values in string fields to lowercase. `none` - to not perform any transformation. If you don't set this option explicitly, it defaults to `none`.	`none`

Try an Example for the `token` Type

The following index definition example uses the sample_mflix.movies collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.

The following index definition indexes string values in the title field as Atlas Search token type and converts the field value to lowercase, which allows you to do the following:

Perform case-insensitive sort, as specified by the normalizer, on the title field.
Run exact match queries on the title field using the following operators:

In the Add Field Mapping window, select title from the Field Name dropdown.
Click the Data Type dropdown and select Token.
Expand Token Properties and select lowercase from the Normalizer dropdown.
Click Add.

Replace the default index definition with the following index definition.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "token",
        "normalizer": "lowercase"
      }
    }
  }
}

The following index definition indexes the genres field as string and token types to return the following:

Search results for queries using Atlas Search operators like text, phrase, and other operators that perform text search on the genres field.
Sorted results for queries using the $search sort option on the genres field.
Exact matches for queries using Atlas Search operators like equals, in, and range.

In the Add Field Mapping window, select genres from the Field Name dropdown.
Click the Data Type dropdown and select Token.
Click Add.
Repeat step 1 and select String from the Data Type dropdown.
Review the default setting for String Properties and click Add.

Replace the default index definition with the following index definition.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "genres": [{
        "type": "string"
      },
      {
        "type": "token"
      }]
    }
  }
}

Back

stringFacet

uuid

Note

token Type Limitations

Note

Review the Behavior of the token Type

Define the Index for the token Type

Note

Configure token Field Properties

Try an Example for the token Type

`token` Type Limitations

Review the Behavior of the `token` Type

Define the Index for the `token` Type

Configure `token` Field Properties

Try an Example for the `token` Type