Filter Strategies

Describes Philter's filter strategies.

A filter strategy defines how sensitive information identified by Philter should be manipulated, whether it is redacted, replaced, encrypted, or manipulated in some other fashion.

In a filter profile you specify the types of sensitive information that should be filtered. How Philter replaces each type of sensitive information is specific to each type. For instance, zip codes can be truncated based on the leading digits or zip code population while phone numbers are redacted. These replacements are performed by filter strategies.

A sample filter profile containing a filter strategy is shown below. In this example, email addresses will be redacted.

{
"name": "email-address",
"identifiers": {
"emailAddress": {
"emailAddressFilterStrategies": [
{
"strategy": "REDACT",
"redactionFormat": "{{{REDACTED-%t}}}"
}
]
}
}
}

Filter Strategies

The filter strategies are described below. Each filter type can specify zero or more filter strategies. When no filter strategies are given, Philter will default to REDACT for that filter type. When multiple filter strategies are given for a single filter type, the filter strategies will be applied in order as they are listed in the filter profile.

The REDACT Filter Strategy

The REDACT filter strategy replaces sensitive information with a given redaction format. You can put variables in the redaction format that Philter will replace when performing the redaction.

The available redaction variables are:

Redaction Variable

Description

%t

Will be replaced with the type of sensitive information. This is to allow you to know the type of sensitive information that was identified and redacted.

%l

Will be replaced by the given classification for the type of sensitive information.

%v

Will be replaced by the original value of the sensitive text. With %v you can annotate sensitive information instead of masking or removing it.

To redact sensitive information by replacing it with the type of sensitive information, the redaction format would be REDACTED-%t.

The CRYPTO_REPLACE Filter Strategy

The CRYPTO_REPLACE filter strategy replaces each identified piece of sensitive information by encrypting it using the AES encryption algorithm. To use this filter strategy, the filter profile must include the details of the encryption key as shown below:

{
"name":"sample-profile",
"crypto": {
"key": "....",
"iv": "...."
},
...

In the snippet of a filter profile shown above, a crypto element is is defined with a key and an initialization vector (iv). These two items are required to encrypt the sensitive information. To generate a key, run the following command:

openssl enc -e -aes-256-cbc -a -salt -P

You will be prompted to enter an encryption password. Once entered, the values of the key and iv will be shown. Copy and paste those values into the filter profile as shown above.

The HASH_SHA256_REPLACE Filter Strategy

The HASH_SHA256_REPLACE filter strategy replaces sensitive information with the SHA256 hash value of the sensitive information. To append a random salt value to each value prior to hashing, set the salt property to true. The salt value used will be returned in the explain response from Philter's API.

The RANDOM_REPLACE Filter Strategy

Replaces the identified text with a fake value but of the same type. For example, an SSN will be replaced by a random text having the format ###-##-####, such as 123-45-6789. An email address will be replaced with a randomly generated email address. Available to all filter types.

The STATIC_REPLACE Filter Strategy

Replaces the identified text with a given static value. Available to all filter types.

The TRUNCATE Filter Strategy

Available only to zip codes, this strategy allows for truncating zip codes to only a select number of digits. Specify truncateDigits to set the desired number of leading digits to leave. For example, if truncateDigits is 2, the zip code 90210 will be truncated to 90***. Available only to a zip code filter.

The ZERO_LEADING Filter Strategy

Available only to zip codes, this strategy changes the first 3 digits of a zip code to be 0. For example, the zip code 90210 will be changed to 00010. Available only to a zip code filter.

Filter Strategy Conditions

A replacement strategy can be applied based on the sensitive information meeting one or more conditions. For example, you can create a condition such that only dates of 11/05/2010 are replaced by using the condition token == "11/05/2010". The conditions that can be applied vary based on the type of sensitive information. For instance, zip codes can have conditions based on their population. Refer to each specific filter type for the conditions available.

The following is an example filter profile for credit cards that contains a condition to only redact credit card numbers that start with the digits 3000:

{
"name": "default",
"identifiers": {
"creditCard": {
"creditCardFilterStrategies": [
{
"condition": "token startswith \"3000\"",
"strategy": "REDACT",
"redactionFormat": "{{{REDACTED-%t}}}"
}
]
}
}
}

Combining Conditions

Conditions can be joined through the use of the and keyword. When conditions are joined, each condition must be satisfied for the identified text to be filtered. If any of the conditions are not satisfied the identified text will not be filtered. Below is an example joined condition:

token != "123-45-6789" and context == "my-context"

This condition requires that the identified text (the token) not be equal to 123-45-6789 and the context be equal to my-context. Both of these conditions must be satisfied for the identified text to be filtered.

Conversely, conditions can be OR'd through the use of multiple filter strategies. For example, if we want to OR a condition on the token and a condition on the context, we would use two filter strategies:

"ssnFilterStrategies": [
{
"condition": "token != \"123-45-6789\"",
"strategy": "REDACT",
"redactionFormat": "{{{REDACTED-%t}}}"
},
{
"condition": "context == \"my-context\"",
"strategy": "REDACT",
"redactionFormat": "{{{REDACTED-%t}}}"
}
]