Philter can identify many predefined types of sensitive information. Each type, or filter, can be enabled or disabled separately from the other types in a filter profile.
Philter uses several methods to identify person's names.
Type | Description |
​First Names​ | Identifies common first names |
​Surnames​ | Identifies common surnames |
​Person's Names (NER)​ | Identifies full names using natural language processing analysis |
Type | Description |
​Ages​ | Identifies ages such as |
​Bitcoin Addresses​ | Identifies Bitcoin addresses such as |
​Cities​ | Identifies common cities |
​Counties​ | Identifies common counties |
​Credit Card Numbers​ | Identifies VISA, American Express, MasterCard, and Discover credit card numbers. |
​Dates​ | Identifies dates in many formats such as May 22, 1999 |
​Driver's License Numbers​ | Identifies driver's license numbers for all 50 US states |
​Email Addresses​ | Identifies email addresses |
​Hospitals and Hospital Abbreviations​ | Identifies common hospital names and their abbreviations |
​IBAN Codes​ | Identifies international bank account numbers |
​IP Addresses​ | Identifies IPv4 and IPv6 addresses |
​MAC Addresses​ | Identifies network MAC addresses |
​Passport Numbers​ | Identifies US passport numbers |
​Phone Numbers​ | Identifies phone numbers and phone number extensions |
​Sections​ | Identifies sections in text denoted by |
​SSNs and TINs​ | Identifies US SSNs and TINs |
​States and State Abbreviations​ | Identifies US state names and abbreviations |
​Tracking Numbers​ | Identifies UPS, FedEx, and USPS tracking numbers |
​URLs​ | Identifies URLs |
​VINs​ | Identifies vehicle identification numbers |
​Zip Codes​ | Identifies US zip codes |
In addition to the predefined types of sensitive information listed in the table above, you can also define your own types of sensitive information. Through custom identifiers and dictionaries, Philter can identify many other types of information that may be sensitive in your use-case. For example, if you have patient identifiers that follow a pattern of AA-00000
you can define a custom identifier for this sensitive information.
Philter can be configured to look identify sensitive information based on custom dictionaries. When a term in the dictionary is found in the text, Philter will treat the term as sensitive information and apply the given replacement strategy.
Custom dictionaries support fuzziness to accommodate for misspellings. The replacement strategy for a custom dictionary has a sensitivityLevel
that controls the amount of allowed fuzziness.
Type | Description |
​Custom Dictionaries​ | Identifies sensitive information based on dictionary values. |
​Custom Identifiers​ | Identifies custom alphanumeric identifiers that may be used for medical record numbers, patient identifiers, account number, or other specific identifier. |