Custom NLP Models
Describes how to train and use a custom NLP model with Philter.
Using your own custom model allows you to have unlimited control over how Philter identifies named-entities in text while still being able to take advantage of Philter's features of filter profiles, redaction, realistic value replacement, and all other features.
Using a custom NLP model is most often not necessary when using Philter. We highly recommend that you ensure a custom NLP model will provide benefits that out weigh the effort required for model training before proceeding.
Important: Training your own NLP model for Philter may require extensive knowledge and abilities in areas such as machine learning, neural networks, and distributed computing.
Philter is able to identify named-entities in text through the use of a trained model. The model is able to identify things, like person's names, in the text that do not follow a well-defined pattern or are easily referenced in a dictionary. Philter's NLP model is interchangeable and we offer multiple models that you can choose from to better tailor Philter to your use-case and your domain.
However, there are times when using our models may not be sufficient, such as when your use-case does not exactly match our available models or you want to try to get better performance by training a model on text very similar to your input text. In those cases you can train a custom NLP model for use with Philter.

Custom NLP Models

Training a Custom NLP Model

Philter is indifferent of the technologies and methods you choose to train your custom model. You can use any framework you like, such as Apache OpenNLP, spaCy, Stanford CoreNLP, or your own custom framework. Follow the framework's documentation for training a model.

Using Your Model

Once your model has been trained and you are satisfied with its performance, to use the model with Philter you must expose the model by implementing a simple HTTP service interface around it. This service facilitates communication between Philter and your model. This interface has two methods described at a high-level below. The service interface is described in detail lower on this page.
Processes the text and returns the named-entities.
Gets the status of the model service, e.g. is the model still loading, is the model ready for inference?
Once your model is available behind the HTTP interface described above, you are ready to use the model with Philter. On the Philter virtual machine, simply export the PHILTER_NER_ENDPOINT environment variable to be the location of the running HTTP service. It is recommended you set this environment variable in /etc/environment. If your HTTP service is running on the same host as Philter on port 8888, the environment variable would be set as:
export PHILTER_NER_ENDPOINT=http://localhost:8888/
Now restart the Philter service and stop and disable the philter-ner service.
sudo systemctl restart philter.service
sudo systemctl stop philter-ner.service
sudo systemctl disable philter-ner.service
When a filter profile containing an NER filter is applied, requests will be made to your HTTP service invoking your model inference returning the identified named-entities.

Philter NER HTTP Service Interface

Your NLP model must be exposed by a service implementing the following HTTP API. The base URL https://localhost:8888 is just for example. Your service can run anywhere (on the same host as Philter, on a different host, etc.) as long as it is accessible from Philter and it can be available on any port.
An HTTP service makes the NLP model accessible to Philter.
Extract named-entities from input text
Get the status

Recommendations and Best Practices

You have complete freedom to train your custom NLP model using whatever tools and processes you choose. However, from our experience that are a few things that can help you be successful.
The first recommendation is to contain your service in a Docker container. Doing so gives you a self-contained image that can be deployed and run virtually anywhere. It simplifies dependency management and protects you from dependency version changes.
The second recommendation is to make your HTTP service as lightweight as possible. Avoid any unnecessary code or features that could negatively impact the speed of your model inference.
Lastly, thoroughly evaluate your model prior to deploying the model to Philter to have a better expectation of performance.