Extract entities (e.g., people, organizations, products) and data about them (e.g., sentiment, relationships) from raw text
The Natural Language API is a pre-trained classifier, named entity recognition model, sentence tokenizer, and sentiment analyzer rolled into a single service.
In layman, Natural Language API allows you to understand any piece of freeform raw text programmatically.
To visualize what is possible with NLP, try entering a sample of text on the Natural Language API demo.
The "Graph" tab shows a knowledge graph extracted from the text. The graph is enhanced with facts from the Diffbot KG. Blue edges represent facts extracted from the text, while grey edges represent facts retrieved from the Diffbot KG. Double-click on a node to expand it with more facts from the KG.
The "Entities" tab shows the extracted entities sorted by salience. Click on an entity to see their link to the Diffbot KG and highlight all the mentions to this entity in the text.
The "Facts" tab shows extracted facts whose property has been pre-defined in our schema. You can hover over a fact to see the part of the text where this fact was found. See the "Documentation" tab for a list of properties we currently support in our schema.
The "Open Facts" tab shows open-domain facts whose description has been extracted from the text alone. Rather than following a particular schema, these facts enable new properties to be discovered.
Data shown in the demo is a simple visualization of the JSON output to expect from the Natural Language API.
Quickstart
Try this Google Colab notebook for a simple quickstart using Python and pandas.
Features & Terminology
- Entity. Anything in the real world. Example: Apple Inc, Steve Jobs.
- Entity Type. A class of an entity. Example: organization, person. The list of entity types we support can be found here.
- Fact. A fact defines a relationship between entities (Apple Inc; founder; Steve Jobs) or an entity and a literal (Apple Inc; number of employees; 137,000).
- Property. A property defines the relationship type (founder, number of employees) of a fact. The list of properties we support can be found here.
- Open Fact. Unlike a regular fact, an open fact does not follow a pre-defined list of properties. An open fact's property is extracted directly from the text. This enables new properties to be discovered. NOTE: This feature is currently disabled as we work to improve its capabilities.
- Sentiment of a document. This value represents the overall sentiment of the text. It ranges from -1.0 (very negative) to 1.0 (very positive). Sentiment around 0.0 is considered neutral.
- Sentiment of an entity. This value represents the sentiment of the text towards an entity. Example: "I love Apple products, but the iMac Pro is too pricey." is positive towards Apple and negative towards the iMac Pro.
- Salience. This value helps answer the question: "What is this text mainly about?". Salience of 1.0 means the entity is the main topic of the document, while salience of 0.0 means that the entity is unnecessary to understand the document.
Supported Languages
NLP feature support may vary with each language.
Feature | Languages Supported |
---|---|
Sentiment | Over 100 languages. You can view the list here. |
Entity | English (en), French (fr), Spanish (es), Chinese (zh), German (de), Russian (ru), Japanese (ja), Dutch (nl), Polish (pl), Norwegian (no), Danish (da), Swedish (sv), Italian (it) |
Salience | English (en), French (fr), Spanish (es), Chinese (zh), German (de), Russian (ru), Japanese (ja), Dutch (nl), Polish (pl), Norwegian (no), Danish (da), Swedish (sv), Italian (it) |
All Others (Facts, Open Facts, etc..) | English (en) only. |
Credit Usage
Each document consumes 1 credit up to 10,000 characters. Additional blocks of 10,000 characters consume 1 credit each.
Limits
Maximum of 100,000 characters per document and 1,000,000 total characters per API request.