You can now filter your Custom API rules with search.

We added a 'Graph view' to the Diffbot Knowledge Graph for Person and Organization profile data.

Word Count Parameter

by Jerome Choo

We've added a new Article API parameter that returns the word count for article text extracted as part of a Crawlbot or BulkAPI job: &wordcount

We've rolled out an update to the Knowledge Graph. Entity identifiers are now preceded by an ‘E’ for each entity regardless of the entity type. Previously, Diffbot Ids were preceded by a letter representing the entity type, i.e. ‘P’ for Person, ‘O’ for Organization, ‘C’ for Corporation, etc. The new format enables us to extend and scale the Diffbot Knowledge Graph to a wider and richer spectrum of entity types over time. Please note: all KG queries will continue to work as before. You do not need to modify the way you search the graph. Below is an example of the new entity id format you will see in results:

Example: IBM
"type": "Corporation", "Organization"
"diffbotUri": "http://diffbot.com/entity/EPdsrDmLiMQCskvBLp_dloQ"
"id": "EPdsrDmLiMQCskvBLp_dloQ"

Example: IBM Board Member & CEO, Virginia 'Ginni' Rometty
"type": "Person"
"diffbotUri": "http://diffbot.com/entity/EFCXA8DGjPMq5oTjl9RESEw",
"id": "EFCXA8DGjPMq5oTjl9RESEw"

quarterlyRevenues and yearlyRevenues with integer are now live:

the field yearlyRevenues.year (previously a String such as “2019”) is transformed into an Integer representing the year (2019)
the field quarterlyRevenues.quarter (previously a String such as “Q1-2019") is split into: quarterlyRevenues.quarter which is an Integer representing the quarter (1) and quarterlyRevenues.year which is an Integer representing the year (2019)

https://app.diffbot.com/search/?query=type%3AOrganization+quarterlyRevenues.revenue.value%3E10000000000

Eg.: Apple, Inc.

"quarterlyRevenues": [{"revenue": {"currency": "USD","value": 91818999808},"isCurrent": false,"year": 2020,
"filingDate": {"str": "d2020-01-29","precision": 3,"timestamp": 1580256000000},
"revenueDate": {"str": "d2019-12-31","precision": 3,"timestamp": 1577750400000},"quarter": 1}

Excel version 1.8.0.0

  • Enhance V3 - we added an option to configure the number of matches/outputs returned per input, e.g. org name + homepage as input: specify return 1 match or 3 matches.
  • Enhance V3 - we added an option to configure the acceptable match threshold. You can now specify a minimum threshold for the inclusion of an entity match/output.
  • View a Diffbot entity regardless of the Excel Add-in status.
  • Saved Searches - you can now save and load DQL queries from the Add-In. These queries are synchronized with the Diffbot Dashboard.

Enhance API Updates

by Jerome Choo

EnhanceAPI version v3

As part of API version v3, the following functionality is now available:

  1. Fetch more than 1 result for an enhance query with &size=n query parameter:
    https://kg.diffbot.com/kg/v3/enhance_endpoint?token=DIFFBOT-TOKEN&type=organization&url=www.ibm.com&size=3
  2. Specify the threshold score with &threshold=d query parameter: https://kg.diffbot.com/kg/v3/enhance_endpoint?token=DIFFBOT-TOKEN&type=organization&name=DummyName&url=www.ibm.com&threshold=0.1&size=3
  3. Head top-n bulkjob records for preview with &head=n query parameter: https://kg.diffbot.com/kg/v3/enhance_endpoint/bulk/BULKJOB-ID?token=DIFFBOT-TOKEN&head=10
  4. Bulkjob response format changed to jsonl from json (Content-Type: application/json-lines). Each enhanced record is on a separate line in the response. This, coupled with the head parameter enables clients to scale up more fluidly for large bulkjobs.

The API spec is available at docs.diffbot.com

Bulkjob retention

Bulkjob retention period has been increased from 7 days to 30 days.

Deprecation of API versions v1 and v2

API versions v1 and v2 are now deprecated and will be removed in future versions of the product. Please plan to migrate.

Enhance API Updates

by Jerome Choo

EnhanceAPI version v3

As part of API version v3, the following functionality is now available:

  1. Fetch more than 1 result for an enhance query with &size=n query parameter:
    https://kg.diffbot.com/kg/v3/enhance_endpoint?token=DIFFBOT-TOKEN&type=organization&url=www.ibm.com&size=3
  2. Specify the threshold score with &threshold=d query parameter: https://kg.diffbot.com/kg/v3/enhance_endpoint?token=DIFFBOT-TOKEN&type=organization&name=DummyName&url=www.ibm.com&threshold=0.1&size=3
  3. Head top-n bulkjob records for preview with &head=n query parameter: https://kg.diffbot.com/kg/v3/enhance_endpoint/bulk/BULKJOB-ID?token=DIFFBOT-TOKEN&head=10
  4. Bulkjob response format changed to jsonl from json (Content-Type: application/json-lines). Each enhanced record is on a separate line in the response. This, coupled with the head parameter enables clients to scale up more fluidly for large bulkjobs.

The API spec is available at docs.diffbot.com

Bulkjob retention

Bulkjob retention period has been increased from 7 days to 30 days.

Deprecation of API versions v1 and v2

API versions v1 and v2 are now deprecated and will be removed in future versions of the product. Please plan to migrate.

September 2020

by Jerome Choo

Industries/Category Updates

We've improved mapping to NAICS (93% industries covered) classifications for Org entities and updated our industries categories structure and label set.

Main updates (267 industries):

Added Retailers as main industry sector (+30 sub-industries)
Grouped the new industries categories list by major industry groups
Differentiated some specific cases.

E.g.: Furniture Companies --> (1) Furniture Manufacturers, (2) Furniture Retailers

     Jewelry Companies -->   (1) Jewelry Manufacturers, (2) Jewelry Retailers

     Toy Companies  -->   (1) Toy Manufacturers, (2) Toy And Video Game Retailers

     Vehicle Parts -->   (1) Vehicle Parts Manufacturers, (2) Automotive Part Retailers
        ...

     Motor Vehicle Manufacturers -->   (1) Motor Vehicle Manufacturers, (2) Vehicle Retailers And Dealership
    Sporting Goods Manufacturers -->   (1) Sporting Goods Manufacturers, (2) Sporting Goods Retailers
        ...

The Diffbot AccountAPI returns the usage for the supplied token, and a list of child tokens. However, it previously did not return a usage array for each child token. We added support for a parameter, childUsage, that will return the usage of the child tokens.

Excel Add-In v1.6

by Jerome Choo

A new Excel add-in version (1.6.x) is available in production (available at office.com & desktop)

What's new?

  • Added to Organization output: NAICs, zip codes, and phone numbers.
  • Minor design changes and bug fixes.
  • New 'Help' tab, tutorials, docs, and getting started videos.