September 2019

by Jerome Choo
  • Added support for the inclusion of RocketReach email contact data (in addition to LeadIQ).
  • Added support for extraction of the headquarter address from headquarter building entity.
  • Began improvements to Record Linking for Organizations with emphasis on improving subsidiary data accuracy.
  • Improved coverage of Org and Person data records with a focus on: 'educated at', 'member of', 'owner of', and 'position held' data fields.
  • Improved Role Classifications: separating CEO and Director.
  • Enhanced and extended the Visual Query Builder Tool in the Developer Dashboard.

August 2019

by Jerome Choo
  • Size is now supported in facet queries for articles
  • Enabled access to crawls and bulk jobs created on child tokens from the app.diffbot.com UI when logged in under the parent token.
  • Enabled the cloning of a crawl from a crawl job page from the app.diffbot.com UI.
  • Made significant improvements to the performance of the app.diffbot.com UI.
  • Added location inference to the Natural Language API.
  • Improved how importance score is generated for spam profiles.
  • Improved deduplication on Organization Founders.
  • Now avoid linking to the same DiffbotURI for some fields, such as the parent and subsidiary entities cannot link to the same unique identifier - Google and Alphabet must have unique IDs.
  • Removed bad descriptions from the allDescriptions field.
  • Improved age calculation/inference logic.

July 2019

by Jerome Choo
  • Added support for multiple Headquarters locations for Organizations.
  • Added support for multiple stock exchange symbol/pairs.
  • Improved extraction of city from neighborhoods.
  • Added support for display of English tags for Non-English taggers.
  • Trained a Dutch Entitylinker.
  • Improved RawDataSentinels supporting Organization data ingest including subsidiary data
  • Improved sub-record linking between Organizations and Founders.
  • Now force extraction of Headquarter address from HQ building entity.
  • Now ensure countries are always classified as administrative areas.
  • Populated missing address in location for 81Mil organizations.
  • Improved the error message returned for mismatched quotes in DQL queries.
  • Ensured users have the ability to stop or pause a crawl between crawl rounds from the Dashboard.
  • Forced the persistence of the assignment of a customAPI to a crawl job.
  • Set the article title in the field.
  • Now rank person images for Person profiles.
  • In DKG: facet-ing on parent key for enums now expand to .normalizedValue
  • Now cache Person and Organization images, including logos.

June 2019

by Jerome Choo
  • Committed to delivering 100% accuracy of 'Fortune 1000' Company entity profile core facts (name, headquarters location, website, CEO, founders, logo, isPublic, parent organization, year founded, stock ticker symbol and exchanges, twitter handle, size attributes - employee count & annual top-line revenues) in the Diffbot KnowledgeGraph (DKG).
  • Enhanced isPublic field population in the DKG.
  • Enhanced stock ticker symbol extraction in the DKG.
  • Fixed rules for assigning min and max employees to an Organization in the DKG.
  • Enriched 3Mil organizations with no revenue data in the DKG.
  • Improved selection of location for Organization.location in the DKG.
  • Improved evaluation of postal codes when an address has no street address in the DKG.
  • Enhanced age calculation/inference in the DKG.
  • Improved Candidate selection for email address and phone number in the DKG.
  • Added support for > and < for date/time fields in DQL.
  • Querying on a DiffbotURI is now strict by default in DQL.
  • Added support for type:Post (discussions) to DQL.
  • Added contextually embedded links to docs from the Crawlbot UI.

May 2019

by Jerome Choo
  • We addressed missing revenues for over 80Mil company entities in the Diffbot KnowledgeGraph (DKG).
  • Improved DKG entity postal code assignments.
  • Improved DKG entity Stock Exchange assignments
  • We removed cookie disclaimer text from DKG entity descriptions.
  • We improved Organization entity classification in the DKG.
  • We added the ability to facet on Organization name tokens in DQL.
  • We expanded currency support in the Diffbot extraction APIs to include ALL currencies in Europe in addition to the European Union (Euro currency standard).
  • We 
improved DQL error messages.
  • We lifted the limit on facet pagination.
  • Organization size attributes are now supported in facets.
  • We normalized Organization entity importance in the DKG to score between 1 and 100.

April 2019

by Jerome Choo
  • Improved Organization Data Quality (i.e. sub-record linking of CEOs and Founders) in the Diffbot KnowledgeGraph (DKG).
  • Added dedicated process to parse subsidiary entities in the DKG.
  • Added support for multiple Person/Organization descriptions in the DKG.
  • Fixed date/timestamp conversion bugs in DQL.
  • Optimized revenue.value and revenue.currency extractions for Organization profile data in the DKG.
  • Added support for pagination of facets in DQL.
  • Added support for querying by tags for type:Image in DQL.
  • Added facet count to the Diffbot KnowledgeGraph Search API response.

February 2019

by Jerome Choo
  • Extended coverage of Entities located or residing in Asia to the Diffbot KnowledgeGraph.
  • Added support for the strict operator to DQL.

December 2018

by Jerome Choo
  • Improved date/time extraction, timezone support in Diffbot extraction APIs.
  • Added support for 'has:'operator to DQL for Articles and Products.

October 2018

by Jerome Choo
  • Added DQL support for type:Product has:breadcrumb.name
  • Added support for computation of total investment when individual investments have different currencies (Organization Profile).
  • Added support for svg image file type for Entity images.
  • Added indexing of Entity description fields.
  • Improved tokenization for Chinese/Japanese tagging.
  • Added hit count for facets.

August 2018

by Jerome Choo
  • Launched the Diffbot Knowledge Graph including a new developer Dashboard, embedded ontology documentation, and an OpenAPI spec.