March 25th, 2016

2016-03-25

by Jerome Choo

Updated Crawlbot seeds behavior, so that if a non-www subdomain is specified as the only seed URL, crawling will restrict itself to that subdomain.
Significant updates to the beta normalizedSpecs field in the Product API. See more details.
Added the field parentUrlDocId to Crawlbot and Bulk Processing JSON objects. This field can be used to match objects to URLs in the Crawlbot or Bulk Processing URL Report.

March 10th, 2016

2016-03-10

by Jerome Choo

Added the originalType field to extracted objects when utilizing the Analyze API's' fallback argument.
Fixed an issue with our Semantria integration that could lead to errant timeout responses.

February 25th, 2016

by Jerome Choo

Fixed an issue in the Article API to prevent in-line Javascript and CSS from being returned in the html field from unsupported video players.
Discussion API: Improved extraction from single-post (no reply) conversations.
Improvements to video extraction within the Video API.

January 29th, 2016

by Jerome Choo

Added beta fields quantityPrices, priceRange and multiplePrices to the Product API.
Improved availability detection and extraction in the Product API.
Improved offerPrice detection in the Product API to reduce the chance of returning an incorrect value from unavailable products or items without a visible price.

January 26th, 2016

by Jerome Choo

Significant speed improvements to the Global Index.

January 21st, 2016

by Jerome Choo

Released an official endpoint for Custom API management. Please see the documentation for information on programmatic management of custom rules and APIs.
Improved video extraction in the Article API to include new providers and HTML5 <video> elements.
max:date queries in the Search and Global Index APIs are now inclusive of the date specified.

January 14th, 2016

by Jerome Choo

Improved specification extraction in the Product API.
Fixed an issue where the estimatedDate field (Article API) would sometimes not be correctly computed.

January 7th, 2016

by Jerome Choo

Fixed an issue where the <base> element could be incorrectly use to calculate relative paths.
Added initial functionality to categorize articles in the Article API based on article text content. If you would like to test this beta feature, contact us.
Improved handling of media sources without a specified protocol (e.g. src="//www.youtube.com...). Media element URLs will now match the protocol of the analyzed page.

December 21st, 2015

by Jerome Choo

Crawlbot and Bulk jobs pending delete (per your Diffbot plan) are now identified in the Crawlbot and Bulk interfaces.
The API Toolkit now uses Diffbot's custom rendering engine for live web page previews. This should reduce inaccuracies when creating custom rules.

December 18th, 2015

by Jerome Choo

Fixed an issue where plain-text POSTed to the Article API would not perform text analysis (tags, sentiment, language-detection).
Improved Crawlbot behavior on Ajax-heavy sites so that pages with the exact same HTML source are no longer deduplicated.
Fixed an issue within the Crawlbot and Bulk interfaces where the "Last 500" URL Report was incorrectly returning the first 500.
Improved author detection within the Article API.