February 27th, 2018

2018-02-27

by Jerome Choo

URL Report downloads are now sorted in newest-first order
Crawlbot now indexes the seed URL of each extracted object in the fromSeedUrl field.

January 5th, 2018

2018-01-05

by Jerome Choo

Crawlbot API: Added the useCanonical argument to allow disabling of canonical URL deduplication on specific crawls.

November 10th, 2017

by Jerome Choo

October 30th, 2017

by Jerome Choo

Custom API fields using the attribute filter will now return all matching selector values, not just the first attribute match.

October 25th, 2017

by Jerome Choo

Crawlbot and Bulk Service data retrieval no longer requires access to port :18100. Data downloads are also now HTTPS-only.

October 16th, 2017

by Jerome Choo

Fixed a rare issue where custom rules could be accidentally deleted.
Significant performance improvements in the Search API.
Improved crawling performance and site coverage in the Global Index.
Improved ability to identify, analyze and return background images in all extraction APIs.

August 31st, 2017

by Jerome Choo

Fixed an issue in the Video API where the url value would retain HTML escaping if present within the original page source.
Fixed a rare crawling issue that occasionally resulted in "Bad IP" status messages for individual pages.
Fixed an issue where empty <video> elements could be returned in the Article API.

August 15th, 2017

by Jerome Choo

Fixed an issue in the Global Index in which complicated Boolean (OR) queries would return no results.

August 8th, 2017

by Jerome Choo

May 22nd, 2017

by Jerome Choo

Many improvements to brand detection in the Product API.
Resolved an issue where humanLanguage could be mis-identified on some Spanish-language pages.