2018-02-27

by Jerome Choo
  • URL Report downloads are now sorted in newest-first order
  • Crawlbot now indexes the seed URL of each extracted object in the fromSeedUrl field.

2018-01-05

by Jerome Choo
  • Crawlbot API: Added the useCanonical argument to allow disabling of canonical URL deduplication on specific crawls.

2017-11-10

by Jerome Choo
  • Significant improvements to Video API site support.

2017-10-30

by Jerome Choo
  • Custom API fields using the attribute filter will now return all matching selector values, not just the first attribute match.

2017-10-25

by Jerome Choo

Crawlbot and Bulk Service data retrieval no longer requires access to port :18100. Data downloads are also now HTTPS-only.

2017-10-16

by Jerome Choo
  • Fixed a rare issue where custom rules could be accidentally deleted.
  • Significant performance improvements in the Search API.
  • Improved crawling performance and site coverage in the Global Index.
  • Improved ability to identify, analyze and return background images in all extraction APIs.

2017-08-31

by Jerome Choo
  • Fixed an issue in the Video API where the url value would retain HTML escaping if present within the original page source.
  • Fixed a rare crawling issue that occasionally resulted in "Bad IP" status messages for individual pages.
  • Fixed an issue where empty <video> elements could be returned in the Article API.

2017-08-15

by Jerome Choo

Fixed an issue in the Global Index in which complicated Boolean (OR) queries would return no results.

2017-08-08

by Jerome Choo
  • Improved date normalization to include Hijri and Jalali dates
  • Fixed support for unicode characters in API Toolkit rules

2017-05-22

by Jerome Choo
  • Many improvements to brand detection in the Product API.
  • Resolved an issue where humanLanguage could be mis-identified on some Spanish-language pages.