We are at risk of getting blocked from this website, so we've temporarily limited access.
{
"errorCode": 500,
"error": "Site has received too many requests. Please try again later."
}
This error happens when too many people are using Diffbot to hit this website at the same time, putting us at risk of being permanently blocked, so we purposefully slow things down for everyone.
It's not ideal, but it's better than being completely blacklisted. This error is most common with sites like Walmart, Amazon, BestBuy, and other well known sites. These websites are generally very aggressive in blocking automated visitors like Diffbot, so we need to be extra vigilant about access.
There are a couple of ways to get around this. The easiest method is to simply wait out the hold. Give it a few minutes and try again. If that doesn't work, here are some other solutions.
Solution #1: Use a whitelisted user agent
If you have a friendly relationship with the website you're trying to extract from. We recommend reaching out to their web team and kindly request for a whitelisted user agent. With a whitelisted user agent added to Custom Extract API , access will only be limited by any limitations established by your partnership with the website.
Solution #2: Extract from HTML directly
Another method to bypass this error is to obtain the HTML of the website by your own means and send the HTML to Extract API to process. This circumvents the need for Diffbot to access the website directly.
Solution #3: Use a proxy service
One last option is to use a proxy service like Bright Data . Proxies are servers that are purpose built for accessing websites and may features that will help you work around access limitations. From here, you can forward your proxy credentials to Extract to use on your behalf, or follow Solution #2 above.
To forward your proxy credentials to Extract, pass in the proxy
IP address and proxyAuth
credentials as parameters in your Extract call. Like so:
https://api.diffbot.com/v3/article?token=TOKEN&proxy=0.0.0.0&proxyAuth=username:password&url=https%3A%2F%2Fdocs.diffbot.com%2Fdocs
Still Not Working?
Let us help! Share your troubleshooting attempts with us at [email protected].