When exporting data from collections via DQL, you have always had the option of specifying ONLY the fields you want to be returned in the JSON output by using the '&filter=' param, i.e. &filter=id%20name%20homepageUri added to [ https://kg.diffbot.com/kg/v3/dql?type=query&token=TOKEN&query=type%3AOrganization+types%3A%22Company%22&size=25&filter=id%20name%20homepageUri ]. But this approach can be unwieldy if you have a long list of attributes to include or if you only want to exclude a few attributes per entity in the output.

Now you can exclude fields you do not want to be returned when exporting data by adding the '&filterExclude=' param, i.e. &filterExclude=subsidiaries%20technologies%20customers added to [ https://kg.diffbot.com/kg/v3/dql?type=query&token=TOKEN&query=type%3AOrganization+types%3A%22Company%22&size=25&filterExclude=subsidiaries%20technologies%20customers ].

Invite a User

by Kris Negulescu

We have updated the dashboard to better support managing your team. You can now invite a user to share your primary (parent token) to manage shared bulk extraction tasks, bulk enrichment tasks, and crawl jobs. Or, designate a child token for that user so that they access Diffbot services independently using your account budget. Check out the new features in your Dashboard UI here: https://app.diffbot.com/diffbot-users/invite/.

Diffbot GraphRAG LLM

by Kris Negulescu

Recently, large language models (LLMs) have been trained with more and more data, leading to an increase in the number of parameters and the computing power needed. But, what if, instead of feeding the model more data, we purposefully trained it to rely less on its pretraining data and more on its ability to find external knowledge?

To test this idea, we fine-tuned LLama 3.3 70B to be an expert tool user of a real-time Knowledge Graph API, providing the first open-source implementation of a GraphRAG system that outperforms Google Gemini and ChatGPT. To learn more, see: https://github.com/diffbot/diffbot-llm-inference/.

type:CompanyReport

by Kris Negulescu

Company Reports are now available in the KnowledgeGraph as type:CompanyReport, and in LeadGraph in company profiles. There are two primary types of reports available:

  • SEC Filings
  • Documents found on a company’s website, like earnings call transcripts, annual reports, ESG reports, etc.

Initial coverage focuses on the top 1000, publicly-traded companies in the United States.

Please Note: there will be a scheduled maintenance window, including downtime for approximately 50 minutes, on Thursday, November 14th from 10 am PST until 2 pm PST.

The Diffbot DevOps team will be using this time to upgrade some of the underlying infrastructure supporting global crawls. To ensure the rapid restoration of uptime and stability of the platform, we are performing this upgrade during ordinary business hours.

During this maintenance window, all updates to the Diffbot Knowledge Graph will be paused. Access to Organization and Person data will continue. Access to Article data will be limited, i.e. you will not be able to download new article data from the graph for your sources, nor will you be able to access article data crawled more than 5 months ago. All other graph data types, including Products & Events, will be inaccessible.

Other Diffbot services will continue to be operational including Crawlbot, all Extraction APIs, the Bulk Extract API, the Natural Language API, and the Enhance APIs.

EventAPI Enhancements

by Kris Negulescu

We have —

  • Added categories for events
  • Improved rule handling for event location, date, title, and description.
  • Improved title, image, start & end date, and timezone extraction.
  • Added support for extraction of location from maps as well as text.

News on LeadGraph is a new feature that allows anyone to monitor breaking news for key risk and opportunity signals.

Monitor key business events like

New Products
Partnerships
Mergers & Acquisitions
Executive Hires
Funding
Private Equity
Layoffs

For more personalized monitoring, you can also curate keyword and company lists. Unlike Diffbot APIs, LeadGraph is accessible via trial, only, for now. Reach out to us for access.

In July, we worked on better normalization for videos in HTML5 along with enhanced support for video controls and audio elements in podcasts.