The Video API automatically extracts detailed video information—including most metadata, thumbnail images, direct video URL and embed code from nearly any video page or video platform on the web.
The Video API automatically extracts detailed video information—including most metadata, thumbnail images, direct video URL and embed code from nearly any video page or video platform on the web.
Test drive Video API without a trial token at diffbot.com/testdrive.
Response
The Video API returns data in JSON format.
Each response includes a request
object (which returns request-specific metadata), and an objects
array, which will include the extracted information for all objects on a submitted page.
Objects in the Video API's objects
array will include the following fields:
Field | Description |
---|---|
type | Type of object (always video ). |
pageUrl | URL of submitted page / page from which the video is extracted. |
resolvedPageUrl | Returned if the pageUrl redirects to another URL. |
title | Title of the video. |
text | Text description, if available, of the video. |
url | Direct link to source video file, if available. |
html | Embeddable HTML of the video (if available), typically an IFRAME or VIDEO object. |
embedUrl | Embeddable URL, if available. |
author | Video uploader or creator, if available. |
date | Date of extracted video, normalized in most cases to RFC 1123 (HTTP/1.1). |
duration | Duration in seconds of the Video. |
viewCount | Number of Video views, if available on the page. |
naturalHeight | Raw video height, if available, in pixels. |
naturalWidth | Raw video width, if available, in pixels. |
images | Array of images, if present within the video. |
↳url | Fully resolved link to image. If the image SRC is encoded as base64 data, the complete data URI will be returned. |
↳title | Description or caption of the image. |
mime | MIME type, if available, as specified by the Video's "Content-Type." |
humanLanguage | Returns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature. |
diffbotUri | Unique object ID. The diffbotUri is generated from the values of various Video fields and uniquely identifies the object. This can be used for deduplication. |
Optional fields, available using fields= argument | |
links | Returns a top-level object (links ) containing all hyperlinks found on the page. |
meta | Returns a top-level object (meta ) containing the full contents of page meta tags, including sub-arrays for OpenGraph tags, Twitter Card metadata, schema.org microdata, and -- if available -- oEmbed metadata. |
querystring | Returns any key/value pairs present in the URL querystring. Items without a discrete value will be returned as true . |
breadcrumb | Returns a top-level array (breadcrumb ) of URLs and link text from page breadcrumbs. |
The following is an example response for a successfully extracted Youtube video.
{
"request": {
"pageUrl": "https://www.youtube.com/watch?v=hFZFjoX2cGg",
"api": "video",
"version": 3
},
"objects": [
{
"date": "Sun, 24 May 2020 07:00:00 GMT",
"images": [
{
"diffbotUri": "image|3|231854607",
"title": "Backyard Squirrel Maze 1.0- Ninja Warrior Course",
"url": "https://i.ytimg.com/vi/hFZFjoX2cGg/maxresdefault.jpg",
"primary": true
}
],
"author": "Mark Rober",
"mime": "video/mp4",
"naturalHeight": 720,
"diffbotUri": "video|3|1870173316",
"type": "video",
"title": "Backyard Squirrel Maze 1.0- Ninja Warrior Course",
"url": "https://rr6---sn-5uaeznks.googlevideo.com/videoplayback?expire=1649742262&ei=Vr1UYoLhFpDm8wTUt76wAQ&ip=23.229.39.25&id=o-AOUTPxsk0l8eAnvqC6G9PtYnkFK4S-lYoQ1G-mt8W40U&itag=399&aitags=133%2C134%2C135%2C136%2C137%2C160%2C242%2C243%2C244%2C247%2C248%2C278%2C394%2C395%2C396%2C397%2C398%2C399&source=youtube&requiressl=yes&mh=s4&mm=31%2C29&mn=sn-5uaeznks%2Csn-5ualdnl7&ms=au%2Crdu&mv=u&mvi=6&pl=23&spc=4ocVC0JfKoZvX0FSPklAcHZ7LBTF&vprv=1&mime=video%2Fmp4&ns=ZSXKPHKFPvaGHi2sm3qErYcG&gir=yes&clen=246339629&dur=1220.093&lmt=1637823031873234&mt=1649720229&fvip=3&keepalive=yes&fexp=24001373%2C24007246&c=WEB&txp=5531432&n=b2VCd9pxXsFyoV_m&sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRgIhAKiKoHk2DfvaKA3Adu0CkNNF_lOavT4Mk0CYkQvbG0wpAiEAgiAc4h_1GqjzjGXgdiKdgdWZXWM0M9ubY-ns1qhGyns%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl&lsig=AG3C_xAwRQIhAM5DsCeq2RzEblNQ3qRq-3xEydeAidszCKAg8PvahEK1AiAzkjRXQQmDItpKWMxLq1TTb_32NWjPxiTDIVeOrBvxLw%3D%3D",
"naturalWidth": 1280,
"duration": 1220,
"provider": "YouTube",
"humanLanguage": "en",
"html": "<iframe src=\"https://www.youtube.com/embed/hFZFjoX2cGg\" frameborder=\"0\" allowfullscreen></iframe>",
"pageUrl": "https://www.youtube.com/watch?v=hFZFjoX2cGg",
"text": "Squirrels were stealing my bird seed so I solved the problem with mechanical engineering :)\n\nHere is an explanation of the illusion dish thing!- https://demos.smu.ca/index.php/demos/optics/69-mirage-mirror\n\nHere is a link to the illusion dish (not sponsored :) https://www.amazon.com/dp/B0718XCG7F/ref=cm_sw_em_r_mt_dp_U_D9PYEbC85X14F\n\n*MUSIC*\n0:02 - Arrow (Instrumental) - Andrew Applepie http://andrewapplepie.com/\n0:27 - Kalimba Jam - Blue Wednesday https://soundcloud.com/bluewednesday/\n3:21 - Zambo - Devil in Disguise https://danijel-zambo.bandcamp.com/track/devil-in-disguise-2 \n3:47 - Cereal Killa - Blue Wednesday https://soundcloud.com/bluewednesday/\n5:28 - J. Thompson - Real Quick Lovin' https://www.amazon.com/Real-Quick-Lovin/dp/B0010YGE3W\n5:39 - New Shoes - Blue Wednesday https://soundcloud.com/bluewednesday/\n7:51- Chi- Ponder- https://www.prodbyponder.com/5-free-beats32029775\n8:31 - Marimba Idea - Blue Wednesday https://soundcloud.com/bluewednesday/ \n9:25 - Josef Falkenskold - Tiny Tumble https://www.epidemicsound.com/artists/josef-falkenskold\n19:07 - Nik- Ponder- https://www.prodbyponder.com/5-free-beats32029775",
"viewCount": 92199455
}
]
}
Optional Fields
Video API may also return some optional fields if specified. (comma delimited) in the &fields=
argument.
Already have the source HTML? POST it to Video API.
Video API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.