[youtube-dl] Download files from youtube.com or other video platforms

... Is this some kind of browser timeout rather than anything else?

Quite possibly. The lighttpd server can time out a request if it takes too long to complete. For instance, an immediate transcoding operation can show some amount of progress and then refresh as if it had finished. This can happen even if the server is still being fed with data to send to the browser. The associated ffmpeg can carry on running and may, depending on the requested operation, complete in the background before the system gets powered off (or the heat death of the universe, for those who run 24/7).
 
Has anyone tried downloading a Channel 4 video recently?
I have tried going to the Ch4 web page (in this instance https://www.channel4.com/programmes/renovation-nation/on-demand/72643-020) and using that as the URL but it errors out as follows: -

15:09:56 - Starting immediate download of https://www.channel4.com/programmes/renovation-nation/on-demand/72643-020 Options --playlist-start 20 15:11:26 - [generic] 72643-020: Requesting header
15:11:28 - [generic] 72643-020: Downloading webpage
15:11:28 - [generic] 72643-020: Extracting information
15:11:49 - Caught error: WARNING: Falling back on generic information extractor. ERROR: Unsupported URL: https://www.channel4.com/programmes/renovation-nation/on-demand/72643-020

so I tried digging down in the HTML code for the direct video url, but it looks like it is an HTML5 blob that they use as it wouldn't download that url either.
blob:https://www.channel4.com/73584a79-60b3-4e9d-90fb-5dffa0271509

This gives the following error (after removing the blob: off the front): -
15:28:27 - Starting immediate download of https://www.channel4.com/73584a79-60b3-4e9d-90fb-5dffa0271509 Options
15:29:54 - [generic] 73584a79-60b3-4e9d-90fb-5dffa0271509: Requesting header
15:29:55 - [generic] 73584a79-60b3-4e9d-90fb-5dffa0271509: Downloading webpage
15:29:56 - Caught error: WARNING: Could not send HEAD request to https://www.channel4.com/73584a79-60b3-4e9d-90fb-5dffa0271509: HTTP Error 404: Not Found ERROR: Unable to download webpage: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

I also tried using the verbose flag producing ...
15:33:12 - Starting immediate download of https://www.channel4.com/73584a79-60b3-4e9d-90fb-5dffa0271509 Options --verbose
15:34:43 - [generic] 73584a79-60b3-4e9d-90fb-5dffa0271509: Requesting header
15:34:44 - [generic] 73584a79-60b3-4e9d-90fb-5dffa0271509: Downloading webpage
15:34:46 - Caught error: [debug] System config: [u'--restrict-filenames', u'--prefer-ffmpeg', u'-f', u'best[height<=?1080][fps<=?60]', u'-o', u'/mnt/hd2/My Video/%(title)s.%(ext)s']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'--newline', u'--verbose', u'https://www.channel4.com/73584a79-60b3-4e9d-90fb-5dffa0271509'] [debug] Encodings: locale ASCII, fs ASCII, out None, pref ASCII
[debug] youtube-dl version 2022.09.03
[debug] Python version 2.7.1 (CPython) - Linux-2.6.18-7.1-7405b0-smp-with-libc0
[debug] exe versions: ffmpeg 4.1, ffprobe 4.1
[debug] Proxy map: {}
WARNING: Could not send HEAD request to https://www.channel4.com/73584a79-60b3-4e9d-90fb-5dffa0271509: HTTP Error 404: Not Found ERROR: Unable to download webpage: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/extractor/common.py", line 634, in _request_webpage return self._downloader.urlopen(url_or_request) File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/YoutubeDL.py", line 2300, in urlopen return self._opener.open(req, timeout=self._socket_timeout) File "/mod/lib/python2.7/urllib2.py", line 398, in open response = meth(req, response) File "/mod/lib/python2.7/urllib2.py", line 511, in http_response 'http', request, response, code, msg, hdrs) File "/mod/lib/python2.7/urllib2.py", line 436, in error return self._call_chain(*args) File "/mod/lib/python2.7/urllib2.py", line 370, in _call_chain result = func(*args) File "/mod/lib/python2.7/urllib2.py", line 519, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
 
C4, C5 use DRM and are not supported by youtube-dl. I think UKTV (Dave, Yesterday, Drama, etc) and Sky (Pick, Challenge, etc) also.

If you were able to download a video it would not be watchable, but would be very useful for filling up a disk (and not much else). Of course you could take the view that eventually a magic spell will be found that turns your pseudo-random data into a viewable file.

Depending on which content is involved and what changes were made to the streaming site that day, Discovery (Quest, HGTV, etc) might be accessible.

See also this page of relevant earlier posts.
 
As of yesterday ITVX was launched and, at least for now, ITV shows are not available either.

Existing Hub URLs redirect to the ITVX /watch/ page which has a different structure.

If the same show was shown in Scotland, the equivalent STV page should still be handled, as STV uses a different video platform.
 
Last edited:
Just tried a download from iplayer, which was working ok about a month ago, but got this error:
Code:
00:56:25 - Caught error: Traceback (most recent call last):
  File "/mod/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/mod/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/mnt/hd2/mod/lib/python2.7/dist-packages/youtube-dl/__main__.py", line 16, in <module>
    import youtube_dl
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/__init__.py", line 15, in <module>
    from .options import (
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/options.py", line 8, in <module>
    from .downloader.external import list_external_downloaders
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/downloader/__init__.py", line 18, in <module>
    from .hls import HlsFD
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/downloader/hls.py", line 12, in <module>
    from .external import FFmpegFD
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/downloader/external.py", line 14, in <module>
    from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/postprocessor/__init__.py", line 3, in <module>
    from .embedthumbnail import EmbedThumbnailPP
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/postprocessor/embedthumbnail.py", line 8, in <module>
    from .ffmpeg import FFmpegPostProcessor
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/postprocessor/ffmpeg.py", line 12, in <module>
    from ..utils import (
ImportError: cannot import name process_communicate_or_kill
All my packages are showing as up to date.
 
Somehow the utils.py in your installation is out of date.

At a gross level, you could force-reinstall the youtube-dl package (v.2022.09.03-2) using the WebIf Diagnostics page or the opkg command in a shell session.

Or check that you have this
Code:
# ls -l /mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/utils.py*
-rw------- 1 root root 173371 Dec 31  1999 /mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/utils.py
-rw------- 1 root root 201711 Oct 14 14:04 /mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/utils.pyc
#
where the date of the .pyc file is some date later than the installation. I suppose there's a reason why the .py files get timestamped at the end of the last millennium rather than the date of the release.

What I suspect is that somehow you have an old utils.pyc (Python byte code) file. When importing module Python goes for module.pyc if it's newer than module.py, or otherwise recompiles module.py and uses that, unless compilation is disabled.
 
Thanks /df, that's sorted it.
My utils.py matched your example but my utils.pyc was "199719 Apr 27 2022", a different size to yours so clearly a different version. Not sure how that came about. I renamed it to force the use of the .py one and my download is now downloading away happily, and a new utils.pyc has been created, size matching yours, as would be expected.
 
I suppose there's a reason why the .py files get timestamped at the end of the last millennium rather than the date of the release.
It's what the real release did, so I copied it (having patched the relevant files). I guess the moral of that story is don't copy other people's bizarre practices when it seems stupid. Especially as there is no real current release and our package is now just being generated via a github download with contemporaneous timestamps.
 
Exactly the sort of reason why there's no current release ...

The Makefile code from the last release does this: touch -t 200001010101 zip/youtube_dl/*.py

That syntax is sensitive to TZ. The actual resulting modification time is 1999-12-31 18:01:00.000000000 +0000.

So first, why doesn't it use UTC like usany sensible person would for a globally significant time?

Then, why doesn't it extract the version from youtube_dl/version.py with sed and use that instead?
Code:
TZ=UTC touch -t "$(sed -rn -e "/^__version__[[:space:]]/{s/^.+[= ]'//;s/[.']//g;p;q}" youtube_dl/version.py)0101" ...
 
May I ask, should YouTube-dl work with itvx? I have latest version and just GUI to start a download, with no options set. It sits at “Requesting header” on all my tests.
Thank you.
 
Try replacing extractor/itv.py with this one. It may require a recent (say, the latest) GitHub master version, since it uses some utility functions only recently pulled from the yt-dlp downstream project.
 
Last edited:
I packaged it and tried it, but it doesn't work:
Code:
humax# youtube -v https://www.itv.com/watch/news/the-latest-itv-news-headlines-as-football-great-pele-dies-aged-82/6js5d0f
[debug] System config: [u'--restrict-filenames', u'--prefer-ffmpeg', u'-f', u'best[height<=?1080][fps<=?60]', u'-o', u'/
mnt/hd2/My Video/%(title)s.%(ext)s']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.itv.com/watch/news/the-latest-itv-news-headlines-as-football-great-pele
-dies-aged-82/6js5d0f']
[debug] Encodings: locale ASCII, fs ASCII, out ASCII, pref ASCII
[debug] youtube-dl version 2022.11.13
[debug] Python version 2.7.1 (CPython) - Linux-2.6.18-7.1-7405b0-smp-with-libc0
[debug] exe versions: ffmpeg 4.1, ffprobe 4.1
[debug] Proxy map: {}
[generic] 6js5d0f: Requesting header
WARNING: Could not send HEAD request to https://www.itv.com/watch/news/the-latest-itv-news-headlines-as-football-great-p
ele-dies-aged-82/6js5d0f: <urlopen error The read operation timed out>
[generic] 6js5d0f: Downloading webpage
ERROR: Unable to download webpage: <urlopen error The read operation timed out> (caused by URLError(SSLError('The read o
peration timed out',),))
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/extractor/common.py", line 635, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/YoutubeDL.py", line 2300, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/mod/lib/python2.7/urllib2.py", line 392, in open
    response = self._open(req, data)
  File "/mod/lib/python2.7/urllib2.py", line 410, in _open
    '_open', req)
  File "/mod/lib/python2.7/urllib2.py", line 370, in _call_chain
    result = func(*args)
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/utils.py", line 2789, in https_open
    req, **kwargs)
  File "/mod/lib/python2.7/urllib2.py", line 1161, in do_open
    raise URLError(err)
 
Thanks, that's a pattern that hasn't previously been used (having "slug" text in the penultimate path component instead of a series ID). However, after fixing that and various other issues to get reasonable results for the news page, I find that the Vera test page is now giving us media links that 403. So, more work needed.
 
This is now passing tests for me, but without any explanation of why the 403s started and stopped.
 

Attachments

  • itv.py.zip
    3.7 KB · Views: 5
I just replaced the itv.py file, everything else the same - different result, but still no go:
Code:
...
[debug] Proxy map: {}
[ITV] 6js5d0f: Downloading webpage
[ITV] 6js5d0f: Downloading JSON metadata
ERROR: Unable to download JSON metadata: <urlopen error [Errno 1] _ssl.c:499: error:140770FC:SSL routines:SSL23_GET_SERV
ER_HELLO:unknown protocol> (caused by URLError(SSLError(1, '_ssl.c:499: error:140770FC:SSL routines:SSL23_GET_SERVER_HEL
LO:unknown protocol'),))
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/extractor/common.py", line 635, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/YoutubeDL.py", line 2300, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/mod/lib/python2.7/urllib2.py", line 392, in open
    response = self._open(req, data)
  File "/mod/lib/python2.7/urllib2.py", line 410, in _open
    '_open', req)
  File "/mod/lib/python2.7/urllib2.py", line 370, in _call_chain
    result = func(*args)
  File "/mod/lib/python2.7/dist-packages/youtube-dl/youtube_dl/utils.py", line 2789, in https_open
    req, **kwargs)
  File "/mod/lib/python2.7/urllib2.py", line 1161, in do_open
    raise URLError(err)
 
Without going into detail tonight, it looks like the FakeHTTP patch that re-implements yt-dl's web requests using wget (for more modern SSL/TLS support) isn't being included.

1. Add the module extractor/fakehttp.py (dependency on wget) from one of the earlier versions if not present.
2. Edit the extractor/itv.py file:
Code:
-from .common import InfoExtractor
+from .fakehttp import FakeHTTP as InfoExtractor
 
Back
Top