... when downloading from iplayer what option(s) do I need to use to get subtitles please? (I've looked in the wiki and noted the comments a page or two back but am still unclear). ...
You are right to be unclear, because it's not trivial. Actually, people and gnus everywhere are baffled by the procedure.
The following needs to be done (other combinations may be possible, as in the link quoted above):
- use the yt-dl options
--write-sub
(to force the subtitle file to be written) and --convert-subs srt
(to get the file in a format that the Humax player has a chance of understanding); you might also try --subtitle-format srt/best
to get srt format subtitles if they are available;
- edit the resulting UTF-8
.srt
file and save it in olde worlde ANSI text format, making the new subtitle file name match the media file name up to the extension (which means removing the .en-GB
); the libiconv package includes the iconv
command that can do this;
- strip out HTML-style tags that are left over from the ttml->srt conversion, which can be done with an editor that supports regular expressions, or on-box with
sed
.
The character encodings supported by the Humax player (what the manual calls subtitle languages) are displayed as you press the Sub button with a valid .srt file in place:
Latin 1
Latin 2 (Central/Eastern Europe)
Latin 4 (Northern Europe/Baltic)
Cyrillic
Greek
Turkish
It has been suggested that UTF-7 may be an acceptable encoding, but it's not one of the above. The level of encoding is an option in UTF-7: characters like <> can be sent as-is or encoded using an escape sequence beginning with +; in the latter case the Humax player won't recognise a Latin 1 .srt.
We'll be going with Latin 1 which corresponds to ANSI.
Code:
#!/bin/sh
# Usage: fixsttl media_file
# for a media file, convert its .locale.srt, if any, to plain text .srt
# can be overriden STTL_LANG=da-DK, etc
STTL_LANG=${STTL_LANG:-en-GB}
main() {
local ext froot srt
[ -n "$1" ] || return
# any other extensions?
for ext in mp4 mpg mkv; do
froot=${1%.$ext}
[ "$1" != "$froot" ] && break
done
[ "$1" != "$froot" ] || return
srt=${froot}.${STTL_LANG}.srt
[ -r "$srt" ] || return
# *.en-GB.srt -> *.srt
iconv -f UTF-8 -t LATIN1 "$srt" |
# strip <tags> and </tags>
sed -r -e 's@<[/a-zA-Z]+( [^>]*)?>@@g' > "${froot}.srt" &&
{ rm -f -- "$srt"; return 0; }
}
main "$@"
Updated: preferred pattern is test_that_must_be_true || return, so if the test fails the return value is false by default.
Ob. instructions: use the File Editor in WebIf>Diagnostics, or your preferred editor in a telnet or WebShell, to create
/mod/bin/fixsttl
, make it executable, add
--exec "fixsttl {}"
to your yt-dl or
qtube options. And you must have installed
libiconv.
Example:
Code:
# youtube --write-sub --sub-format srt/best --convert-subs srt --exec "fixsttl {}" 'https://www.bbc.co.uk/iplayer/episode/m000n7g4/top-of-the-pops-04011990'
[bbc.co.uk] m000n7g4: Downloading video page
[bbc.co.uk] m000n7g4: Downloading playlist JSON
[bbc.co.uk] m000n7g3: Downloading media selection XML
[bbc.co.uk] m000n7g3: Downloading captions
...
[bbc.co.uk] m000n7g3: Downloading m3u8 information
...
[bbc.co.uk] m000n7g3: Downloading MPD manifest
...
[bbc.co.uk] m000n7g3: Downloading m3u8 information
...
[bbc.co.uk] m000n7g3: Downloading MPD manifest
...
[bbc.co.uk] m000n7g3: Downloading m3u8 information
[bbc.co.uk] m000n7g3: Downloading m3u8 information
...
[bbc.co.uk] m000n7g3: Downloading media selection XML
[bbc.co.uk] m000n7g3: Downloading captions
...
[bbc.co.uk] m000n7g3: Downloading MPD manifest
...
[info] Writing video subtitles to: /media/drive1/Video/Top_of_the_Pops_04_01_1990.en-GB.ttml
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 225
[download] Destination: /media/drive1/Video/Top_of_the_Pops_04_01_1990.mp4
[download] 100% of 367.73MiB in 06:48
[ffmpeg] Fixing malformed AAC bitstream in "/media/drive1/Video/Top_of_the_Pops_04_01_1990.mp4"
[ffmpeg] Converting subtitles
WARNING: You have requested to convert dfxp (TTML) subtitles into another format, which results in style information loss
Deleting original file /media/drive1/Video/Top_of_the_Pops_04_01_1990.en-GB.ttml (pass -k to keep)
[exec] Executing command: fixsttl /media/drive1/Video/Top_of_the_Pops_04_01_1990.mp4
#
And see the dreadfully grainy image.
Now what we need is for the BBC to show some programme about web page coding to see what happens to a line like "You shouldn't use the <marquee> tag but you can create a scrolling text effect with CSS Level 3 instead".