Quick guide to Extract DVB-Subtitles from TS HD file and convert to SRT in minutes

Back in May '17 and post #11, I couldn't understand "Why". I still don't.
Why would you want to do this other than the "Because I can" approach?
Can someone tell me please?
 
Tesseract is a general-purpose OCR engine that has long been the default choice for Linux, where it can be used from the console or incorporated into applications, such as scanning software. So far as I know it does not have a dedicated GUI in either Linux or Windows. So yes, it is a 'stock' engine.

The last files I tried this extraction process on were the final episodes of 'Black Earth Rising' and 'Strangers'. The first is largely set in Rwanda and apart from a lot of very unfamiliar names, it also contains French, including some doubtful terms like 'genocidaire'. The latter is set in Hong Kong, so Chinese proper nouns. Some of these resemble English words or acronyms, so the name 'Xo' confused the spell checker which wanted to render it as 'X0' or 'XO'. There were very few actual character recognition errors: those that did occur mostly involved double l (ll) or m, which in some circumstances can be detected as hi or nn or hn. Arguably, it might be better just to turn off the spell checking: this would pretty much eliminate the uncertainty caused by foreign names being spell checked, at the expense of letting a small number of recognition errors through.
 
@Trev: why do it?

1) My ageing ears have increasing difficulty with a) American TV and b) BBC audio. I like to keep subtitles on.

2) DVB subtitles are over-large, hideous and distracting. I can adjust srt subs to be inconspicuous in a small font at the bottom of the screen, where I can ignore them until I need them.

3) I use my Hummy purely as a recorder (it's in a bedroom). I clean up all recordings and store them on my NAS in mkv or mp4 format for use throughout the house, mostly viewed using a dedicated PC or video streamer. Once a recording is edited to remove unwanted material, including ads, DVB subs are lost anyway.

4) Sometimes I can't get a specific set of subs online. Particularly the case with non-drama programming.

5) I have time to play, it's a hobby and I'm probably a little bit OCD...

We all organise our viewing in our own way. Personally, I'm at a complete loss to understand why anyone would want to be bothered to download iPlayer or YouTube material using a Hummy or to decrypt programmes off-box.
 
Last edited:
Arguably, it might be better just to turn off the spell checking: this would pretty much eliminate the uncertainty caused by foreign names being spell checked, at the expense of letting a small number of recognition errors through.
That's exactly what I was arguing.
 
We all organise our viewing in our own way. Personally, I'm at a complete loss to understand why anyone would want to be bothered to download iPlayer or YouTube material using a Hummy or to decrypt programmes off-box.
I'd certainly agree with the first point(s).
Even though I've written the Windows version of the off-box decryption (following af123's work on this), I can't see why people are using it in place of other methods. Especially when the people using it seem to have the custom firmware. It's not as though I can even use it with my 2000T (at present). Before you ask why I wrote it - I refer you to your point 5. "I have time to play, it's a hobby and I'm probably a little bit OCD..." :D
 
Back
Top