• The forum software that supports hummy.tv has been upgraded to XenForo 2.1!

    This upgrade brings a number of improvements including the ability to bookmark posts to come back to later. Please bear with us as we continue to tweak things and open a new thread for any questions, issues or suggestions in Site/Forum Issues.

Quick guide to Extract DVB-Subtitles from TS HD file and convert to SRT in minutes

Trev

The Dumb One
Back in May '17 and post #11, I couldn't understand "Why". I still don't.
Why would you want to do this other than the "Because I can" approach?
Can someone tell me please?
 

fenlander

Active Member
Tesseract is a general-purpose OCR engine that has long been the default choice for Linux, where it can be used from the console or incorporated into applications, such as scanning software. So far as I know it does not have a dedicated GUI in either Linux or Windows. So yes, it is a 'stock' engine.

The last files I tried this extraction process on were the final episodes of 'Black Earth Rising' and 'Strangers'. The first is largely set in Rwanda and apart from a lot of very unfamiliar names, it also contains French, including some doubtful terms like 'genocidaire'. The latter is set in Hong Kong, so Chinese proper nouns. Some of these resemble English words or acronyms, so the name 'Xo' confused the spell checker which wanted to render it as 'X0' or 'XO'. There were very few actual character recognition errors: those that did occur mostly involved double l (ll) or m, which in some circumstances can be detected as hi or nn or hn. Arguably, it might be better just to turn off the spell checking: this would pretty much eliminate the uncertainty caused by foreign names being spell checked, at the expense of letting a small number of recognition errors through.
 

fenlander

Active Member
@Trev: why do it?

1) My ageing ears have increasing difficulty with a) American TV and b) BBC audio. I like to keep subtitles on.

2) DVB subtitles are over-large, hideous and distracting. I can adjust srt subs to be inconspicuous in a small font at the bottom of the screen, where I can ignore them until I need them.

3) I use my Hummy purely as a recorder (it's in a bedroom). I clean up all recordings and store them on my NAS in mkv or mp4 format for use throughout the house, mostly viewed using a dedicated PC or video streamer. Once a recording is edited to remove unwanted material, including ads, DVB subs are lost anyway.

4) Sometimes I can't get a specific set of subs online. Particularly the case with non-drama programming.

5) I have time to play, it's a hobby and I'm probably a little bit OCD...

We all organise our viewing in our own way. Personally, I'm at a complete loss to understand why anyone would want to be bothered to download iPlayer or YouTube material using a Hummy or to decrypt programmes off-box.
 
Last edited:

Black Hole

May contain traces of nut
Arguably, it might be better just to turn off the spell checking: this would pretty much eliminate the uncertainty caused by foreign names being spell checked, at the expense of letting a small number of recognition errors through.
That's exactly what I was arguing.
 

EEPhil

Number 28
We all organise our viewing in our own way. Personally, I'm at a complete loss to understand why anyone would want to be bothered to download iPlayer or YouTube material using a Hummy or to decrypt programmes off-box.
I'd certainly agree with the first point(s).
Even though I've written the Windows version of the off-box decryption (following af123's work on this), I can't see why people are using it in place of other methods. Especially when the people using it seem to have the custom firmware. It's not as though I can even use it with my 2000T (at present). Before you ask why I wrote it - I refer you to your point 5. "I have time to play, it's a hobby and I'm probably a little bit OCD..." :D
 
Top