Extended characters in EPG

af123

Administrator
Staff member
I've just pushed an updated epg package up to the repository which will now properly handle extended characters (such as those with accents but also things like £ signs) within EPG names and synopses.



This improved data will also flow through to the EPG data uploaded to RS.

Thanks to Moftot for inspiring me to look at this again and for helping with testing the implementation.
 

martxw

Active Member
Is the HDR-FOX T2 actually capable of recording programmes with extended characters in their title?
I've been trying to test TV Diary, and scheduled recording for 3 programmes with £ in their title. None has recorded properly.
The Red Ring shows that it's trying to record, but I end up with an HMT file and empty NTS and TS files.
Code:
humax# ls -l Handbags*
-rw-r--r-- 1 root root 2072 Mar 12 00:23 Handbags Under ??100_20170311_2300.hmt
-rw-r--r-- 1 root root 0 Mar 11 23:00 Handbags Under ??100_20170311_2300.nts
-rw-r--r-- 1 root root 0 Mar 11 23:00 Handbags Under ??100_20170311_2300.ts
The WebUI dropdown status shows it playing (rather than recording) "Handbags Under \xc2\xa3100_20170311_2300".
The TV Diary log shows
Code:
Error getting details of humaxtv file /mnt/hd2/My Video/Handbags Under \xc2\xa3100_20170311_2300.ts. Ignoring it. [could not read "/mnt/hd2/My Video/Handbags Under \xc2\xa3100_20170311_2300.ts": No such file or directory]
So far they've all been on QVC Style. I'll try recording "The House That £100k Built" on both BBC 2 & BBC 2 HD later on.

If the Humax itself is failing to record I don't think the CF would be responsible.
 
OP
af123

af123

Administrator
Staff member
I don't know. I've set the House that... for both channels here too (via standard remote control method).

The schedule database contains:

Code:
SD: \x10i7The House That \xa3100k Built
HD: \x15The House That \xc2\xa3100k Built

So the SD channel is in ISO6937 and HD is UTF-8.

The SD channel does not display properly in the webif schedule - I've just pushed a fix for that: https://github.com/hummypkg/webif/commit/a5fa57709929d7eef44e8f6fd637ba0d170399a3

There is more work to do to properly handle extended characters though, let's see what 1600 brings.
 

hairy_mutley

Active Member
I have never had any problem with recording The House That £100k Built (and others in the £100k series) in HD, only with the way that WebIf/CF features handle the name afterwards.
 
OP
af123

af123

Administrator
Staff member
Both are recording fine here. Both have UTF-8 filenames on disk too.
Code:
humax# ls -bd The\ House\ That\ £100k\ Built/
The\ House\ That\ \302\243100k\ Built//
humax# ls -b The\ House\ That\ £100k\ Built/
The\ House\ That\ \302\243100k\ Built_20170312_1559(1).hmt
The\ House\ That\ \302\243100k\ Built_20170312_1559(1).nts
The\ House\ That\ \302\243100k\ Built_20170312_1559(1).ts
The\ House\ That\ \302\243100k\ Built_20170312_1559.hmt
The\ House\ That\ \302\243100k\ Built_20170312_1559.nts
The\ House\ That\ \302\243100k\ Built_20170312_1559.ts
 
OP
af123

af123

Administrator
Staff member
only with the way that WebIf/CF features handle the name afterwards.
That will finally be fixed in the near future!

For package developers, there are some new tools in the custom firmware that aid conversion from ISO6937 to UTF-8. There are now Jim and SQLite3 extensions that add an xconv() function that does the conversion for you.

Example showing both:

Code:
#!/mod/bin/jimsh

source /mod/webif/lib/setup
require hexdump

package require sqlite3
package require xconv

set s "home made p\303at\302e"
hexdump $s

# Via Jim 'xconv' extension:

set t [xconv $s]
hexdump $t

# Via SQLite3 'xconv' extension
set db [sqlite3.open ":memory:"]
$db extension /mod/lib/sql/libqxconv.so
set ret [$db query {select xconv('%s') as xconv} $s]
lassign [lindex $ret 0] x t

hexdump $t

Which produces:

Code:
0000: 68 6f 6d 65 20 6d 61 64 65 20 70 c3 61 74 c2 65  home made p.at.e
0000: 68 6f 6d 65 20 6d 61 64 65 20 70 c3 a2 74 c3 a9  home made p..t..
0000: 68 6f 6d 65 20 6d 61 64 65 20 70 c3 a2 74 c3 a9  home made p..t..

The extension can be loaded from the sqlite3 shell too so you could write a routine to 'update table set column = xconv(column);' in order to perform a batch update. If the string is already UTF-8 then it will not be changed.

Code:
sqlite3 -cmd '.load /mod/lib/sql/libqxconv.so' :memory: "select hex(xconv(x'70c36174c265'))"
70C3A274C3A9
 
Last edited:

martxw

Active Member
That recorded fine for me too.
I need to fix tvdiary_status to look for extended character file paths. But when I watched it live, viewing the /mnt/hd2/Tsr/0.ts file, it was spotted fine.
The synopsis for the SD file still has a blob instead of £, which is the same in the WebUI browse section.

I've scheduled a whole lot more shopping channel shows to see if they keep failing, or whether there was just a bad patch.
 

martxw

Active Member
When the Humax is live viewing the Tsr/0.ts file tvdiary_status knows it's live viewing, so it gets the programme details from the EPG for the current channel.
Nothing actually comes from the 0.ts file.
This does lead to a bug where, if you pause live viewing, TV Diary thinks you're watching the following program, since it's based on time.
 
OP
af123

af123

Administrator
Staff member
The synopsis for the SD file still has a blob instead of £, which is the same in the WebUI browse section.
That's what I fixed in the earlier commit. It will be in the next version. I've also since fixed the 'status' command to handle programmes with extended characters in the filename.
 

martxw

Active Member
Thanks, I'll check that. So I think I just need to fix my handling of filenames.

And I think I can guess why it failed to record. QVC Style is scheduled 24 hours a day, but it's currently showing a card saying it'll be back at 1pm. All my attempted recordings were morning or late evening. So they probably weren't broadcasting, just like the Humax suggested.
 
OP
af123

af123

Administrator
Staff member
Thanks, I'll check that. So I think I just need to fix my handling of filenames.

And I think I can guess why it failed to record. QVC Style is scheduled 24 hours a day, but it's currently showing a card saying it'll be back at 1pm. All my attempted recordings were morning or late evening. So they probably weren't broadcasting, just like the Humax suggested.
Webif 1.4.1 is now out in beta with all of the character set fixes so far. There's an updated rs package with its own fixes too.
I've scheduled "Handbags" for 2AM via RS..
 
OP
af123

af123

Administrator
Staff member
I've scheduled "Handbags" for 2AM via RS..
Failed to record...

Code:
humax# hmt Handbags\ Under\ £100_20170313_0200. | grep Status:
Recording Status: Zero length (Failed: Appears not to have been broadcast)
 

martxw

Active Member
Failed to record...
Ditto. But since I fixed recognising paths with extended characters, TV Diary showed it as playing the program that wasn't recorded. If the file is growing I infer it's being recorded. If it's not growing then I infer it's being played. But in this case it's zero sized and not growing, so I have to add a special case to spot programmes not being broadcast.
Released 0.0.3-9 to fix this.
 
OP
af123

af123

Administrator
Staff member
Ditto. But since I fixed recognising paths with extended characters, TV Diary showed it as playing the program that wasn't recorded. If the file is growing I infer it's being recorded. If it's not growing then I infer it's being played. But in this case it's zero sized and not growing, so I have to add a special case to spot programmes not being broadcast.
Released 0.0.3-9 to fix this.
I've pulled your 'subst' change into the next webif release too, thanks.
 

martxw

Active Member
After recording loads of programmes about handbags, fiancées, fiancés and cafés, I'm confident this is sorted now with the latest webif 1.4.1-2 and tvdiary 0.0.3-11. Cheers.
Still waiting for the first TV show to have an emoji in its title or synopsis:)
 
OP
af123

af123

Administrator
Staff member
Likewise, thanks for the help.
Fancy apostrophe's or quotes are the other thing - not immediately obvious, probably due to cut and pasting from Word with autocorrect enabled.
Just set an episode of "Shed and Buried" to record - the synopsis includes three of the things. I don't expect problems but it occurred to me that I hadn't specifically tested these.

Redruth: The boys are travelling to Redruth, where they're confronted with Roy’s sheds. There’s everything from a model canary-yellow Lamborghini to Harry Potter’s Ford Anglia! (S1 Ep8)

Inconsistent though, the first apostrophe in they're is standard ASCII.
 
Last edited:

Black Hole

May contain traces of nut
You mean a proper apostrophe as opposed to the others being a close-quote character? It's a pity there is a fashion for using close-quote instead of apostrophe, but I admit it looks better on the page.
 
OP
af123

af123

Administrator
Staff member
I go to all this effort to make things display properly and the Humax software gets it wrong!
 
Top