I've carried out further tests, and captured the results in the attached updated spreadsheet. I hope that you can read it, and that it's obvious. There are three types of entry: those produced directly by the system which are included, for reference; those modified by different versions of the custom software over the years; those generated by Raydon's utilities. Please let me know if you have any questions.
I'll try to answer as many of your points as I can; apologies for the resultant length of the post.
I'm curious how the hmt utility (which provides a decode of the .hmt file in human-readable form) handles these variations, in particular whether it mimics what's seen on-screen. It should, and if it doesn't it needs updating.
If the firmware history includes an alteration of the start index for the display title from 017F to 0180, how does the new firmware interoperate with old recordings? If the start index is actually variable, how is it indicated? If the index has changed, have the utilities (such as rename) been updated in accordance?
The display function of
hmt seems to handle them without difficulty. The string encoding problem is relatively invisible, unless one goes outside the ASCII character set. However, both bottletop and /df are looking into this. I haven't yet done any tests on using
hmt to
modify the contents of the file.
The note that I attached to Raydon's hmt map in the Wiki many years ago, suggests that the offset 017F/0180 problem went back even further than that. It also suggests that af123 might have changed
hmt the utility to accommodate the ambiguity.
As a test, I moved the file name in an original recording from offset 0180 to 0178 (sic). The Webif
rename routine seemed quite happy to display the correct name, rename it, and store it back at 0178. I speculated earlier that it might be just working on the path and file name fields as a concatenated string, but I don't know.
There may have been slightly different HMT versions in different OEM FW versions.
All the differences that I have found arise from various aspects of the custom firmware. The only anomaly I've ever encountered in hmt files produced by the the Humax firmware itself is in the encoding of the text fields in the iPlate. For some reason, SD files use ISO 6937, whilst HD recordings use utf8. Raydon noted this in his map of the hmt format - but...
But what happens if ITitle contains characters that 6937 can't cope with? Sidecar gets it wrong that's what. Raydon's .HMT format analysis is incomplete (shock horror, he's never wrong, except when he is!) and so is his utility.
In fact, Raydon developed two mechanisms for generating hmt files:
sidecar as part of the Webif and
AVHDR-T2 as an offline utility. They disagree with each other both with regard to the offset 017F/0180 problem, and in the use of encoding standards. (examples attached). I apologise for not being able to persuade him to look into this subject. But it didn't seem to affect anybody else at the time. It's only the sort problem that has caused me to look at it again.
Why does crop have any input to this?
And the nicesplice stuff is also wrong, using the wrong offsets (0x17F/0x180).
I thought at first that
crop that was causing the the 017F/0180 problem, but it also happens with
rename without
crop. And history shows that
nicesplice is also involved somewhere. Perhaps someone who understands the structure of the custom software better than I do will know if there's a common source to the problem.
Tests with
hmt +settitle=x
Code:
hmt "+settitle=$(printf 'title string with funny characters')" "$ts_name"
Let's look at Title (0x29A) and ITitle (0x516).
- x is ISO6937 containing "é" == 0xc265, no tag: Title displays e for é, ITitle correct.
- x is ISO6937 containing "é" == 0xc265, tag 0x106937: Title gets "i7" prefix and e for é, ITitle correct.
- x is UTF-8 containing "é" == 0xc3a9, no tag: Title displays OK, ITitle corrupted.
- x is UTF-8 containing "é" == 0xc3a9, tag \x15: Title displays OK, ITitle displays OK.
- x is ISO8859-1,15 containing "é" == 0xe9, no tag: Title loses é, ITitle has Ø for é.
- x is ISO8859-15 containing "é" == 0xe9, tag \x0b: Title loses é, ITitle correct.
- x is ISO8859-1 containing "é" == 0xe9, tag 0x100001: Title loses é, ITitle missing.
- x is UCS-2 containing "é" == 0x00e9, tag \x0b: Title loses é, ITitle corrupted.
The HD Fox-T2 CFW3.13 settop program sets the Title with no tag and and the ITitle and Synopsis with tag 0x106937.
In the Glums test archive, the Title and Synopsis are untagged and the Ititle has tag 0x15. These may have been processed by WebIf in some way. Also, WTF is going on in the pathname field (0x80) where the é is encoded as 0xcc82e28098?
So apparently the only ways to have "funny characters" in the Title and ITitle using the same encoding for both are:
- UTF-8 with no tag in Title, with 0x15 tag in ITitle
- UTF-8 with 0x15 tag in Title and ITitle.
Neither is what the system creates by default.
To make sorting work properly, perhaps we should omit the tag in Title when updating a .hmt.
In passing, the box OS theoretically understands only ASCII, but it can't tell that from UTF-8. The
telnet executable knows how to display UTF-8. Things like
vi's cursor can get confused by multibyte character encodings.
I hope that the attached gives you a clear view of what the system does by default - specifically which fields are prefix-less, which have a utf8 prefix, and which have an ISO 6937 prefix. It also shows the ways in which some of our custom packages differ from this convention. Sorry, but I can't answer your question about the pathname!
This is all quite dispiriting.
Indeed so. But it's also actually quite an encouraging sign. I hope that the attached spreadsheet gives at least some tentative reassurance that a) we know what encoding the base system uses in all the relevant text fields; b) apart from the sort problem with the title field (offset 029A), we know that both the underlying and the custom software is quite tolerant of errors. That gives me some confidence that these various problems might be remedied without undue difficulty, and without undue compatibility issues.
But that depends on whether we think it's a worthwhile goal.