Thanks, that's interesting - I hadn't seen that logic before.
I've seen this happen three times: with Whitechapel (ITV HD) and with 2 documentary series on C4 HD - Stephen Hawking's Brave New World and Tony Robinson's Gods and Monsters. Looking at the contents of the synopsis field of these programmes and comparing it to BBC series - all of which dedup renames quite correctly - it seems that ITV and C4 do not use the field in the same structured way as the BBC. Using the current logic, it seems unlikely that dedup will be able to rename many ITV/C4 series correctly. My linking of the problem to cropping is clearly wrong - as I only crop ITV/C4 programmes, these were the only ones where the glitch showed up and I picked on the wrong cause.
I guess there are 3 options, now:
- persuade the errant TV channels to do things properly, like the BBC (unlikely to succeed...)
- accept that some channels do not provide information in a format that dedup can use, or
- find some alternative logic based on the program name and the broadcast date/time in order to construct series information
I can anticipate some of the problems the last of these might throw up! Anyway, I'll start paying more attention to the way non-BBC channels use the synopsis field and see if anything occurs to me.
EDIT: It does occur to me that when used on 'non-compliant' channels there is a definite risk of dedup mis-identifying duplicates (i.e. where programmes within a series all share the same synopsis). Users need to be alert before hitting the 'Process Folder' button. I've never actually done this, but if you process a folder, are files flagged as duplicates automatically deleted?
I'm a little uneasy about doing all this by parsing the synopsis field unless there is some certainty that there actually are rules about how this field is authored, even within the BBC. Could a change of staff result in a change of grammar - semi-colon instead of colon or whatever? Clearly, if there are rules, they apply only to a given broadcaster, not across the industry.