Special Characters, sweeper rename ' instead of '

free30

Member
Hello
I'm using sweeper and TVDB to rename and remove doubles.
This has worked for a long time without issue, thank you!
But now the renaming files with 'special characters' like ' does not work correctly.
So a program called Quagmire's Mom is named as Quagmire's Mom on the webif but in the TV browsers its called Quagmire's Mom.
This is also creating many doubles as in the past it was named correctly and now it keeping a double as the name don't match.

Thank for any advice.
 
Last edited:
It's special characters you have a problem with?

Something is leaving the apostrophe in an HTML-encoded form (') which is displayed as ' in the WebIf but not on screen.

Either it's sent like that by TheTVDB.com, or some processing in the CF side is changing it. However the Jim cgi module that would be the prime suspect in the latter case only changes <>&" and the only recent change to sweeper and webif that could have affected TVDB doesn't seem to be capable of changing the treatment of HTML entities.

If characters encoded as HTML entities are now being received, maybe the HTML processing in /mod/webif/plugin/sweeper/save.jim can be adapted to handle a slightly wider range of them, something like this. This should probably go somewhere in the tvdb::_extract method:
Code:
# Transform some chars that might be sent as HTML entities. 
# Initially &<>, but add some others (sent by TheTVDB.com?).
# Entity names are case-sensitive but HTML5 adds AMP, etc;
# other syntaxes (eg &#xhhhh;) aren't. So use -nocase at the risk
# of transforming an illegal &APOS;, eg.
set data [string map -nocase {
        &amp; &
        &lt; <
        &gt; >
        &apos; "'"
        &#x0027 "'"
        &#39; "'"
        &#37; %
        &#43; +
        &#32; " "
        &#34; "\""
        &quot; "\""
        &#x0022; "\""
        &#63; "?"
        &#38; "&"
        &#35; "#"
 } $data ]
Or this code for proc HtmlDecodeEntity could be adapted.

@af123?
 
Last edited:
Thanks that sounds great.
I guess I am not the only one with this problem.
Is anyone able to implement this? I don't know where to start.
 
Not sure I have the answer, but on reflection maybe the right solution is a class method decodeCharEntities for xml.class that can be called from tvdb.class. I can run that up, I expect.

BUMP @af123

Test patch here.
 
Last edited:
Thanks, for the suggestions. I hope it gets picked up as I cant test your patch without clear instructions.
 
Last edited:
If you are able to use the File Editor in WebIf>Diagnostics, then you could use these instructions, and this would be helpful to confirm that the patch fixes the problem:
  1. Use the File Editor to open the file /mod/webif/lib/xml.class and save a copy, say /mod/webif/lib/xml.class.org ("org" = original)
  2. In another tab or window of your web browser, load the text of the new version; select the entire text and copy it.
  3. In the File Editor tab/window, select the entire text of the original xml.class; paste the new text to replace it.
  4. Save the result as /mod/webif/lib/xml.class.
  5. In case it's important, restart the system in order to clear any caches (or at the command line, service lighttpd restart).
  6. Test your problem TVDB shows; it maywill be necessary to use Change>Clear Series Information and then re-associate the series folder with TheTVDB.
If you think this has all gone horribly wrong, you can restore the "org" version of the changed file, or just reinstall the webif package using the WebIf>Diagnostics>Force reinstall function.
 
Last edited:
Ok, So I followed your fantastically clear instructions, thank you.
I waited for a show to come along to test it, but it looks like it is not working. I thought maybe Webif got updated and removed it but no.
So thanks but even with these changes I still get shows renamed incorrectly. Like " A Picture&#39;s Worth a Thousand"
 
Bother.

My attempt to reproduce is hampered by the absence of the 2 shows you mention from my thetvdb.com search results. A series URL from thetvdb.com would be helpful.

The TVDB interface caches data assiduously: in memory, while WebIf is running on the server, and persistently in the directory /mod/var/tvdb. The series reset in step #6 is definitely required, and in case, as it appears, that doesn't clear the in-memory cache, it would be prudent to repeat step 5 as well. The on-disk cached episode data is refreshed if it's older than 24 hours, and replaced if the TheTVDB.com download is accessible.

If the TVDB field in the Media Details pop-up says "Found episode using cached values" when first opened, it hasn't cleared the cache properly (although refreshing the episode data should be enough).

You might first try running the tvdbreset diagnostic, using Webif>Diagnostics>Run Diagnostic. This clears the on-disk cache, and you shouldn't lose anything if the original series data is still available from TheTVDB.com. In fact you may get better data if the site has been updated; there doesn't seem to be any other way to get the TVDB interface to update its cached data from the site.
 
Hello, I tried looking at this again.
I have followed your instructions completely and I am still getting the same problem.

I am having the problem on The Simpsons for instance and many others. Family Guy and American Dad, seem yo use a lot of special characters in the titles and are good for testing.
Any advise would be great as its a shame that something that was working so well, no longer is.

 
Hello, I tried looking at this again.
I have followed your instructions completely and I am still getting the same problem.
The changes (post #4) that should have fixed this are in webif-1.4.8-10 and sweeper-2.2.3, so please upgrade to those versions (at least), or reinstall them.
I am having the problem on The Simpsons for instance and many others. Family Guy and American Dad, seem yo use a lot of special characters in the titles and are good for testing.
If you could identify a particular show that fails, that would help to focus debugging.
Any advise would be great as its a shame that something that was working so well, no longer is.
...
Do you mean 'was working so well' before I tried it with these shows that have special characters in the series data, or 'was working so well' before some software update? If the latter, more information, please.
 
I reinstalled everything to try and get WebIf 1.4.8-10. I am still using 1.4.8-8. Nothing do seems to change this. But the XML.Class does seem to be updated.
I'm using sweeper 2.2.3.

Family Guy - Quagmire's Mom - S13 E10, is an example for an episode not being correctly named.

Yes it did work before. The last time I can see a file named correctly was Feb 2020.


I have still not been able to test yet after this full reset I have undertaken. I still await a program that needs renaming.
I will let you know if I've flushed the problem. Thanks again for all your help.
 
Web-If 1.4.8-8 is the newest version in the standard package area, to get anything newer you need opkg-beta package installed
 
Last edited:
Wasn't my post clear enough for you?, you seem you to have stopped reading half way through the sentence again
 
You edited it after I'd read it and before I'd responded to it.
Snipe, snipe, snipe, all the time. You really get on my nerves. As does your bizarre punctuation.
 
My last edit to that post was made at 4:35, your post was around 6:20 almost 2 hours after my edit - explain
 
At any rate webif-1.4.8-3 contains the relevant change.

Looking back to February, the only relevant change was that the TVDB integration had to be reworked as a result of site changes. Perhaps that is the problem area, although that was in webif-1.4.7 described by Gitea as "1 year ago" (it has most of a code line to render the date FFS, but that's how it formats it).

For debugging purposes, I need a show that I can record and then sweep. I don't suppose "Family Guy - Quagmire's Mom" is being transmitted any time soon?
 
For debugging purposes, I need a show that I can record and then sweep. I don't suppose "Family Guy - Quagmire's Mom" is being transmitted any time soon?
"The Next Step - It's a..." is on CBBC and CBBC HD.
"You've Been Framed! A-Z of..." on ITV2 and ITV2+1.
" Radio 1's Residency - %" on Radio 1 (% varies).
 
Back
Top