Special Characters, sweeper rename ' instead of '

Thanks. Its all working for me know.

I'm not sure why. Maybe I needed to flush the cache after clearing series information and before re-selecting it.

Thank you for your hep and patience.
 
That's weird because I was just about to post that I'd reproduced the problem. Of course, as @Luke pointed out, there are lots of series whose name contains a quote.

I recorded "Britain's Greatest Bridges" (C5) as a series. In the series folder I selected "World's Greatest Bridges" series/season 1 (also known as "Britain's Greatest Bridges") from theTVDB.com. Looking at the series title field displayed in the directory banner, it appears that the text is doubly encoded: thus the title whose raw text is "World's Greatest Bridges" and thus should display as the correct name gets transformed into "World's Greatest Bridges" and hence displays as "World's Greatest Bridges". This is more difficult to diagnose than it should be, because Mozilla's Developer Tools insist on presenting the decoded text when showing the page HTML in the Inspector tab.

The series name comes from the /mod/var/tvdb/series.db database file, which in turn is created by running /mod/webif/lib/bin/tvdb (a binary program) on the unzipped series data fetched from theTVDB.com. I would say that this is the point at which the data retrieved from theTVDB.com should be decoded, in which case the subsequent re-encoding of the name field would be correct, but this program belongs, I guess, to @af123.

So meanwhile, the following change prevents the double encoding, and thus corrects the display of the series name:
Code:
--- /mod/webif/html/browse/tvdb/banner.jim.org
+++ /mod/webif/html/browse/tvdb/banner.jim
@@ -15,7 +15,7 @@
        puts "
            <span id=tvdbseriesname title=\"Series ID: $seriesid\">
                <a target=_blank href=\"/db/?db=$seriesid.db\">
-                       [cgi_quote_html [$v get name]]
+                      [$v get name]
                </a>
            </span>
        "
 
OK.
So it worked for one night. Now its back to naming things incorrectly.

I'm going to try your fix. So... I'm changing /mod/webif/html/browse/tvdb/banner.jim
Finding the code
Code:
[cgi_quote_html [$v get name]]

and changing it to
Code:
[$v get name]]

Hope that's right.
Now I'm going to deselect series and clear cache before re-selecting it again.

Thank so much for the help.
 
Last edited:
...
I'm going to try your fix. So... I'm changing /mod/webif/html/browse/tvdb/banner.jim
Finding the code
Code:
[cgi_quote_html [$v get name]]
and changing it to
Code:
[$v get name]]
Hope that's right.
...
You don't want the final ] (which is closing the call to cgi_quote_html), but otherwise, yes
 
Maybe you applied a WebIf update that undid the change?

I scheduled "Schitt's Creek" as a series (never previously scheduled), picked up last night's show (s2e7), synced the folder with theTVDB, and had no "&39;" issues.
 
I am trying to get this working and following your instructions again.

After removing the series selection and clearing the cache and rebooting, I have noticed it is still able to rename the files. So I can not be clearing the cache properly.
I've tried restarting, I've tried running 'service lighttpd restart' in maintenance mode.

Can you offer any more help, as it looks like this is why the problem persists. It only went away before when I removed WebIF, which I'd rather aviod doing again.
 
You can try running the diagnostic named "tvdbreset". This just deletes all the cached series data that has been downloaded from TheTVDB.com (not data that has already been incorporated into series folders).

Then use the "Change Series" button in the series folder display to re-associate the folder with a series from TheTVDB.com.

I just did this with "America's Untold Story" and had success.

What would be nice is a way to edit the "s?/e??" in the episode description without going to Sweeper or manually setting with hmt +setseries= ... [/code].
 
What would be nice is a way to edit the "s?/e??" in the episode description without going to Sweeper or manually setting with hmt +setseries= ... [/code].
You can use the Rename function on Browse Opt+ to change the synopsis and other fields
1609437081464.png
 
Indeed, but in the "Media Details" display you are shown something like "Episode: s?e??/?? name of episode" (where the ?s may have been instantiated as eg "s1e02/4" - is Sweeper the only way of setting the values apart from hmt?) and this isn't brought out into the Rename function.
 
It this another instance where it would be useful to give sweeper an "execute arbitrary command with tokens" action?
 
Indeed, but in the "Media Details" display you are shown something like "Episode: s?e??/?? name of episode" (where the ?s may have been instantiated as eg "s1e02/4" - is Sweeper the only way of setting the values apart from hmt?) and this isn't brought out into the Rename function.
It would be quite useful to add an episode change option to the Rename function (as well as the arbitrary command option for Sweeper)

Episode is not a standard hmt field but one that @af123 squeezed into unused space
 
I had a look at the rename function.

The custom fields get set if they are 0 at the point where the file is listed in a Webif folder display (and so after a rename). Actually, the ones that I wanted to change would have been changed if the episode recogniser had known about "part m of n" as well as the many variants of "Ep. m/n" that it has already been taught. Manually setting the values has a limitation that 0 values will get reassigned from the synopsis text unless that is edited at the same time. Anyhow this is my WIP.
 
I am sorry but I am still having the same problem.

I think I have failed to clear the cache, becuase it carried on naming the files even when I removed the selected series.

I have tried running 'tvdbreset' and 'service lighttpd restart' and restatring.
Now I've even tried removing the WebIF then re-selected the series, and made the change above to '/mod/webif/html/browse/tvdb/banner.jim' yet it still naming things incorrecly.

Any advise, looks like I'm the only one with problems now...
 
Have you tried, as a test, scheduling a previously untouched series with a single quote in its title (examples that are being shown daily include "Schitt's Creek", "How It's Made") for record? Once the first episode has been recorded, sync it with TheTVDB.com and report back.
 
Thank you for your help.

Its still happening. I just set a new series and it did the same thing. See attached image.

I have can confirm I'm using the updated
/mod/webif/lib/xml.class
and
/mod/webif/html/browse/tvdb/banner.jim

Web interface version: 1.4.8-8
Custom firmware version: 3.13 (build 4028)
Humax Version: 1.02.32 (kernel HDR_CFW_3.13)
Loader Version: a7.33


Any other advise?
 

Attachments

  • test.jpg
    test.jpg
    388.3 KB · Views: 7
OK, the test I did was just with the WebIf file browser and not with Sweeper.

Could you repeat that test, ie:
  1. scheduled a series with a ' in the name (examples above), ideally one that you haven't previously scheduled;
  2. after one episode has been recorded, sync the folder with theTVDB;
  3. open the "Media Details" dialogue (OK on the show in WebIf "Browse Media Files");
  4. check that the series name has been populated correctly with ' rather than &#39;.
If that works, please post your Sweeper rules (which are obscured in the image posted above) so that I can test them. You can copy-and-paste them as text from Sweeper "View Config".
 
I have not yet found a new series to try it on.
I did the test on American Dad in a new folder.
As you can see in the images attached. The episodes name only uses &#39 when sweeper is used.
The sweeper rules are the de-dup ones built it to WebIF.

This example is odd as it also uses the wrong episode name, something I haven't noticed before.
In image 1 the synopsis says 'Shell Game: ......' which is the episode name but for sone reason it uses 'Portrait of Francine's Genitals' and then sweeper use &#39.
 

Attachments

  • 1.jpg
    1.jpg
    271.3 KB · Views: 8
  • 2.jpg
    2.jpg
    364.4 KB · Views: 7
  • 3.jpg
    3.jpg
    405.2 KB · Views: 7
I have not yet found a new series to try it on.
I did the test on American Dad in a new folder.
As you can see in the images attached. The episodes name only uses &#39 when sweeper is used.
The sweeper rules are the de-dup ones built it to WebIF.
Good: the episode name in Media Details is initially correct. Somehow it's getting changed incorrectly in Sweeper.

Most programme metadata, such that shown in Media Details, is saved in the accompanying .hmt file. Unusually, the episode name is recomputed each time using a procedure like this:
  • Remove any common prefixes ("New series", etc) from the programme synopsis, and also anything after and including the first ":" .
  • Trim the result to 40 characters and make this the episode name (at this point we may also have some further metadata like series/episode number).
  • If the programme folder has been synced with theTVDB, try to match the programme against each series whose id is linked to the folder.
  • Use the EpisodeName from matching theTVDB <episode> content; otherwise use the previously guessed episode name.
This is what shows in the episode name field of Media Details.

But when the same value is computed by Sweeper the undecoded XML entity is being shown. As far as I can see no web interfaces are used in running Sweeper's automatic processing, so there are no opportunities for incorrect coding-decoding. So the observed behaviour is consistent with the use of the old xml.class (which didn't decode numerically coded XML entities), or some defective version, as this would cause the entity to be shown decoded in the WebIf "Media Details" and not in Sweeper.

Here's the XML text that I just fetched from theTVDB for the relevant episode.
Code:
<Episode>
	<id>5775508</id>
	<Combined_episodenumber>4</Combined_episodenumber>
	<Combined_season>14</Combined_season>
	<DVD_chapter>0</DVD_chapter><DVD_discid></DVD_discid>
	<DVD_episodenumber>4</DVD_episodenumber><DVD_season>13</DVD_season>
	<Director>|Rodney Clouden|</Director>
	<EpImgFlag></EpImgFlag>
	<EpisodeName>Portrait of Francine&#39;s Genitals</EpisodeName>
	<EpisodeNumber>4</EpisodeNumber>
	<FirstAired>2016-11-28</FirstAired><GuestStars></GuestStars>
	<IMDB_ID>tt5780926</IMDB_ID><Language>en</Language>
	<Overview>Stan is embarrassed when a painting of Francine&#39;s genitals, done by a famous artist, is unveiled at the museum. Steve turns to helping people after masturbation is ruined by his mom&#39;s portrait.</Overview>
	<ProductionCode>BAJN04</ProductionCode>
	<Rating>8.8</Rating><RatingCount>104</RatingCount>
	<SeasonNumber>14</SeasonNumber>
	<Writer>|Steve Hely|</Writer>
	<absolute_number>216</absolute_number><airsafter_season>0</airsafter_season>
	<airsbefore_episode>0</airsbefore_episode><airsbefore_season>0</airsbefore_season>
	<filename>episodes/73141/5775508.jpg</filename><lastupdated>1586482773</lastupdated>
	<seasonid>673132</seasonid>
	<seriesid>73141</seriesid>
	<thumb_added>2019-11-13 11:17:27</thumb_added>
	<thumb_height>360</thumb_height><thumb_width>640</thumb_width>
</Episode>
Similar text should be found in your /mod/var/tvdb/73141.xml, which you could inspect, eg with the WebIf Diagnostics>File Editor. To get the Sweeper episode name wrong, this must have been incorrectly parsed to extract "Portrait of Francine&#39;s Genitals" instead of "Portrait of Francine's Genitals".

Let's see that in action. In a telnet or webshell session, type jimsh and Enter, to start the Jim interpreter. Then enter the 4 commands shown after the initial '.' (Jim's prompt; don't type that):
Code:
# jimsh
Welcome to Jim version 0.79
. source /mod/webif/lib/setup
jscss
. require ts.class
. set tt [ts fetch {/media/My Video/American Dad!/American Dad!_20210111_2358.ts}]
<reference.<ts_____>.00000000000000000000>
. $tt episode_name
Portrait of Francine's Genitals
. exit
#
If the result of the command $tt episode_name isn't as shown, that would be very interesting.
This example is odd as it also uses the wrong episode name, something I haven't noticed before.
In image 1 the synopsis says 'Shell Game: ......' which is the episode name but for sone reason it uses 'Portrait of Francine's Genitals' and then sweeper use &#39.
The data from theTVDB.com (as quoted above) disagrees with the transmitted EPG data. One would have to watch the show to know which is correct. The matcher (as described above) believes the series/episode number ahead of the guessed episode name, but you can manually override the episode selection from the TVDB data with the "Change" button.
 
Thanks again for your help.

1. Opening /mod/var/tvdb/73141.xml
This was too big to open in WebIF as were most of the .xml files. I managed to open one of them and it had the incorrectly encoded data in. See attached image 1.

2. Running commands in the Jim interpreter.
As you can see below in 3 the result was incorrectly encoded. It took me a while as I had to rename the file back after I had allowed sweaper to rename it before, not sure if this would change the result. I will try and find a fresh file to try it on.
 

Attachments

  • 1.jpg
    1.jpg
    575.5 KB · Views: 7
  • 2.jpg
    2.jpg
    109.4 KB · Views: 7
  • 3.jpg
    3.jpg
    32.8 KB · Views: 7
Last edited:
Back
Top