Prompted by recent discussions in other threads, I've been working on updating the web interface to use data from TheTVDB.com, initially to improve episode name detection. They provide an API that I've hooked in with appropriate caching and the latest published version of the web interface, 1.2.2-3 has test support for this although it is not enabled by default.
What you will see with webif 1.2.2-3, is a new field in the pop-up window that appears when you click on a recording in the web interface:
Without TVDB integration, this field shows the series and episode number if it has been able to extract it from the synopsis. A lot of the providers have begun to add this information, which is helpful, although they all use a slightly different format and aren't even consistent with that, which isn't.
It also shows the episode name that it extracted from the synopsis (which is the one used by dedup). In this case it hasn't been very successful, because the episode name isn't there to be found.
If I now enable the test TVDB integration using the tvdb/on diagnostic:
Much better for this case. The debug area shows how it found the episode and the synopsis from TheTVDB.com. The number after the series name in the Episode line (82459) is clickable and will take you to the database viewer showing the episode database which has been cached for this series. The cached data is all stored in /mod/var/tvdb/.
Here's an example where it was able to extract the episode name and then used that to find the series and episode information:
And another where it has had to attempt to find the episode using phrase matching between the broadcast and tvdb synopses:
This one is the most problematic and where I need testing and help with the algorithm. There are often shared phrases between the two synopses - "hunt for the missing remains of what is believed" in the above for example but not always. This method currently works very well for some series and fails completely for others.
I'll write up details of the current algorithm in the next post later but if anyone wants to enable this and have a play, please do! It isn't currently used by dedup or sweeper, just by this pop-up box in the web interface.
There is one more case that appears which is when the episode number can be determined but not the series. If this happens then the system uses the episode number to narrow down the list of possibilities prior to applying phrase matching.
What you will see with webif 1.2.2-3, is a new field in the pop-up window that appears when you click on a recording in the web interface:
Without TVDB integration, this field shows the series and episode number if it has been able to extract it from the synopsis. A lot of the providers have begun to add this information, which is helpful, although they all use a slightly different format and aren't even consistent with that, which isn't.
It also shows the episode name that it extracted from the synopsis (which is the one used by dedup). In this case it hasn't been very successful, because the episode name isn't there to be found.
If I now enable the test TVDB integration using the tvdb/on diagnostic:
Much better for this case. The debug area shows how it found the episode and the synopsis from TheTVDB.com. The number after the series name in the Episode line (82459) is clickable and will take you to the database viewer showing the episode database which has been cached for this series. The cached data is all stored in /mod/var/tvdb/.
Here's an example where it was able to extract the episode name and then used that to find the series and episode information:
And another where it has had to attempt to find the episode using phrase matching between the broadcast and tvdb synopses:
This one is the most problematic and where I need testing and help with the algorithm. There are often shared phrases between the two synopses - "hunt for the missing remains of what is believed" in the above for example but not always. This method currently works very well for some series and fails completely for others.
I'll write up details of the current algorithm in the next post later but if anyone wants to enable this and have a play, please do! It isn't currently used by dedup or sweeper, just by this pop-up box in the web interface.
There is one more case that appears which is when the episode number can be determined but not the series. If this happens then the system uses the episode number to narrow down the list of possibilities prior to applying phrase matching.