• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.
  • The forum software that supports hummy.tv has been upgraded to XenForo 2.0!

    This is a major upgrade which changes the look and feel of the forum somewhat but brings a host of improvements too. Please bear with us as we continue to tweak things and report any issues or suggestions in Site/Forum Issues.

[dedup] command line de-duplication

af123

Administrator
Staff member
#1
I've updated the dedup (command line de-duplication) package. This was one of the very early packages which has long been superseded by the web interface de-duplication for most people. It's bothered me for a while that the logic used by the two was different so this update unifies that. They both now use the same backend modules for the logic so will stay in step.

I use the command line tool to automatically batch process recordings as they are completed - I'm planning to roll that up into an auto-dedup package when I get some time.
 
#2
Hi af123,

Thanks for the update. I thought I'd mention a couple of modifications I've made to my running version of this, in case they prove useful....

1. Another common prefix to remove - 'CBBC.', this often appears on episodes of Shaun the Sheep recorded in the morning that are also shown at the same time on the CBBC channel.

2. I added another line to process.jim to remove question marks from the file name as well as the other special characters, since rsync does not generally like question marks and usually fails to transfer the files.

Code:
    # Escape special characters to create the filename.
    regsub -all -- {[\/ &]} $syn "_" fn
    regsub -all -- {[?]} $fn "" fn
I may not have done this the absolute best way, but it works as far as I can tell. Adding the question mark into the first line ends up in it being replaced by an underscore, which looks odd if this is the last character in the name.

I still also use a modified copy of the old bash script version (/mod/bin/dedup) periodically in a crontab. I pass a second parameter (in addition to '-yes') into it telling it which folder to process. If I can also figure out how to get it to remove questionmarks I'll be onto a winner, but its a steep learning curve!

Cheers
 
OP
OP
af123

af123

Administrator
Staff member
#3
Thanks for that, I'll add in your changes to the next version. I expect the prefix list will grow over time and need changing whenever there is a staff change at any channel!

This new script will already take directory names as arguments - it just defaults to the current directory if none are provided, so you should be able to use this in your cron entries if you want. The only issue at the moment is that it will rename things which are still recording (as will the shell script version) - I'll fix that in the next update though.
 
OP
OP
af123

af123

Administrator
Staff member
#4
The webif update I just pushed out has the updates to dedup in it. New CBBC prefix and removal of ? characters from filenames.