1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

[dedup] command line de-duplication

Discussion in 'HD/HDR-FOX T2 Customised Firmware' started by af123, Apr 12, 2012.

  1. af123

    af123 Administrator Staff Member

    I've updated the dedup (command line de-duplication) package. This was one of the very early packages which has long been superseded by the web interface de-duplication for most people. It's bothered me for a while that the logic used by the two was different so this update unifies that. They both now use the same backend modules for the logic so will stay in step.

    I use the command line tool to automatically batch process recordings as they are completed - I'm planning to roll that up into an auto-dedup package when I get some time.
     
  2. rpb424

    rpb424 Member

    Hi af123,

    Thanks for the update. I thought I'd mention a couple of modifications I've made to my running version of this, in case they prove useful....

    1. Another common prefix to remove - 'CBBC.', this often appears on episodes of Shaun the Sheep recorded in the morning that are also shown at the same time on the CBBC channel.

    2. I added another line to process.jim to remove question marks from the file name as well as the other special characters, since rsync does not generally like question marks and usually fails to transfer the files.

    Code:
        # Escape special characters to create the filename.
        regsub -all -- {[\/ &]} $syn "_" fn
        regsub -all -- {[?]} $fn "" fn
    I may not have done this the absolute best way, but it works as far as I can tell. Adding the question mark into the first line ends up in it being replaced by an underscore, which looks odd if this is the last character in the name.

    I still also use a modified copy of the old bash script version (/mod/bin/dedup) periodically in a crontab. I pass a second parameter (in addition to '-yes') into it telling it which folder to process. If I can also figure out how to get it to remove questionmarks I'll be onto a winner, but its a steep learning curve!

    Cheers
     
  3. af123

    af123 Administrator Staff Member

    Thanks for that, I'll add in your changes to the next version. I expect the prefix list will grow over time and need changing whenever there is a staff change at any channel!

    This new script will already take directory names as arguments - it just defaults to the current directory if none are provided, so you should be able to use this in your cron entries if you want. The only issue at the moment is that it will rename things which are still recording (as will the shell script version) - I'll fix that in the next update though.
     
  4. af123

    af123 Administrator Staff Member

    The webif update I just pushed out has the updates to dedup in it. New CBBC prefix and removal of ? characters from filenames.