Is Dedup process working properly??

#1
Hi All,

I went to use the de-duplicate method for the first time in a folder of 57 files (The A Team). The results said that 45 were duplicates. However when I copied the urls from the browser into excel and then extracted the url hyperlink screen tips out and then checked for duplicate I only found 23. Before I put my faith in the dedup process, please could someone confirm that it isn't going to delete the 45 files but half of them (which is closer to the 23 I checked out manually?)

One thing is that in the results of dedup it only reports back 40 characters of the description. The description is the thing that is key which determines whether programs are duplicated and often have the Sereis and Episode number on. Does Dedup only look at the first 40 character of the program description or does it look at the entire description?

thanks

Rodp

pasted pic below - hopefully that comes through ok and shows the cutoff at 40 characters.
upload_2017-11-1_22-52-58.png

upload_2017-11-1_22-58-43.png
 

Ezra Pound

Well-Known Member
#2
I'm guessing that there is a 40 character limit for a file name so the 'proposed file name' has been truncated, it's just a question of which 40 it picks
 
OP
OP
R

rodp

Member
#3
I see, is there a way to make it ensure it picks out Sx Ep xx style text? A bit of regex?

I'm not sure it's finding the correct duplicates at the moment as it's not looking at the whole text. I've attached the full info in a spreadsheet showing the difference that dedup finds vs the manual way. I've put an x by the ones that are duplicates.

Thanks

Rodp
 

Attachments

af123

Administrator
Staff member
#4
If you use theTVDB integration, it will do a much better job. Just tell it that it's the A team.

upload_2017-11-2_14-15-55.png
 
OP
OP
R

rodp

Member
#5
Thanks af123 - that helps defo. Is there a way to keep the label 'The A-Team' in still but add the other stuff when you do a dedup?

Thanks

Rodp
 
OP
OP
R
#6
Hi All, just wanted to pick this up again. Will de-duplicate (with TVDB setup) find duplicate entries and choose the longest one to keep or does it just work on a first come first served basis?

I ask this because I just noticed my Humax had hung and so wasn't recording. It was just at the start of a program so I quickly recycled the power and manually set the program recording (via webif remote). So I've missed a couple of minutes from the beginning, however the same episode might be on later in the week and so I'd like to know if dedup would pick the longest of the two to keep?

Thanks

Rodp
 
Top