[DetectAds] Announcing DetectAds version 2

I had a recording that failed to process (chase run) earlier:
Code:
05/04/2016 18:58:01 RM(4607)-   DETECTADS: Checking /media/My Video/Stargate SG-1/Stargate SG-1_20160405_1858.ts (Pick) for channel exclusion
05/04/2016 18:58:01 RM(4607)-   DETECTADS: Started /media/My Video/Stargate SG-1/Stargate SG-1_20160405_1858.ts for chaserun advert detection
05/04/2016 18:58:01 DA(4639)- ==DETECTADS Chase Run: /media/My Video/Stargate SG-1/Stargate SG-1_20160405_1858.ts
05/04/2016 19:00:01 - CG(6131)- System power on attempted
05/04/2016 20:28:01 DA(18183)- ==DETECTADS Chase Run: /media/My Video/Stargate SG-1/Stargate SG-1_20160405_1858.ts
From the log it seems that after the initial attempt (there was no error message in Web-If), the file was queued for chase run again about half an hour after the recording had finished. The unit then crashed and went into standby about another half hour later. After switching the unit back on, the recording was decrypted by the Web-If autoprocess. Detectads analysis was run successfully on the decrypted file after removing it from the queue and starting the analysis manually. There were no orphaned files in '/mod/tmp'.
 
I had a recording that failed to process (chase run) earlier:
Code:
05/04/2016 18:58:01 RM(4607)-   DETECTADS: Checking /media/My Video/Stargate SG-1/Stargate SG-1_20160405_1858.ts (Pick) for channel exclusion
05/04/2016 18:58:01 RM(4607)-   DETECTADS: Started /media/My Video/Stargate SG-1/Stargate SG-1_20160405_1858.ts for chaserun advert detection
05/04/2016 18:58:01 DA(4639)- ==DETECTADS Chase Run: /media/My Video/Stargate SG-1/Stargate SG-1_20160405_1858.ts
05/04/2016 19:00:01 - CG(6131)- System power on attempted
05/04/2016 20:28:01 DA(18183)- ==DETECTADS Chase Run: /media/My Video/Stargate SG-1/Stargate SG-1_20160405_1858.ts
From the log it seems that after the initial attempt (there was no error message in Web-If), the file was queued for chase run again about half an hour after the recording had finished. The unit then crashed and went into standby about another half hour later. After switching the unit back on, the recording was decrypted by the Web-If autoprocess. Detectads analysis was run successfully on the decrypted file after removing it from the queue and starting the analysis manually. There were no orphaned files in '/mod/tmp'.
I have just uploaded new versions of DetectAds and ChaseGet that attempt to improve the diagnostics when ffmpeg detects problems in the input file, also to prevent the chance of the same file being processed simultaneously and to prevent multiple retry attempts.

M.E.T. can you turn on debugging logging (Settings - General Setting) so that we can see more detail of what is going on before hangs. There should have been no need to remove the file from the queue and running manually.
 
I have just uploaded new versions of DetectAds and ChaseGet that attempt to improve the diagnostics when ffmpeg detects problems in the input file, also to prevent the chance of the same file being processed simultaneously and to prevent multiple retry attempts.

M.E.T. can you turn on debugging logging (Settings - General Setting) so that we can see more detail of what is going on before hangs. There should have been no need to remove the file from the queue and running manually.
I've updated the detectads and chaseget packages and turned on debug logging. Two recordings were fine, a third is ongoing has stopped writing to the '-dec.ts' file. The log file is repeating entries like those below. I will leave it running to see what happens.
Code:
06/04/2016 23:32:06 - CG(25854)- duration: 8643 elapsed: 5469 remaining: 3174
06/04/2016 23:32:06 - CG(25854)-   /mod/tmp/Oblivion_20160406_2200-inp.ts - Not yet indexed, trying helper.
06/04/2016 23:32:09 - CG(25854)-   /mod/tmp/Oblivion_20160406_2200-inp.ts - Can't use helper. retry
06/04/2016 23:32:24 - CG(25854)- duration: 8643 elapsed: 5487 remaining: 3156
06/04/2016 23:32:24 - CG(25854)-   /mod/tmp/Oblivion_20160406_2200-inp.ts - Not yet indexed, trying helper.
06/04/2016 23:32:27 - CG(25854)-   /mod/tmp/Oblivion_20160406_2200-inp.ts - Can't use helper. retry
 
Was auto decrypt processing another recording (check auto.log) while you were getting the retry messages?
Did they eventually stop and detectads complete?
 
Was auto decrypt processing another recording (check auto.log) while you were getting the retry messages?
Did they eventually stop and detectads complete?
Auto decrypt was not processing another recording while the detectads run was ongoing and the retry messages were being logged.
To add to the above post (#226), for the first half an hour or so there were two recordings ongoing and both were being processed by Detectads. The half hour recording was processed correctly, but the longer recording (according to the detectads log) seems to have stopped writing to the '-dec' file just before the shorter recording completed. After this time, the log just consists of retry messages.
When the longer recording finished, nothing happened, with Detectads still posting retry messages to the log file. This limbo situation carried on for about another hour and fifteen minutes, but with the box fully functional. At this point the box locked up and needed to be switched off and back on again (rear switch) to get it working. After doing this, Detectads then worked as expected. The partially written '-dec' file was deleted and the recording was requeued for chaserun processing (the original recording in 'My Video' was not processed by auto decrypt). After the 'chaserun' processing of the recording, the '-inp' files and the original (in 'My Video') were deleted and the processed recording was moved and renamed as expected. I've attached the detectads log below (it is quite long).
I'm fine with the occasional processing failure. In the above situation, the best outcome would have been reached if detectads had stopped trying to write to the '-dec' file much earlier (by the latest at the end of the recording) then had cleaned up the partially written files and restarted the analysis automatically, without the 'limbo' period and without locking up the unit.
 

Attachments

  • detectads-log.txt
    400.6 KB · Views: 3
Eureka, please upgrade to chaseget 0.1.1-4

The key in the log was the line
Code:
06/04/2016 22:34:51 - CG(25854)-   /mod/tmp/Oblivion_20160406_2200-inp.ts - database is locked
That error caused the dlnahelper lock to stay locked so that on future calls it couldn't acquire the lock.

It is a bug that af123 and I thought we had fixed back in July last year but obviously not comprehensively enough.
I don't know why you are experiencing the database locked condition more frequently than others.

It might be worth trying to rebuild the DLNA index (after you have tested this fix work for you).
The system crash is possibly due to a memory leak that is accumulating to a critical point with all the retries but I am guessing and haven't any experience in monitoring storage use on Linux sytems.
 
Eureka, please upgrade to chaseget 0.1.1-4

The key in the log was the line
Code:
06/04/2016 22:34:51 - CG(25854)-   /mod/tmp/Oblivion_20160406_2200-inp.ts - database is locked
That error caused the dlnahelper lock to stay locked so that on future calls it couldn't acquire the lock.

It is a bug that af123 and I thought we had fixed back in July last year but obviously not comprehensively enough.
I don't know why you are experiencing the database locked condition more frequently than others.

It might be worth trying to rebuild the DLNA index (after you have tested this fix work for you).
The system crash is possibly due to a memory leak that is accumulating to a critical point with all the retries but I am guessing and haven't any experience in monitoring storage use on Linux sytems.
Thanks for investigating this. I will rebuild the DLNA index too: I haven't done that for a while. I have several units and for some reason this one runs into problems more often than the others, though I can't find anything wrong with the hardware, and fix-disk reports no problems. If the DLNA database gets locked would briefly putting the unit into standby get it working again? The unit would not shut down completely, but the DLNA server would be turned off until the second on/off signal was sent.
 
Thanks for investigating this. I will rebuild the DLNA index too: I haven't done that for a while. I have several units and for some reason this one runs into problems more often than the others, though I can't find anything wrong with the hardware, and fix-disk reports no problems. If the DLNA database gets locked would briefly putting the unit into standby get it working again? The unit would not shut down completely, but the DLNA server would be turned off until the second on/off signal was sent.

I was a bit unclear in my problem description before, there are actually two different locks involved here.
  • The DLNA database lock which sqlite database uses to ensure updates are applied consistently. This should only be held for a very short time whilst an update is applied so it is normally very rare to actually get the locked response to a DB query which is why I thought your database might be running a bit slower than usual - not that it is permanently locked.
  • The DLNA Helper lock is part of the custom firmware and used to serialize decryption processing and is held for an extended period whilst decryption takes place, the temporary failure of the database access resulted in the failure to release this lock causing the problem
 
The system crash is possibly due to a memory leak that is accumulating to a critical point with all the retries but I am guessing and haven't any experience in monitoring storage use on Linux sytems.
Check using the "free" command periodically. You'll soon spot if something is leaking. I think it's unlikely.
 
I am repeatedly getting this message
Code:
10   08/04/2016 13:08:02 DA(3737)- Cannot acquire exclusive lock /media/My Video/4Alan/_series_ts/Sanctuary/Sanctuary_20160407_2000.ts, terminating.
9    08/04/2016 13:08:01 DA(3737)- ==DETECTADS Chase Run: /media/My Video/4Alan/_series_ts/Sanctuary/Sanctuary_20160407_2000.ts
I have restarted the box (several times), removed the episode from the queue (allowing following DA jobs to run), put it back on the queue and it locks again.
I have just reset the DLNA database, and have seen the error again since.
Any suggestions?
 
This is yet another lock, totally unrelated to the DLNA database, that is intended to prevent two occurrences of detectads processing the same file.
It is possible that the hashing used to create a lock number is not resulting in a unique value and a false conflict.

Could you post the output from the following commands:
Code:
 lsof -i
 pgrep -lf detectads
 pgrep -lf chaseget
 pgrep -lf ffmpeg
 
Could you post the output from the following commands:
Thanks, here goes...
Code:
humax# lsof -i
COMMAND   PID   USER   FD   TYPE DEVICE SIZE NODE NAME
dnsmasq   156 nobody    4u  IPv4    376       UDP *:domain
dnsmasq   156 nobody    5u  IPv4    377       TCP *:domain (LISTEN)
utelnetd  163   root    3u  IPv4    383       TCP *:telnet (LISTEN)
utelnetd  163   root    4u  IPv4 165379       TCP humax:telnet->192.168.0.3:57153 (ESTABLISHED)
humaxtv   288   root   30u  IPv4   3197       TCP *:50001 (LISTEN)
humaxtv   288   root   32u  IPv4   3199       UDP humax:57129
humaxtv   288   root   33u  IPv4   3200       UDP *:1900
humaxtv   288   root   34u  IPv4   3203       TCP *:9000 (LISTEN)
humaxtv   288   root   35u  IPv4   3204       TCP *:9001 (LISTEN)
humaxtv   288   root   40u  IPv4   3210       TCP *:9955 (LISTEN)
lighttpd 1580   root    5u  IPv4   2046       TCP *:www (LISTEN)
lighttpd 1583   root    5u  IPv4   2046       TCP *:www (LISTEN)
lighttpd 1592   root    5u  IPv4   2046       TCP *:www (LISTEN)
portmap  1637 daemon    4u  IPv4   2119       UDP *:sunrpc
portmap  1637 daemon    5u  IPv4   2120       TCP *:sunrpc (LISTEN)
mdnsd    2016   root    6u  IPv4   2854       UDP *:5353
mdnsd    2016   root    7u  IPv4   2855       UDP *:hostmon
betaftpd 2040   root    0u  IPv4   2865       TCP *:ftp (LISTEN)
smbd     2351   root   10u  IPv4   3089       TCP *:netbios-ssn (LISTEN)
nmbd     2417   root    7u  IPv4   3142       UDP *:netbios-ns
nmbd     2417   root    8u  IPv4   3143       UDP *:netbios-dgm
nmbd     2417   root    9u  IPv4   3144       UDP humax:netbios-ns
nmbd     2417   root   10u  IPv4   3145       UDP humax:netbios-dgm
humax# pgrep -lf detectads
humax# pgrep -lf chaseget
humax# pgrep -lf ffmpeg
humax#
 
Is it still failing if you add it to the queue now?
What happens if you rename the program slightly?

I will try to recreate the issue but haven't been able to using your file name so I am not sure what is going on.
 
It was still in the queue and failing.
Had some trouble removing it from the queue, the DetectAds dialog gave me a greyed remove from queue button (no add to queue button); it seems you can't refresh the page, you have to leave it and return to it. Managed to remove it.
Having added it again, it has reported the lock conflict again.
Removed it from the queue, renamed it, added it to queue... it is now running.
Solves my problem, hope it gives you some clues.
 
Processing completed OK.

While I am here, going back to the subject of corrupted recordings that I had when recording 2 watching one and running detectads in the background.
I have been running with a cpulimit of 30 in the hope that it would help. My verdict is that it does not. In some ways it just makes it worse; detect ads takes longer to run, so the problem manifest itself for longer.
I may be wrong, but this suggests to me that the problem comes more from hard disk bandwidth rather than cpu.
(2 HD channels recording, 1 HD channel playing + detectads playing and recording at faster-than-realtime)
As a result, I currently run ALL detectads by manually queuing when the system will not be recording
- initially I would quickly load the sweeper detectads rule at"My Video" and run the rule, but that isn't effective now that the global has been removed from the rule (due to an existing problem with global).

So, it would be nice if ... :)
- detectads only ran when nothing was recording
- the problem could be fixed with the global sweeper rule
 
Processing completed OK.

While I am here, going back to the subject of corrupted recordings that I had when recording 2 watching one and running detectads in the background.
I have been running with a cpulimit of 30 in the hope that it would help. My verdict is that it does not. In some ways it just makes it worse; detect ads takes longer to run, so the problem manifest itself for longer.
I may be wrong, but this suggests to me that the problem comes more from hard disk bandwidth rather than cpu.
(2 HD channels recording, 1 HD channel playing + detectads playing and recording at faster-than-realtime)
As a result, I currently run ALL detectads by manually queuing when the system will not be recording
- initially I would quickly load the sweeper detectads rule at"My Video" and run the rule, but that isn't effective now that the global has been removed from the rule (due to an existing problem with global).

So, it would be nice if ... :)
- detectads only ran when nothing was recording
- the problem could be fixed with the global sweeper rule

I now know what the problem is with my locking scheme but at the moment I have no idea how to fix it :( AFAIK there is currently no exclusive file locking from JIM/TCL.

I think you are right that cpulimit has limited effect when recording HD programs in parallel - any value less than about 45% slows processing to less than real time.

The detectads queue already contains a target start time (used for requeuing), I could make that available via the command line so you could use it for scheduling detectads to run when the system is idle
 
I now know what the problem is with my locking scheme but at the moment I have no idea how to fix it :( AFAIK there is currently no exclusive file locking from JIM/TCL.

Bit of a hack, but you could do a mkdir/rmdir?

i.e. make a dir with a name encoded from the filename, using that dir's existence as the lock. A dir is easier than a file, as mkdir can then both be the taking, and the testing, of the lock.
 
Back
Top