record, replay & [detect-ads] overload

hairy_mutley

Active Member
I have recently started running [detect-ads] (0.2.2-6) on selected folders using sweeper rules. However, it seems that this is causing overloading of the processor and/or the hdd.
This is the scenario; recording on 2 channels and sweeper decrypt & detect-ads (no cropping) seems to be ok.
Throw in a chase play and there is a problem. The play periodically stops, drops out and breaks up.
Later analysis reveals that the part of the recording that was being played was actually OK, it was just the simultaneous 2 * record, play and detect-ads that caused the playback to be corrupt (and not surprisingly, both the recording that would have occurred at the same time were also corrupt).
Looking at the sysmon processor load history, it appears that the decode & detect-ads produced a processor load that was periodically near 100% for about half an hour (to process a 1 hour programme).
Apparently the high processor load combined with the disk throughput for 2 records, 1 play and 1 double speed read-write was too much for the system to cope with.

Obviously, this presents a problem, much as I would like to run detect-ads, it is obviously not feasible to let it run like this. I cannot imagine that using chase processing would be any better (unless someone know otherwise).
Am I the only person seeing this problem, or is there something different about my configuration. Obviously, while recording and playing, I would be quite happy to slow or suspend decode & detect-ads if this were possible. Any suggestions?
 
Is the disk properly aligned? Try the 4kalign diagnostic. I don't use detectads routinely but I have never seen it cause this problem (although the 100% CPU is expected)
 
Are your disk accesses retrying? This would push the load up.
How would I check?

Is the disk properly aligned? Try the 4kalign diagnostic. I don't use detectads routinely but I have never seen it cause this problem (although the 100% CPU is expected)
This is an original 500G disk, I don't think this applies. I think that when the 4kalign diagnostic was introduced, I ran it and it said as much. I can re-run this evening just to make sure.
 
All you can do is run the disk checker and see what it says.

I ran the disk checker within the last month, it reported no problems. SMART status was OK as well. In any case, unless there was a problem with a large proportion of the disk, I would expect retries to be a passing problem that corrected itself when record or replay passed the bad area.
However, I will try to do a new disk check just in case anything has changed.
 
CPU usage in DetectAds has always been a potential concern and since the original version it has used a queue to serialize processing and even when running in chaseget mode with two simultaneous recordings only one recording is actually being processed at any time -so chaseget should not be a larger impact, perhaps less since it is spread over the duration of the recording
However there has never been any attempt to take into account what else is going on at the same time and I think it could be quite hard to do.

I guess from the time taken that you are processing HD files, I always record in SD since I have a poor signal and I can't really notice any significant difference in picture quality.

The CPU hog in the process is ffmpeg which extracts the audio component of the file. I have considered trying to write my own audio extractor to avoid the need to use ffmpeg but have never got a round tuit (been working on another new package instead)
I don't know if the new version 3 ffmpeg is any better - performance isn't listed as an improvement in the blurb :(

A quick google last night suggested using cpulimit with ffmpeg so I might download that package and see what difference it makes.
 
... even when running in chaseget mode with two simultaneous recordings only one recording is actually being processed at any time -so chaseget should not be a larger impact, perhaps less since it is spread over the duration of the recording...
I had overlooked the benefit that chaseget offers in slowing the processing. However, as you say, a second simultaneous recording will be background processed later so still has the potential to cause problems. I might give it a try if I can do so without too much risk of anoying the wife further!
cpulimit sounds interesting, hope that it shows some benefits.
 
However, as you say, a second simultaneous recording will be background processed later so still has the potential to cause problems.
Thats not quite what I said.
The two chaseget processes are run simultaneously, when one process has retrieved all the available video for the recording it is processing it sleeps for 30 seconds to allow more video to accumulate which allows the other process to get the lock and process the available material for its recording. So if you look at the task list there will be two copies of detetectads, chaseget, ffmpeg, silence, nsplice in the system only one set will be active and the others dormant with a periodic switch between the sets
 
Seem to have found a file that detectads cannot process. Overnight it processed several files before getting to this one, then it locked up the box (on but unresponsive), had to power it down. Powered it up again (about 6:30), but having come home from work, I find it locked again. The detectads log shows nothing after reporting "22/03/2016 06:28:03 DA(3109)- ==DETECTADS Chase Run:" this morning. Sysmon is showing no data after 8:30. I have removed the detectads queue file to prevent it from attempting to process the file again.
Any suggestions for diagnostics?
 
Seem to have found a file that detectads cannot process. Overnight it processed several files before getting to this one, then it locked up the box (on but unresponsive), had to power it down. Powered it up again (about 6:30), but having come home from work, I find it locked again. The detectads log shows nothing after reporting "22/03/2016 06:28:03 DA(3109)- ==DETECTADS Chase Run:" this morning. Sysmon is showing no data after 8:30. I have removed the detectads queue file to prevent it from attempting to process the file again.
Any suggestions for diagnostics?
I have seen this before. During the initial chaserun, the original recording (in 'My Video', for example) is copied in sections to a file with '-inp' appended to the filename in the folder '/mod/tmp'. This is processed to give a decrypted '-dec' file in the same location (with sidecars). At the end of the run, the original in 'My Video' is deleted, the '-dec' file is moved to 'My Video' and renamed, and the '-inp' file is deleted. Sometimes it goes wrong during the recording and the process stops writing to the '-dec' file. Then when the recording has finished, the process files ('-inp' and '-dec' are orphaned in '/mod/tmp') and detectads tries to chaserun the already completed recording in My Video. Perhaps because of the orphaned files in '/mod/tmp' this fails and the unit ultimately locks up.
I would navigate to '/mod/tmp' and delete any orphaned process files. If the original in My Video, for example, has been decrypted I would then manually queue it for background processing.
Ideally when the above happens, it would be good if any orphaned files in '/mod/tmp' were automatically deleted. Also, I don't know why it tries to chaserun an already completed recording. In this eventuality, I'd have thought that it would be better to allow the original recording to to be decrypted by the regular Web-If process, then post-processed by detectads.
 
I have seen this before. During the initial chaserun, the original recording (in 'My Video', for example) is copied in sections to a file with '-inp' appended to the filename in the folder '/mod/tmp'. This is processed to give a decrypted '-dec' file in the same location (with sidecars). At the end of the run, the original in 'My Video' is deleted, the '-dec' file is moved to 'My Video' and renamed, and the '-inp' file is deleted. Sometimes it goes wrong during the recording and the process stops writing to the '-dec' file. Then when the recording has finished, the process files ('-inp' and '-dec' are orphaned in '/mod/tmp') and detectads tries to chaserun the already completed recording in My Video. Perhaps because of the orphaned files in '/mod/tmp' this fails and the unit ultimately locks up.
I would navigate to '/mod/tmp' and delete any orphaned process files. If the original in My Video, for example, has been decrypted I would then manually queue it for background processing.
Ideally when the above happens, it would be good if any orphaned files in '/mod/tmp' were automatically deleted. Also, I don't know why it tries to chaserun an already completed recording. In this eventuality, I'd have thought that it would be better to allow the original recording to to be decrypted by the regular Web-If process, then post-processed by detectads.

Thanks for the information and I will look into automatic cleanup of orphaned files

Sometimes, for reasons unknown, decryption fails but succeeds if the file is processed later. To handle this scenario detectads creates a queue entry for later so that if detectads fails the file is already on the queue for reprocessing, if detectads succeeds (normal case) the queue entry is removed.

Chaserun processing is used on completed file if they haven't been decrypted - I don't have autodecrypt turned on because I don't have any need for it.
 
I have compiled the cpulimit program and created a cpulimit package in the repository.

I haven't yet integrated it detectads but you can still try it out without any code change.

from a telnet session enter
Code:
cpulimit -l 50 -e ffmpeg
where the -l value is the cpu % and -e the process name.

Unfortunately this simplistic method wont work for chaseget with simultaneous recordings, it only limits the first program instance found so for the second instance you would need to specify the process id.

Let me know if it seems to reduce the occurrence of dropouts and other faults when the box is busy.

Of course it does slow things down a bit
Some limited testing with a 30min 1gGb HD recording
25% - 19:09
50% - 10:09
75% - 8:04
100%- 7:59
 
I have compiled the cpulimit program and created a cpulimit package in the repository.

I haven't yet integrated it detectads but you can still try it out without any code change.

from a telnet session enter
Code:
cpulimit -l 50 -e ffmpeg
where the -l value is the cpu % and -e the process name.

Unfortunately this simplistic method wont work for chaseget with simultaneous recordings, it only limits the first program instance found so for the second instance you would need to specify the process id.

Let me know if it seems to reduce the occurrence of dropouts and other faults when the box is busy.

Of course it does slow things down a bit
Some limited testing with a 30min 1gGb HD recording
25% - 19:09
50% - 10:09
75% - 8:04
100%- 7:59
What % CPU does ffmpeg use when left to its own devices? 75% and 100% are essentially the same in your table with respect to processing time, and limiting to 50% still only makes it take 25% longer to process. You probably wouldn't notice the effect of a 50% limit in practice based on your data but it may make the system much less prone to lock-ups.
 
Naturally ffmpeg seems to fluctuate between 50 and 80% so I am not surprised that 75% has little effect.

I will probably make the % a detectads setting - chaseget users will be able to use a lower percentage and still have detection complete close to the end of recording.
 
Just for information as I am not sure what we can do about it, but I am finding that cpulimit is reporting "Segmentation fault".
Summarising the last few runs.
It seems that when it finds an ffmpeg to attach to, it reports its process id. When detectads moves on to the next file, cpulimit reports the new id. This it did for several hours covering 7 process id switches. When detectads finishes, cpulimit reports "No process found" every 2 or 3 seconds. I have seen it do between 20 and 50 of these before it segfaults. I have re-started it several times, but each time (in this current run of trials) it seems that with "No process found" it eventually segfaults.
 
Back
Top