Bookmark changes between 5.1 and Stereo audio?

florca

Member
Hi - This is well beyond my knowledge of the .ts / .m2ts stream format, but how difficult would it be to detect and bookmark changes between 5.1 and Stereo audio on the BBC / Ch4 / Ch5 HD services? In part this is because each change generates a loud squeal from my aging amplifier, but if easily detected might it be a faster (and more accurate?) way than the current DetectAds silence detection to pinpoint programme start / end on BBC, and ad breaks on Ch4 / Ch5 for a lot of films and drama?
I've also found that Emby gets confused by material with changes in the number of audio channels, so for 5.1 stuff I want to keep I spend quite a bit of time detecting and editing out the Stereo sections with VideoRedo (which also doesn't have a quick way to pinpoint these transitions), so bookmarks could speed this up a lot...
Best wishes and thanks for such a useful set of enhancments for my two Humax HDRs :thumbsup:
Phil
 
ffprobe (which is supplied with ffmpeg) can give you this :-
Code:
Humax1# ffprobe -i "KAISER CHIEFS DD5_1_20110410_0147.ts" -show_streams -select_streams a:0


Input #0, mpegts, from 'KAISER CHIEFS DD5_1_20110410_0147.ts':
  Duration: 00:04:19.81, start: 22892.639744, bitrate: 11571 kb/s
  Program 17472
    Metadata:
      service_name    : BBC HD
      service_provider: BBC
    Stream #0:0[0x65]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt709, top first), 1440x1080 [SAR 4:3 DAR 16:9], 25 fps, 50 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x66](eng): Audio: aac_latm (LC) ([17][0][0][0] / 0x0011), 48000 Hz, 5.1, fltp
    Stream #0:2[0x6a](eng): Audio: aac_latm (HE-AACv2) ([17][0][0][0] / 0x0011), 48000 Hz, stereo, fltp (visual impaired) (descriptions) (dependent)
    Stream #0:3[0x69](eng): Subtitle: dvb_subtitle ([6][0][0][0] / 0x0006)
    Stream #0:4[0x96]: Unknown: none ([5][0][0][0] / 0x0005)
    Stream #0:5[0x6e]: Unknown: none ([11][0][0][0] / 0x000B)
    Stream #0:6[0x6f]: Unknown: none ([11][0][0][0] / 0x000B)
    Stream #0:7[0x82]: Unknown: none ([11][0][0][0] / 0x000B)
  Program 17540
    Metadata:
      service_name    : BBC One HD
      service_provider: BBC
  Program 17604
    Metadata:
      service_name    : ITV1 HD
      service_provider: ITV
  Program 17664
    Metadata:
      service_name    : Channel 4 HD
      service_provider: CHANNEL FOUR
[STREAM]
index=1
codec_name=aac_latm
codec_long_name=AAC LATM (Advanced Audio Coding LATM syntax)
profile=LC
codec_type=audio
codec_time_base=1/48000
codec_tag_string=[17][0][0][0]
codec_tag=0x0011
sample_fmt=fltp
sample_rate=48000
channels=6
channel_layout=5.1
bits_per_sample=0
id=0x66
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/90000
start_pts=2060337577
start_time=22892.639744
duration_ts=23299200
duration=258.880000
bit_rate=N/A
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0

But I can't see anywhere that a change in stream type would be recorded
 
What's "Emby"?

In part this is because each change generates a loud squeal from my aging amplifier
I take it you are so wedded to 5.1 you wouldn't consider turning off surround and just using stereo output.

DetectAds silence detection to pinpoint programme start / end on BBC
Theoretically, you don't need to do that - the AR flag should be sufficient to mark the start of a BBC programme, and there are no ad breaks to detect.

if easily detected might it be a faster (and more accurate?) way than the current DetectAds silence detection
...but not universally available. Silence detection works(ish) for all services - HiDef or StDef, with or without multi-channel audio - and is close to real-time so how fast do you need it?

However, all that aside, a process to detect soundtrack format changes would need to:
  1. Decrypt the stream;
  2. Extract the audio track descriptor;
  3. Recognise format changes;
  4. Log the Δt for each format change;
  5. Add bookmarks to the .hmt after the recording has completed.
Steps 2 and 3 are currently unknowns. Little doubt ffmpeg could be used to perform step 2, but then some knowledge (or observation) would be required to resolve step 3.

Bookmarks only have a resolution of 1s - is that fine enough for what you have in mind?

You could kick the process off by experimenting with ffmpeg and a .ts, to see whether you can extract something that looks like an "audio track descriptor" we can examine and further extract Δt from.
 
ffprobe (which is supplied with ffmpeg) can give you this :-
Yes... I've tried some experiments with "ffprobe -read_intervals" which will confirm whether there is multi-channel audio present at (say..) 5 mins into a recording but there doesn't seem to be a straightforward way to detect and pinpoint transitions using it or ffmpeg (or vlc, which I've been using to re-mux without transcoding to mp4 / ADTS audio).
What's "Emby"?
Similar but I think better than Plex - It's a multi-client library application which I have running on a headless Pi4b server where I store all of the media I want to keep. The presentation and metadata support is excellent and it has a very good native client for my LG TV. I use the Humax (only one in service at the moment) for recording as it's far more reliable than anything else I've found, but edit and move anything I want to keep long-term onto the Emby server. More info at emby.media (not allowed to post external links yet... )
the AR flag should be sufficient to mark the start of a BBC programme
The AR flag is great and works really well on the Humax to capture the broad start and end, but usually starts at the beginning of the ident with Stereo audio, which then changes after a few seconds or so to 5.1 (with the aforementioned squeal) for multi-channel content. To be honest the BBC content is easily edited, but from my observations with VideoRedo edits using DetectAds bookmark timings, the 5.1/Stereo/5.1 transitions provide slightly more accurate cut-points than the Silence detection and also give a consistent audio stream.
 
Last edited:
I think I may have stumbled across something of interest, I was playing around with the silence detect filter on ffmpeg using :-
ffmpeg -hide_banner -vn -i "KAISER CHIEFS DD5_1_20110410_0147.ts" -af silencedetect=noise=-30dB:d=0.5 -f null - 2>&1 > silence.txt
and got this from a DD5.1 file (screen dump as It didn't find its way into the text file) :-

silence.jpg
 
Last edited:
That's interesting! I'll try this out on some more samples, but from your example it's not obvious how to derive the position where the transition occurs?
 
That's interesting! I'll try this out on some more samples, but from your example it's not obvious how to derive the position where the transition occurs?
@njm, the original implementer of detectads, did look into the ffmpeg silencedetect filter way back in 2014 but used the silence program instead

I haven't changed the basic detection method since taking over development of the detectads package though I did update silence to support the chaserun processing.
I have wondered whether silence could be changed to use one of the faster to extract sound formats but I am not familiar enough with the details of the various file formats to know how practical that would be.
 
This isn't about an alternative way to detect silences; it's about detecting audio format switching.

Media players commonly get upset by stream format changes: they tend to assume the format assigned at the start of the file are maintained throughout the file (not an unreasonable assumption for a media file, not so good if the source is a live stream).

It's no good looking at media info, likewise it is likely to report the characteristics at the start of the file. What's needed is to look at the info embedded throughout the stream, or in an accompanying control stream (if there is one).
 
from your example it's not obvious how to derive the position where the transition occurs?
In my example there is a time stamp either side of the line in question, so it appeared between 246.215 seconds and 246.741 seconds from the start of the file, however the timestamps may not always be there I suppose
MymsMan : the original implementer of detectads, did look into the ffmpeg silencedetect filter way back in 2014 but used the silence program instead
Apart from any other problems there might be, I think the main problem is how long it takes to run this on the Humax, it's very slow, in fact I didn't wait for it to complete, I used a desktop PC instead
 
Last edited:
I think the main problem is how long it takes to run this on the Humax, it's very slow
I use Avidemux for video editing and, with a selfish hat on, I can live with changes of codec either side of breaks for commercials (just edit out the ads).

The big problem has always been when there is a switch from stereo to 5.1 near the start of a programme. Usually this switch comes after the AR start and if it begins in stereo Avidemux and others treat the whole programme as stereo. The only easy workaround I have found is to crop the original so that it starts in 5.1 . If there is a way to check, say, just the first couple of minutes after AR start (or padding) and bookmark it for cropping that would be a huge help.
 
Last edited:
A bit more web searching turned up this, which looks like it might show a way to quickly parse the file for audio channel changes. Will try some more tests...
Code:
www.reddit.com/r/ffmpeg/comments/g140t0/correcting_stream_data_for_aac_51_audio/
 
I've extracted the interesting/relevant bit from the above:

www.reddit.com/r/ffmpeg/comments/g140t0/correcting_stream_data_for_aac_51_audio/ said:
Firstly, if you demux the AAC audio from the MKV file to a *.latm file with FFMPEG, it keeps the LATM frame headers for the AAC packet in the file.

Code:
ffmpeg -i incorrect-audio-file.mkv -c:a copy incorrect-audio.latm

Then, looking at the created LATM file in a hex editor, and comparing the bytes with the AAC LATM/LOAS spec in the ISO standard (ISO14496-3-2009 - can be found on Google), after a bit of figuring out I worked out the problem.

If the first bit of the fourth byte of each LATM packet is 0, then that packet contains 'AudioSpecificConfig' data. This 'AudioSpecificConfig' data stores the audio type, the sample rate, and the number of channels in the stream. I worked out that this data immediately follows the header in these packets, so occurs in bytes 6 and 7 of the LATM packet, arranged as -

Audio Object (5 bits)
Sample Rate Frequency Index (4 bits)
Number of channels (4 bits)
Other data (that we're not worried about for this) (3 bits)

In the problem audio streams, I saw the following data -

Audio object: 00010 (0x2 - AAC-LC - this is correct)
Sample rate frequency index: 0011 (0x3 - 48kHz - this is also correct)
Number of channels: 0010 (2 - this is why it's reporting as stereo!)

The implication is that a change in the "number of channels" field in the "audiospecificconfig" data can be used as a marker.
 
OK, after a fair bit of learning about ffprobe options I have a way forward which I think will work quite nicely with a bit of scripting. It's not very Humax-centric (all of the testing has been done on a Pi4B) but should be adaptable to any platform with ffmpeg / ffprobe.

Using the following command on a decrypted Humax .m2ts (.ts) file "S01E05 - Episode 5.m2ts" from Channel4 HD with a mix of Stereo and 5.1 content, selected at a PTS timestamp of 2.45 seconds into the programme where I know there's a transition in the next few frames...
Code:
ffprobe -hide_banner -read_intervals 2.45%+#20 -select_streams a:0 -i "S01E05 - Episode 5.m2ts" -show_frames -show_entries frame=pkt_pts_time,channels -of csv=nk=1:p=0  1>adata.txt 2>ainfo.txt
will generate the following adata.txt file:
Code:
2.447556,2
2.468889,2
2.490222,2
2.511556,2
2.532889,2
2.554222,2
2.575556,2
2.596889,2
2.618222,2
2.639556,2
2.660889,2
2.682222,6
2.703556,6
2.724889,6
2.746222,6
2.767556,6
2.788889,6
2.810222,6
2.831556,6
2.852889,6
So I know that the 5.1 (6-channel) content starts at PTS timestamp = 2.682222 seconds.

The PTS timestamp for the first (2-channel) frame in the file is:
Code:
-15.771111,2
which means I have to add 15.771111 seconds to any PTS channel-transition value used for cut points.

Now I have a way to identify the transition points :)

Running the ffprobe command across the whole input file:
Code:
ffprobe -hide_banner -select_streams a:0 -i "S01E05 - Episode 5.m2ts" -show_frames -show_entries frame=pkt_pts_time,channels -of csv=nk=1:p=0  1>adata1.txt 2>ainfo.txt
took c.30seconds and generates a data file of 2.2MB for a one hour programme, which I manually searched for transition points (in this case Start, End and three ad-breaks) to generate the following cut-and-splice script
(NB = 2.682222 + 15.771111 = 18.453333 to identify the start of the first segment):
Code:
ffmpeg -i "S01E05 - Episode 5.m2ts" -c:v copy -c:a copy -ss 18.453333 -to 926.400001 -async 1 cut1.m2ts 1>aout.txt 2>aerr.txt
ffmpeg -i "S01E05 - Episode 5.m2ts" -c:v copy -c:a copy -ss 1170.453333 -to 1798.400001 -async 1 cut2.m2ts 1>>aout.txt 2>>aerr.txt
ffmpeg -i "S01E05 - Episode 5.m2ts" -c:v copy -c:a copy -ss 2042.410667 -to 2399.424 -async 1 cut3.m2ts 1>>aout.txt 2>>aerr.txt
ffmpeg -i "S01E05 - Episode 5.m2ts" -c:v copy -c:a copy -ss 2643.413333 -to 3213.44 -async 1 cut4.m2ts 1>>aout.txt 2>>aerr.txt
ffmpeg -i "concat:cut1.m2ts|cut2.m2ts|cut3.m2ts|cut4.m2ts" -c copy result.m2ts 1>>aout.txt 2>>aerr.txt
which took 2 minutes and produced a clean .m2ts file with consistent 5.1 audio and cut points exactly surrounding the ad-breaks :thumbsup:

final stage for my puposes is to re-mux to .mp4 which works better with my Emby setup. ffmpeg won't do this (it refuses to remux LATM audio to ADTS in an MP4 container) but vlc will, so this command generates a clean .mp4 file with 5.1 ADTS audio and (copied) h.264 1920x1080 video.
Code:
cvlc --no-repeat --no-audio --no-video --no-sout-spu --no-loop -I dummy "result.m2ts" --play-and-exit --sout='#transcode{}:std{access=file,mux=mp4,dst="result.mp4"}'

So... that's proved the principle, now to do a bit of scripting to automate the process...
 
Last edited:
After quite a few "learning opportunities" I have a basic BASH script which works on both my Raspberry Pi and the Humax to remove the Stereo sections from a mixed 5.1 and Stereo .ts (.m2ts) Humax recording. It's my first ever BASH Script (everything else has been hacked Perl, an arcane SNA network management language called NCL and, 40 years ago, IBM 370 Assembler) so there's probably a lot of stylistic howlers, but it seems to work OK for me at any rate. For my use it's designed to be called from another script which converts to .mp4 and adds metadata and Chapter marks, but this will simply overwrite the original .ts file (in Humax .m2ts format so it can be re-Sidecar'd if needed).

For a one hour Channel4 HD 5.1 programme with three Stereo ad-breaks and Stereo Start & End sections it took 28mins to complete on the Humax and 4 mins on a Pi4B
So far I've only run it from within the same directory as the target file, so not sure how it will respond to working across directories, but thought I'd publish now as an early beta in case useful...

* Beware that this will overwrite the original file without retaining a backup, so only test with material you don't value - or take a backup of it first! *

Non-abusive feedback welcome...

Bash:
#!/bin/bash
#------------------------------------------------------------------------------------------------------------
# BASH script to determine whether a Humax .ts media file contains 5.1 audio and if so remove any Stereo section(s)
# Input parameters: $1 - Name of .ts media file, with or without the .ts extension
# Output: the same .ts file, cut and spliced to remove any Stereo sections if they exist. Exits with file untouched if no 5.1 audio at +240 seconds
#                  $basename.log file containing STDERR output from ffmpeg / ffprobe and other progress info
#         
# Calls / depends on: ffmpeg / ffprobe to analyse and edit the programme
#                     sed / grep / uniq Linux commands
#                     BASH
#---------------------------------------------------------------------------------------------------------------

# Derive basename of input file
root="${1%.*}"

# Check we have the required files and exit with message to .log file if not found!

[ ! -f "$root".ts ] && \
echo "Required file "$root".ts missing!" >>"$root".log && \
exit 5;

# Look at the number of channels four minutes into the programme and exit if less than three channels of audio

channels=$(ffprobe -hide_banner -loglevel warning -read_intervals 240%+#1 -select_streams a:0 -i "$root".ts -show_frames -show_entries frame=channels -of csv=nk=1:p=0 2>>"$root".log);

if (( "$channels" < "3" )); then

echo "File only has "$channels" audio channels at +4 minutes - exiting with no action"
exit 1

fi ;

echo "File has "$channels" audio channels at +4 minutes - processing file to remove any Stereo sections"
date
ls -l "$root".ts

#
#  We have multi-channel audio!
#
#  Initialise a log file to capture progress and ffmpeg errors


date >"$root".log
echo "File has "$channels" audio channels at +4 minutes - processing file to remove any Stereo sections" >>"$root".log
ls -l "$root".* >>"$root".log


# Initialise the bash file for cutting and splicing work

echo "#!/bin/bash" >"$root".sh

# Generate a .pts file with a list of the number of audio chanels in each frame

ffprobe -hide_banner -loglevel warning -select_streams a:0 -i "$root".ts -show_frames -show_entries frame=pkt_pts_time,channels -of csv=nk=1:p=0  1>"$root".pts 2>>"$root".log;

# Convert comma delimiters to spaces to feed into uniq
sed 's/,/ /g' "$root".pts | \
# Skip the PTS field and grab only the lines with the first occurrance of each channel indicator in a segment
uniq -f 1 | \
# Put the commas back to create a match file for grep
sed 's/ /,/g' >"$root".pty
# Match each change in channel number and couple with the prior line to get end points. Ensure whole-line matching!
grep -w -F -B 1 -f "$root".pty "$root".pts >"$root".ptt
# Remove the group separators (--no-group-separator option not available in Humax BusyBox)
grep , "$root".ptt >"$root".ptu
# We always have 6 decimal places, so convert to integers to allow integer arithmatic
sed 's/\.//g' "$root".ptu >"$root".ptw

# Convert the PTS timestamps to real elapsed seconds

while IFS=, read -r pts chan; do
  if [ -z "$fpts" ]; then
  fpts="$pts"
  echo 00000000,"$chan" >"$root".ptz
  else
  rpts=$(($pts - $fpts))
  echo ""$rpts","$chan"" >>"$root".ptz
  fi
done < "$root".ptw

# Put the decimal points back
sed 's/........$/.&/g' "$root".ptz >"$root".ptx

# Generate the Cut and Splice file

phase="Starts"
concat=concat:
count6=0
count2=0

while IFS=, read -r pts chan; do
  [ $chan -eq 2 ] && ((++count2));
  if [ $phase == "Starts" ]; then
  spts="$pts"
  schan="$chan"
  phase="Ends"
  else
  [ $chan -eq 6 ] && ((++count6)) \
  && echo "ffmpeg -hide_banner -loglevel error -stats -i \""$root".ts\" -c:v copy -c:a copy -ss "$spts" -to "$pts" -async 1 \""$root""$count6".m2ts\" 1>>\""$root".log\" 2>>\""$root".log\"" >>"$root".sh \
  && concat=""$concat"\""$root""$count6".m2ts\"\\|"
  phase="Starts"
  fi
done < "$root".ptx


# If there are no Stereo sections then just clean up and exit, leaving file untouched

  if [ $count2 -eq 0 ]; then
  echo "No Stereo sections found - exiting with no changes to file. See \""$root".log\" for more information"
  echo "Copy of "$root".ptx - the derived list of sections" >>"$root".log
  cat "$root".ptx >>"$root".log
  echo "Copy of "$root".sh - the ffmpeg cut-and-splice list" >>"$root".log
  cat "$root".sh >>"$root".log
  echo "No Stereo sections found - exiting with no changes to file" >>"$root".log
  rm "$root".pt* 
  rm "$root".sh
  exit 0
  fi
 
# If input ends on a 6 channel segment then include this to end
 
  if [ $schan -eq 6 ]; then
  ((++count6))
  echo "ffmpeg -hide_banner -loglevel error -stats -i \""$root".ts\" -c:v copy -c:a copy -ss "$spts" -async 1 \""$root""$count6".m2ts\" 1>>\""$root".log\" 2>>\""$root".log\"" >>"$root".sh
  concat=""$concat"\""$root""$count6".m2ts\"\\|"
  fi

# Concatonate all the segments together...

echo "ffmpeg -hide_banner -loglevel error -stats -i "${concat%??}" -y -c copy \""$root".m2ts\" 1>>\""$root".log\" 2>>\""$root".log\"" >>"$root".sh;

# And go do the edit

bash "$root".sh

# Overwrite the original .ts file with the spliced .m2ts version

  [ -f "$root".m2ts ] && \
  mv -f "$root".m2ts "$root".ts

# Clean up the work files..

rm "$root"?.m2ts

# Write the .sh and .ptx files to log first so that we can confirm what's been asked for....

echo "Copy of "$root".ptx - the derived list of sections" >>"$root".log
cat "$root".ptx >>"$root".log
echo "Copy of "$root".sh - the ffmpeg cut-and-splice list" >>"$root".log
cat "$root".sh >>"$root".log
rm "$root".sh
rm "$root".pt*
# and exit
date >>"$root".log
ls -l "$root".* >>"$root".log
echo "Exiting cleanly - "$root".ts has been cut into "$count6" sections with 5.1 audio and spliced" >>"$root".log
date
ls -l "$root".ts
echo "Exiting cleanly - \""$root".ts\" has been cut into "$count6" sections with 5.1 audio and spliced. See \""$root".log\" for more information"
exit 0
 
Very interesting.

I am wondering about the possibility of using the possibility of using the ffmpeg copy and concat technique as a replacement for nicespclice on the humax.

Logically your script could be considered as two logical separate portions
  1. Detection of time stamps for cut points (in you case by audio change) but could be by Ad breaks using silence or manual bookmarks
  2. Applying the detected cut points to the recording (currently nicesplice on the humax)
Some of the biggest complaints against nicesplice are audio and visual noise at cut points and failure to maintain the time stamps in the output stream which can lead to playback problems on some players. ffmpeg should be better at smoothing over the joints.

A challenge will be to get it to run in chaserun mode whilst still recording but providing we can pipe stdin into ffprobe and ffmpeg concat it should theoretically be possible, then the next challenge will be whether it can run faster than recording and how much strain it puts on the humax
 
then the next challenge will be whether it can run faster than recording and how much strain it puts on the humax
I could try adding some more date (and top?) commends to the script to break it down a bit more scientifically, but my sense is that the copy / concat part with ffmpeg was relatively quick - the part that took a surprising length of time on the Humax was the full ffprobe scan to generate the audio transition point list. I must admit that one of my challenges was that piping via sdtin seems rather more problematic on the Humax than the Pi, so the slightly crazy collection of .pt* work files came about from sorting out pipes which worked fine on the Pi but were rejected on the Humax - certainly a lesson that all BASH implementations are not created equal!
Does /dev/stdin exist on the Humax for example, or if not then what's it called??

The good thing is that the ffmpeg copy / concat does, as you say, generate smooth transitions, and using the audio channel change as a marker results in pinpoint cut points around Ch4HD adverts in the samples I've tried :)
 
Back
Top