Description
I've been testing out Editly to concatenate various home videos, with transitions and subtitles. Overall it's working well, but I discovered that eventually audio will go out of sync. #117 suggests using detached-audio
layers as a workaround, and that works, but is cumbersome because I basically have to list each video clip twice.
Then I started comparing the audio clips extracted into the temp directory when Editly runs. The first thing I noticed was that the length of the concatenated audio file was different depending on whether or not keepSourceAudio
was enabled. If it was, then the file was shorter, in some cases by several seconds; if it wasn't, then the length was correct (i.e. matched the length of the final video).
Then I looked at the individual clips and found that in some cases the extracted audio clip was shorter than the silence clip generated for the same video clip (when disabling keepSourceAudio
). This only happened with some video files, and only if there was no cutTo
specified. Turns out, in those files the audio track is shorter than the video track. For example:
$ mediainfo --output=JSON SAM_4235.AVI | jq -r '.media.track[] | ."@type", .Duration'
General
20.767
Video
20.767
Audio
20.758
$ ffmpeg -nostdin -i SAM_4235.AVI -t 20.767 -sample_fmt s32 -ar 48000 -map a:0 -c:a flac -y test.flac 2>/dev/null
$ mediainfo --Output="General;%Duration%" test.flac
20758
It's a small difference, but over dozens of files it adds up. It also explains why the problem doesn't occur when keepSourceAudio
is disabled - the silence files are generated to the exact same length as the video clip, so the concatenated audio file is also the correct length.
A couple of possible fixes I tested:
-
Pad the extracted audio to the length of the video:
$ ffmpeg -nostdin -i SAM_4235.AVI -t 20.767 -sample_fmt s32 -ar 48000 -map a:0 -c:a flac -af apad -shortest -y test.flac 2>/dev/null $ mediainfo --Output="General;%Duration%" test.flac 20767
This works, but I'm not sure how it might affect files where the audio track is longer than the video track.
-
Always generate the silence clips and merge them with the extracted audio clips:
$ ffmpeg -nostdin -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 -sample_fmt s32 -ar 48000 -t 20.767 -c:a flac -y silence.flac 2>/dev/null $ mediainfo --Output="General;%Duration%" silence.flac 20767 $ ffmpeg -nostdin -i SAM_4235.AVI -i silence.flac -filter_complex "[0:1][1:0] amix=inputs=2:duration=longest[a]" -t 20.767 -sample_fmt s32 -ar 48000 -map "[a]" -c:a flac -y test.flac 2>/dev/null $ mediainfo --Output="General;%Duration%" test.flac 20767
This also works, but I'm not sure how it might affect files with more than one audio stream. Alternatively, the silence and extracted audio can be merged as a separate step.
I guess another possible solution would be to generate a silence clip that's the full length of the final video and merge the extracted audio clips to it at the correct positions. Or something else I'm not thinking of, I just trial&error'd my way through this with different ffmpeg flags.