Skip to content

Audio goes out of sync when source files have mismatched audio track lengths #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
IndrekHaav opened this issue Apr 14, 2025 · 0 comments

Comments

@IndrekHaav
Copy link

I've been testing out Editly to concatenate various home videos, with transitions and subtitles. Overall it's working well, but I discovered that eventually audio will go out of sync. #117 suggests using detached-audio layers as a workaround, and that works, but is cumbersome because I basically have to list each video clip twice.

Then I started comparing the audio clips extracted into the temp directory when Editly runs. The first thing I noticed was that the length of the concatenated audio file was different depending on whether or not keepSourceAudio was enabled. If it was, then the file was shorter, in some cases by several seconds; if it wasn't, then the length was correct (i.e. matched the length of the final video).

Then I looked at the individual clips and found that in some cases the extracted audio clip was shorter than the silence clip generated for the same video clip (when disabling keepSourceAudio). This only happened with some video files, and only if there was no cutTo specified. Turns out, in those files the audio track is shorter than the video track. For example:

$ mediainfo --output=JSON SAM_4235.AVI | jq -r '.media.track[] | ."@type", .Duration'
General
20.767
Video
20.767
Audio
20.758
$ ffmpeg -nostdin -i SAM_4235.AVI -t 20.767 -sample_fmt s32 -ar 48000 -map a:0 -c:a flac -y test.flac 2>/dev/null
$ mediainfo --Output="General;%Duration%" test.flac
20758

It's a small difference, but over dozens of files it adds up. It also explains why the problem doesn't occur when keepSourceAudio is disabled - the silence files are generated to the exact same length as the video clip, so the concatenated audio file is also the correct length.

A couple of possible fixes I tested:

  1. Pad the extracted audio to the length of the video:

    $ ffmpeg -nostdin -i SAM_4235.AVI -t 20.767 -sample_fmt s32 -ar 48000 -map a:0 -c:a flac -af apad -shortest -y test.flac 2>/dev/null
    $ mediainfo --Output="General;%Duration%" test.flac
    20767

    This works, but I'm not sure how it might affect files where the audio track is longer than the video track.

  2. Always generate the silence clips and merge them with the extracted audio clips:

    $ ffmpeg -nostdin -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 -sample_fmt s32 -ar 48000 -t 20.767 -c:a flac -y silence.flac 2>/dev/null
    $ mediainfo --Output="General;%Duration%" silence.flac
    20767
    $ ffmpeg -nostdin -i SAM_4235.AVI -i silence.flac -filter_complex "[0:1][1:0] amix=inputs=2:duration=longest[a]" -t 20.767 -sample_fmt s32 -ar 48000 -map "[a]" -c:a flac -y test.flac 2>/dev/null
    $ mediainfo --Output="General;%Duration%" test.flac
    20767

    This also works, but I'm not sure how it might affect files with more than one audio stream. Alternatively, the silence and extracted audio can be merged as a separate step.

I guess another possible solution would be to generate a silence clip that's the full length of the final video and merge the extracted audio clips to it at the correct positions. Or something else I'm not thinking of, I just trial&error'd my way through this with different ffmpeg flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant