API Reference

Selective Lipdubbing for Single Actors

Overview

In some cases, you may only want to lipdub a specific part of a video. For example, with personalization, you might want to replace just the part where the video greets a person by name. In this case, performing a full lipdub would be unnecessary when you only need to replace the name.

We've extended the existing LipDub capabilities to support this use-case:

  1. Take your source audio file and replace the regions of the audio you want to replace with the new audio. NOTE: the audio duration must match the total duration of the video.
  2. Upload this new audio as you would for a full lipdub.
  3. Provide the start and end times of the regions you want to replace, and the system will do the rest.

Example Usage

curl --request POST \
     --url https://api.lipdub.ai/v1/shots/shot_id/generate \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "audio_id": "audio_123",
  "output_filename": "my_lipdub.mp4",
  "timecode_ranges": [[0, 10], [20, 30]]
}
'

Timecode ranges should be provided as pairs of [start, end] times. You can specify as many ranges as you need, as long as they don’t exceed the length of the video. Times can be given in seconds or in SMPTE format, but be sure to use the same unit consistently.

Requirements

Add a 4 frame buffer to the start and end timecodes

Since we are only lipdubbing the regions you specify, the start and end timecodes should include a 4-frame buffer around to ensure the lipdubbed parts blend seamlessly with the original video.

Best Practices

Train using Premium or Ultra Model

To ensure the best transition from lipdubbed regions to the original video, we recommend training with at least the Premium Model or Ultra Model. This ensures texture fidelity is maintained when transitioning between the regions

Replaced regions should match the original region in length

Since we are only lipdubbing the regions you specify, make sure the new regions are the same length as the originals. Otherwise, the parts of the video that aren’t lipdubbed may look out of sync.

Normalize Audio Levels

For best results, first normalize the audio levels and then isolate the vocals from the background noise.