HomeToolsLip Sync API

Frame-accurate syncAsync REST + webhooksUp to 4K outputIdempotent retries
Powering lip-sync workflows for 900+ engineering teams

Lip Sync API.

Lip-sync as a service. Drop a face URL, drop an audio URL, get a frame-accurate lip-synced MP4 back โ€” sub-minute latency, async webhooks.

Sample request
Output
POST /v1/lipsync
Authorization: Bearer sk_live_***

{
  "face_url": "https://cdn.example.com/face.jpg",
  "audio_url": "https://cdn.example.com/voice.mp3",
  "format": "16-9",
  "resolution": "1080p"
}

โ†’ 202 Accepted
{ "job_id": "ls_8a2...", "webhook": "..." }
Async REST ยท sub-minute latency ยท idempotent
118k lip-sync renders served last 30 days
โ€” Output Example
โ–ธ preview9:16 ยท 1080p
00:00 / 00:45โ–ธ
โ€” Endpoints

Lip-sync any face to any audio.

โ€” Sample request
endpoint ยท 1/3
Endpoint
112 params
POST /v1/lipsync with { face_url, audio_url }. Single photo + audio โ†’ MP4 with photoreal lip movement. Up to 4K.
AuthBearer API key
LimitsTiered per plan
Endpoint
POST /v1/lipsync
REST ยท Async
โ€” Webhook payload
200 OK
โ— Live9:16 ยท 1080p
Webhook response
slot ยท webhook
00:00 / 00:45โ–ธ
โ€” How it works

From request to lip-synced render in 3 simple steps.

Step 1
API key dashboard

Get an API key

Sign up, generate a live key in the dashboard. Sandbox keys for development.

Step 2
Endpoint with face + audio inputs

POST face + audio URLs

Send signed URLs to face image/video and audio. The API extracts the face model and the phoneme timing.

Step 3
Webhook payload preview

Receive webhook

On completion, we POST a signed MP4 URL to your webhook. Or poll the job endpoint until done.

โ€” Docs

How to dub 100k videos via API without breaking the bank?

From auth to webhook handler with code samples in TypeScript and Python.

โ–ธ Docs ยท 16:9

I dubbed 10,000 product demos via API in 4 hours (walkthrough).

โ€” Who it's for

Built for engineering teams.

Localization

Localization platforms

Offer video dubbing as a service via API. Frame-accurate, multilingual, at scale.

L&D

L&D platforms

Re-sync narrator audio across course updates. Same instructor, new lines, no re-shoot.

Media

Media & news

Auto-dub news clips for international audiences. Same anchor, every language, every clip.

SaaS

Personalized media SaaS

Generate personalized lip-synced videos at scale โ€” sales outreach, onboarding, transactional.

โ€” Comparison

DIY lip-sync vs ClipNova Lip Sync API.

Building it yourself means months of ML infra. ClipNova ships it behind a single REST endpoint.

Feature
Lip Sync API
DIY infra
Setup
One API key, one endpoint
Stand up GPU farm + models
Time to first sync
Minutes
Months of ML work
Quality
Frame-accurate, photoreal
Hire ML team to match
Idempotency
Built in
Build yourself
Compliance
SOC 2 + EU residency
Audit yourself
โ€” Use Cases

See what teams build with it.

Production deployments across categories.

Video dubbing at scale.

A media company dubs 10,000 news clips per week into 8 languages. Same anchor, same energy, every language.

  • Batch endpoints
  • 8 languages per pass
  • Anchor consistency preserved
  • Webhook on each clip
16:9
Drop example here
slot ยท dub-api

Personalized sales outreach.

A SaaS sends every prospect a lip-synced video pitch from the founder, personalized to their company.

  • Per-prospect rendering
  • Founder face + cloned voice
  • Sub-minute latency
  • Audit logs
16:9
Drop example here
slot ยท outreach-api

L&D narrator updates.

An LMS pushes script updates to existing lessons. Same narrator, new lines, no re-shoot โ€” just a re-sync.

  • Audio-only updates
  • Visual continuity preserved
  • Version control
  • Bulk endpoints
16:9
Drop example here
slot ยท lms-api
โ€” FAQs

Frequently asked.

What is the Lip Sync API?
A REST endpoint that takes a face URL and an audio URL, and returns a frame-accurate lip-synced MP4. Designed for high-volume, programmatic use.
What inputs are accepted?
Face: JPG, PNG, MP4, MOV. Audio: MP3, WAV, M4A. Both passed as signed URLs (or uploaded via /uploads endpoint).
Quality compared to ClipNova UI?
Same model, same quality. The API is the same engine that powers the UI tool.
Latency?
Under 2 minutes for a 60-second source video. 4K renders take 4โ€“6 minutes.
Idempotency?
Yes. Every request accepts an Idempotency-Key header. Safe to retry.
Webhook reliability?
Signed payloads, retried with exponential backoff for 24h, full logs in dashboard.
Compliance?
SOC 2 Type II. EU data residency available on enterprise. No training on user inputs.
Pricing?
Per-second of output. Volume discounts at 50k+ renders / month. Free sandbox tier for development.
View complete API docs

Find detailed reference for every endpoint, parameter and webhook

or check our OpenAPI spec optimized for LLMs โ†’
โ€” Tools

Free AI ads tools.

Pick your tool.

See all tools
ClipNova

The fastest way to lip-sync via API.

Get an API key

Free sandbox tier