Wan 2.2 Animate

Wan 2.2 Animate is a practical way to animate a character image using a reference video. It mirrors the performer’s expressions and movement and can also replace the original person in the video with the animated character. Lighting and color tone are adjusted through a relighting module so the replacement fits the scene. The system builds on the Wan family of video models and adds design choices for animation and replacement tasks.

The method distinguishes between the parts of the input that guide the process and the parts that are generated. Skeleton signals control body motion, while facial features extracted from the source image are used to reenact expressions. This combination gives consistent identity and controllable motion. For replacement, an auxiliary Relighting LoRA applies scene lighting while preserving the character’s appearance. These details help reach stable and high-quality results across different scenes and subjects.

If your goal is to turn a static drawing, mascot, avatar, or product character into a moving subject that follows a real reference video, this approach is a strong fit. If you need to replace a person in a shot with a designed character while keeping the same camera, timing, and scene colors, the same pipeline applies with the replacement path. Both flows use the same core representation so you can move between them with minimal changes.

At a Glance

Primary Task

Animate a character from a single image using a reference video, or replace a person in a video with the animated character.

Key Inputs

Character image, reference video, optional masks or region hints for where generation should occur.

Control Signals

Spatially aligned skeleton motion for body pose and movement; implicit facial features for expression reenactment.

Replacement Mode

Relighting LoRA matches environmental lighting and color tone while preserving appearance consistency.

How It Works

Wan 2.2 Animate separates the problem into guidance signals and generation regions. The input paradigm marks where generation is allowed, and which conditions steer motion and identity. With a shared symbolic representation across animation and replacement, the same system can do both. Body motion comes from skeleton signals aligned to the video frames. Expressions are reenacted using features derived from the source image. The combination ensures coherent motion and a stable character identity across frames.

For environmental fit during replacement, an auxiliary Relighting LoRA adjusts the rendered character to the scene. This keeps the identity consistent while applying the scene’s lighting and color tone. This step matters when scenes contain shadows, highlights, or colored illumination. It reduces mismatches between the character and background, so the character looks like part of the shot.

Under the hood, Wan 2.2 introduces an expert design for different noise levels during denoising. A high-noise expert handles the early steps of generation, and a low-noise expert handles later steps. The switch point is set by a signal-to-noise ratio threshold. This helps the model manage structure early and details later, aiding convergence and quality. A smaller dense model option (TI2V-5B) and a high-compression VAE make practical runs possible on a single consumer GPU for shorter clips.

What You Can Build

Character Animation from a Single Image
Feed a character portrait and a reference video. The output follows head and body motion while preserving the character’s identity. Expressions and timing follow the performer. This is useful for avatars, game character previews, and motion studies.
Character Replacement in Live Footage
Replace the original person in a shot with your designed character. The relighting module adapts to the scene’s illumination. This helps with previs, concept clips, and content where a stylized character needs to act in a real scene.
Expression Studies
Using face features from the source image, the system reenacts expressions from the reference. This helps create consistent talking sequences or expressive reaction shots without manual keyframing.
Pose-Driven Motion Clips
Skeleton guidance enables pose-accurate motion for action beats, dance snippets, and body-language studies. This is suitable for testing motion ideas or producing short clips aligned to choreography or stunt references.

Core Ideas

Unified Inputs

Different tasks share one representation that separates reference conditions (skeleton, image features) from generation regions. This keeps the workflow simple when switching between animation and replacement.

Motion and Expression

Skeleton signals control pose and movement. Face features from the source image drive expressions. The mix gives controllability and identity consistency.

Scene Relighting

An auxiliary Relighting LoRA applies scene lighting and color tone while preserving the appearance of the character during replacement.

Practical Guidance

Source Image Quality: Use a clear character image with neutral lighting. Avoid heavy filters. Frontal or near-frontal framing helps with expression consistency.

Reference Video Choice: Pick clips with stable motion and visible pose changes. Side profiles may reduce face guidance. Avoid excessive motion blur.

Masks and Regions: When doing replacement, specify where generation should occur. This reduces artifacts around edges and keeps the background intact.

Length and Resolution: Start with short clips to validate identity and motion. Increase resolution once the motion looks right. For long scenes, break them into shots.

Lighting Consistency: In replacement mode, check highlights and shadow direction. The relighting module adapts tone, but input footage without extreme exposure swings tends to work better.

Feature Highlights

High-Control Motion
Skeleton-based control gives reliable pose following for actions, walks, and expressive gestures. You can steer the outcome using a suitable reference clip.
Expression Reenactment
Implicit facial features guide expressions, lip movement, and subtle cues taken from the source image and the performer’s video.
Relighting for Replacement
The auxiliary module adapts the character to the scene’s lighting and color tone, helping the replacement fit the shot.
Model Options
A MoE design with experts for different denoising stages, plus a compact TI2V-5B option with a high-compression VAE for more accessible runs.

Step-by-Step: From Image to Animated Clip

Prepare inputs: Choose a character image and a reference video with clear motion. Keep the character’s face visible when expressions matter.
Mark regions (optional): For replacement, define where generation should occur. For pure animation, the full frame may be generated.
Configure motion control: Use skeleton guidance aligned to the reference frames.
Run generation: Start with moderate resolution. Review identity, motion, and expression.
Refine and scale: Adjust inputs, then raise resolution and length as needed.

Wan 2.2 Animate

At a Glance

Primary Task

Key Inputs

Control Signals

Replacement Mode

How It Works

What You Can Build

Character Animation from a Single Image

Character Replacement in Live Footage

Expression Studies

Pose-Driven Motion Clips

Core Ideas

Unified Inputs

Motion and Expression

Scene Relighting

Practical Guidance

Feature Highlights

High-Control Motion

Expression Reenactment

Relighting for Replacement

Model Options

Step-by-Step: From Image to Animated Clip

Pros and Considerations

Pros

Considerations

Demo

FAQs

Wan 2.2 Animate

At a Glance

Primary Task

Key Inputs

Control Signals

Replacement Mode

How It Works

What You Can Build

Character Animation from a Single Image

Character Replacement in Live Footage

Expression Studies

Pose-Driven Motion Clips

Core Ideas

Unified Inputs

Motion and Expression

Scene Relighting

Practical Guidance

Feature Highlights

High-Control Motion

Expression Reenactment

Relighting for Replacement

Model Options

Step-by-Step: From Image to Animated Clip

Pros and Considerations

Pros

Considerations

Demo

FAQs

What is Wan 2.2 Animate?

How are motion and expressions controlled?

What helps the character fit into the scene during replacement?

Can I run it on a single consumer GPU?

Do I need to provide masks?

What inputs work best?

Is audio supported?