Build a complete ComfyUI Wan 2.1 image to video workflow from scratch. This guide focuses on the native ComfyUI implementation that works seamlessly with the classic K sampler
In this comprehensive tutorial, we'll build a complete ComfyUI Wan 2.1 image to video workflow from scratch. This guide focuses on the native ComfyUI implementation that works seamlessly with the classic K sampler, making video generation accessible and straightforward for creators at any level.
Prerequisites & Set-up Requirements
Before starting with Wan 2.1, you must update ComfyUI to the latest version. Without this update, the Wan 2.1 nodes won't be available regardless of which workflow you're trying to use.
The workflow requires several specific models that need to be placed in the correct directories within your ComfyUI installation:
Required Models:
- Wan 2.1 Image to Video model (14B FP8E4 version recommended for better performance)
- UMP T5XXL text encoder (similar to Flux's T5XXL but supports multiple languages)
- Wan 2.1 VAE
- Clip Vision H model
Directory Structure:
- Main models go in: ComfyUI/models/diffusion_models/
- Text encoders go in: ComfyUI/models/text_encoders/
- VAE files go in: ComfyUI/models/vae/
- Clip vision models go in: ComfyUI/models/clip_vision/
Building the Core Workflow
Start with the basic foundation by loading your main model. The workflow begins similarly to traditional image generation setups but includes specific video preprocessing components.
Basic Node Setup:
1. Load the Wan 2.1 main model
2. Add positive and negative prompt inputs (colored green and red respectively)
3. Connect the VAE decoder and VAE loader
4. Add the clip loader using the UMP T5XXL model
The key difference from standard workflows is replacing the empty latent image with the "Image to Video" preprocessor node. This node takes information from the left side of your workflow and prepares it for the K sampler.
Video-Specific Components:
- Image to Video preprocessor node
- Clip Vision Encode node
- Clip Vision Loader (using clip vision H)
- Image input for your starting frame
- Video Combine node for final output
Understanding the New Text Encoder
The UMP T5XXL text encoder functions like a large language model, allowing you to write prompts in natural sentences rather than comma-separated tags. This encoder supports multiple languages including Chinese characters and brings back negative prompting capability that wasn't available in some other recent models.
You can describe your video using conversational language while still maintaining precise control over the generation process
CONFIGURING VIDEO PARAMETERS
The workflow includes several important video-specific settings:
Frame Settings:
- Default frame count: 65 frames
- Frame rate: typically 16 FPS (adjustable)
- Resolution: match your input image dimensions when possible
- Batch size: number of video variations to generate
Sampler Configuration:
- DDIM scheduler recommended for video (less stiff than Euler)
- Standard sampling steps apply
- CFG scale works similarly to image generation
PROMPTING TECHNIQUES FOR VIDEO GENERATION
Effective video prompting differs from static image prompting. Focus on describing camera movement, subject behavior, and environmental factors.
Camera Control Examples:
- "Static camera shot" - minimal camera movement
- "Camera pans to close-up" - directed camera movement
- "Tracking shot" - follows subject movement
- "Crash zoom" - rapid zoom to specific element
Subject Direction:
- Describe what you want the subject to do
- Mention environmental interactions
- Specify if elements should remain still
- Detail any repetitive motions like "pulsating" or "breathing"
Environmental Details:
- Describe atmospheric effects like bubbles, smoke, or particles
- Mention lighting changes or effects
- Include background activity or movement
OPTIMIZING PERFORMANCE AND QUALITY
The FP8E4 model version provides a good balance between quality and system requirements. The 14B parameter model at 6.74GB offers reasonable performance on most modern systems while maintaining good output quality.
Performance Tips:
- Start with lower frame counts for testing
- Use the FP8 quantized versions for better performance
- Match input image resolution to avoid unnecessary processing
- Consider batch processing for multiple variations
Quality Considerations:
- Videos longer than 6-8 seconds may experience morphing issues
- Keep important subjects within frame boundaries
- Use consistent lighting in source images
- Consider the relationship between frame rate and video length
TROUBLESHOOTING COMMON ISSUES
When elements don't behave as expected, refine your prompts with more specific language. If the camera moves too much despite "static" instructions, add "remaining still" or similar stabilizing phrases.
For missing environmental effects like bubbles or particles, double-check your spelling and try alternative descriptive terms. The model responds well to detailed descriptions of what should be happening in the scene.
ADVANCED WORKFLOW CUSTOMIZATION
The basic workflow can be extended with additional preprocessing nodes, upscaling components, or multiple video outputs. LoRA integration is possible but works better when applied to source images rather than directly in the video generation process.
Consider creating keyframe sequences outside the video generator and using this workflow to animate between them for longer-form content.
CONCLUSION
ComfyUI Wan 2.1 represents a significant advancement in open-source video generation, offering capabilities that rival closed-source alternatives. The native integration with ComfyUI's node system makes it accessible while providing the flexibility to customize workflows for specific needs.
The combination of natural language prompting, negative prompt support, and the familiar K sampler interface creates a powerful yet approachable video generation system. As open-source video generation continues evolving rapidly, mastering these foundational techniques positions creators to take advantage of future developments.
Call to Action: Level up your team's AI usage—collaborate with Promptus. Be a creator at https://www.promptus.ai
Join our creator newsletter
Stay up-to-date with the creator tips, workflows, models announcements and news.

