
A hands-on guide to Wan 2.2 in Promptus with ComfyUI—covering text, image, and video inputs, parameter cheatsheets, onboarding tips, and practical FAQs to help you create professional AI videos faster.
CosyFlows are curated ComfyUI workflows that hide the plumbing but keep the creative control. Under the hood, Wan 2.2 is a latent video diffusion system: it (1) compresses frames into a latent space, (2) denoises over time with a spatio-temporal U-Net guided by your prompt(s)/frames, then (3) decodes to video and optionally post-processes (debanding, slight sharpening, frame pacing).
Two ready-to-run flows:
- (cosy) Wan 2.2 5B Video Generation – fast iteration, great detail at 720p (and draft 1080p). Use for ideation, social, reels, marketing snippets.
- (cosy) Wan 2.2 14B First–Last Frame to Video – maximum fidelity at 1080p, optimized for keyframe-driven storytelling (smooth transitions between two designed frames).
Promptus hosts the compute, so you don’t install nodes or models; you focus on prompts, references, and a few high-impact knobs.
Three creative modes
1. Text → Video
5B, quick ideation; 14B via keyframe prompts if you provide frames
Input: A descriptive prompt (optionally negative prompt).
Mechanics: The model synthesizes a temporally consistent scene. Your words steer content, camera, mood, and motion.
Key controls to dial:
- Duration (s): Shorter = crisper motion & fewer artifacts (e.g., 3–6s).
- FPS: 24–30 for natural motion; higher = smoother but costlier.
- CFG / Guidance: Higher = stick closer to prompt; too high may oversaturate or “lock” weird details. Start ~5–7.
- Steps / Sampler: More steps = more detail/coherence (diminishing returns past a point).
- Seed: Lock it to make comparable variations; change to explore.
Prompt formula (works great):[Subject] + [Action/Motion] + [Camera] + [Lighting] + [Style/Medium] + [Era/Lens] + [Color/Grade] + [Mood/Adjectives] + quality tags (e.g., film grain, high detail) + NEGATIVE: [unwanted stuff]
Example:
“golden retriever splashing through shallow lake, handheld medium shot, backlit sunset, cinematic color grade, gentle bokeh, natural film grain, warm tones — NEGATIVE: text overlays, watermark, motion blur, double faces”
2. Image → Video
5B best; 14B if you treat the still as a “first frame”
Input: One reference image (style/subject).
Mechanics: The image anchors structure & style; diffusion invents plausible motion around it.
Key controls to dial:
- Init Strength / Denoise Strength (often called “strength”):
- Lower (~0.3–0.45) = preserve more of your image (gentle parallax, breathing, small camera moves).
- Higher (~0.5–0.65) = allow new content/motion; risk drifting off-style.
- Motion Presets (if available) or simple prompt verbs: “slow dolly-in”, “subtle breeze”, “light camera sway”.
- Duration / FPS as above.
Tip: Add a motion description (“soft wind moving grass; camera dolly left 10%”) so the model adds believable dynamics rather than hallucinating large actions.
3. Video → Video
5B for speed; 14B for upscale polish via two keyframes or short chunks
Input: A source clip (live-action, 3D render, or a plain draft).
Mechanics: The model stylizes or modifies the input while preserving core motion.
Key controls to dial:
- Denoise Strength:
- 0.35–0.5 = keep structure/motion, add style (best for brand consistency).
- 0.5–0.65 = allow bigger redesigns (costs fidelity to original).
- Style Prompt: Be explicit about medium (cel animation, oil paint, photoreal), grade, lens, era.
- Negative Prompt: “no text, no extra logos, no heavy blur, no jitter”.
- Frame Rate Match: Matching your source fps reduces jitter.
Pro move: Feed a clean, contrasty source with consistent exposure. Garbage in = flicker out.
14B First–Last Frame
This flow shines when you upload two art-directed frames (first & last) and describe the transition:
- Frames: 1920×1080 PNG/JPG with consistent grade (white balance, contrast).
- Transition Prompt: Describe what changes over time (lighting, weather, pose, camera path).
- Frame Rate: 24fps is a great baseline for cinematic pacing.
- Duration: 3–8s sequences tend to look most “premium” and coherent.
Example brief:
First frame: “Forest at dawn” → Last frame: “Same forest at dusk”
Prompt: “sun rises then warms to golden hour; slow crane-up, leaves rustle lightly; cinematic”
High-impact parameter cheatsheet
- Prompt / Negative Prompt → semantic steering & guardrails.
- Seed → repeatability; lock to compare tweaks apples-to-apples.
- CFG (Guidance Scale) → prompt adherence vs. freedom (start 5–7).
- Steps → detail/coherence (start mid; go higher if mushy frames).
- Strength (init/denoise) → how much to deviate from input image/video.
- Duration & FPS → total frames; affects motion smoothness and artifact risk.
- Resolution → 720p for drafts; 1080p for finals (esp. 14B).
- Motion/Camera Hints → dolly, pan, tilt, zoom, parallax—small numbers feel real.
Proven recipes
- Draft fast, finish premium:
5B @ 720p (short clips) → iterate prompts → lock seed → 14B @ 1080p with tight transition description (if using keyframes).
- Cohesive brand style (video→video):
Keep denoise ~0.4–0.5, strong style prompt (“clean commercial look, soft key light, neutral backgrounds”), and negative (“no extra logos, no vignette”).
- Image→video parallax loop:
One hero still + prompt “subtle camera push-in, shallow depth of field, gentle hair movement”—strength ~0.4 to preserve identity.
Troubleshooting
- Flicker/“texture swim”: Shorten duration; increase steps slightly; add “stable textures, no flicker” to negative; reduce strength.
- Faces/hands drift: Tighten prompt (“single subject, clean face geometry”), reduce strength, raise steps; try a new seed.
- Motion too wild: Lower strength; add explicit “slow” camera/action verbs; drop FPS or duration.
- Blurry frames: Increase steps a bit; try another sampler; ensure 1080p on 14B for finals.
- Color/exposure pops (first–last): Match grading between keyframes; describe lighting evolution clearly.
Quick start checklist
- Pick Flow: 5B for drafts / 14B for keyframed finals.
- Write the prompt: subject + action + camera + lighting + style + negative.
- Set basics: 720p/1080p, duration, fps, steps ~mid, CFG ~5–7, lock seed.
- (Image/Video inputs?) Set strength ~0.4–0.55 depending on how much change you want.
- Generate → Review: If off-style, lower strength or add more style words; if off-prompt, raise CFG slightly.
- Finalize: Re-run best take at 1080p (14B) with matched grades and explicit transition description.
When to choose which
- 5B Text→Video: Ideation, social cuts, rapid A/B prompts, storyboarding.
- 5B Image→Video: Photographic parallax “alive stills,” gentle motion logos/packshots.
- 5B Video→Video: Consistent stylization of recorded footage or CG playblasts.
- 14B First–Last: Hero transitions, brand reveal sequences, cinematic micro-stories.