Skip to content
Gian Gomez
Writing

How do I produce AI video ads without burning the budget?.

Gian Gomez headshot

If you want this run for you instead of read about, Dynamite Growth is where engagements get scoped.

Author: Gian Gomezfounder of Dynamite Growth

Published May 26, 2026 · 10 min read

Six rules I extracted from a 10-spot Seedance 2 and GPT Image 2 sprint. Three locked render constraints, one audio register, one continuation workflow, and one comedic format that survived production.

I spent a week stress-testing Seedance 2 and GPT Image 2 across a 10-spot ad slate. The output was a working production stack. The useful output was the six rules I broke first, paid for in garbled phone screens, malformed currency, phantom hands between characters, and a comedic register that read amateur the first three renders. None of those failures show up in a tutorial. They show up when you have a shipping deadline and a credit meter that does not stop. This article is the short version of what survived the sprint. Three locked render rules that prevent the most expensive failure modes. One audio rule that flips the register from UGC to commercial. One workflow rule that ends the practice of regenerating first frames for every shot. One comedic format that the deadpan-narrator tradition already proved out. If you are producing AI video at scale, these are the rules I wish someone had handed me on day one.

Why do phone screens always glitch in AI render?

Direct answer: AI text rendering on phone screens fails reliably across both GPT Image 2 and Seedance 2. UI text, form fields, app icons, button labels all glitch with garbled letters or misshapen elements. The rule is to never show the screen.

The fix has a hierarchy. Best case, the phone is a magic-trigger object, not a screen-display object. Show only the back of the phone. In one of my spots the phone became a bottom-edge cash dispenser. The screen was never in frame and the gag carried itself. Second best, the phone is held in hand with the screen facing the character and the back of the phone facing the camera. Third, phone face-down on a table. Less dynamic but safe.

Whichever path you choose, negative-prompt the failure explicitly. No phone screen visible. No UI shapes, form fields, or buttons. No text rendered on the phone screen. The model will try to hallucinate a UI if you give it room, and the result is a frame you cannot use in a paid ad.

The reason this rule is locked is that it took three full render passes in one spot before I gave up trying to coax a clean UI out of the model. Phone face-down worked. Phone in hand with screen away worked. Phone as a dispenser worked beautifully. Phone with a visible screen never worked once across either model in either version.

Why does US currency render malformed?

Direct answer: photoreal US currency in AI renders produces malformed presidential faces, garbled serial numbers, and off-pattern Federal Reserve design. The model cannot render a clean dollar bill. The fix is to spec stylized non-photoreal bills from the prompt.

Write the prompt as generic green rectangular paper-bill designs with vague bill-like markings. No presidential face detail. No readable serial numbers. No recognizable US Treasury layout. The model can render that cleanly. The result reads as cash without crossing the uncanny line where the viewer notices the face is wrong.

If the stylized bills still malform, fall back to stylized golden coins or abstract green confetti. I used both fallbacks in different spots. Coins for the magic-realism dispenser. Confetti for the celebratory close-out. Both read clean. Both avoided the dollar-bill failure mode entirely.

Same principle applies to any branded paper or official document on screen. Government forms, contracts, credit cards with readable numbers. Anything with fine-grained text on a printed surface fails the same way. Stylize the asset or hide the detail.

Why do multi-character scenes fail?

Direct answer:multi-character AI render is the highest-risk shot in the stack. Hand-touching interactions produce phantom limbs, face-shift mid-render, and character-bleed between figures. The fix is to physically separate the characters or keep one of them off-frame.

The mitigation hierarchy goes safest to riskiest. Glass-pane separation is the safest pattern. Two characters separated by a window or a glass partition. No hand contact, no physics risk between the figures. I used this in a spot where one character stood at a kitchen window and the other stood outside. The scene worked first render.

Off-frame voice is the second safest. Only one character on camera. The other character is voice-only. I used this in a bank-versus-broker spot where the banker arm was visible at the edge of frame but the face and body stayed off-camera. The viewer reads two characters because the audio sells it. The render risk drops to one face.

Multi-character stampede with depth layering is manageable but requires discipline. Front six to eight figures detailed. Back layer stylized silhouettes. No hand interactions between figures. I used this in a cartoon spot with 50 lenders stampeding through a door. The front layer carried the gag. The back layer carried the volume.

The hard rule is to avoid two characters in close proximity with hand-touching. High-fives, fist bumps, shoulder pats, hand-passing-objects. The model cannot resolve the physics cleanly. The render produces phantom hands, twisted wrists, or a third arm. Cut the contact out of the script before you render. Re-block the scene to avoid touch.

What audio register actually works for AI video?

Direct answer: the audio register depends on the format. UGC peer-energy spots need the captured-real-world signature with proximity warmth and audible breath. Narrative-comedy with a brand narrator over a character punchline needs smooth radio voice. The two are not interchangeable.

I learned this the wrong way around. The first spot was a narrative-comedy piece with a brand-narrator voiceover and a character punchline. I used the UGC audio profile we had locked from selfie testimonials. The render came back amateur. The vocal fry on the tails, the breath on the mic, the slightly-too-warm proximity all worked beautifully on a phone selfie. On a narrative-comedy spot with a polished visual, they read as a bad podcast guest, not as a professional narrator.

The fix is to spec smooth radio voice for the narrator format. Polished FM-DJ quality. Light studio booth reverb. Clean recording. Minimal mouth sounds. Minimal vocal fry. Minimal audible breath. Reference vibes that work include Dollar Shave Club deadpan-smooth, Liev Schreiber narration, George Clooney voice-for-hire, polished NPR-evolution Audie Cornish. The model can hit any of those if you name them in the prompt.

The format-dependent rule is simple. Selfie testimonial, peer-to-peer, ugc framing? Stay with the captured-real-world signature. Narrative-comedy with a brand-narrator? Smooth radio. The two registers signal different things to the viewer. Mixing them up makes the polished visual look amateur or the casual visual look corporate.

How do you keep continuity across multi-shot pieces?

Direct answer: stop generating separate first-frame images for every shot. Use Seedance Option 2 continuation. Upload the prior shot as the start-state reference. The model pixel-locks to its last frame.

The naive workflow is to generate one GPT Image 2 first frame per shot. Render each shot independently. Cut them together in post. The result is a visible jump between every cut. Lighting shifts, character clothing changes, the background loses a prop. The viewer reads the cuts as glitches, not as edits.

The continuation workflow fixes this. Render shot one with a generated first-frame image. Render shot two by uploading the full shot-one video as the start-state reference. Seedance pixel-locks to the last frame of shot one and begins shot two from that exact state. The cut becomes invisible. The narrative reads continuous.

For shots with major visual transitions, like a door breaking down or a character entering a new room, upload both the prior shot and a generated end-frame image as the target. The model interpolates the motion between them. The transition reads natural instead of as two unrelated clips spliced together.

The other benefit is cost. Only shot one needs a generated first frame. Every shot after it inherits from the previous render. On a 5-shot piece that is four GPT Image 2 generations saved. Across a 10-spot slate the saving compounds.

What comedic format works for finance and B2B?

Direct answer: the absurdist-delivery format. The delivery mechanic itself is the joke. An unexpected object or character brings the outcome. Deadpan reaction. Brand narrator bookends the visual gag. The joke carries 80 percent of the spot. The brand lands the words.

The format is distinct from situational comedy. Situational comedy needs a setup, a development, and a punchline across time. The absurdist-delivery format collapses all three into a single visual mechanic. The mechanic is the unexpected delivery method for an expected outcome.

Three of my spots used the format and all three worked. A phone that ejected cash like a slot machine. A stampede of lenders breaking down a door. A bank that stalked a small business owner like a clingy ex outside the kitchen window. Each one rendered as a single visual gag. Each one carried its meaning without a single line of dialogue from the hero character.

The template structure is simple. 15 to 30 seconds. Single shot or short multi-shot. Magic-realism delivery mechanic. Deadpan character reaction, never theatrical. Brand narrator voiceover bookends. The visual gag carries the spot. The words land the brand.

The format is generalizable beyond financial services. Any brand with a clear differentiator that can be rendered as an absurd delivery mechanic can use it. A lender marketplace becomes a stampede of lenders. A funding speed claim becomes a cash-dispensing phone. The trick is to find the differentiator that can be visualized literally and then render it at scale.

FAQ

Which model do you start with for AI video?

Seedance 2 for the video render. GPT Image 2 for first-frame generation when continuation does not apply. I default to GPT Image 2 because the prompt obedience on first frames is the cleanest of any model in the stack right now. If a first frame fails three times in a row, I switch tools. The three-strike rule is non-negotiable. Iterating the prompt past three failures is a sign you have a tool-fit problem, not a prompt-fit problem.

How long does a 30-second AI video spot take to produce?

From locked script to renderable file, two to four hours for a single-shot spot. Six to eight hours for a multi-shot narrative piece. The variance comes from how many render passes the first frame needs and how many continuation iterations the subsequent shots require. The first spot in a new format takes longer. Spots after the format is locked take half the time because the prompts inherit.

How do you stop the credit meter from running away?

Pre-resolve the failure modes before you render. The six rules above are the failure modes I burned credits learning. Negative-prompt the known glitches. Stylize the assets that will malform. Re-block the scenes that will produce phantom limbs. Choose the audio register that matches the format. Use continuation instead of regenerating first frames. Pre-resolution costs minutes. Render-and-recover costs credits.

What does this not work for?

Tutorials, demos, anything that needs to show a real UI on a real device. The model cannot render readable text on a screen yet. If the spot requires UI to be visible and legible, shoot it live. The AI stack is excellent for magic-realism, narrative comedy, character-driven dialogue, and any spot where the visual gag carries the meaning. It is not yet excellent for product demonstration.

Are these rules going to last?

The render-constraint rules will hold until the underlying models fix the failure modes. Phone screens, currency, and multi-character physics are open research problems across every video model in the market. The workflow rule about continuation will hold for as long as the platform exposes a continuation primitive. The format rule about absurdist-delivery is older than AI video and will outlive it. Comedy formats compound across eras. Render constraints are a moving target.

If you want to see how I run paid acquisition against AI-produced creative inside a current account, the agency surface is at dynamitegrowth.co.

About the author

Gian Gomez, studio portrait

Gian Gomez.

Founder, Dynamite Growth · Miami

AI-leveraged solo operator running paid acquisition and funnels for B2B high-ticket clients out of Miami. Eight years in sales and marketing, $50M+ generated across roles, including founding Prodigy Power and operating as employee #1 at Andy Elliott’s sales education company. The receipts are the work, not the prompts.