When we set out to create a digital comic book-inspired, sci-fi soft skills learning experience and gamification for our partner’s three-year technical training program, the promise of generative AI was alluring: rapid content creation, lower production costs, and unlimited creative possibilities. The reality? More nuanced… but entirely achievable.
The Workflow in Brief
(Video demo below: See how our AI-assisted production pipeline brought this sci-fi soft skills world to life.)
▶ Watch the Demo
Our production pipeline followed these stages:
- Story & Prompt Generation → ChatGPT created the narrative, character descriptions, and scene prompts for images and video
- Visual Creation & Editing → Freepik’s AI suite generated and retouched images
- Video Generation → Freepik and Kling.ai produced video from starting frames
- Audio Integration → Layered in voiceover, lip sync, and generated background music
- Assembly & Publishing → Brought everything together in Storyline and published
Simple on paper… more complex in execution.
What We Learned the Hard Way
Perhaps our biggest workflow revelation: story consistency lives or dies in a single conversation thread. Initially, we attempted to generate story chunks across multiple ChatGPT sessions. The result? Tonal whiplash and endless continuity issues, like previously unheard-of characters suddenly replacing a different main character, and even more confounding head scratchers — all courtesy of AI’s inherent randomness. Starting with our critical story parameters, values, and main events outline, keeping the entire narrative development within one extended conversation dramatically improved consistency and character voice.
The visual generation presented its own challenges. “AI face” is real — especially prevalent in female characters, where we battled an uncanny sameness across different supposed individuals. Multiple characters in a single scene often looked like siblings, regardless of our prompts. Custom-trained character LoRAs on small datasets locked characters into repetitive facial expressions and clothing choices, fighting our detailed prompt specifications at every turn.
Then there were the notorious AI artifacts: the occasional “creepypasta” moment (unsettling distortions that shouldn’t exist), persistent hand anatomy issues, and lip-sync problems that multiplied with each additional character in a scene. Each required a producer’s touch — from quick retouching to complete regeneration, with the occasional moment of staring at a seven-fingered hand wondering where it all went wrong. If you still think AI is going to take all our jobs, I challenge you to generate a video of a man juggling.
Lessons From Building Our Tool Stack
We evaluated several platforms across different production needs:
Voice & Audio
- ElevenLabs delivered emotive character intonations that brought scenarios to life, with its voice changer tool allowing us to record the level of feeling we wanted in a line and apply to any character voice.
- Soundraw.io generated a playlist of background music that enhanced the narrative atmosphere.
Image Generation & Editing
- Freepik’s AI suite produced the strongest results for image creation, editing and retouching.
- Adobe Suite’s Firefly yielded less consistent and lower quality outcomes in comparison, despite being the media industry behemoth it is.
- Custom character LoRAs added some consistency, though imperfectly, highlighting the importance of training with a broad data set that includes a range of facial expressions and clothing styles.
- ChatGPT’s DALL-E proved valuable for extending an icon set purchased from Adobe Stock — with careful prompting to maintain stylistic elements, it successfully generated additional icons matching the original set’s aesthetic.
Video Generation
- Kling.ai offered the most robust video generation capabilities, with extensive customization options and multiple models to explore.
- Freepik also offered video capabilities alongside its image tools, which allowed for a more streamlined production approach.
Conversation Simulation
- HeyGen’s dual avatar feature seemed promising for realistic dialogue but felt robotic and landed in uncanny valley territory — too distracting for conversational scenario-based training requiring any sense of realism.
Each tool excelled in its lane, but none were set-and-forget solutions. Success required understanding each platform’s strengths and limitations.
Real Talk on AI Production
Here’s what the headlines miss: AI doesn’t automate content creation, it transforms it. Recent discussions about “AI slop” flooding the internet aren’t unfounded. Without constant producer oversight, quality deteriorates fast. Every generated asset required evaluation, many needed revision, and some demanded outright rejection and regeneration.
But here’s the upside: with the right tools, a thoughtful workflow, and a producer’s critical eye, we created engaging scenario-based training that would have been financially prohibitive through traditional production methods. Our budget constraints would have relegated us to text-based scenarios or static slideshows. Instead, learners experienced cinematic storytelling that bridged soft skills gaps through immersive narrative.
What does this require? Time. Human oversight and decision-making. Critical thinking skills. This isn’t “input one prompt and BAM — Here’s your scenario-based training.” It’s iterative, requires creative problem-solving, and demands someone who understands both the learning objectives and the technology’s limitations.
For L&D professionals exploring simulated conversations and story-based learning: AI tools are ready. They’re powerful. But they’re collaborators, not replacements. Budget accordingly — not just financially, but in terms of production time and human expertise. These tools expand what’s possible on constrained budgets, they just don’t do it on autopilot.