Just 4 months since it’s original posting, so much has already changed in the world of AI, so Tristia has added some current tools to keep things relevant.

When we set out to create a digital comic book-inspired, sci-fi soft skills learning experience and gamification for our partner’s three-year technical training program, the promise of generative AI was alluring: rapid content creation, lower production costs, and unlimited creative possibilities. The reality? More nuanced… but entirely achievable.

Update (February 2026): Since this project, tools like Freepik Spaces, Nano Banana 2, and Midjourney V7’s Omni Reference have addressed several challenges described here. I’ve added notes throughout on how these might change your approach.

The Workflow in Brief

(Video demo below: See how our AI-assisted production pipeline brought this sci-fi soft skills world to life.)

▶ Watch the Demo

Our production pipeline followed these stages:

  1. Story & Prompt Generation → ChatGPT created the narrative, character descriptions, and scene prompts for images and video
  2. Visual Creation & Editing → Freepik’s AI suite generated and retouched images
  3. Video Generation → Freepik and Kling.ai produced video from starting frames
  4. Audio Integration → Layered in voiceover, lip sync, and generated background music
  5. Assembly & Publishing → Brought everything together in Storyline and published

Simple on paper… more complex in execution.

What We Learned the Hard Way

Perhaps our biggest workflow revelation: story consistency lives or dies in a single conversation thread. Initially, we attempted to generate story chunks across multiple ChatGPT sessions. The result? Tonal whiplash and endless continuity issues, like previously unheard-of characters suddenly replacing a different main character, and even more confounding head scratchers — all courtesy of AI’s inherent randomness. Starting with our critical story parameters, values, and main events outline, keeping the entire narrative development within one extended conversation dramatically improved consistency and character voice.

Node-based tools like Freepik Spaces (November 2025) now let you build story-to-image pipelines where context carries through the workflow. Your character reference connects to your scene prompts, which connect to your video generation, all in one visual map. This doesn’t eliminate the single-conversation principle for narrative development, but it reduces the consistency drift from tool-hopping.

The visual generation presented its own challenges. “AI face” is real — especially prevalent in female characters, where we battled an uncanny sameness across different supposed individuals. Multiple characters in a single scene often looked like siblings, regardless of our prompts. Custom-trained character LoRAs on small datasets locked characters into repetitive facial expressions and clothing choices, fighting our detailed prompt specifications at every turn.

Nano Banana 2 (Google’s Gemini 2.5 Flash Image, November 2025) specifically targets this problem. It maintains “visual memory” across editing sessions with conversational prompting like “same character, different outfit.” Early testing shows stronger consistency than custom LoRAs trained on small datasets. Midjourney’s Omni Reference (–oref, May 2025) also handles this better than the old –cref parameter.

Then there were the notorious AI artifacts: the occasional “creepypasta” moment (unsettling distortions that shouldn’t exist), persistent hand anatomy issues, and lip-sync problems that multiplied with each additional character in a scene. Each required a producer’s touch — from quick retouching to complete regeneration, with the occasional moment of staring at a seven-fingered hand wondering where it all went wrong. If you still think AI is going to take all our jobs, I challenge you to generate a video of a man juggling.

Lessons From Building Our Tool Stack

We evaluated several platforms across different production needs:

Voice & Audio

  • ElevenLabs delivered emotive character intonations that brought scenarios to life, with its voice changer tool allowing us to record the level of feeling we wanted in a line and apply to any character voice.
ElevenLabs v3 (June 2025) now supports audio tags for emotional control—[whispers], [sighs], [laughs]—directly in your text. Their Text to Dialogue API generates multi-speaker conversations with automatic transitions and interruptions, which could replace the voice changer workaround for scenario dialogue – but ultimately, it’s a matter of playing with the options and discovering which works best for your content, workflow, timeframe, etc.
  • Soundraw.io generated a playlist of background music that enhanced the narrative atmosphere.

Image Generation & Editing

  • Freepik’s AI suite produced the strongest results for image creation, editing and retouching.
  • Adobe Suite’s Firefly yielded less consistent and lower quality outcomes in comparison, despite being the media industry behemoth it is.
  • Custom character LoRAs added some consistency, though imperfectly, highlighting the importance of training with a broad data set that includes a range of facial expressions and clothing styles.
  • ChatGPT’s DALL-E proved valuable for extending an icon set purchased from Adobe Stock — with careful prompting to maintain stylistic elements, it successfully generated additional icons matching the original set’s aesthetic.
Freepik Spaces now consolidates multiple models (Flux, Mystic, Ideogram, Runway) into one node-based canvas with repeatable workflows. Nano Banana 2 offers conversational editing with sub-10-second generation and native 2K resolution. Midjourney V7’s Omni Reference handles both character and object consistency through one system.

Video Generation

  • Kling.ai offered the most robust video generation capabilities, with extensive customization options and multiple models to explore.
  • Freepik also offered video capabilities alongside its image tools, which allowed for a more streamlined production approach.

Conversation Simulation

  • HeyGen’s dual avatar feature seemed promising for realistic dialogue but felt robotic and landed in uncanny valley territory — too distracting for conversational scenario-based training requiring any sense of realism.

Each tool excelled in its lane, but none were set-and-forget solutions. Success required understanding each platform’s strengths and limitations.

Real Talk on AI Production

Here’s what the headlines miss: AI doesn’t automate content creation, it transforms it. Recent discussions about “AI slop” flooding the internet aren’t unfounded. Without constant producer oversight, quality deteriorates fast. Every generated asset required evaluation, many needed revision, and some demanded outright rejection and regeneration.

But here’s the upside: with the right tools, a thoughtful workflow, and a producer’s critical eye, we created engaging scenario-based training that would have been financially prohibitive through traditional production methods. Our budget constraints would have relegated us to text-based scenarios or static slideshows. Instead, learners experienced cinematic storytelling that bridged soft skills gaps through immersive narrative.

What does this require? Time. Human oversight and decision-making. Critical thinking skills. This isn’t “input one prompt and BAM — Here’s your scenario-based training.” It’s iterative, requires creative problem-solving, and demands someone who understands both the learning objectives and the technology’s limitations.

The tools keep improving. Freepik’s collaborative canvas means distributed teams can work on the same pipeline. Nano Banana’s conversational editing lowers the prompt-engineering barrier. Midjourney’s Omni Reference reduces the character drift that used to require manual cleanup. None of this changes the fundamental equation: faster generation and better consistency mean you can iterate more quickly, not that you can skip the iteration or human oversight.

For L&D professionals exploring simulated conversations and story-based learning: AI tools are ready. They’re powerful. But they’re collaborators, not replacements. Budget accordingly, not just financially, but in terms of production time and human expertise. These tools expand what’s possible on constrained budgets, they just don’t do it on autopilot.