Episodes Story Cast Suggest Hall of Fame How It Works Follow on TikTok Follow on Facebook
// ROADMAP v1.0

The Future of Bob

Bob is just getting started. Here's where we're hoping to take him — better animation, faster production, more interactivity, and eventually a proper studio setup running in a shed in South Australia.

Now

Where We Are

The pipeline is running. Episodes are being generated end-to-end with no human involvement after the vote is counted. It works — but it's rough around the edges.

  • SadTalker lip sync — functional but limited to portrait talking heads
  • rembg background removal — frame by frame, CPU-heavy
  • DreamShaper characters — consistent but cartoon quality
  • Edge TTS voices — good Australian accents, slightly robotic
  • ~30 minute render time per episode on GTX 1060
Soon

Animation Upgrade

SadTalker is good for what it is, but it only animates the face. The next step is full-body animation — characters that move, gesture, and react physically to the dialogue.

  • Replace SadTalker with HeyGen or similar photorealistic talking head API 0
  • Full-body character animation using pose estimation and motion transfer 0
  • Animated backgrounds — subtle parallax, weather, time of day changes 0
  • Better lip sync accuracy using wav2lip or similar dedicated model 0
  • Scene transitions between dialogue lines 0
Soon

Sound Design

Currently Bob's world is mostly silent except for voices. Real storytelling needs ambient sound — the creak of a pub, the wind across the Birdsville Track, the clunk of a blown tyre.

  • AI-generated ambient soundscapes per scene (outback wind, pub noise, car interior) 0
  • Sound effects triggered by script keywords 0
  • Improved voice synthesis — ElevenLabs or Cartesia for more natural delivery 0
  • Dynamic music scoring — different themes per location and emotional tone 0
Later

Production Pipeline

Right now each episode takes 30-45 minutes to render sequentially. With better hardware and parallelisation, that drops to under 10 minutes — meaning same-hour episode release after voting closes.

  • A GPU-populated server for serious parallel rendering 0
  • Parallel scene rendering — multiple scenes processing simultaneously 0
  • Episode archive page on this website with full back catalogue
  • Automated episode summary posted to website after each render 0
Dream

Bob's World

The long-term vision is a fully interactive AI story universe. Bob is just the start.

  • Multi-camera angles per scene — cutaways, reaction shots, wide establishing shots 0
  • 3D environments — Bob's world rendered in real-time 3D with consistent locations 0
Soon

Growing the Audience

Bob's existence depends on people watching. We're building in mechanics that make that explicit and turn it into part of the story.

  • "Bob Knows He Might Die" — a short where Bob breaks the fourth wall about his AI existence and asks viewers to follow to keep him alive 0
  • End card CTA — "Follow or Bob dies" alongside the vote options 0
  • Behind the scenes shorts — showing the pipeline rendering, the GPU working, the AI writing 0
  • Bob reacts to real TikTok comments in standalone shorts 0
  • Bob's survival tied to follower count in the narrative — the aliens return and reveal the nuroliser only stays stable while enough humans are watching 0
  • A dedicated follow-drive pinned video at the top of the profile 0

The Hardware Problem

Every AI task in the pipeline — lip sync, background removal, image generation, voice synthesis — runs on a single NVIDIA GTX 1060 6GB from 2016. It's a remarkable machine that punches well above its weight, but it's showing its limits.

The dream is a dedicated server stacked with multiple GPUs running in parallel — every scene rendering simultaneously, episodes completing in minutes instead of hours, and enough headroom to run multiple story series at once. A shed full of GPUs in South Australia, all working for Bob.

Current
1 × GTX 1060 6GB
Dream: Server full of GPUs
Render Time
~35 min/ep
~2 min/ep
Parallel Scenes
1 at a time
All at once
Series
Bob only
Multiple universes

Animation: Now vs Future

Current Pipeline
Static character portrait images
SadTalker face-only animation
Plain colour scene backgrounds
No body movement or gestures
Dialogue subtitles only
No ambient sound
Edge TTS — good but synthetic
Target Pipeline
Full-body animated characters
Photorealistic lip sync
Animated scene environments
Gesture and expression matching
Contextual sound effects
Ambient soundscapes per location
Natural voice synthesis