How Bob Works

The Pipeline

01 Audience Votes TikTok + YouTube + Facebook + API

Viewers comment their vote on TikTok, YouTube and Facebook. OpenClaw makes API calls every 5 minutes to track votes on all three platforms. First option to 100 votes wins — or the leader at 3am AEST triggers the next episode automatically.

02 AI Writes the Episode Claude Sonnet via Anthropic API

OpenClaw calls the Claude API with the full series bible, the complete story so far from a running tracker, and the winning vote. Claude returns a JSON script with 10–15 lines of dialogue, character expressions, backgrounds and 3 new vote options.

03 Character Generation ComfyUI + DreamShaper 8 + ControlNet

Any new character referenced in the script is automatically generated using ComfyUI with the DreamShaper model and OpenPose ControlNet. Bob's real photo is used as the pose reference so all characters maintain consistent facial positioning. Images are saved for reuse in future episodes.

04 Voice Synthesis Microsoft Edge TTS

Each line of dialogue is converted to audio using Edge TTS Australian voices — en-AU-WilliamNeural for Bob and male characters, en-AU-NatashaNeural for female characters, and American voices for the aliens. Audio is resampled to 16kHz mono WAV for SadTalker compatibility.

05 Lip Sync Animation SadTalker + GTX 1060 6GB

SadTalker animates each character image to match the audio using a 3D morphable face model. It extracts facial landmarks, generates expression coefficients from the audio mel spectrogram, renders a talking head video and composites it back onto the original face using seamless cloning.

06 Background Removal rembg + onnxruntime-gpu + CUDA

The SadTalker output has a plain background that needs removing before compositing. rembg runs the U2-Net neural network on each frame to generate an alpha mask, producing transparent PNG frames. This runs on the GTX 1060 via CUDA for acceleration.

07 Scene Compositing FFmpeg

FFmpeg overlays the transparent character frames over the background image, scales to 1080×1920 (TikTok portrait format), adds subtitle text using drawtext, mixes in the audio and encodes to H.264/AAC MP4. The final episode is assembled by concatenating all scenes with a dynamic intro card and endcard.

The Full Stack

[AI]

Claude Sonnet

Story Writer

Anthropic's Claude writes every episode from scratch using the series bible and story tracker. It invents characters, dialogue, plot twists and vote options.

[IMG]

ComfyUI + DreamShaper 8

Character & Background Generator

Stable Diffusion workflow for generating new character portraits and outback backgrounds on demand. ControlNet with OpenPose keeps face positions consistent.

[LIP]

SadTalker

Lip Sync Engine

3D face model animation that drives character portraits to speak in sync with generated audio. Runs entirely on local GPU hardware.

[TTS]

Microsoft Edge TTS

Voice Synthesis

High-quality Australian neural voices. en-AU-WilliamNeural for Bob. Runs via the edge-tts Python CLI — free, no API key required.

[BG]

rembg

Background Removal

U2-Net neural network running via onnxruntime-gpu to remove SadTalker backgrounds frame by frame, enabling character compositing over scene backgrounds.

[VID]

FFmpeg

Video Assembly

Handles all compositing, scaling, subtitle rendering, audio mixing, scene concatenation and final encoding to H.264/AAC for TikTok.

[AGT]

OpenClaw

AI Agent Orchestrator

Self-hosted AI agent platform running in Docker on an Unraid NAS. Orchestrates the entire pipeline via SSH — from vote polling to episode rendering to notifications.

📘

Facebook Graph API

Video Upload & Vote Polling

Handles automatic Reels upload to the Bob in Australia Facebook Page and polls public comments every 5 minutes to tally votes alongside YouTube. TikTok votes are counted manually.

▶

YouTube Data API v3

Video Upload & Vote Polling

Handles automatic episode upload as unlisted videos with comments enabled, and polls YouTube comments every 5 minutes to tally votes. First option to 100 votes triggers the next episode.

🐍

Python 3

Pipeline Glue

Every step of the pipeline is orchestrated in Python — from calling the Claude API to triggering ComfyUI workflows, managing the story tracker JSON, syncing files via SCP, and writing trigger files between machines.

🖥

Unraid NAS

Orchestration Host

Self-hosted NAS running Docker containers including OpenClaw. Hosts the persistent pipeline volume, story tracker, cast images and episode trigger files. Boots containers automatically on startup.

🗄

PHP + SQLite

Website Backend

The bobinaustralia.com.au website runs on PHP with SQLite for community suggestions voting, contact form submissions, and future feature voting. No external database required.

☁

Nginx + Cloudflare

Web Server & CDN

Nginx via Plesk serves the website from a Linux server in Australia. Cloudflare sits in front for DDoS protection, SSL termination, email obfuscation and global CDN caching.

🔑

Google OAuth 2.0

YouTube Authentication

OAuth token-based authentication for the YouTube Data API. Refresh tokens stored locally on the render server — no manual re-authentication required between episodes.

Geek FAQ

Does Bob actually know what's happening to him?

No. Claude writes each episode fresh from the series bible and a story tracker JSON file. It has no persistent memory — just the text summary of what's happened so far. Every decision is made in a single API call.

How are new characters generated?

When Claude writes a new character into an episode, it includes a description field. The pipeline detects the missing image, starts ComfyUI, submits a txt2img workflow with ControlNet using Bob's real photo as a pose reference, and saves the output PNG for future episodes.

What stops people from voting multiple times?

Nothing — it's TikTok comments. Each comment counts as one vote regardless of who posted it. If someone really wants BIRDSVILLE to win, they can comment 100 times. That's kind of the point.

How long does a full episode take to render?

On the GTX 1060, approximately 25–40 minutes for a 13-scene episode. SadTalker is the slowest step at ~45 seconds per scene, followed by rembg background removal at ~60 seconds per scene running on CUDA.

Is this running in the cloud?

No. Everything runs on physical hardware in South Australia — an Unraid NAS for orchestration and a Linux box with a GTX 1060 for rendering. The only cloud calls are to the Anthropic API for writing and Microsoft Edge TTS for voices.

What happens if the render fails mid-episode?

The pipeline has a resume feature — it checks for existing scene_N.mp4 files and skips already-completed scenes. A failed run can be restarted and will pick up from where it left off.

Will Bob ever find out about the $5 million?

Yes — when he finds an ATM and checks his balance. Tap and go doesn't show the balance so it has to be a proper ATM. When that happens is entirely up to the audience votes.

The Pipeline

Hardware

Render Times Per Scene

The Full Stack

Geek FAQ