Episodes Story Cast Suggest Hall of Fame How It Works Follow on TikTok Follow on Facebook
// TECHNICAL DOCUMENTATION v1.0

How Bob Works

A fully automated AI pipeline running on self-hosted hardware in South Australia. Here's everything under the hood.

The Pipeline

01 Audience Votes TikTok + YouTube + Facebook + API
Viewers comment their vote on TikTok, YouTube and Facebook. OpenClaw makes API calls every 5 minutes to track votes on all three platforms. First option to 100 votes wins — or the leader at 3am AEST triggers the next episode automatically.
02 AI Writes the Episode Claude Sonnet via Anthropic API
OpenClaw calls the Claude API with the full series bible, the complete story so far from a running tracker, and the winning vote. Claude returns a JSON script with 10–15 lines of dialogue, character expressions, backgrounds and 3 new vote options.
03 Character Generation ComfyUI + DreamShaper 8 + ControlNet
Any new character referenced in the script is automatically generated using ComfyUI with the DreamShaper model and OpenPose ControlNet. Bob's real photo is used as the pose reference so all characters maintain consistent facial positioning. Images are saved for reuse in future episodes.
04 Voice Synthesis Microsoft Edge TTS
Each line of dialogue is converted to audio using Edge TTS Australian voices — en-AU-WilliamNeural for Bob and male characters, en-AU-NatashaNeural for female characters, and American voices for the aliens. Audio is resampled to 16kHz mono WAV for SadTalker compatibility.
05 Lip Sync Animation SadTalker + GTX 1060 6GB
SadTalker animates each character image to match the audio using a 3D morphable face model. It extracts facial landmarks, generates expression coefficients from the audio mel spectrogram, renders a talking head video and composites it back onto the original face using seamless cloning.
06 Background Removal rembg + onnxruntime-gpu + CUDA
The SadTalker output has a plain background that needs removing before compositing. rembg runs the U2-Net neural network on each frame to generate an alpha mask, producing transparent PNG frames. This runs on the GTX 1060 via CUDA for acceleration.
07 Scene Compositing FFmpeg
FFmpeg overlays the transparent character frames over the background image, scales to 1080×1920 (TikTok portrait format), adds subtitle text using drawtext, mixes in the audio and encodes to H.264/AAC MP4. The final episode is assembled by concatenating all scenes with a dynamic intro card and endcard.

Hardware

Render Node
Super
Ubuntu 24.04, custom build
GPU
NVIDIA GTX 1060
6GB VRAM, Pascal architecture
RAM
8GB System RAM
4GB swap configured
Orchestration
OpenClaw
Docker on Unraid NAS
Web Server
Allium
Plesk, 1500+ days uptime
Internet
Starlink
CGNAT, self-hosted via NPM

Render Times Per Scene

Voice Synthesis (Edge TTS)~5 sec
Lip Sync (SadTalker)~45 sec
Background Removal (rembg + CUDA)~60 sec
Scene Compositing (FFmpeg)~10 sec
Total per episode (~13 scenes)~25 min

The Full Stack

[AI]
Claude Sonnet
Story Writer
Anthropic's Claude writes every episode from scratch using the series bible and story tracker. It invents characters, dialogue, plot twists and vote options.
[IMG]
ComfyUI + DreamShaper 8
Character & Background Generator
Stable Diffusion workflow for generating new character portraits and outback backgrounds on demand. ControlNet with OpenPose keeps face positions consistent.
[LIP]
SadTalker
Lip Sync Engine
3D face model animation that drives character portraits to speak in sync with generated audio. Runs entirely on local GPU hardware.
[TTS]
Microsoft Edge TTS
Voice Synthesis
High-quality Australian neural voices. en-AU-WilliamNeural for Bob. Runs via the edge-tts Python CLI — free, no API key required.
[BG]
rembg
Background Removal
U2-Net neural network running via onnxruntime-gpu to remove SadTalker backgrounds frame by frame, enabling character compositing over scene backgrounds.
[VID]
FFmpeg
Video Assembly
Handles all compositing, scaling, subtitle rendering, audio mixing, scene concatenation and final encoding to H.264/AAC for TikTok.
[AGT]
OpenClaw
AI Agent Orchestrator
Self-hosted AI agent platform running in Docker on an Unraid NAS. Orchestrates the entire pipeline via SSH — from vote polling to episode rendering to notifications.
📘
Facebook Graph API
Video Upload & Vote Polling
Handles automatic Reels upload to the Bob in Australia Facebook Page and polls public comments every 5 minutes to tally votes alongside YouTube. TikTok votes are counted manually.
YouTube Data API v3
Video Upload & Vote Polling
Handles automatic episode upload as unlisted videos with comments enabled, and polls YouTube comments every 5 minutes to tally votes. First option to 100 votes triggers the next episode.
🐍
Python 3
Pipeline Glue
Every step of the pipeline is orchestrated in Python — from calling the Claude API to triggering ComfyUI workflows, managing the story tracker JSON, syncing files via SCP, and writing trigger files between machines.
🖥
Unraid NAS
Orchestration Host
Self-hosted NAS running Docker containers including OpenClaw. Hosts the persistent pipeline volume, story tracker, cast images and episode trigger files. Boots containers automatically on startup.
🗄
PHP + SQLite
Website Backend
The bobinaustralia.com.au website runs on PHP with SQLite for community suggestions voting, contact form submissions, and future feature voting. No external database required.
Nginx + Cloudflare
Web Server & CDN
Nginx via Plesk serves the website from a Linux server in Australia. Cloudflare sits in front for DDoS protection, SSL termination, email obfuscation and global CDN caching.
🔑
Google OAuth 2.0
YouTube Authentication
OAuth token-based authentication for the YouTube Data API. Refresh tokens stored locally on the render server — no manual re-authentication required between episodes.

Geek FAQ

Does Bob actually know what's happening to him?
No. Claude writes each episode fresh from the series bible and a story tracker JSON file. It has no persistent memory — just the text summary of what's happened so far. Every decision is made in a single API call.
How are new characters generated?
When Claude writes a new character into an episode, it includes a description field. The pipeline detects the missing image, starts ComfyUI, submits a txt2img workflow with ControlNet using Bob's real photo as a pose reference, and saves the output PNG for future episodes.
What stops people from voting multiple times?
Nothing — it's TikTok comments. Each comment counts as one vote regardless of who posted it. If someone really wants BIRDSVILLE to win, they can comment 100 times. That's kind of the point.
How long does a full episode take to render?
On the GTX 1060, approximately 25–40 minutes for a 13-scene episode. SadTalker is the slowest step at ~45 seconds per scene, followed by rembg background removal at ~60 seconds per scene running on CUDA.
Is this running in the cloud?
No. Everything runs on physical hardware in South Australia — an Unraid NAS for orchestration and a Linux box with a GTX 1060 for rendering. The only cloud calls are to the Anthropic API for writing and Microsoft Edge TTS for voices.
What happens if the render fails mid-episode?
The pipeline has a resume feature — it checks for existing scene_N.mp4 files and skips already-completed scenes. A failed run can be restarted and will pick up from where it left off.
Will Bob ever find out about the $5 million?
Yes — when he finds an ATM and checks his balance. Tap and go doesn't show the balance so it has to be a proper ATM. When that happens is entirely up to the audience votes.