Controlling My Dev Server Through WhatsApp (Because Why Not)

I have a problem: I run autonomous AI workers on a remote server, but checking on them requires SSH-ing in from my phone, which is… not fun.

Solution? Build a WhatsApp bridge so I can text “status” and get a reply. Like texting your employees, except they’re not real and they don’t judge my 2am check-ins.

The Vision

The goal is simple: send WhatsApp messages to orchestrate the dev server.

Me: "status"
Bridge: "Server up, 2 workers active, costs looking fine"

Me: "progress trota"
Bridge: "Phase 6 analytics in progress, HR zones implemented"

Me: "/ask what's left on the roadmap?"
Bridge: *forwards to LLM, returns answer*

Eventually, voice notes too. Because typing “check on all projects” is apparently too much effort.

Baileys vs. Cloud API

WhatsApp has an official Cloud API for businesses. I’m not using it. Here’s why:

The Cloud API Route

Requires business account registration
Needs a second phone number (or port your main one)
Costs money per message (not much, but still)
Subject to Meta’s delightful terms of service
In January 2026, Meta banned AI chatbots on Cloud API (yep)

The Baileys Route

Open source library that mimics WhatsApp Web
No business registration, no extra phone number
Link via QR code like you would with WhatsApp Web
Completely free
Self-hosted on my VM
Lighter than whatsapp-web.js (no Chromium dependency, ~50MB RAM)

The tradeoff? Ban risk.

The Ban Risk Thing

Let’s be clear: using Baileys violates WhatsApp’s terms of service. They don’t want you automating their platform unless you’re paying for Cloud API.

Mitigation Strategy

I’m not exactly running a spam operation here, so the actual risk is manageable:

Use a burner SIM — Not my main number. If it gets banned, no big deal.
Single user, low volume — It’s just me sending <50 messages/day.
Human-like delays — Random 1-3 second delays before replying.
Exponential backoff — Don’t hammer their servers on reconnect.
Fallback plan — If it gets banned, switch to Telegram Bot API (official, free, zero risk).

The gamble: it’s a personal use case with minimal message volume. WhatsApp probably doesn’t care enough to ban me when they’re busy dealing with actual spammers sending thousands of messages per day.

Famous last words? We’ll see.

The Architecture

Phone (WhatsApp)
    ↓
WhatsApp Servers (WebSocket)
    ↓
Baileys Library (on dev-server)
    ↓
Command Parser
    ↓
Execute: tmux list / ps aux / read PROGRESS.md / invoke LLM
    ↓
Reply via Baileys
    ↓
Phone

The bridge runs as a PM2 service. It maintains a persistent WebSocket connection to WhatsApp’s servers, same as WhatsApp Web does in your browser.

Implementation (Phase 1 MVP)

Phase 1 is the “prove it works” stage:

Completed:

✅ Baileys connection with QR code pairing
✅ Persistent auth state (survives restarts)
✅ Whitelist-only message handling (security first)
✅ Command parser with simple routing
✅ Commands: status, workers, costs, progress <project>
✅ PM2 process management
✅ Health check endpoint
✅ Human-like response delays

Example Commands:

# Check server health
"status" → CPU, memory, disk, uptime

# List worker sessions
"workers" → Active tmux sessions with "worker" in the name

# Quick cost summary
"costs" → GCP instance cost estimates

# Project progress
"progress trota" → Reads ~/projects/trota/worker/PROGRESS.md

Simple. Effective. No frills.

What’s Next (Phase 2)

Voice note transcription (Whisper)
/ask commands that forward to the LLM
Rich message formatting (bold, code blocks)
Telegram fallback bot (if WhatsApp bans me)

What’s After That (Phase 3)

Worker dispatch from phone: “start worker on trota phase 7”
Scheduled status pings (morning reports)
Screenshot/progress photo attachments

Why This Matters

The orchestrator architecture is all about removing friction. I can already SSH in and run commands. But that’s:

Slow on mobile
Requires switching apps
Breaks flow if I’m doing something else

WhatsApp is already on my phone. It’s always open. I can check on workers while waiting in line at the grocery store. Or at 2am when I randomly wonder if a worker finished.

It’s the difference between “I should check on that later” and actually checking on it.

The Bigger Picture

This isn’t just about WhatsApp. It’s about building a command-and-control interface for the orchestrator.

Right now I have:

SSH (full access, heavy)
WhatsApp bridge (quick status checks, lightweight)

Future interfaces:

Voice (dictate tasks, get updates)
Web dashboard (visual monitoring)
Email (daily/weekly reports)

Each interface serves a different context. SSH for deep work. WhatsApp for quick checks. Voice for brainstorming while walking. Email for passive awareness.

The orchestrator doesn’t care how you talk to it. It just works.

Final Thoughts

Will my WhatsApp account get banned? Maybe. Probably not, but maybe.

Is this a questionable use of technology? Absolutely.

Is it cool? Yes.

Will I switch to Telegram if needed? Also yes.

Now if you’ll excuse me, I need to test this thing. Time to send “status” and see what happens.

Update: It works. My phone buzzed with a server report. I feel like a supervillain.