Features Guide¶

A deep dive into OtterBot's capabilities -- from voice interaction and 3D visualization to web search, browser automation, and project management.

Voice (TTS & STT)¶

OtterBot supports both text-to-speech (reading responses aloud) and speech-to-text (voice input). Both can run locally without any API keys.

Text-to-Speech Providers¶

Kokoro¶

Local

High-quality local TTS using the Kokoro 82M ONNX model. Runs entirely on CPU inside the container. Default voice: af_heart.

Edge TTS¶

Free Cloud

Microsoft's neural TTS voices. Free, no API key needed. Wide variety of voices and languages available.

OpenAI-Compatible¶

API Key

Any provider with an OpenAI-compatible TTS endpoint (/v1/audio/speech). Configure the base URL and optional API key.

Deepgram¶

API Key

Deepgram's Aura neural TTS voices. Requires a Deepgram API key. Voices include asteria, luna, stella, athena, hera, orion, arcas, perseus, angus, orpheus, helios, and zeus.

TTS Configuration¶

Setting Key	Description	Default
`tts:enabled`	Enable or disable TTS	`false`
`tts:active_provider`	Active provider: `kokoro`, `edge-tts`, `openai-compatible`, or `deepgram`	`kokoro`
`tts:voice`	Voice name	`af_heart`
`tts:speed`	Playback speed (0.5 -- 2.0)	`1`
`tts:openai-compatible:base_url`	Base URL for OpenAI-compatible provider	--
`tts:openai-compatible:api_key`	API key (optional)	--
`tts:deepgram:api_key`	Deepgram API key	--

Speech-to-Text Providers¶

Whisper Local¶

Local

OpenAI Whisper ONNX model running locally. Default model: onnx-community/whisper-base. Configurable model ID.

OpenAI-Compatible¶

API Key

Any provider with an OpenAI-compatible transcription endpoint (/v1/audio/transcriptions).

Browser Web Speech API¶

Built-in

Uses the browser's native Web Speech API for speech recognition. No setup or API key required — works directly in supported browsers (Chrome, Edge).

Deepgram¶

API Key

Deepgram's real-time speech recognition API. Requires a Deepgram API key.

STT Configuration¶

Setting Key	Description	Default
`stt:enabled`	Enable or disable STT	`false`
`stt:active_provider`	Active provider: `whisper-local`, `openai-compatible`, `browser`, or `deepgram`	`whisper-local`
`stt:whisper:model_id`	HuggingFace model ID for local Whisper	`onnx-community/whisper-base`
`stt:openai-compatible:base_url`	Base URL for OpenAI-compatible provider	--
`stt:openai-compatible:api_key`	API key (optional)	--
`stt:deepgram:api_key`	Deepgram API key	--

3D Live View¶

The Live View renders your agents as animated 3D characters in a virtual scene using Three.js and React Three Fiber. Each agent can have its own character model with idle and action animations.

Model Packs¶

Character models are GLB files organized into model packs. OtterBot auto-discovers packs from the assets/workers/ directory:

assets/workers/
+-- MyCharacter/
|  +-- characters/
|  |  +-- character.glb        # 3D character model
|  +-- Animations/
|  |  +-- gltf/
|  |     +-- Rig_Medium/        # Medium rig animations
|  |     +-- Rig_Large/         # Large rig animations
|  +-- artwork.png              # Thumbnail

Each model pack includes idle and action animations. Agents assigned a model pack will appear as that character in the Live View, with animations reflecting their current status.

Environment Packs¶

Environment models (GLTF) are loaded from assets/environments/ and define the 3D scene backdrop. Multiple environment assets can be placed in a scene.

Scene Configurations¶

JSON files in assets/scenes/ define camera position, lighting, and layout. The Room Builder (below) lets you create and edit scene configurations visually.

Web Search Providers¶

Agents with the web_search tool can search the web. OtterBot supports four search providers:

DuckDuckGo¶

Free -- No API Key

Default provider. Searches via DuckDuckGo HTML interface. No configuration needed.

Brave Search¶

API Key

Brave's independent search index. Requires a Brave Search API key. Config: search:brave:api_key.

Tavily¶

API Key

AI-optimized search API designed for LLM agents. Requires a Tavily API key. Config: search:tavily:api_key.

SearXNG¶

Self-Hosted

Privacy-respecting metasearch engine. Connect to your own SearXNG instance. Config: search:searxng:base_url.

Set the active provider with the search:active_provider setting. All providers return results in a unified format with title, URL, and snippet.

Browser Automation¶

Agents with the web_browse tool can navigate web pages, extract content, and interact with websites using Playwright. A browser pool manages Chromium instances efficiently.

Navigate to URLs and extract page content as text
Managed browser pool prevents resource exhaustion
Runs headless Chromium inside the Docker container
Browser binaries path configurable via PLAYWRIGHT_BROWSERS_PATH

Info

Researcher agents have both web_search and web_browse -- search for results, then browse specific pages for detailed information.

Project Management & Kanban¶

OtterBot organizes complex work into Projects, each with a Kanban board for task tracking. Projects are created by the COO when you give it a multi-step goal.

Project Lifecycle¶

Creation -- COO creates a project with a name and description
Charter Gathering -- COO writes a project charter (specification/requirements)
Charter Finalized -- Charter is locked, Team Lead is spawned
Execution -- Team Lead creates Kanban tasks and assigns workers
Completion -- Project status set to completed, failed, or cancelled

Kanban Board¶

Each project has a Kanban board with three columns:

Column	Description
`backlog`	Tasks planned but not yet started
`in_progress`	Tasks currently being worked on by an agent
`done`	Completed tasks with optional completion reports

Tasks can have labels, blocking dependencies, and agent assignments. Workers automatically move tasks to in_progress when they start and done when they finish. The frontend displays the Kanban board in real-time via Socket.IO events.

Desktop Environment¶

OtterBot includes an optional full XFCE desktop environment running inside the container, accessible via noVNC in your browser. This gives agents (and you) access to a complete Linux desktop with GUI applications.

What's Included¶

XFCE4 -- Lightweight desktop environment
File manager -- Browse the workspace filesystem
Terminal -- Full shell access
Web browser -- Chromium for agents' browser automation
noVNC -- Browser-based VNC client (no VNC client needed)

Configuration¶

Variable	Description	Default
`ENABLE_DESKTOP`	Enable or disable the virtual desktop	`true`
`DESKTOP_RESOLUTION`	Screen resolution (WxHxDepth)	`1280x720x24`
`VNC_PORT`	Internal VNC server port	`5900`
`SUDO_MODE`	Privilege level: `restricted` (safe) or `full`	`restricted`

The desktop status can be checked via the GET /api/desktop/status endpoint. noVNC files are served directly from the server at /novnc/*.

Room Builder¶

The Room Builder is a visual 3D scene editor in the web UI. It lets you place environment assets, position the camera, and configure the Live View scene without editing JSON files.

Drag and drop environment assets from available packs
Position, rotate, and scale objects in 3D space
Configure lighting and camera angles
Save scene configurations for the Live View

Package Management¶

Agents with the install_package tool can install software packages inside the container. This supports both:

apt packages -- System-level packages (e.g., build tools, libraries)
npm packages -- Node.js packages installed globally to /otterbot/tools

Installed packages can be listed via the GET /api/packages REST endpoint. The NPM_CONFIG_PREFIX environment variable controls where npm global packages are installed.

Tip

Ephemeral by default: Packages installed inside the container are lost when the container stops. Mount a volume to persist installed tools across sessions.

Admin Assistant¶

The Admin Assistant is a persistent personal productivity agent that handles everyday tasks alongside the COO. It reports directly to the CEO (you) and manages:

Todos -- Create, list, update, and complete personal todo items with priorities and due dates
Email -- Read, search, send, reply, archive, and label Gmail messages
Calendar -- Create, update, delete, and list Google Calendar events across multiple calendars

The Admin Assistant runs as its own agent role (admin_assistant) with a dedicated system prompt optimized for productivity tasks. It can save memories to remember your preferences across sessions.

Gmail Integration¶

OtterBot integrates with Gmail via Google OAuth, giving the Admin Assistant full email capabilities.

Capability	Description
Read messages	Fetch and read email messages by ID or search query
Send messages	Compose and send new emails
Reply	Reply to existing email threads
Archive	Archive messages to remove them from the inbox
Label	Apply or remove Gmail labels for organization
List labels	Retrieve all available Gmail labels

Gmail access requires configuring Google OAuth credentials in the Settings UI. All email operations go through Google's official API.

Google Calendar¶

The Admin Assistant can manage your Google Calendar, creating and organizing events through natural language requests.

List calendars -- See all available calendars on your account
List events -- Query events by date range or calendar
Create events -- Schedule new events with title, time, location, and description
Update events -- Modify existing event details
Delete events -- Remove events from your calendar

Calendar access shares the same Google OAuth credentials as Gmail.

Custom Tools¶

OtterBot lets you create custom JavaScript tools that agents can use at runtime. Tools are sandboxed using isolated-vm for security.

Create tools manually -- Write a tool with a name, description, JSON schema for parameters, and JavaScript implementation
AI-generate tools -- Describe what you need and OtterBot will generate the tool code for you
Sandboxed execution -- Tools run in an isolated V8 context with controlled access to fetch and environment variables
Tool examples -- Browse built-in example tools for inspiration

Custom tools appear in the /api/tools/available endpoint and can be assigned to any agent template.

Skills System¶

Skills are reusable markdown prompt fragments that can be attached to agents to give them specialized knowledge or behavior patterns.

Create skills -- Write markdown content that gets injected into agent system prompts
Clone skills -- Duplicate existing skills as a starting point
Import/export -- Share skills between OtterBot instances as JSON
Scan -- Auto-discover skills from a directory structure

Memory & Soul¶

OtterBot's memory system gives agents persistent knowledge that carries across conversations.

Episodic Memories¶

Agents can save important observations, decisions, and learnings as episodic memories. These are stored with vector embeddings and retrieved via semantic search when relevant to the current conversation.

Memories are automatically extracted from significant conversations
A memory compactor consolidates similar memories to prevent bloat
Memories can be listed, saved, and managed via the API and Socket.IO

Soul Documents¶

Soul documents define an agent's identity, values, and long-term behavioral guidelines. They provide a stable foundation that persists regardless of conversation context.

Create and edit soul documents through the API
A soul advisor injects relevant identity context into agent prompts
Useful for defining company culture, communication style, or domain expertise

Coding Agents¶

OtterBot can delegate coding tasks to external coding agent CLIs, running them in managed PTY (pseudo-terminal) sessions.

OpenCode¶

Open Source

Open-source coding assistant. Configure the CLI path in Settings.

Claude Code¶

API Key

Anthropic's CLI coding agent. Requires Claude Code CLI installed and an Anthropic API key.

Codex¶

API Key

OpenAI's CLI coding agent. Requires Codex CLI installed and an OpenAI API key.

Gemini CLI¶

API Key or OAuth

Google's CLI coding agent. Requires Gemini CLI installed and a Google API key, or OAuth via gemini login.

PTY sessions -- Each coding agent runs in its own terminal, streamed to the UI in real-time
Permission handling -- File write and shell command permissions are relayed to the user for approval
File diffs -- Changes made by the coding agent are captured and displayed
Session management -- Start, monitor, and stop coding agent sessions via Socket.IO

Messaging Integrations¶

OtterBot can bridge conversations to fifteen external messaging platforms, letting you chat with your agents from anywhere.

Discord¶

Bot Token

Full Discord bot integration. Configure a bot token and channel in Settings.

Slack¶

App Token

Slack workspace app via Bolt framework. Requires app and bot tokens. Supports threaded conversations.

Matrix¶

Self-Hosted

Decentralized chat protocol with end-to-end encryption (E2EE) support. Connect to any Matrix homeserver.

IRC¶

Open Protocol

Classic IRC networks. Configure server, channel, and nickname.

Microsoft Teams¶

Webhook

Teams channel integration via incoming/outgoing webhooks.

Telegram¶

Bot Token

Telegram bot integration. Create a bot via BotFather and configure the token in Settings.

WhatsApp¶

Bridge

WhatsApp messaging bridge integration.

Signal¶

Bridge

Signal messenger bridge for secure messaging.

Mattermost¶

Webhook/Bot

Mattermost team chat integration.

Nextcloud Talk¶

Integration

Nextcloud Talk chat integration.

Tlon¶

Integration

Tlon (Urbit) communication platform integration.

Bluesky¶

Bot Token

AT Protocol integration for the Bluesky social network. Configure a bot token in Settings. Supports pairing approval flow. New in v0.16.0.

Google Chat¶

Google Workspace

Google Workspace integration with webhook support. Pair your Google Chat space through the Settings UI.

Mastodon¶

ActivityPub

Fediverse integration via the ActivityPub protocol. Connect to any Mastodon-compatible instance. Supports pairing approval flow.

Email (IMAP/SMTP)¶

Generic Email

Generic email bridge via IMAP/SMTP. Separate from the Gmail OAuth integration — connect to any email provider that supports IMAP and SMTP. Configure server addresses, ports, and credentials in Settings.

Each bridge relays messages bidirectionally between the external platform and the COO. Configure credentials and channels through the Settings UI.

Scheduled Tasks¶

OtterBot supports custom recurring tasks that run automatically on configurable intervals. Use these for:

Daily standup summaries
Periodic monitoring or health checks
Automated report generation
Regular data collection tasks

Scheduled tasks are managed through the API with configurable intervals and can be enabled or disabled individually.

Usage Analytics¶

OtterBot tracks token usage and costs across all LLM interactions, giving you visibility into resource consumption.

View	Description
Summary	Total tokens and estimated costs across all models
Recent	Recent usage entries with timestamps
By Model	Token usage broken down by LLM model
By Agent	Token usage broken down by agent instance

Backup & Restore¶

OtterBot supports full database backup and restore through the Settings UI and API.

Backup -- Download a complete backup of the encrypted database
Restore -- Upload a previous backup to restore all data (settings, conversations, agents, projects)

GitHub Integration¶

OtterBot includes built-in GitHub integration for managing repositories and monitoring activity.

SSH key management -- Generate, import, test, and delete SSH keys for Git authentication. Per-key usage selector (auth, signing, or both).
Multi-account support -- CRUD for multiple GitHub accounts, each with their own credentials and SSH keys. Per-project account assignment with a resolution chain: project-specific account → default account → legacy configuration.
Issue monitoring -- Track GitHub issues and discussions. Automatically create OtterBot projects from GitHub issues.
Fork mode -- Fork repositories, create cross-fork pull requests, and sync upstream changes. Useful for contributing to repositories you don't have push access to.
Commit signing -- Per-project toggle for SSH-based commit signing.
GitHub CLI -- The gh CLI is pre-installed for agents to use

Gitea Integration¶

OtterBot supports Gitea as an alternative to GitHub for self-hosted Git workflows.

Account management -- CRUD for Gitea accounts with per-project assignment
Issue monitoring -- Track Gitea issues and automatically create OtterBot projects
PR monitoring -- Monitor pull request activity on Gitea repositories

Specialist Agents¶

Specialist Agents are OtterBot's primary extension point for connecting to external data sources. Each specialist is an autonomous agent with its own isolated knowledge store, data ingestion pipeline, configuration, and custom tools.

Info

Specialists were formerly called "modules." The internal implementation still uses module naming for backward compatibility — defineSpecialist() and defineModule() are aliases.

Capabilities¶

Capability	Description
Knowledge Store	Isolated SQLite database with hybrid FTS5 + vector search per specialist
Data Ingestion	Automated polling on configurable intervals and/or webhook listeners
Custom Tools	Specialist-specific tools available to the specialist's agent
AI Agent	Optional reasoning layer — an LLM agent that can query and synthesize from the knowledge store
Config Schema	Typed settings (string, number, boolean, secret, select) managed through the Settings UI
Migrations	Versioned database schema migrations for custom tables

Knowledge Store¶

Each specialist gets its own SQLite database with hybrid search combining:

FTS5 full-text search for keyword matching (BM25 ranking)
Vector embeddings for semantic similarity (cosine similarity)
Reciprocal rank fusion to merge both result sets

// Available via ctx.knowledge in all handlers
ctx.knowledge.upsert("doc-1", "Document content", { url: "..." });
const results = await ctx.knowledge.search("query", 10);
const doc = ctx.knowledge.get("doc-1");
const count = ctx.knowledge.count();

Triggers & Data Pipeline¶

Specialists ingest data through two trigger types:

Poll Trigger¶

Runs on a configurable interval. Items returned from the onPoll handler are automatically upserted into the knowledge store.

triggers: [
  { type: "poll", intervalMs: 300_000, minIntervalMs: 60_000 }  // 5 min poll, 1 min minimum
]

A onFullSync handler can also be defined for comprehensive reindexing of all data.

Webhook Trigger¶

Registers a webhook endpoint at POST /api/modules/:moduleId/webhook. Supports GitHub signature verification (X-Hub-Signature-256) and secret-based auth.

triggers: [
  { type: "webhook", path: "/webhook" }
]

Custom Tools¶

Specialists can expose tools to their agent. Every specialist agent also automatically receives a knowledge_search tool for hybrid search over its knowledge store.

tools: [
  {
    name: "search_discussions",
    description: "Search discussions with filters",
    parameters: {
      query: { type: "string", description: "Search text" },
      category: { type: "string", description: "Filter by category" },
    },
    async execute(args, ctx) {
      // Can use raw DB access: ctx.knowledge.db.prepare(sql)
      return JSON.stringify(results);
    },
  },
]

Agent Configuration¶

Specialists can declare an AI agent that reasons over the indexed knowledge:

agent: {
  defaultName: "Discussions Agent",
  defaultPrompt: "You are a GitHub Discussions specialist...",
  defaultModel: "claude-sonnet-4-5-20250929",  // optional
  defaultProvider: "anthropic",                  // optional
}

Agent behavior is controlled by a posting mode:

Mode	Behavior
`respond`	Always respond to queries (default)
`lurk`	Index only — respond only to direct queries from COO/CEO
`new_chats`	Respond to new conversations, then lurk
`permission`	Ask COO for permission before responding

Installation¶

Specialists can be installed from three sources:

Source	Description
Git	Cloned from a GitHub repository, built with `pnpm install && pnpm build`
npm	Installed from an npm registry
Local	Symlinked from a local path (for development)

Install via the REST API, the COO's module_install tool, or the Settings UI. Multiple instances of the same specialist type can be installed with different configurations.

Management¶

Enable/disable -- Toggle specialists without uninstalling
Configure -- Update settings through the UI (config schema fields appear automatically)
Reload -- Unload and reload a specialist to pick up changes
Query -- The COO can query any specialist's knowledge via the module_query tool

Defining a Specialist¶

Create a package with an entry point that exports a specialist definition:

import { defineSpecialist } from "@otterbot/shared";

export default defineSpecialist({
  manifest: {
    id: "my-specialist",
    name: "My Specialist",
    version: "1.0.0",
    description: "Monitors an external data source",
  },
  configSchema: { /* typed settings */ },
  agent: { /* optional AI agent config */ },
  tools: [ /* custom tools */ ],
  triggers: [ /* poll and/or webhook triggers */ ],
  migrations: [ /* database schema versions */ ],
  onPoll: async (ctx) => { /* fetch and return items */ },
  onWebhook: async (req, ctx) => { /* handle webhook events */ },
  onQuery: async (query, ctx) => { /* custom search logic */ },
  onLoad: async (ctx) => { /* startup logic */ },
  onUnload: async (ctx) => { /* cleanup logic */ },
});

Example: GitHub Discussions¶

The built-in github-discussions specialist demonstrates the full system:

Polls GitHub's GraphQL API every 5 minutes for new discussions
Indexes discussions and comments into its knowledge store with structured metadata
Handles webhooks from GitHub for real-time updates on discussion/comment events
Exposes a search_discussions tool for structured filtering by category, author, and answered status
Provides an AI agent that synthesizes answers with discussion numbers and URLs

MCP Integration¶

OtterBot supports the Model Context Protocol (MCP), letting you connect external MCP servers that provide additional tools for your agents.

Server Management¶

Add MCP servers — Configure stdio or SSE-based MCP servers with command, args, and environment variables
Start/stop servers — Manage MCP server lifecycles at runtime
Tool discovery — Automatically discover available tools from connected MCP servers
Tool filtering — Select which discovered tools to make available to agents
Security — Command validation, URL validation, and secret masking built in

Transport Types¶

Transport	Description
`stdio`	Runs the MCP server as a local child process (command + args)
`sse`	Connects to a remote MCP server via Server-Sent Events (HTTPS required)

MCP servers are configured through the Settings UI or REST API. Discovered tools appear alongside built-in tools and can be assigned to agent templates.

Demo Recording¶

OtterBot can record video demos of running web applications with optional voiceover narration using the demo_record tool and the Demo Recorder agent template.

Playwright video capture -- Records browser sessions navigating your web application
TTS voiceover -- Optional text-to-speech narration overlaid on the recording
Dev server management -- Start and stop development servers as part of the recording flow
FFmpeg post-processing -- Produces YouTube-ready MP4 videos with configurable resolution (720p or 1080p)
Scripted demos -- Run a JSON demo script for repeatable, automated recordings

3D Visibility Toggle¶

Each project has a show3d setting that controls whether the project's agents appear in the 3D Live View. Toggle it via the project:set-show3d Socket.IO event to keep the Live View focused on active work.

Code Review Pipeline¶

OtterBot includes an automated code review pipeline that orchestrates multi-stage review and implementation workflows for pull requests.

Pipeline Stages¶

Stage	Description
`Triage`	LLM-based classification of the issue or PR
`Coder`	Creates a feature branch, implements the solution, and commits changes
`Security Reviewer`	Audits the implementation for vulnerabilities and security risks. Can kick back to the Coder for fixes
`Tester`	Writes and runs tests to validate the implementation
`Code Reviewer`	Reviews code quality and correctness, then creates the pull request

The pipeline integrates with the Merge Queue for end-to-end automation: issues flow through triage and implementation, PRs are opened automatically, and validated changes are queued for merge.

Merge Queue¶

The merge queue automates the process of merging approved pull requests safely and sequentially.

Merge Flow¶

Queued — PR is approved and added to the merge queue
Rebasing — Automatically rebased onto the target branch
Re-review — Optional pipeline check after rebase to verify changes still pass
Merging — PR is merged after all checks pass

Features include automatic conflict detection, sequential processing to prevent race conditions, position-based queue reordering, and real-time status updates via Socket.IO events.

World Layout¶

The World Layout system manages zones for 3D scene organization in the Live View.

Create zones -- Define named areas in the 3D world with position and size
Manage zones -- Update or delete zones via the API
Agent placement -- Agents can be positioned within zones for spatial organization