Skip to content

Features Guide

A deep dive into OtterBot's capabilities -- from voice interaction and 3D visualization to web search, browser automation, and project management.

Voice (TTS & STT)

OtterBot supports both text-to-speech (reading responses aloud) and speech-to-text (voice input). Both can run locally without any API keys.

Text-to-Speech Providers

Kokoro

Local

High-quality local TTS using the Kokoro 82M ONNX model. Runs entirely on CPU inside the container. Default voice: af_heart.

Edge TTS

Free Cloud

Microsoft's neural TTS voices. Free, no API key needed. Wide variety of voices and languages available.

OpenAI-Compatible

API Key

Any provider with an OpenAI-compatible TTS endpoint (/v1/audio/speech). Configure the base URL and optional API key.

Deepgram

API Key

Deepgram's Aura neural TTS voices. Requires a Deepgram API key. Voices include asteria, luna, stella, athena, hera, orion, arcas, perseus, angus, orpheus, helios, and zeus.

TTS Configuration

Setting Key Description Default
tts:enabled Enable or disable TTS false
tts:active_provider Active provider: kokoro, edge-tts, openai-compatible, or deepgram kokoro
tts:voice Voice name af_heart
tts:speed Playback speed (0.5 -- 2.0) 1
tts:openai-compatible:base_url Base URL for OpenAI-compatible provider --
tts:openai-compatible:api_key API key (optional) --
tts:deepgram:api_key Deepgram API key --

Speech-to-Text Providers

Whisper Local

Local

OpenAI Whisper ONNX model running locally. Default model: onnx-community/whisper-base. Configurable model ID.

OpenAI-Compatible

API Key

Any provider with an OpenAI-compatible transcription endpoint (/v1/audio/transcriptions).

Browser Web Speech API

Built-in

Uses the browser's native Web Speech API for speech recognition. No setup or API key required — works directly in supported browsers (Chrome, Edge).

Deepgram

API Key

Deepgram's real-time speech recognition API. Requires a Deepgram API key.

STT Configuration

Setting Key Description Default
stt:enabled Enable or disable STT false
stt:active_provider Active provider: whisper-local, openai-compatible, browser, or deepgram whisper-local
stt:whisper:model_id HuggingFace model ID for local Whisper onnx-community/whisper-base
stt:openai-compatible:base_url Base URL for OpenAI-compatible provider --
stt:openai-compatible:api_key API key (optional) --
stt:deepgram:api_key Deepgram API key --

3D Live View

The Live View renders your agents as animated 3D characters in a virtual scene using Three.js and React Three Fiber. Each agent can have its own character model with idle and action animations.

Model Packs

Character models are GLB files organized into model packs. OtterBot auto-discovers packs from the assets/workers/ directory:

assets/workers/
+-- MyCharacter/
|  +-- characters/
|  |  +-- character.glb        # 3D character model
|  +-- Animations/
|  |  +-- gltf/
|  |     +-- Rig_Medium/        # Medium rig animations
|  |     +-- Rig_Large/         # Large rig animations
|  +-- artwork.png              # Thumbnail

Each model pack includes idle and action animations. Agents assigned a model pack will appear as that character in the Live View, with animations reflecting their current status.

Environment Packs

Environment models (GLTF) are loaded from assets/environments/ and define the 3D scene backdrop. Multiple environment assets can be placed in a scene.

Scene Configurations

JSON files in assets/scenes/ define camera position, lighting, and layout. The Room Builder (below) lets you create and edit scene configurations visually.

Web Search Providers

Agents with the web_search tool can search the web. OtterBot supports four search providers:

DuckDuckGo

Free -- No API Key

Default provider. Searches via DuckDuckGo HTML interface. No configuration needed.

API Key

Brave's independent search index. Requires a Brave Search API key. Config: search:brave:api_key.

Tavily

API Key

AI-optimized search API designed for LLM agents. Requires a Tavily API key. Config: search:tavily:api_key.

SearXNG

Self-Hosted

Privacy-respecting metasearch engine. Connect to your own SearXNG instance. Config: search:searxng:base_url.

Set the active provider with the search:active_provider setting. All providers return results in a unified format with title, URL, and snippet.

Browser Automation

Agents with the web_browse tool can navigate web pages, extract content, and interact with websites using Playwright. A browser pool manages Chromium instances efficiently.

  • Navigate to URLs and extract page content as text
  • Managed browser pool prevents resource exhaustion
  • Runs headless Chromium inside the Docker container
  • Browser binaries path configurable via PLAYWRIGHT_BROWSERS_PATH

Info

Researcher agents have both web_search and web_browse -- search for results, then browse specific pages for detailed information.

Project Management & Kanban

OtterBot organizes complex work into Projects, each with a Kanban board for task tracking. Projects are created by the COO when you give it a multi-step goal.

Project Lifecycle

  • Creation -- COO creates a project with a name and description
  • Charter Gathering -- COO writes a project charter (specification/requirements)
  • Charter Finalized -- Charter is locked, Team Lead is spawned
  • Execution -- Team Lead creates Kanban tasks and assigns workers
  • Completion -- Project status set to completed, failed, or cancelled

Kanban Board

Each project has a Kanban board with three columns:

Column Description
backlog Tasks planned but not yet started
in_progress Tasks currently being worked on by an agent
done Completed tasks with optional completion reports

Tasks can have labels, blocking dependencies, and agent assignments. Workers automatically move tasks to in_progress when they start and done when they finish. The frontend displays the Kanban board in real-time via Socket.IO events.

Desktop Environment

OtterBot includes an optional full XFCE desktop environment running inside the container, accessible via noVNC in your browser. This gives agents (and you) access to a complete Linux desktop with GUI applications.

What's Included

  • XFCE4 -- Lightweight desktop environment
  • File manager -- Browse the workspace filesystem
  • Terminal -- Full shell access
  • Web browser -- Chromium for agents' browser automation
  • noVNC -- Browser-based VNC client (no VNC client needed)

Configuration

Variable Description Default
ENABLE_DESKTOP Enable or disable the virtual desktop true
DESKTOP_RESOLUTION Screen resolution (WxHxDepth) 1280x720x24
VNC_PORT Internal VNC server port 5900
SUDO_MODE Privilege level: restricted (safe) or full restricted

The desktop status can be checked via the GET /api/desktop/status endpoint. noVNC files are served directly from the server at /novnc/*.

Room Builder

The Room Builder is a visual 3D scene editor in the web UI. It lets you place environment assets, position the camera, and configure the Live View scene without editing JSON files.

  • Drag and drop environment assets from available packs
  • Position, rotate, and scale objects in 3D space
  • Configure lighting and camera angles
  • Save scene configurations for the Live View

Package Management

Agents with the install_package tool can install software packages inside the container. This supports both:

  • apt packages -- System-level packages (e.g., build tools, libraries)
  • npm packages -- Node.js packages installed globally to /otterbot/tools

Installed packages can be listed via the GET /api/packages REST endpoint. The NPM_CONFIG_PREFIX environment variable controls where npm global packages are installed.

Tip

Ephemeral by default: Packages installed inside the container are lost when the container stops. Mount a volume to persist installed tools across sessions.

Admin Assistant

The Admin Assistant is a persistent personal productivity agent that handles everyday tasks alongside the COO. It reports directly to the CEO (you) and manages:

  • Todos -- Create, list, update, and complete personal todo items with priorities and due dates
  • Email -- Read, search, send, reply, archive, and label Gmail messages
  • Calendar -- Create, update, delete, and list Google Calendar events across multiple calendars

The Admin Assistant runs as its own agent role (admin_assistant) with a dedicated system prompt optimized for productivity tasks. It can save memories to remember your preferences across sessions.

Gmail Integration

OtterBot integrates with Gmail via Google OAuth, giving the Admin Assistant full email capabilities.

Capability Description
Read messages Fetch and read email messages by ID or search query
Send messages Compose and send new emails
Reply Reply to existing email threads
Archive Archive messages to remove them from the inbox
Label Apply or remove Gmail labels for organization
List labels Retrieve all available Gmail labels

Gmail access requires configuring Google OAuth credentials in the Settings UI. All email operations go through Google's official API.

Google Calendar

The Admin Assistant can manage your Google Calendar, creating and organizing events through natural language requests.

  • List calendars -- See all available calendars on your account
  • List events -- Query events by date range or calendar
  • Create events -- Schedule new events with title, time, location, and description
  • Update events -- Modify existing event details
  • Delete events -- Remove events from your calendar

Calendar access shares the same Google OAuth credentials as Gmail.

Custom Tools

OtterBot lets you create custom JavaScript tools that agents can use at runtime. Tools are sandboxed using isolated-vm for security.

  • Create tools manually -- Write a tool with a name, description, JSON schema for parameters, and JavaScript implementation
  • AI-generate tools -- Describe what you need and OtterBot will generate the tool code for you
  • Sandboxed execution -- Tools run in an isolated V8 context with controlled access to fetch and environment variables
  • Tool examples -- Browse built-in example tools for inspiration

Custom tools appear in the /api/tools/available endpoint and can be assigned to any agent template.

Skills System

Skills are reusable markdown prompt fragments that can be attached to agents to give them specialized knowledge or behavior patterns.

  • Create skills -- Write markdown content that gets injected into agent system prompts
  • Clone skills -- Duplicate existing skills as a starting point
  • Import/export -- Share skills between OtterBot instances as JSON
  • Scan -- Auto-discover skills from a directory structure

Memory & Soul

OtterBot's memory system gives agents persistent knowledge that carries across conversations.

Episodic Memories

Agents can save important observations, decisions, and learnings as episodic memories. These are stored with vector embeddings and retrieved via semantic search when relevant to the current conversation.

  • Memories are automatically extracted from significant conversations
  • A memory compactor consolidates similar memories to prevent bloat
  • Memories can be listed, saved, and managed via the API and Socket.IO

Soul Documents

Soul documents define an agent's identity, values, and long-term behavioral guidelines. They provide a stable foundation that persists regardless of conversation context.

  • Create and edit soul documents through the API
  • A soul advisor injects relevant identity context into agent prompts
  • Useful for defining company culture, communication style, or domain expertise

Coding Agents

OtterBot can delegate coding tasks to external coding agent CLIs, running them in managed PTY (pseudo-terminal) sessions.

OpenCode

Open Source

Open-source coding assistant. Configure the CLI path in Settings.

Claude Code

API Key

Anthropic's CLI coding agent. Requires Claude Code CLI installed and an Anthropic API key.

Codex

API Key

OpenAI's CLI coding agent. Requires Codex CLI installed and an OpenAI API key.

Gemini CLI

API Key or OAuth

Google's CLI coding agent. Requires Gemini CLI installed and a Google API key, or OAuth via gemini login.

  • PTY sessions -- Each coding agent runs in its own terminal, streamed to the UI in real-time
  • Permission handling -- File write and shell command permissions are relayed to the user for approval
  • File diffs -- Changes made by the coding agent are captured and displayed
  • Session management -- Start, monitor, and stop coding agent sessions via Socket.IO

Messaging Integrations

OtterBot can bridge conversations to fifteen external messaging platforms, letting you chat with your agents from anywhere.

Discord

Bot Token

Full Discord bot integration. Configure a bot token and channel in Settings.

Slack

App Token

Slack workspace app via Bolt framework. Requires app and bot tokens. Supports threaded conversations.

Matrix

Self-Hosted

Decentralized chat protocol with end-to-end encryption (E2EE) support. Connect to any Matrix homeserver.

IRC

Open Protocol

Classic IRC networks. Configure server, channel, and nickname.

Microsoft Teams

Webhook

Teams channel integration via incoming/outgoing webhooks.

Telegram

Bot Token

Telegram bot integration. Create a bot via BotFather and configure the token in Settings.

WhatsApp

Bridge

WhatsApp messaging bridge integration.

Signal

Bridge

Signal messenger bridge for secure messaging.

Mattermost

Webhook/Bot

Mattermost team chat integration.

Nextcloud Talk

Integration

Nextcloud Talk chat integration.

Tlon

Integration

Tlon (Urbit) communication platform integration.

Bluesky

Bot Token

AT Protocol integration for the Bluesky social network. Configure a bot token in Settings. Supports pairing approval flow. New in v0.16.0.

Google Chat

Google Workspace

Google Workspace integration with webhook support. Pair your Google Chat space through the Settings UI.

Mastodon

ActivityPub

Fediverse integration via the ActivityPub protocol. Connect to any Mastodon-compatible instance. Supports pairing approval flow.

Email (IMAP/SMTP)

Generic Email

Generic email bridge via IMAP/SMTP. Separate from the Gmail OAuth integration — connect to any email provider that supports IMAP and SMTP. Configure server addresses, ports, and credentials in Settings.

Each bridge relays messages bidirectionally between the external platform and the COO. Configure credentials and channels through the Settings UI.

Scheduled Tasks

OtterBot supports custom recurring tasks that run automatically on configurable intervals. Use these for:

  • Daily standup summaries
  • Periodic monitoring or health checks
  • Automated report generation
  • Regular data collection tasks

Scheduled tasks are managed through the API with configurable intervals and can be enabled or disabled individually.

Usage Analytics

OtterBot tracks token usage and costs across all LLM interactions, giving you visibility into resource consumption.

View Description
Summary Total tokens and estimated costs across all models
Recent Recent usage entries with timestamps
By Model Token usage broken down by LLM model
By Agent Token usage broken down by agent instance

Backup & Restore

OtterBot supports full database backup and restore through the Settings UI and API.

  • Backup -- Download a complete backup of the encrypted database
  • Restore -- Upload a previous backup to restore all data (settings, conversations, agents, projects)

GitHub Integration

OtterBot includes built-in GitHub integration for managing repositories and monitoring activity.

  • SSH key management -- Generate, import, test, and delete SSH keys for Git authentication. Per-key usage selector (auth, signing, or both).
  • Multi-account support -- CRUD for multiple GitHub accounts, each with their own credentials and SSH keys. Per-project account assignment with a resolution chain: project-specific account → default account → legacy configuration.
  • Issue monitoring -- Track GitHub issues and discussions. Automatically create OtterBot projects from GitHub issues.
  • Fork mode -- Fork repositories, create cross-fork pull requests, and sync upstream changes. Useful for contributing to repositories you don't have push access to.
  • Commit signing -- Per-project toggle for SSH-based commit signing.
  • GitHub CLI -- The gh CLI is pre-installed for agents to use

Gitea Integration

OtterBot supports Gitea as an alternative to GitHub for self-hosted Git workflows.

  • Account management -- CRUD for Gitea accounts with per-project assignment
  • Issue monitoring -- Track Gitea issues and automatically create OtterBot projects
  • PR monitoring -- Monitor pull request activity on Gitea repositories

Specialist Agents

Specialist Agents are OtterBot's primary extension point for connecting to external data sources. Each specialist is an autonomous agent with its own isolated knowledge store, data ingestion pipeline, configuration, and custom tools.

Info

Specialists were formerly called "modules." The internal implementation still uses module naming for backward compatibility — defineSpecialist() and defineModule() are aliases.

Capabilities

Capability Description
Knowledge Store Isolated SQLite database with hybrid FTS5 + vector search per specialist
Data Ingestion Automated polling on configurable intervals and/or webhook listeners
Custom Tools Specialist-specific tools available to the specialist's agent
AI Agent Optional reasoning layer — an LLM agent that can query and synthesize from the knowledge store
Config Schema Typed settings (string, number, boolean, secret, select) managed through the Settings UI
Migrations Versioned database schema migrations for custom tables

Knowledge Store

Each specialist gets its own SQLite database with hybrid search combining:

  • FTS5 full-text search for keyword matching (BM25 ranking)
  • Vector embeddings for semantic similarity (cosine similarity)
  • Reciprocal rank fusion to merge both result sets
// Available via ctx.knowledge in all handlers
ctx.knowledge.upsert("doc-1", "Document content", { url: "..." });
const results = await ctx.knowledge.search("query", 10);
const doc = ctx.knowledge.get("doc-1");
const count = ctx.knowledge.count();

Triggers & Data Pipeline

Specialists ingest data through two trigger types:

Poll Trigger

Runs on a configurable interval. Items returned from the onPoll handler are automatically upserted into the knowledge store.

triggers: [
  { type: "poll", intervalMs: 300_000, minIntervalMs: 60_000 }  // 5 min poll, 1 min minimum
]

A onFullSync handler can also be defined for comprehensive reindexing of all data.

Webhook Trigger

Registers a webhook endpoint at POST /api/modules/:moduleId/webhook. Supports GitHub signature verification (X-Hub-Signature-256) and secret-based auth.

triggers: [
  { type: "webhook", path: "/webhook" }
]

Custom Tools

Specialists can expose tools to their agent. Every specialist agent also automatically receives a knowledge_search tool for hybrid search over its knowledge store.

tools: [
  {
    name: "search_discussions",
    description: "Search discussions with filters",
    parameters: {
      query: { type: "string", description: "Search text" },
      category: { type: "string", description: "Filter by category" },
    },
    async execute(args, ctx) {
      // Can use raw DB access: ctx.knowledge.db.prepare(sql)
      return JSON.stringify(results);
    },
  },
]

Agent Configuration

Specialists can declare an AI agent that reasons over the indexed knowledge:

agent: {
  defaultName: "Discussions Agent",
  defaultPrompt: "You are a GitHub Discussions specialist...",
  defaultModel: "claude-sonnet-4-5-20250929",  // optional
  defaultProvider: "anthropic",                  // optional
}

Agent behavior is controlled by a posting mode:

Mode Behavior
respond Always respond to queries (default)
lurk Index only — respond only to direct queries from COO/CEO
new_chats Respond to new conversations, then lurk
permission Ask COO for permission before responding

Installation

Specialists can be installed from three sources:

Source Description
Git Cloned from a GitHub repository, built with pnpm install && pnpm build
npm Installed from an npm registry
Local Symlinked from a local path (for development)

Install via the REST API, the COO's module_install tool, or the Settings UI. Multiple instances of the same specialist type can be installed with different configurations.

Management

  • Enable/disable -- Toggle specialists without uninstalling
  • Configure -- Update settings through the UI (config schema fields appear automatically)
  • Reload -- Unload and reload a specialist to pick up changes
  • Query -- The COO can query any specialist's knowledge via the module_query tool

Defining a Specialist

Create a package with an entry point that exports a specialist definition:

import { defineSpecialist } from "@otterbot/shared";

export default defineSpecialist({
  manifest: {
    id: "my-specialist",
    name: "My Specialist",
    version: "1.0.0",
    description: "Monitors an external data source",
  },
  configSchema: { /* typed settings */ },
  agent: { /* optional AI agent config */ },
  tools: [ /* custom tools */ ],
  triggers: [ /* poll and/or webhook triggers */ ],
  migrations: [ /* database schema versions */ ],
  onPoll: async (ctx) => { /* fetch and return items */ },
  onWebhook: async (req, ctx) => { /* handle webhook events */ },
  onQuery: async (query, ctx) => { /* custom search logic */ },
  onLoad: async (ctx) => { /* startup logic */ },
  onUnload: async (ctx) => { /* cleanup logic */ },
});

Example: GitHub Discussions

The built-in github-discussions specialist demonstrates the full system:

  • Polls GitHub's GraphQL API every 5 minutes for new discussions
  • Indexes discussions and comments into its knowledge store with structured metadata
  • Handles webhooks from GitHub for real-time updates on discussion/comment events
  • Exposes a search_discussions tool for structured filtering by category, author, and answered status
  • Provides an AI agent that synthesizes answers with discussion numbers and URLs

MCP Integration

OtterBot supports the Model Context Protocol (MCP), letting you connect external MCP servers that provide additional tools for your agents.

Server Management

  • Add MCP servers — Configure stdio or SSE-based MCP servers with command, args, and environment variables
  • Start/stop servers — Manage MCP server lifecycles at runtime
  • Tool discovery — Automatically discover available tools from connected MCP servers
  • Tool filtering — Select which discovered tools to make available to agents
  • Security — Command validation, URL validation, and secret masking built in

Transport Types

Transport Description
stdio Runs the MCP server as a local child process (command + args)
sse Connects to a remote MCP server via Server-Sent Events (HTTPS required)

MCP servers are configured through the Settings UI or REST API. Discovered tools appear alongside built-in tools and can be assigned to agent templates.

Demo Recording

OtterBot can record video demos of running web applications with optional voiceover narration using the demo_record tool and the Demo Recorder agent template.

  • Playwright video capture -- Records browser sessions navigating your web application
  • TTS voiceover -- Optional text-to-speech narration overlaid on the recording
  • Dev server management -- Start and stop development servers as part of the recording flow
  • FFmpeg post-processing -- Produces YouTube-ready MP4 videos with configurable resolution (720p or 1080p)
  • Scripted demos -- Run a JSON demo script for repeatable, automated recordings

3D Visibility Toggle

Each project has a show3d setting that controls whether the project's agents appear in the 3D Live View. Toggle it via the project:set-show3d Socket.IO event to keep the Live View focused on active work.

Code Review Pipeline

OtterBot includes an automated code review pipeline that orchestrates multi-stage review and implementation workflows for pull requests.

Pipeline Stages

Stage Description
Triage LLM-based classification of the issue or PR
Coder Creates a feature branch, implements the solution, and commits changes
Security Reviewer Audits the implementation for vulnerabilities and security risks. Can kick back to the Coder for fixes
Tester Writes and runs tests to validate the implementation
Code Reviewer Reviews code quality and correctness, then creates the pull request

The pipeline integrates with the Merge Queue for end-to-end automation: issues flow through triage and implementation, PRs are opened automatically, and validated changes are queued for merge.

Merge Queue

The merge queue automates the process of merging approved pull requests safely and sequentially.

Merge Flow

  • Queued — PR is approved and added to the merge queue
  • Rebasing — Automatically rebased onto the target branch
  • Re-review — Optional pipeline check after rebase to verify changes still pass
  • Merging — PR is merged after all checks pass

Features include automatic conflict detection, sequential processing to prevent race conditions, position-based queue reordering, and real-time status updates via Socket.IO events.

World Layout

The World Layout system manages zones for 3D scene organization in the Live View.

  • Create zones -- Define named areas in the 3D world with position and size
  • Manage zones -- Update or delete zones via the API
  • Agent placement -- Agents can be positioned within zones for spatial organization