---
title: Give Hermes Agent a Voice with ElevenLabs
summary: >-
  Hermes Agent ships with no voice by default. This guide adds one with
  ElevenLabs — Text to Speech for its replies and Speech to Text (Scribe) for
  transcribing what you say — both as simple provider config in Hermes.
author: ElevenLabs Developers
authorUrl: 'https://x.com/ElevenLabsDevs'
category: Integrations
difficulty: Beginner
readingTime: 5
date: '2026-06-25'
tags:
  - voice
  - text-to-speech
  - speech-to-text
  - elevenlabs
  - scribe
  - config
integrations:
  - Hermes Agent
  - ElevenLabs
  - Telegram
  - Discord
  - WhatsApp
  - Slack
  - Signal
---

## Why give Hermes a voice

Hermes Agent runs in your terminal, in messaging apps, and on your phone. By default it has no voice. This guide walks you through how to add one: ElevenLabs **Text to Speech** for its replies, and **Speech to Text** for transcribing what you say. Both are provider config in Hermes — no custom scripts required.

The end result: you speak, Hermes hears you with Scribe, thinks, and answers back in your chosen ElevenLabs voice.

## Setup

Get an API key from the ElevenLabs dashboard and add it to `~/.hermes/.env`:

```shell
ELEVENLABS_API_KEY=your_key_here
```

If the ElevenLabs dependency is missing, install the premium TTS extra into the Hermes environment:

```shell
pip install "hermes-agent[tts-premium]"
```

## Easy setup (let Hermes do it)

Hermes is built to use your machine. To turn on ElevenLabs Text to Speech and Speech to Text, you can simply ask Hermes to configure it for you. Hermes has built-in skills for this and it's quite reliable:

```plaintext
Set ElevenLabs as the voice mode for both TTS and STT. I have already added the API Key into .hermes/.env.
```

The manual steps below do the same thing — they're worth reading because they show how Hermes configuration works under the hood.

## Text to Speech (manual)

Run the setup wizard and pick ElevenLabs at the voice step:

```shell
hermes setup
```

Or edit `~/.hermes/config.yaml` directly:

```yaml
tts:
  provider: "elevenlabs"
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"  # any voice from your library
    model_id: "eleven_flash_v2_5"     # ~75ms, built for real-time
```

`voice_id` is the voice — choose one from the voice library or use a clone. `model_id` defines which model to use: `eleven_flash_v2_5` is a good choice for live conversation (~75ms), while `eleven_multilingual_v2` is a good general-purpose default. Hermes chooses the audio format from the output path.

Restart Hermes after changing config. In the gateway, use:

```plaintext
/restart
```

In the CLI, exit and relaunch Hermes. Then enable voice output with:

```plaintext
/voice on
/voice tts
```

## Speech to Text (manual)

ElevenLabs **Scribe** is a built-in Hermes STT provider. You do not need to create a custom transcription script or register a command provider.

Add this to `~/.hermes/config.yaml`:

```yaml
stt:
  enabled: true
  provider: elevenlabs
  elevenlabs:
    model_id: scribe_v2
    language_code: ""        # optional; leave blank for auto-detect
    tag_audio_events: false
    diarize: false
```

That is enough. Hermes writes incoming audio to a temporary file, sends it to the ElevenLabs `/speech-to-text` API, and uses the returned transcript. Voice messages on **Telegram, Discord, WhatsApp, Slack, and Signal** will use Scribe once the gateway has restarted.

To force a language, set `language_code`, for example:

```yaml
stt:
  enabled: true
  provider: elevenlabs
  elevenlabs:
    model_id: scribe_v2
    language_code: eng
```

For names, product terms, and libraries that Scribe commonly mishears, check the ElevenLabs Speech to Text docs for the latest prompting and model options supported by the API.

## Done

Speak, and Hermes hears you with Scribe, thinks, and answers in your ElevenLabs voice. Change the voice at any time by picking a new `voice_id`.
