I run three Hermes Agent gateways on a machine in my house, each wired to a different Telegram bot. They manage my blog, watch my NVR cameras, deploy Cloudflare Workers for a market-data app, and handle the ordinary work of keeping a dozen services running.
Me working on my telegram bots circa 3000 BC
The architecture
Hermes Agent gateway architecture diagram
tl;dr: DeepSeek v4 Flash (API, 1M) + Qwen 3.6 27B (local, 90k). /model to switch. Three isolated profiles as systemd user services, each with its own Telegram bot.
Three Hermes gateway instances run as systemd user services. Each one has its own profile — its own config, sessions, memory, skill set, and Telegram bot token — so they are fully independent and can run different models simultaneously.
My daily driver is DeepSeek v4 Flash served through the API — fast enough for conversation, large context window (1M tokens), and cheap enough that I do not think about cost. When I want to test or iterate on something that does not need an API call, I switch to Qwen 3.6 27B running on a local machine at 192.168.0.78:4000 through an OpenAI-compatible proxy. Switching models is a single command — /model in the chat — and the provider abstraction means I never think about which backend is handling the request.
The three profiles exist because different contexts want different isolation. My main gateway handles general work: coding, writing, research. The second profile handles operational tasks — managing downloads, checking on the Frigate NVR. The third runs more experimental setups. They all share the same host machine, same skill library, same memory persistence, but they never interfere with each other.
The skills pipeline
Hermes Agent skills pipeline as a Greek fresco
tl;dr: ~100 skills accumulated over months. Markdown files with YAML frontmatter. Curator auto-archives stale ones, backs up before pruning. Shared across all profiles.
The most transformative part of Hermes is the skill system. Every time I solve something that took more than a few turns — a tricky debugging session, a deployment workflow, a configuration puzzle — I save the approach as a skill. Skills are just markdown files with structured frontmatter that the agent loads next time it encounters a similar task.
After months of this, I have accumulated over a hundred skills. The curator — a built-in background process — handles the lifecycle: it tracks which skills get used, marks stale ones for archiving, and keeps backups in case something needs to be restored. I do not manage the skill list. The system manages itself.
This is the part that changes how the agent actually works. Fresh out of the box, a new Hermes install is a capable general assistant. After a hundred skills, it is an assistant that knows your specific setup: which port your local model runs on, how your blog is deployed, the naming conventions you prefer, the things you always want it to avoid. The skills accumulate context that no generic prompt engineering can provide.
The practical loop
Hermes Agent practical loop diagram
Hermes Agent practical loop as a Greek fresco
tl;dr: 5-step loop: outcome → constraints → inspect → diff → tighten. Applies to blog posts, server ops, market data apps, multi-agent delegate_task with 3 concurrent subagents, cron-jobs.
The loop I keep coming back to has not changed much from the basics:
- Give it the outcome.
- State the constraints.
- Let it inspect the project.
- Review the diff.
- Tighten the brief and repeat.
What has changed is the scope. That loop now applies to:
Content and the blog. Writing and editing posts like this one, managing metadata, regenerating the RSS feed, deploying to Vercel, checking that the build passes, fixing broken links — all done through Hermes on Telegram. I dictate voice messages, it handles the transcription, reads the existing post, makes edits, and verifies the result. The back-and-forth happens in chat.
System administration. When Frigate crashes (which it does), a watchdog script detects it and Hermes gets notified. I ask it to check logs, identify the failure mode, and propose a fix.
Market data application. The Rust Terminal project — a Cloudflare Workers app backed by D1 — serves 6,000+ SCMM market items with price tracking, trade events, and intel. Hermes built the initial deployment, migrated the database from SQLite to D1, wrote the cache layer, handles the GitHub workflow, and monitors the Cloudflare cron jobs that refresh trade data.
Code maintenance across projects. Small fixes, refactors, dependency updates, failing build triage. Same loop, different repos.
Multi-agent workflows. When a task is too large or has parallel workstreams, I use delegate_task to spawn subagents. They run in isolated contexts with their own terminal sessions, do the work, and return summaries. Three can run simultaneously. For longer-running tasks, I schedule cron jobs — self-contained prompts that run on a timer and deliver results back to Telegram.
Profiles and isolation
Hermes Agent isolated profiles as Greek temples
tl;dr: 6 isolated dimensions per profile: config, sessions, memory, skills, cron, bot token. Filesystem-level under ~/.hermes/profiles/. The experimental bot can't see blog creds.
The profile system is worth calling out because it solves a problem I was not expecting to have: once you have an agent that can touch real infrastructure, you do not necessarily want every request to have access to everything.
Each profile has its own:
- Config — model, provider, tool enablement, timeouts
- Session store — conversations stay in their own profile
- Memory — user profile and persistent notes are per-profile
- Skills — the skill library is shared across profiles, so knowledge accumulates everywhere
- Cron jobs — scheduled tasks are profile-scoped
- Telegram bot token — each gateway answers on its own bot
This means my secondary bot never sees my blog credentials, and the experimental bot cannot accidentally touch production infrastructure. The isolation is enforced at the filesystem level — each profile lives in its own directory under ~/.hermes/profiles/.
Where it fits in my day
tl;dr: Content (voice memo → draft → deploy), server ops (Frigate crash logs), market data (Rust Terminal, 6k+ SCMM items, D1, CF Workers cron).
The use cases that have become routine:
Content work. Turning a voice memo into a drafted post, editing, updating metadata, checking the build, deploying. The feedback loop makes writing lower friction, so I do more of it.
Operations. Checking service status, reading logs, diagnosing failures, summarising what changed overnight. I used to SSH into machines for this. Now I ask in a Telegram chat.
Market tracking. The Rust Terminal runs on Cloudflare Workers with a D1 database. Trade events stream in, prices refresh every minute, and the item database loads progressively. Hermes handles the deployment cycle and can answer questions about the data model or fix API issues.
System maintenance. Watchdogs for Frigate send alerts through Hermes.
Personal workflow. Drafting, note-taking, organising — the same loop applied to the boring middle of any project.
The boundaries
Hermes Agent boundaries and review as a Greek fresco
tl;dr: Diff review before every ship. .env for creds (never in chat). No auto-approve on destructive commands. Profile isolation as containment — one rogue skill can't touch others.
Hermes has enough access to be useful and enough supervision to stay safe. I review diffs before they ship. I check what changed before I accept it. I keep credentials in the .env file, not in conversation. I never let it approve its own destructive commands.
That is not distrust. It is the operating model that makes the rest work. The agent gets real leverage because it can touch real things. I stay in control because I see everything before it matters.
The same profile isolation that keeps my bots separate also means that if one profile's config gets corrupted or a skill goes rogue, the others are unaffected. Profiles are cheap to create and cheap to delete. I treat them the way I treat Docker containers — disposable, isolated, and purpose-built.
- OWASP Agentic AI threats and mitigations
- NIST Generative AI Profile
- Model Context Protocol authorization
Why it works
tl;dr: You end up with an assistant that knows your infra, your writing style, your common failure modes, and your preferences — because every fix you've made before is saved as a skill.
The reason this setup stuck is not that Hermes produces flawless work. It does not. The reason is that the system makes useful work cheaper to start, cheaper to revise, and cheaper to finish.
A blog post is no longer a whole ceremony. A server issue is no longer something I investigate alone. A deployment is no longer a checklist I run manually. A new project does not start from zero context — it starts with a hundred skills worth of accumulated knowledge about how I like things done.
The skills pipeline means the system gets better over time, not worse. Every hard-won fix becomes reusable knowledge. Every corrected mistake becomes a guardrail. The curator handles the cleanup automatically, so the collection does not rot.
That compounding effect is what makes the whole thing worth running. Fresh out of the box, any agent can write code. After months of use, an agent with persistent skills and memory knows your specific infrastructure, your preferences, your common failure modes, and your writing style. It has context that no generic model prompt can provide.
Hermes is one of the first tools I have used where that compounding actually happens.
What's next
Hermes Agent What's Next
tl;dr: Voice-first interaction, real-time data dashboards, on-device models. The pattern is set — now it scales.
The current setup works. The question is where to push next.
Voice as a primary channel. I already dictate voice memos that Hermes transcribes into posts. The next step is making voice the default input for operations — checking server status, triaging issues, asking questions while away from a keyboard. The transcription quality is already there. The habit is forming.
Real-time dashboards. Cron jobs deliver summaries, but what I actually want is a live view — trade events as they happen, server metrics that refresh without being asked. The agent can push data to a web view that stays open on a tablet. The plumbing exists. The front-end is the missing piece.
On-device models getting better. The local Qwen instance at 90k context handles a lot, but it is not fast enough for interactive use yet. As local models improve — better speed, larger context, smaller footprint — the balance shifts. Less API dependency, more privacy, lower latency. That trajectory is accelerating.
None of this changes the core loop. The loop — outcome, constraints, inspect, diff, tighten — stays the same. The scope just keeps widening.




