Structured logging to OpenSearch with session IDs and an MCP server for AI-assisted debugging
The problem: debugging across three runtimes
Picture the scene: a pro taps "Create Estimate" in the app. Nothing happens. The error report says "something went wrong." You have three separate log streams — Flutter (Dart), Spring Boot (Java), Python — with no correlation between them. The failure could be anywhere in the chain.
That was our debugging experience before we built this. HomeGuild runs three distinct runtimes in production. Flutter powers the mobile apps. Vulcan (Spring Boot) handles the GraphQL API layer. Minerva (Python/LangGraph) runs the AI agents. When something breaks across that chain, console logging is useless. Worse, our hosting on DigitalOcean scrapes stdout for its own dashboard, so anything you write to console shows up twice — once in your log aggregator and once in DigitalOcean's UI. Duplicate noise makes finding actual problems harder, not easier.
We needed structured, searchable, correlated logs across all three runtimes. And we needed them to not touch the console at all.
Architecture: direct-to-OpenSearch
The core decision was simple: logs go directly to OpenSearch via HTTP POST. They never hit stdout. This eliminates the duplicate log problem entirely and gives us a single, queryable log store across all three runtimes.
The pipeline:
- Flutter (Dart):
StructuredLogger→OpenSearchLogger→ HTTP POST - Spring Boot (Java):
StructuredLogger→OpenSearchClient→ HTTP POST - Python (Minerva): Structured JSON logger → HTTP POST
Every runtime follows the same pattern: structured log entries are queued in memory, then flushed in batches — every 10 seconds or every 100 entries, whichever comes first. All logging is fire-and-forget. A failed log shipment never blocks application execution. Logging should have zero performance impact on user experience.
Standard Dart logging pattern:
import 'package:logging/logging.dart';
class EstimateService {
static final Logger _logger = Logger('EstimateService');
Future<Estimate> createEstimate(EstimateInput input) async {
try {
_logger.info('Creating estimate for customer ${input.customerId}');
final estimate = await _api.createEstimate(input);
_logger.info('Estimate created: ${estimate.id}');
return estimate;
} catch (e, stackTrace) {
_logger.severe('Estimate creation failed: $e', e, stackTrace);
rethrow;
}
}
}
That single _logger.severe() call ships a structured log entry to OpenSearch and creates a Sentry error event with the full stack trace. One line, two destinations.
Log level routing:
| Level | Destination | Use case |
|---|---|---|
fine() | Local only | Debug detail — API URLs, parameter values |
info() | OpenSearch only | Operational events |
warning() | OpenSearch + Sentry breadcrumb | Issues that block functionality |
severe() | OpenSearch + Sentry event | Exceptions with stack traces |
Only severe creates a Sentry event. Warnings attach as breadcrumbs — context without per-event cost. Moving from warning-level Sentry events to severe-only cut our bill significantly while improving signal quality.
The session ID: the single most valuable field
Every log line across every runtime is tagged with a sessionId — a UUID generated at app launch in Flutter or at request start in the backend services.
One field, one query:
sessionId: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
Returns every log entry from that user's session, in chronological order, across Flutter, Vulcan, and Minerva. Button tap, API call, database query, agent invocation, response — all in one view.
The session ID flows through the system via standard mechanisms:
- Flutter generates the UUID at app launch, attaches it to every log entry
- HTTP requests carry it in a custom header (
X-Session-Id) - Vulcan extracts it from the header, uses it for its own logs, forwards it downstream
- Minerva receives it via gRPC metadata, tags its agent execution logs
Implementation cost: maybe an hour across all three runtimes. If you take one thing from this post, add a session ID to your log entries.
The MCP server: querying logs from Claude
We built an MCP server that exposes OpenSearch queries as tools. From inside a Claude conversation — via Claude Code or any MCP-compatible client — we can ask Claude to search production logs directly.
Us: "Show me the logs for the session where estimate creation failed yesterday around 3pm"
Claude: queries OpenSearch → "Session
a1b2c3d4...had an estimate creation failure at 3:12pm. Null pointer inLineItemCalculator.computeTotal()—taxRatewas null because the tax item lookup returned an empty array. The pro's account has no tax items configured."
The MCP server exposes three tools:
- search_logs — full-text and structured queries with time range filtering
- get_session — all log entries for a session ID, ordered chronologically
- get_errors — recent
severeandwarningentries, optionally filtered by service
What makes this genuinely valuable is cross-context validation. Claude has access to both the codebase (via file reading) and runtime behavior (via the log MCP). It can correlate what the code says it should do with what it actually did — catching an entire class of bugs invisible when you look at only one or the other.
Before this, debugging meant: open OpenSearch, write a query, scan results, form a hypothesis, repeat. Now we describe the problem and Claude runs the loop with access to both logs and codebase simultaneously.
What we learned
Structured logging compounds. Every new feature gets correlated, searchable logging automatically. Zero additional work per service.
Session IDs are cheap to add and impossibly valuable later. An hour of implementation. Incalculable debugging value.
Direct-to-OpenSearch eliminates noise. No duplicate logs, no parsing inconsistencies. The logs in OpenSearch are exactly what we shipped.
The MCP server turned debugging into conversation. The compounding value is cross-context validation — code and runtime behavior in the same reasoning loop.
The full infrastructure — Dart StructuredLogger to OpenSearch MCP server — is maybe 2,000 lines across all three runtimes. One of the highest-leverage investments we have made in the platform.