Add GenAI semantic-convention attributes to OpenTelemetryMiddleware#2970
Conversation
Set gen_ai.operation.name (execute_tool), gen_ai.tool.name, and
gen_ai.prompt.name per the MCP OTel semantic conventions, plus error.type
and rpc.response.status_code on failures (tool_error for a tools/call
result carrying is_error). Span name now follows {mcp.method.name} {target}.
There was a problem hiding this comment.
LGTM — telemetry-only additions following the MCP OTel semconv spec, well covered by new tests; the inline docstring nit is worth a quick follow-up but doesn't block.
Extended reasoning...
Overview
The PR is confined to src/mcp/server/_otel.py and its test file. It adds GenAI semantic-convention attributes (gen_ai.operation.name, gen_ai.tool.name, gen_ai.prompt.name, mcp.protocol.version, error.type, rpc.response.status_code) to the existing OpenTelemetryMiddleware and renames spans to the spec's {method} {target} format. No protocol, dispatch, or transport behavior changes — all exceptions are still re-raised and the handler result is returned unchanged; the middleware only annotates spans.
Security risks
None of substance. The PR explicitly excludes the sensitive opt-in attributes (tool arguments/results, prompt variables), so no client payload data is added to spans beyond the tool/prompt name, which is already in the span name. The ValidationError path keeps the sanitized status message. ctx.protocol_version is a required field on ServerRequestContext, so the new attribute is always a string.
Level of scrutiny
Moderate-to-low. This is observability-only code in an opt-in middleware that was added very recently (so the span-name format change is unlikely to break existing dashboards), and it tracks an external semconv spec. The structural match against CallToolResult / isError / is_error dict shapes is straightforward and tested for all three shapes.
Other factors
Test coverage is thorough: new tests for tool-error detection (model and both dict spellings), prompt-name attribute, gen_ai attribute exclusion on custom methods, and error.type assertions on all three failure paths. The single finding from the bug-hunting pass is a docstring overstatement (it claims rpc.response.status_code is set on all failures when only the MCPError branch sets it) — a documentation nit, not a behavioral defect, already flagged inline.
Align the ValidationError branch with the MCPError branch and the C# SDK: set error.type and rpc.response.status_code to the numeric JSON-RPC code (INVALID_PARAMS) as a string, not the exception class name.
There was a problem hiding this comment.
1 issue found across 2 files
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
Gate gen_ai.operation.name on the method, not the tool-name target, so a malformed tools/call still records execute_tool (cubic review). Tighten the docstring to describe the per-branch error.type/rpc.response.status_code behavior accurately (claude review).
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
src/mcp/server/_otel.py:72-77— The new{"is_error": True}match arm marks the span as a tool error (error.type="tool_error",StatusCode.ERROR), but the runner's result coercion (serialize_server_resultwithby_name=False,extra="ignore") silently drops the snake_caseis_errorkey, so the client actually receives a successful result with noisErrorfield — the telemetry contradicts the wire response. Consider dropping the snake_case arm (andtest_tool_error_snake_case_dict_result_sets_error_type), or fixing the runner coercion to acceptby_nameso the snake_case shape really does reach the client as an error.Extended reasoning...
What the bug is
The post-result match in
OpenTelemetryMiddlewaretreats three shapes as a tool error:CallToolResult(is_error=True),{"isError": True}, and{"is_error": True}(src/mcp/server/_otel.py:74). The PR description justifies the dict arms with "result coercion happens after the middleware runs", i.e. the assumption that the runner will later coerce these dict shapes into anisError: trueresult on the wire. That assumption holds for the model and the camelCase dict, but not for the snake_case shape — its error flag never reaches the client, so the span reports a tool failure for a request whose wire response is a success.The code path
After the middleware chain returns,
ServerRunner._on_requestpasses the raw dict through_dump_result(returns dicts unchanged, src/mcp/server/runner.py:148-158) and thenserialize_server_result(method, version, result)(src/mcp/server/runner.py:295). That function validates against the per-version wire model withadapter.validate_python(data, by_name=False)and dumps withby_alias=True, exclude_none=True(src/mcp/types/methods.py:615-617). The wireCallToolResultdeclaresis_erroronly under the aliasisErrorwithextra="ignore"(src/mcp/types/v2025_11_25/init.py:3144-3155). AlthoughWireModelsetspopulate_by_name=True, the explicitby_name=Falseat the call site overrides it, so the snake_case keyis_erroris treated as an extra and silently discarded.Step-by-step proof (reproduced under the repo's pydantic 2.12.5):
- A
tools/callhandler returns{"content": [], "is_error": True}. - The middleware's match arm
{"is_error": True}fires → span getserror.type="tool_error"andStatusCode.ERROR. - The runner calls
serialize_server_result("tools/call", "2025-11-25", {"content": [], "is_error": True}). validate_python(..., by_name=False)does not recognizeis_erroras the field (alias-only matching), andextra="ignore"drops it →is_error=Noneon the model.dump_python(by_alias=True, exclude_none=True)produces{"content": []}— noisErrorkey, which the spec defines as success.- The client sees a successful tool call; the exported span reports
tool_error/ERROR.
Why existing tests don't catch it
The PR's own
test_tool_error_snake_case_dict_result_sets_error_typeonly asserts the span attributes and never inspects the response body, so it passes while encoding the wrong assumption. By contrast, the camelCase test's sibling (test_emits_server_span_with_method_and_target) does assert the wire body for the success case.Impact
Operators see false-positive tool failures (error spans, error rates, alerts) for handlers that return snake_case dicts — failures that never happened on the wire. (Viewed the other way, the deeper pre-existing issue is that the runner silently drops a handler's snake_case error flag; either way, the new arm bakes in an assumption about coercion that does not hold.)
How to fix
Either drop the
{"is_error": True}arm and its test — only theCallToolResultmodel and the camelCaseisErrordict survive coercion as wire errors — or change the runner's result coercion to validate withby_name=Trueso the snake_case shape genuinely reaches the client asisError: true(in which case the arm and the telemetry would be correct). - A
|
This pull request is included in pre-release v2.0.0a3 |
Adds the GenAI semantic-convention attributes from the MCP OTel spans spec (
mcp.common.attributes) toOpenTelemetryMiddleware:gen_ai.operation.name=execute_toolfortools/call(per the spec, set only for tool calls).gen_ai.tool.namefortools/call,gen_ai.prompt.nameforprompts/get.error.typeandrpc.response.status_codeset to the JSON-RPC error code on failures;error.type=tool_errorfor atools/callresult carryingis_error(spec:CallToolResultwithisErrortrue).mcp.protocol.version.The span name now follows the spec's
{mcp.method.name} {target}format (dropping the priorMCP handleprefix).The
tools/callresult is matched against both theCallToolResultmodel and the dict shapes (isError/is_error) handlers may return, since result coercion happens after the middleware runs.The sensitive
opt_inattributes (gen_ai.tool.call.arguments/result,gen_ai.prompt.variable.*) are intentionally left out; they can be added later behind an opt-in flag.AI Disclaimer
This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.