Add GenAI semantic-convention attributes to OpenTelemetryMiddleware by Kludex · Pull Request #2970 · modelcontextprotocol/python-sdk

Kludex · 2026-06-25T13:47:54Z

Adds the GenAI semantic-convention attributes from the MCP OTel spans spec (mcp.common.attributes) to OpenTelemetryMiddleware:

gen_ai.operation.name = execute_tool for tools/call (per the spec, set only for tool calls).
gen_ai.tool.name for tools/call, gen_ai.prompt.name for prompts/get.
error.type and rpc.response.status_code set to the JSON-RPC error code on failures; error.type = tool_error for a tools/call result carrying is_error (spec: CallToolResult with isError true).
mcp.protocol.version.

The span name now follows the spec's {mcp.method.name} {target} format (dropping the prior MCP handle prefix).

The tools/call result is matched against both the CallToolResult model and the dict shapes (isError/is_error) handlers may return, since result coercion happens after the middleware runs.

The sensitive opt_in attributes (gen_ai.tool.call.arguments/result, gen_ai.prompt.variable.*) are intentionally left out; they can be added later behind an opt-in flag.

AI Disclaimer

This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.

Set gen_ai.operation.name (execute_tool), gen_ai.tool.name, and gen_ai.prompt.name per the MCP OTel semantic conventions, plus error.type and rpc.response.status_code on failures (tool_error for a tools/call result carrying is_error). Span name now follows {mcp.method.name} {target}.

claude

LGTM — telemetry-only additions following the MCP OTel semconv spec, well covered by new tests; the inline docstring nit is worth a quick follow-up but doesn't block.

Extended reasoning...

Overview

The PR is confined to src/mcp/server/_otel.py and its test file. It adds GenAI semantic-convention attributes (gen_ai.operation.name, gen_ai.tool.name, gen_ai.prompt.name, mcp.protocol.version, error.type, rpc.response.status_code) to the existing OpenTelemetryMiddleware and renames spans to the spec's {method} {target} format. No protocol, dispatch, or transport behavior changes — all exceptions are still re-raised and the handler result is returned unchanged; the middleware only annotates spans.

Security risks

None of substance. The PR explicitly excludes the sensitive opt-in attributes (tool arguments/results, prompt variables), so no client payload data is added to spans beyond the tool/prompt name, which is already in the span name. The ValidationError path keeps the sanitized status message. ctx.protocol_version is a required field on ServerRequestContext, so the new attribute is always a string.

Level of scrutiny

Moderate-to-low. This is observability-only code in an opt-in middleware that was added very recently (so the span-name format change is unlikely to break existing dashboards), and it tracks an external semconv spec. The structural match against CallToolResult / isError / is_error dict shapes is straightforward and tested for all three shapes.

Other factors

Test coverage is thorough: new tests for tool-error detection (model and both dict spellings), prompt-name attribute, gen_ai attribute exclusion on custom methods, and error.type assertions on all three failure paths. The single finding from the bug-hunting pass is a docstring overstatement (it claims rpc.response.status_code is set on all failures when only the MCPError branch sets it) — a documentation nit, not a behavioral defect, already flagged inline.

Align the ValidationError branch with the MCPError branch and the C# SDK: set error.type and rpc.response.status_code to the numeric JSON-RPC code (INVALID_PARAMS) as a string, not the exception class name.

cubic-dev-ai

1 issue found across 2 files

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

Gate gen_ai.operation.name on the method, not the tool-name target, so a malformed tools/call still records execute_tool (cubic review). Tighten the docstring to describe the per-branch error.type/rpc.response.status_code behavior accurately (claude review).

claude

Additional findings (outside current diff — PR may have been updated during review):

🔴 src/mcp/server/_otel.py:72-77 — The new {"is_error": True} match arm marks the span as a tool error (error.type="tool_error", StatusCode.ERROR), but the runner's result coercion (serialize_server_result with by_name=False, extra="ignore") silently drops the snake_case is_error key, so the client actually receives a successful result with no isError field — the telemetry contradicts the wire response. Consider dropping the snake_case arm (and test_tool_error_snake_case_dict_result_sets_error_type), or fixing the runner coercion to accept by_name so the snake_case shape really does reach the client as an error.
Extended reasoning...

What the bug is

The post-result match in OpenTelemetryMiddleware treats three shapes as a tool error: CallToolResult(is_error=True), {"isError": True}, and {"is_error": True} (src/mcp/server/_otel.py:74). The PR description justifies the dict arms with "result coercion happens after the middleware runs", i.e. the assumption that the runner will later coerce these dict shapes into an isError: true result on the wire. That assumption holds for the model and the camelCase dict, but not for the snake_case shape — its error flag never reaches the client, so the span reports a tool failure for a request whose wire response is a success.

The code path

After the middleware chain returns, ServerRunner._on_request passes the raw dict through _dump_result (returns dicts unchanged, src/mcp/server/runner.py:148-158) and then serialize_server_result(method, version, result) (src/mcp/server/runner.py:295). That function validates against the per-version wire model with adapter.validate_python(data, by_name=False) and dumps with by_alias=True, exclude_none=True (src/mcp/types/methods.py:615-617). The wire CallToolResult declares is_error only under the alias isError with extra="ignore" (src/mcp/types/v2025_11_25/init.py:3144-3155). Although WireModel sets populate_by_name=True, the explicit by_name=False at the call site overrides it, so the snake_case key is_error is treated as an extra and silently discarded.

Step-by-step proof (reproduced under the repo's pydantic 2.12.5):
1. A tools/call handler returns {"content": [], "is_error": True}.
2. The middleware's match arm {"is_error": True} fires → span gets error.type="tool_error" and StatusCode.ERROR.
3. The runner calls serialize_server_result("tools/call", "2025-11-25", {"content": [], "is_error": True}).
4. validate_python(..., by_name=False) does not recognize is_error as the field (alias-only matching), and extra="ignore" drops it → is_error=None on the model.
5. dump_python(by_alias=True, exclude_none=True) produces {"content": []} — no isError key, which the spec defines as success.
6. The client sees a successful tool call; the exported span reports tool_error / ERROR.
Why existing tests don't catch it

The PR's own test_tool_error_snake_case_dict_result_sets_error_type only asserts the span attributes and never inspects the response body, so it passes while encoding the wrong assumption. By contrast, the camelCase test's sibling (test_emits_server_span_with_method_and_target) does assert the wire body for the success case.

Impact

Operators see false-positive tool failures (error spans, error rates, alerts) for handlers that return snake_case dicts — failures that never happened on the wire. (Viewed the other way, the deeper pre-existing issue is that the runner silently drops a handler's snake_case error flag; either way, the new arm bakes in an assumption about coercion that does not hold.)

How to fix

Either drop the {"is_error": True} arm and its test — only the CallToolResult model and the camelCase isError dict survive coercion as wire errors — or change the runner's result coercion to validate with by_name=True so the snake_case shape genuinely reaches the client as isError: true (in which case the arm and the telemetry would be correct).

github-actions · 2026-06-26T08:43:12Z

This pull request is included in pre-release v2.0.0a3

claude Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/mcp/server/_otel.py

Emit JSON-RPC error code for error.type on validation failures

2c05e4b

Align the ValidationError branch with the MCPError branch and the C# SDK: set error.type and rpc.response.status_code to the numeric JSON-RPC code (INVALID_PARAMS) as a string, not the exception class name.

cubic-dev-ai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/mcp/server/_otel.py Outdated

Kludex added 2 commits June 25, 2026 16:38

remove docstring

513b9bc

maxisbey approved these changes Jun 25, 2026

View reviewed changes

maxisbey enabled auto-merge (squash) June 25, 2026 14:41

maxisbey merged commit 1b1abf6 into main Jun 25, 2026
32 checks passed

maxisbey deleted the otel-genai-attributes branch June 25, 2026 14:43

claude Bot reviewed Jun 25, 2026

View reviewed changes

Kludex mentioned this pull request Jun 25, 2026

Stop flagging snake_case is_error results as tool errors in OTel span #2971

Merged

maxisbey mentioned this pull request Jun 26, 2026

Implement SEP-414: Document OpenTelemetry Trace Context Propagation Conventions #2790

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GenAI semantic-convention attributes to OpenTelemetryMiddleware#2970

Add GenAI semantic-convention attributes to OpenTelemetryMiddleware#2970
maxisbey merged 4 commits into
mainfrom
otel-genai-attributes

Kludex commented Jun 25, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Kludex commented Jun 25, 2026

AI Disclaimer

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cubic-dev-ai Bot left a comment •

edited

Loading