Skip to content

Add GenAI semantic-convention attributes to OpenTelemetryMiddleware#2970

Merged
maxisbey merged 4 commits into
mainfrom
otel-genai-attributes
Jun 25, 2026
Merged

Add GenAI semantic-convention attributes to OpenTelemetryMiddleware#2970
maxisbey merged 4 commits into
mainfrom
otel-genai-attributes

Conversation

@Kludex

@Kludex Kludex commented Jun 25, 2026

Copy link
Copy Markdown
Member

Adds the GenAI semantic-convention attributes from the MCP OTel spans spec (mcp.common.attributes) to OpenTelemetryMiddleware:

  • gen_ai.operation.name = execute_tool for tools/call (per the spec, set only for tool calls).
  • gen_ai.tool.name for tools/call, gen_ai.prompt.name for prompts/get.
  • error.type and rpc.response.status_code set to the JSON-RPC error code on failures; error.type = tool_error for a tools/call result carrying is_error (spec: CallToolResult with isError true).
  • mcp.protocol.version.

The span name now follows the spec's {mcp.method.name} {target} format (dropping the prior MCP handle prefix).

The tools/call result is matched against both the CallToolResult model and the dict shapes (isError/is_error) handlers may return, since result coercion happens after the middleware runs.

The sensitive opt_in attributes (gen_ai.tool.call.arguments/result, gen_ai.prompt.variable.*) are intentionally left out; they can be added later behind an opt-in flag.

AI Disclaimer

This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.

Set gen_ai.operation.name (execute_tool), gen_ai.tool.name, and
gen_ai.prompt.name per the MCP OTel semantic conventions, plus error.type
and rpc.response.status_code on failures (tool_error for a tools/call
result carrying is_error). Span name now follows {mcp.method.name} {target}.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — telemetry-only additions following the MCP OTel semconv spec, well covered by new tests; the inline docstring nit is worth a quick follow-up but doesn't block.

Extended reasoning...

Overview

The PR is confined to src/mcp/server/_otel.py and its test file. It adds GenAI semantic-convention attributes (gen_ai.operation.name, gen_ai.tool.name, gen_ai.prompt.name, mcp.protocol.version, error.type, rpc.response.status_code) to the existing OpenTelemetryMiddleware and renames spans to the spec's {method} {target} format. No protocol, dispatch, or transport behavior changes — all exceptions are still re-raised and the handler result is returned unchanged; the middleware only annotates spans.

Security risks

None of substance. The PR explicitly excludes the sensitive opt-in attributes (tool arguments/results, prompt variables), so no client payload data is added to spans beyond the tool/prompt name, which is already in the span name. The ValidationError path keeps the sanitized status message. ctx.protocol_version is a required field on ServerRequestContext, so the new attribute is always a string.

Level of scrutiny

Moderate-to-low. This is observability-only code in an opt-in middleware that was added very recently (so the span-name format change is unlikely to break existing dashboards), and it tracks an external semconv spec. The structural match against CallToolResult / isError / is_error dict shapes is straightforward and tested for all three shapes.

Other factors

Test coverage is thorough: new tests for tool-error detection (model and both dict spellings), prompt-name attribute, gen_ai attribute exclusion on custom methods, and error.type assertions on all three failure paths. The single finding from the bug-hunting pass is a docstring overstatement (it claims rpc.response.status_code is set on all failures when only the MCPError branch sets it) — a documentation nit, not a behavioral defect, already flagged inline.

Comment thread src/mcp/server/_otel.py
Align the ValidationError branch with the MCPError branch and the C# SDK:
set error.type and rpc.response.status_code to the numeric JSON-RPC code
(INVALID_PARAMS) as a string, not the exception class name.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread src/mcp/server/_otel.py Outdated
Kludex added 2 commits June 25, 2026 16:38
Gate gen_ai.operation.name on the method, not the tool-name target, so a
malformed tools/call still records execute_tool (cubic review). Tighten the
docstring to describe the per-branch error.type/rpc.response.status_code
behavior accurately (claude review).
@maxisbey maxisbey enabled auto-merge (squash) June 25, 2026 14:41
@maxisbey maxisbey merged commit 1b1abf6 into main Jun 25, 2026
32 checks passed
@maxisbey maxisbey deleted the otel-genai-attributes branch June 25, 2026 14:43

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 src/mcp/server/_otel.py:72-77 — The new {"is_error": True} match arm marks the span as a tool error (error.type="tool_error", StatusCode.ERROR), but the runner's result coercion (serialize_server_result with by_name=False, extra="ignore") silently drops the snake_case is_error key, so the client actually receives a successful result with no isError field — the telemetry contradicts the wire response. Consider dropping the snake_case arm (and test_tool_error_snake_case_dict_result_sets_error_type), or fixing the runner coercion to accept by_name so the snake_case shape really does reach the client as an error.

    Extended reasoning...

    What the bug is

    The post-result match in OpenTelemetryMiddleware treats three shapes as a tool error: CallToolResult(is_error=True), {"isError": True}, and {"is_error": True} (src/mcp/server/_otel.py:74). The PR description justifies the dict arms with "result coercion happens after the middleware runs", i.e. the assumption that the runner will later coerce these dict shapes into an isError: true result on the wire. That assumption holds for the model and the camelCase dict, but not for the snake_case shape — its error flag never reaches the client, so the span reports a tool failure for a request whose wire response is a success.

    The code path

    After the middleware chain returns, ServerRunner._on_request passes the raw dict through _dump_result (returns dicts unchanged, src/mcp/server/runner.py:148-158) and then serialize_server_result(method, version, result) (src/mcp/server/runner.py:295). That function validates against the per-version wire model with adapter.validate_python(data, by_name=False) and dumps with by_alias=True, exclude_none=True (src/mcp/types/methods.py:615-617). The wire CallToolResult declares is_error only under the alias isError with extra="ignore" (src/mcp/types/v2025_11_25/init.py:3144-3155). Although WireModel sets populate_by_name=True, the explicit by_name=False at the call site overrides it, so the snake_case key is_error is treated as an extra and silently discarded.

    Step-by-step proof (reproduced under the repo's pydantic 2.12.5):

    1. A tools/call handler returns {"content": [], "is_error": True}.
    2. The middleware's match arm {"is_error": True} fires → span gets error.type="tool_error" and StatusCode.ERROR.
    3. The runner calls serialize_server_result("tools/call", "2025-11-25", {"content": [], "is_error": True}).
    4. validate_python(..., by_name=False) does not recognize is_error as the field (alias-only matching), and extra="ignore" drops it → is_error=None on the model.
    5. dump_python(by_alias=True, exclude_none=True) produces {"content": []} — no isError key, which the spec defines as success.
    6. The client sees a successful tool call; the exported span reports tool_error / ERROR.

    Why existing tests don't catch it

    The PR's own test_tool_error_snake_case_dict_result_sets_error_type only asserts the span attributes and never inspects the response body, so it passes while encoding the wrong assumption. By contrast, the camelCase test's sibling (test_emits_server_span_with_method_and_target) does assert the wire body for the success case.

    Impact

    Operators see false-positive tool failures (error spans, error rates, alerts) for handlers that return snake_case dicts — failures that never happened on the wire. (Viewed the other way, the deeper pre-existing issue is that the runner silently drops a handler's snake_case error flag; either way, the new arm bakes in an assumption about coercion that does not hold.)

    How to fix

    Either drop the {"is_error": True} arm and its test — only the CallToolResult model and the camelCase isError dict survive coercion as wire errors — or change the runner's result coercion to validate with by_name=True so the snake_case shape genuinely reaches the client as isError: true (in which case the arm and the telemetry would be correct).

@github-actions

Copy link
Copy Markdown
Contributor

This pull request is included in pre-release v2.0.0a3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants