SDKs Overview

OpenSearch AgentHealth provides official SDKs for instrumenting your AI agents with observability and evaluation capabilities.

Available SDKs

Python SDK

Full-featured SDK for Python applications. Supports all major agent frameworks.

Python SDK Documentation →

JavaScript/TypeScript SDK

Native support for Node.js and browser environments.

JavaScript SDK Documentation →

Quick Start

Python

pip install opensearch-agentops

from opensearch_agentops import AgentHealth

# Initialize
agentops = AgentHealth(
    endpoint="http://localhost:4317",  # OTEL Collector
    service_name="my-agent"
)

# Instrument your agent
@agentops.trace
def run_agent(prompt: str):
    # Your agent logic here
    response = my_llm.generate(prompt)
    return response

# Automatic tracing captures:
# - LLM calls with token usage
# - Tool invocations
# - Execution timing

JavaScript

npm install @opensearch-project/agentops

import { AgentHealth } from '@opensearch-project/agentops';

// Initialize
const agentops = new AgentHealth({
  endpoint: 'http://localhost:4317',
  serviceName: 'my-agent'
});

// Instrument your agent
const result = await agentops.trace('agent_run', async (span) => {
  // Your agent logic here
  const response = await myLLM.generate(prompt);

  // Add custom attributes
  span.setAttribute('gen_ai.request.model', 'claude-sonnet-4');

  return response;
});

Core Features

Automatic Instrumentation

The SDKs automatically capture:

Data	Description	OTEL Attribute
Model Info	LLM provider and model ID	`gen_ai.system`, `gen_ai.request.model`
Token Usage	Input/output token counts	`gen_ai.usage.input_tokens`
Tool Calls	Function invocations	`gen_ai.tool.name`
Latency	Execution timing	Span duration
Errors	Exceptions and failures	`exception.*`

Agent Adapters

SDKs support multiple agent frameworks through adapters:

from opensearch_agentops.adapters import (
    LangGraphAdapter,
    StrandsAdapter,
    ClaudeCodeAdapter,
    CustomAdapter
)

# LangGraph integration
adapter = LangGraphAdapter(
    endpoint="http://localhost:3000",
    streaming=True
)

# Strands integration (AWS Bedrock)
adapter = StrandsAdapter(
    agent_id="my-strands-agent",
    region="us-west-2"
)

# Custom agent
adapter = CustomAdapter(
    execute_fn=my_agent_function
)

Multi-Agent Comparison

Compare different agent configurations:

# Define agent configurations
configs = [
    {"agent": "langgraph", "model": "claude-sonnet-4"},
    {"agent": "langgraph", "model": "gpt-4o"},
    {"agent": "strands", "model": "claude-sonnet-4"},
]

# Run comparison
comparison = agentops.compare_agents(
    test_case_id="tc-database-timeout",
    configs=configs
)

# Analyze results
print(f"Best performing: {comparison.best_config}")
print(f"Accuracy comparison:")
for result in comparison.results:
    print(f"  {result.config}: {result.accuracy}/100")

Evaluation API

Run evaluations programmatically:

# Create test case
test_case = agentops.create_test_case(
    name="Database Timeout Diagnosis",
    initial_prompt="Why is my API returning 503 errors?",
    context=[
        {"type": "log", "content": "Connection timeout errors..."},
        {"type": "metric", "content": {"pool_size": [45, 48, 50, 50]}}
    ],
    expected_outcomes=[
        "Identify connection pool exhaustion",
        "Recommend increasing pool size"
    ]
)

# Create benchmark
benchmark = agentops.create_benchmark(
    name="RCA Suite",
    test_case_ids=[test_case.id]
)

# Run evaluation
run = agentops.run_benchmark(
    benchmark_id=benchmark.id,
    agent="langgraph",
    model="claude-sonnet-4"
)

# Get results
print(f"Pass Rate: {run.metrics.pass_rate}%")
print(f"Avg Accuracy: {run.metrics.avg_accuracy}")

# Access individual results
for result in run.results:
    print(f"{result.test_case_name}: {result.pass_fail_status}")
    print(f"  Accuracy: {result.accuracy}")
    print(f"  Reasoning: {result.llm_judge_reasoning}")

Framework Integrations

LangGraph

from opensearch_agentops.integrations import LangGraphInstrumentation

# Auto-instrument LangGraph
LangGraphInstrumentation.instrument()

# Your LangGraph code is now traced automatically
from langgraph.prebuilt import create_react_agent

agent = create_react_agent(model, tools)
result = agent.invoke({"messages": [("user", prompt)]})

Strands

from opensearch_agentops.integrations import StrandsInstrumentation

# Auto-instrument Strands
StrandsInstrumentation.instrument()

# Strands agents are now traced
from strands import Agent

agent = Agent(model="claude-sonnet-4")
result = agent.run(prompt)

OpenAI

from opensearch_agentops.integrations import OpenAIInstrumentation

# Auto-instrument OpenAI
OpenAIInstrumentation.instrument()

# OpenAI calls are now traced
from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

Anthropic

from opensearch_agentops.integrations import AnthropicInstrumentation

# Auto-instrument Anthropic
AnthropicInstrumentation.instrument()

# Anthropic calls are now traced
import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": prompt}]
)

Configuration

Environment Variables

# Required
AGENTOPS_ENDPOINT=http://localhost:4317

# Optional
AGENTOPS_SERVICE_NAME=my-agent
AGENTOPS_ENVIRONMENT=production
AGENTOPS_DEBUG=false

# LLM Judge
AGENTOPS_JUDGE_MODEL=claude-sonnet-4
AGENTOPS_JUDGE_PROVIDER=bedrock  # or openai, anthropic

# OpenSearch Storage
OPENSEARCH_URL=http://localhost:9200
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=admin

Programmatic Configuration

from opensearch_agentops import AgentHealth, Config

config = Config(
    endpoint="http://localhost:4317",
    service_name="my-agent",
    environment="production",

    # Sampling
    trace_sample_rate=1.0,  # 100% sampling

    # Batching
    batch_size=100,
    flush_interval=5000,  # 5 seconds

    # Judge settings
    judge_model="claude-sonnet-4",
    judge_provider="bedrock"
)

agentops = AgentHealth(config=config)

SDK Architecture

┌──────────────────────────────────────────────────────┐
│                   Your Application                    │
├──────────────────────────────────────────────────────┤
│                  AgentHealth SDK                         │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐     │
│  │  Tracers   │  │  Adapters  │  │ Evaluators │     │
│  └─────┬──────┘  └─────┬──────┘  └─────┬──────┘     │
│        │               │               │             │
│        └───────────────┴───────────────┘             │
│                        │                             │
│              ┌─────────┴─────────┐                   │
│              │  OTEL Exporter    │                   │
│              └─────────┬─────────┘                   │
└────────────────────────┼─────────────────────────────┘
                         │ OTLP gRPC/HTTP
                         ▼
              ┌──────────────────────┐
              │  OTEL Collector      │
              └──────────┬───────────┘
                         │
           ┌─────────────┼─────────────┐
           ▼             ▼             ▼
     ┌──────────┐  ┌──────────┐  ┌──────────┐
     │OpenSearch│  │Prometheus│  │ Jaeger   │
     └──────────┘  └──────────┘  └──────────┘

Next Steps

SDKs Overview

Available SDKs

Python SDK

JavaScript/TypeScript SDK

Quick Start

Python

JavaScript

Core Features

Automatic Instrumentation

Agent Adapters

Multi-Agent Comparison

Evaluation API

Framework Integrations

LangGraph

Strands

OpenAI

Anthropic

Configuration

Environment Variables

Programmatic Configuration

SDK Architecture

Next Steps

Python SDK

JavaScript SDK

Troubleshooting