Opik MCP Server

Data Science ToolsPython

Debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows

Available Tools

trace

Create and manage traces for LLM applications, capturing inputs, outputs, and intermediate steps

trace_namemetadata

log

Log events, inputs, outputs, and metadata within a trace

typecontentmetadata

evaluate

Evaluate LLM outputs using various metrics like relevance, faithfulness, and context precision

trace_idmetrics

dashboard

Launch the Opik dashboard to visualize traces and evaluation results

port

Opik is a comprehensive platform for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows. It provides detailed tracing capabilities, automated evaluations, and production-ready dashboards to help developers understand and improve their AI systems. With Opik, teams can gain insights into their LLM applications' performance, identify issues, and optimize their systems for better results.

Opik: LLM Observability and Evaluation Platform

Opik is a powerful tool for debugging, evaluating, and monitoring your LLM applications, RAG systems, and agentic workflows. It provides comprehensive tracing, automated evaluations, and production-ready dashboards to help you understand and improve your AI systems.

Installation

To install Opik, you can use pip:

pip install opik

For JavaScript/TypeScript applications, you can install via npm:

npm install @comet-ml/opik

Getting Started

After installation, you can start using Opik by initializing it in your application:

Python

from opik import OpikTracer

# Initialize the tracer
tracer = OpikTracer()

# Create a trace for your LLM application
with tracer.trace("my_llm_app") as trace:
    # Your LLM application code here
    response = llm.generate("What is the capital of France?")
    
    # Log the response
    trace.log(
        type="llm_response",
        content=response,
        metadata={"model": "gpt-4"}
    )

JavaScript/TypeScript

import { OpikTracer } from '@comet-ml/opik';

// Initialize the tracer
const tracer = new OpikTracer();

// Create a trace for your LLM application
const trace = tracer.startTrace("my_llm_app");
try {
    // Your LLM application code here
    const response = await llm.generate("What is the capital of France?");
    
    // Log the response
    trace.log({
        type: "llm_response",
        content: response,
        metadata: { model: "gpt-4" }
    });
} finally {
    // End the trace
    await trace.end();
}

Key Features

Tracing

Opik provides detailed tracing capabilities that allow you to track the flow of information through your LLM applications. You can log inputs, outputs, and intermediate steps to understand how your system is processing data.

with tracer.trace("rag_system") as trace:
    # Log user query
    trace.log(type="user_query", content="What is quantum computing?")
    
    # Log document retrieval
    docs = retriever.get_relevant_documents("What is quantum computing?")
    trace.log(type="retrieved_documents", content=docs)
    
    # Log LLM response
    response = llm.generate_with_context(docs, "What is quantum computing?")
    trace.log(type="llm_response", content=response)

Evaluation

Opik includes automated evaluation tools to assess the quality of your LLM outputs:

from opik.evaluation import evaluate_rag

# Evaluate RAG system
results = evaluate_rag(
    trace_id="trace_123",
    metrics=["relevance", "faithfulness", "context_precision"]
)

print(f"Relevance score: {results['relevance']}")
print(f"Faithfulness score: {results['faithfulness']}")

Dashboards

Opik provides production-ready dashboards to visualize your system's performance. You can access these dashboards by running:

opik dashboard

This will start a local server where you can view your traces, evaluation results, and system metrics.

Configuration

You can configure Opik using environment variables or by passing parameters to the OpikTracer constructor:

tracer = OpikTracer(
    api_key="your_api_key",  # Optional: for cloud storage
    project_name="my_llm_project",
    log_level="INFO"
)

For more advanced configuration, you can create a configuration file:

# opik.yaml
api_key: your_api_key
project_name: my_llm_project
log_level: INFO
storage:
  type: local  # or "cloud"
  path: ./opik_data

Advanced Usage

Custom Metrics

You can define custom evaluation metrics:

from opik.evaluation import register_metric

@register_metric
def my_custom_metric(trace, **kwargs):
    # Implement your custom metric logic
    return score

Integration with Popular LLM Frameworks

Opik integrates seamlessly with popular LLM frameworks like LangChain, LlamaIndex, and DSPy:

# LangChain integration
from opik.integrations.langchain import OpikCallbackHandler

callback = OpikCallbackHandler()
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[callback])

For more detailed information and advanced usage, refer to the official documentation.