MCP Azure OpenAI Web Browsing is a minimal implementation that connects Azure OpenAI capabilities with web browser automation through the Model Context Protocol (MCP). It leverages Playwright for browser control and provides a bridge that converts MCP server responses to OpenAI function calling format, enabling AI-powered web interactions. This tool allows AI models to navigate websites, interact with web elements, and perform automated tasks through a standardized protocol. The implementation includes both server and client components, making it easy to integrate into existing AI applications that need web automation capabilities.
MCP Azure OpenAI Web Browsing provides a Model Context Protocol server that enables AI models to control web browsers through Azure OpenAI. The implementation uses FastMCP for the server component and Playwright for browser automation, creating a powerful tool for AI-driven web interactions.
Clone the repository:
git clone https://github.com/kimtth/mcp-aoai-web-browsing.git
cd mcp-aoai-web-browsing
Set up environment variables:
.env.template
to .env
AZURE_OPEN_AI_ENDPOINT=your_endpoint
AZURE_OPEN_AI_API_KEY=your_api_key
AZURE_OPEN_AI_DEPLOYMENT_MODEL=your_model
AZURE_OPEN_AI_API_VERSION=your_api_version
Install dependencies using uv
(recommended):
pip install uv
uv sync
Launch the application:
python chatgui.py
The MCP server provides several Playwright-based tools for web automation:
playwright_navigate(url, timeout=30000, wait_until="load")
: Navigate to a specified URLplaywright_go_back()
: Navigate back in browser historyplaywright_go_forward()
: Navigate forward in browser historyplaywright_reload()
: Reload the current pageplaywright_click(selector)
: Click on an element matching the selectorplaywright_fill(selector, value)
: Fill a form field with textplaywright_press(selector, key)
: Press a key on an elementplaywright_get_text(selector)
: Get text content from an elementplaywright_get_attribute(selector, name)
: Get attribute value from an elementplaywright_extract_selectors(content)
: Extract selectors from page contentplaywright_screenshot()
: Take a screenshot of the current pageplaywright_get_page_content()
: Get the HTML content of the current pageplaywright_close()
: Close the browserOnce the application is running, you can interact with it through the provided GUI. Enter prompts that instruct the AI to perform web tasks, such as:
The AI will use the Playwright tools to execute these commands in a controlled browser environment.
The implementation consists of three main components:
This architecture ensures a stable connection between the AI model and the browser automation tools, making it easy to extend with additional capabilities.