Query Google Search, scrape the top N URLs from the results, and return their cleaned content as Markdown
The RAG Web Browser provides AI assistants with the ability to search the web and extract content from websites. It acts as a bridge between large language models and the internet, similar to web search capabilities in ChatGPT. This tool enables AI agents to perform Google searches, scrape content from multiple URLs, and return cleaned content in Markdown format for easy consumption by LLMs.
The RAG Web Browser MCP server connects AI assistants to the web, allowing them to search for information and extract content from websites. It works by communicating with the RAG Web Browser Actor on the Apify platform, which runs in standby mode to provide fast responses.
To use the RAG Web Browser MCP server, you'll need to:
Clone the repository:
git clone https://github.com/apify/mcp-server-rag-web-browser.git
cd mcp-server-rag-web-browser
Install dependencies:
npm install
Create a .env
file based on the .env.example
template:
APIFY_TOKEN=your_apify_token
You'll need an Apify token, which you can obtain by creating an account at apify.com.
Start the server:
npm start
Add the server to your MCP-compatible client (like Claude Desktop) using the following configuration:
"mcpServers": {
"rag-web-browser": {
"command": "npm",
"args": [
"--prefix",
"PATH_TO_YOUR_PROJECT_DIRECTORY",
"start"
]
}
}
Replace PATH_TO_YOUR_PROJECT_DIRECTORY
with the actual path to where you cloned the repository.
Once installed, you can use the RAG Web Browser through your AI assistant. The primary functionality is accessed through the search
tool, which allows you to:
To search the web:
To fetch a specific URL:
When using the search tool, you can customize its behavior with these parameters:
maxResults
: Control how many search results to process (default: 1)scrapingTool
: Choose between 'browser-playwright' for JavaScript-heavy sites or 'raw-http' for faster, simpler sitesoutputFormats
: Select the format for returned content (markdown, text, or html)requestTimeoutSecs
: Set a timeout for requests to prevent long-running operationsIf you encounter issues:
For more detailed information, visit the repository on GitHub.