Back to MCP Catalog

Hugging Face Dataset Viewer MCP Server

Data Science ToolsPython
Browse and analyze datasets hosted on the Hugging Face Hub
Available Tools

validate

Check if a dataset exists and is accessible

datasetauth_token

get_info

Get detailed information about a dataset

datasetauth_token

get_rows

Get paginated contents of a dataset

datasetconfigsplitpageauth_token

get_first_rows

Get first rows from a dataset split

datasetconfigsplitauth_token

get_statistics

Get statistics about a dataset split

datasetconfigsplitauth_token

search_dataset

Search for text within a dataset

datasetconfigsplitqueryauth_token

filter

Filter rows using SQL-like conditions

datasetconfigsplitwhereorderbypageauth_token

get_parquet

Download entire dataset in Parquet format

datasetauth_token

The Hugging Face Dataset Viewer MCP provides a seamless interface to explore, search, and analyze datasets hosted on the Hugging Face Hub. It enables users to validate datasets, retrieve detailed information, access paginated contents, and perform advanced operations like searching and filtering. With support for dataset configurations, splits, and authentication for private datasets, this MCP offers comprehensive capabilities for data exploration. It also provides statistical analysis and the ability to download entire datasets in Parquet format, making it an essential tool for data scientists and machine learning practitioners.

Installation

Prerequisites

  • Python 3.12 or higher
  • uv - Fast Python package installer and resolver

Setup Instructions

  1. Clone the repository:
git clone https://github.com/privetin/dataset-viewer.git
cd dataset-viewer
  1. Create and activate a virtual environment:
# Create virtual environment
uv venv

# Activate virtual environment
# On Unix:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
  1. Install in development mode:
uv add -e .

Configuration

Environment Variables

You can set the HUGGINGFACE_TOKEN environment variable to provide your Hugging Face API token for accessing private datasets.

Claude Desktop Integration

Add the MCP server configuration to your Claude Desktop config file:

  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "dataset-viewer": {
      "command": "uv",
      "args": [
        "--directory",
        "PATH_TO_YOUR_DATASET_VIEWER_DIRECTORY",
        "run",
        "dataset-viewer"
      ]
    }
  }
}

Replace PATH_TO_YOUR_DATASET_VIEWER_DIRECTORY with the actual path to where you cloned the repository.

Usage

Once installed, you can use the Dataset Viewer MCP to interact with Hugging Face datasets. The MCP uses the dataset:// URI scheme for accessing datasets.

Basic Operations

  • Validate a dataset's existence and accessibility
  • Get detailed information about datasets
  • Browse dataset contents with pagination
  • View statistics and analyze dataset characteristics

Working with Private Datasets

For private datasets, you'll need to provide an authentication token either through the environment variable or as a parameter to the relevant tools.

Advanced Features

  • Search for specific text within datasets
  • Filter rows using SQL-like conditions
  • Sort results using ORDER BY clauses
  • Download entire datasets in Parquet format for offline analysis

Related MCPs

Vega-Lite Data Visualization
Data Science ToolsPython

Create interactive data visualizations using Vega-Lite syntax

Open Data
Data Science ToolsPython

Connect any Open Data to any LLM with Model Context Protocol

Tinybird
Data Science ToolsPython

Query and interact with Tinybird workspaces from any MCP client

About Model Context Protocol

Model Context Protocol (MCP) allows AI models to access external tools and services, extending their capabilities beyond their training data.

Generate Cursor Documentation

Save time on coding by generating custom documentation and prompts for Cursor IDE.