Parse a SQL query string into an expression tree
Convert SQL from one dialect to another
Format a SQL query with proper indentation and spacing
Apply optimization rules to a SQL query
Execute SQL queries against Python data structures
SQLGlot is a powerful, no-dependency SQL parser and transpiler that can translate between 24 different SQL dialects including DuckDB, Presto/Trino, Spark/Databricks, Snowflake, and BigQuery. It reads a wide variety of SQL inputs and outputs syntactically and semantically correct SQL in the targeted dialects. Written purely in Python, SQLGlot offers robust performance while providing comprehensive parsing capabilities with error detection and customization options.
SQLGlot is a versatile SQL toolkit that allows you to parse, transpile, optimize, and execute SQL queries across different dialects. It's particularly useful for data engineers and analysts who work with multiple database systems and need to convert queries between them.
You can install SQLGlot using pip:
pip install sqlglot
For development or to contribute to the project, you can clone the repository and install it in development mode:
git clone https://github.com/tobymao/sqlglot.git
cd sqlglot
pip install -e .
import sqlglot
# Parse a SQL query
expression = sqlglot.parse_one("SELECT * FROM table WHERE id = 1")
import sqlglot
from sqlglot.dialects import mysql, postgres, snowflake
# Convert MySQL to PostgreSQL
postgres_sql = sqlglot.transpile("SELECT * FROM table LIMIT 10", read="mysql", write="postgres")[0]
# Convert Snowflake to BigQuery
bigquery_sql = sqlglot.transpile(
"SELECT * FROM table QUALIFY ROW_NUMBER() OVER (PARTITION BY id) = 1",
read="snowflake",
write="bigquery"
)[0]
import sqlglot
# Format a SQL query
formatted_sql = sqlglot.format("SELECT id,name FROM users WHERE status='active'")
from sqlglot import exp
# Build a SQL query programmatically
query = (
exp.select("id", "name")
.from_("users")
.where(exp.column("status").eq("active"))
)
# Convert to SQL string
sql_string = query.sql()
import sqlglot
# Parse a query
expression = sqlglot.parse_one("SELECT a, b, SUM(c) FROM table GROUP BY a, b")
# Get tables referenced in the query
tables = expression.find_all(sqlglot.exp.Table)
# Get columns referenced in the query
columns = expression.find_all(sqlglot.exp.Column)
You can extend SQLGlot to support custom dialects:
from sqlglot.dialects.dialect import Dialect, register_dialect
@register_dialect("my_dialect")
class MyDialect(Dialect):
# Define custom syntax and behaviors
pass
SQLGlot provides detailed error messages for syntax issues:
try:
sqlglot.parse_one("SELECT * FROM")
except sqlglot.errors.ParseError as e:
print(f"Parse error: {e}")
When transpiling, you can configure various options:
sqlglot.transpile(
"SELECT * FROM table",
read="mysql",
write="postgres",
pretty=True, # Format the output
identify=True, # Quote identifiers
error_level="IGNORE", # IGNORE, WARN, RAISE
)
SQLGlot can be integrated with data processing frameworks like Pandas:
import pandas as pd
import sqlglot
# Execute SQL on pandas DataFrames
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
tables = {"my_table": df}
result = sqlglot.engine.execute(
"SELECT a, b FROM my_table WHERE a > 1",
tables=tables
)
SQLGlot is designed to be efficient while being written purely in Python. For very large SQL files or high-throughput applications, consider using the batch processing methods to optimize performance.