ClickHouse
Connect to ClickHouse for real-time analytics
ClickHouse is a column-oriented database management system designed for online analytical processing (OLAP). It excels at real-time analytics over massive datasets. Scoop connects directly to ClickHouse, enabling AI-powered analysis on your high-volume data.
Connection Details
| Setting | Value |
|---|---|
| HTTP Port | 8123 |
| Native Port | 9000 |
| Connection Type | HTTP or Native |
| SSL | Recommended |
Prerequisites
Before connecting Scoop to ClickHouse:
- ClickHouse server (version 21.8 or later recommended)
- Read-only user with SELECT permissions
- Network access from Scoop's IP addresses
Create a Read-Only User
Connect to your ClickHouse server and run:
-- Create the read-only user
CREATE USER scoop_user IDENTIFIED BY 'your_secure_password';
-- Grant SELECT on your database
GRANT SELECT ON your_database.* TO scoop_user;
-- Or grant access to specific tables
GRANT SELECT ON your_database.your_table TO scoop_user;Replace your_database and the password with your actual values.
Whitelist Scoop IP Addresses
For ClickHouse Cloud
- Go to your ClickHouse Cloud service
- Navigate to Settings > IP Access List
- Add Scoop's IP addresses
For Self-Hosted ClickHouse
- Edit your
users.xmlorusers.d/*.xmlconfiguration - Add IP restrictions if needed
- Configure your firewall to allow port 8123 (HTTP) or 9000 (native)
Contact Scoop support for the current list of IP addresses to whitelist.
Connect in Scoop
- Navigate to Datasets and click New Dataset
- Select Database as your source
- Choose ClickHouse from the database type dropdown
- Enter your connection details:
- Server Host: Your ClickHouse hostname
- Port: 8123 (HTTP) or 8443 (HTTPS)
- Database: Your database name (default:
default) - Username: scoop_user
- Password: The user's password
- Click Test Connection to verify
- Enter your SQL query and proceed
Connection URL Format
If using a JDBC URL directly:
HTTP connection:
jdbc:clickhouse://hostname:8123/database_name
With SSL:
jdbc:clickhouse://hostname:8443/database_name?ssl=true
Query Optimization
ClickHouse performs best with queries that:
- Filter on partition keys to minimize data scanning
- Use appropriate aggregate functions (ClickHouse has many optimized functions)
- Avoid large JOINs when possible (ClickHouse is optimized for denormalized data)
-- Good: Filter on partitioned column (e.g., date)
SELECT * FROM events WHERE date >= '2024-01-01'
-- Better: Use ClickHouse-specific functions
SELECT toStartOfMonth(date) as month, count()
FROM events
GROUP BY monthBest Practices
- Use MergeTree engine family for best performance
- Partition by date for time-series data
- Use LowCardinality for columns with few distinct values
- Batch inserts rather than single-row inserts
Troubleshooting
Connection Refused
- Verify ClickHouse is running and listening on the correct port
- Check that your IP is allowed in the configuration
- Ensure the port is open in your firewall
Authentication Failed
- Verify the username and password are correct
- Check user configuration in
users.xml - Ensure the user is not restricted to localhost only
Permission Denied
- Verify GRANT statements were executed
- Check current permissions:
SHOW GRANTS FOR scoop_user - Ensure the user has access to the database
Updated about 5 hours ago