ClickHouse

Connect to ClickHouse for real-time analytics

ClickHouse is a column-oriented database management system designed for online analytical processing (OLAP). It excels at real-time analytics over massive datasets. Scoop connects directly to ClickHouse, enabling AI-powered analysis on your high-volume data.

Connection Details

SettingValue
HTTP Port8123
Native Port9000
Connection TypeHTTP or Native
SSLRecommended

Prerequisites

Before connecting Scoop to ClickHouse:

  1. ClickHouse server (version 21.8 or later recommended)
  2. Read-only user with SELECT permissions
  3. Network access from Scoop's IP addresses

Create a Read-Only User

Connect to your ClickHouse server and run:

-- Create the read-only user
CREATE USER scoop_user IDENTIFIED BY 'your_secure_password';

-- Grant SELECT on your database
GRANT SELECT ON your_database.* TO scoop_user;

-- Or grant access to specific tables
GRANT SELECT ON your_database.your_table TO scoop_user;

Replace your_database and the password with your actual values.

Whitelist Scoop IP Addresses

For ClickHouse Cloud

  1. Go to your ClickHouse Cloud service
  2. Navigate to Settings > IP Access List
  3. Add Scoop's IP addresses

For Self-Hosted ClickHouse

  1. Edit your users.xml or users.d/*.xml configuration
  2. Add IP restrictions if needed
  3. Configure your firewall to allow port 8123 (HTTP) or 9000 (native)

Contact Scoop support for the current list of IP addresses to whitelist.

Connect in Scoop

  1. Navigate to Datasets and click New Dataset
  2. Select Database as your source
  3. Choose ClickHouse from the database type dropdown
  4. Enter your connection details:
    • Server Host: Your ClickHouse hostname
    • Port: 8123 (HTTP) or 8443 (HTTPS)
    • Database: Your database name (default: default)
    • Username: scoop_user
    • Password: The user's password
  5. Click Test Connection to verify
  6. Enter your SQL query and proceed

Connection URL Format

If using a JDBC URL directly:

HTTP connection:

jdbc:clickhouse://hostname:8123/database_name

With SSL:

jdbc:clickhouse://hostname:8443/database_name?ssl=true

Query Optimization

ClickHouse performs best with queries that:

  • Filter on partition keys to minimize data scanning
  • Use appropriate aggregate functions (ClickHouse has many optimized functions)
  • Avoid large JOINs when possible (ClickHouse is optimized for denormalized data)
-- Good: Filter on partitioned column (e.g., date)
SELECT * FROM events WHERE date >= '2024-01-01'

-- Better: Use ClickHouse-specific functions
SELECT toStartOfMonth(date) as month, count()
FROM events
GROUP BY month

Best Practices

  • Use MergeTree engine family for best performance
  • Partition by date for time-series data
  • Use LowCardinality for columns with few distinct values
  • Batch inserts rather than single-row inserts

Troubleshooting

Connection Refused

  • Verify ClickHouse is running and listening on the correct port
  • Check that your IP is allowed in the configuration
  • Ensure the port is open in your firewall

Authentication Failed

  • Verify the username and password are correct
  • Check user configuration in users.xml
  • Ensure the user is not restricted to localhost only

Permission Denied

  • Verify GRANT statements were executed
  • Check current permissions: SHOW GRANTS FOR scoop_user
  • Ensure the user has access to the database