Greenplum
Connect to Greenplum for large-scale analytics
Greenplum is a massively parallel processing (MPP) database built on PostgreSQL, designed for large-scale analytics and data warehousing. Scoop connects directly to Greenplum using PostgreSQL compatibility, enabling AI-powered analysis on your data warehouse.
Connection Details
| Setting | Value |
|---|---|
| Default Port | 5432 |
| Connection Type | TCP/IP (PostgreSQL protocol) |
| SSL | Recommended |
Prerequisites
Before connecting Scoop to Greenplum:
- Greenplum cluster (version 6.x or later recommended)
- Read-only user with SELECT permissions
- Network access from Scoop's IP addresses
Create a Read-Only User
Connect to your Greenplum master node and run:
-- Create the read-only user
CREATE USER scoop_user WITH PASSWORD 'your_secure_password';
-- Grant connection access
GRANT CONNECT ON DATABASE your_database TO scoop_user;
-- Grant schema usage
GRANT USAGE ON SCHEMA public TO scoop_user;
-- Grant SELECT on all existing tables
GRANT SELECT ON ALL TABLES IN SCHEMA public TO scoop_user;
-- Grant SELECT on future tables (recommended)
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO scoop_user;Replace your_database and the password with your actual values.
Whitelist Scoop IP Addresses
For VMware Greenplum
Configure your network security to allow connections from Scoop's IP addresses to the master node on port 5432.
For Greenplum on Cloud
Follow your cloud provider's instructions for configuring network access (security groups, firewall rules, etc.).
Also configure pg_hba.conf on the master to allow remote connections:
host your_database scoop_user SCOOP_IP/32 md5
Contact Scoop support for the current list of IP addresses to whitelist.
Connect in Scoop
- Navigate to Datasets and click New Dataset
- Select Database as your source
- Choose Greenplum from the database type dropdown
- Enter your connection details:
- Server Host: Your Greenplum master hostname
- Port: 5432 (or your custom port)
- Database: Your database name
- Username: scoop_user
- Password: The user's password
- Click Test Connection to verify
- Enter your SQL query and proceed
Connection URL Format
Greenplum uses the PostgreSQL JDBC driver:
jdbc:postgresql://master-hostname:5432/database_name
With SSL:
jdbc:postgresql://master-hostname:5432/database_name?ssl=true&sslmode=require
Query Optimization
Greenplum performs best when queries leverage its distributed architecture:
- Use distribution keys wisely to minimize data movement
- Filter on partitioned columns when possible
- Avoid cross-segment operations that require data redistribution
-- Good: Query uses distribution key in join
SELECT f.*, d.name
FROM fact_sales f
JOIN dim_customer d ON f.customer_id = d.customer_id
WHERE f.sale_date >= '2024-01-01'
-- Check query plan for performance
EXPLAIN ANALYZE SELECT ...Best Practices
- Analyze tables regularly to keep statistics current
- Use appropriate distribution (hash, random, or replicated)
- Partition large tables by date or other logical divisions
- Use columnar storage (append-optimized) for analytical workloads
Troubleshooting
Connection Refused
- Verify you're connecting to the master node (not segment nodes)
- Check that Greenplum is running:
gpstate -s - Ensure
pg_hba.confallows remote connections - Verify port 5432 is open in your firewall
Authentication Failed
- Verify the username and password are correct
- Check
pg_hba.confauthentication method - Ensure the user exists:
SELECT usename FROM pg_user;
Permission Denied
- Verify GRANT statements were executed on the correct database
- Check current permissions:
\dp table_namein psql - Ensure USAGE is granted on the schema
Updated about 5 hours ago