Greenplum

Connect to Greenplum for large-scale analytics

Greenplum is a massively parallel processing (MPP) database built on PostgreSQL, designed for large-scale analytics and data warehousing. Scoop connects directly to Greenplum using PostgreSQL compatibility, enabling AI-powered analysis on your data warehouse.

Connection Details

SettingValue
Default Port5432
Connection TypeTCP/IP (PostgreSQL protocol)
SSLRecommended

Prerequisites

Before connecting Scoop to Greenplum:

  1. Greenplum cluster (version 6.x or later recommended)
  2. Read-only user with SELECT permissions
  3. Network access from Scoop's IP addresses

Create a Read-Only User

Connect to your Greenplum master node and run:

-- Create the read-only user
CREATE USER scoop_user WITH PASSWORD 'your_secure_password';

-- Grant connection access
GRANT CONNECT ON DATABASE your_database TO scoop_user;

-- Grant schema usage
GRANT USAGE ON SCHEMA public TO scoop_user;

-- Grant SELECT on all existing tables
GRANT SELECT ON ALL TABLES IN SCHEMA public TO scoop_user;

-- Grant SELECT on future tables (recommended)
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO scoop_user;

Replace your_database and the password with your actual values.

Whitelist Scoop IP Addresses

For VMware Greenplum

Configure your network security to allow connections from Scoop's IP addresses to the master node on port 5432.

For Greenplum on Cloud

Follow your cloud provider's instructions for configuring network access (security groups, firewall rules, etc.).

Also configure pg_hba.conf on the master to allow remote connections:

host    your_database    scoop_user    SCOOP_IP/32    md5

Contact Scoop support for the current list of IP addresses to whitelist.

Connect in Scoop

  1. Navigate to Datasets and click New Dataset
  2. Select Database as your source
  3. Choose Greenplum from the database type dropdown
  4. Enter your connection details:
    • Server Host: Your Greenplum master hostname
    • Port: 5432 (or your custom port)
    • Database: Your database name
    • Username: scoop_user
    • Password: The user's password
  5. Click Test Connection to verify
  6. Enter your SQL query and proceed

Connection URL Format

Greenplum uses the PostgreSQL JDBC driver:

jdbc:postgresql://master-hostname:5432/database_name

With SSL:

jdbc:postgresql://master-hostname:5432/database_name?ssl=true&sslmode=require

Query Optimization

Greenplum performs best when queries leverage its distributed architecture:

  • Use distribution keys wisely to minimize data movement
  • Filter on partitioned columns when possible
  • Avoid cross-segment operations that require data redistribution
-- Good: Query uses distribution key in join
SELECT f.*, d.name
FROM fact_sales f
JOIN dim_customer d ON f.customer_id = d.customer_id
WHERE f.sale_date >= '2024-01-01'

-- Check query plan for performance
EXPLAIN ANALYZE SELECT ...

Best Practices

  • Analyze tables regularly to keep statistics current
  • Use appropriate distribution (hash, random, or replicated)
  • Partition large tables by date or other logical divisions
  • Use columnar storage (append-optimized) for analytical workloads

Troubleshooting

Connection Refused

  • Verify you're connecting to the master node (not segment nodes)
  • Check that Greenplum is running: gpstate -s
  • Ensure pg_hba.conf allows remote connections
  • Verify port 5432 is open in your firewall

Authentication Failed

  • Verify the username and password are correct
  • Check pg_hba.conf authentication method
  • Ensure the user exists: SELECT usename FROM pg_user;

Permission Denied

  • Verify GRANT statements were executed on the correct database
  • Check current permissions: \dp table_name in psql
  • Ensure USAGE is granted on the schema