# Working with Datasets in Scoop for Slack # Dataset Mastery: Your Gateway to Insights Master the art of dataset management to unlock the full power of Scoop's analytics capabilities. ## Understanding the Dataset Ecosystem ### 🌐 Three Types of Datasets **1. Organization Datasets** đŸĸ - Company-wide data sources - Live connections to business systems - Automatic refresh schedules - Shared across teams - Examples: CRM, ERP, Marketing platforms **2. Personal Datasets** 👤 - Your uploaded files - Private by default - Full control over sharing - Perfect for ad-hoc analysis - Examples: Excel reports, CSV exports **3. Channel Datasets** đŸ“Ŗ - Auto-mapped to specific channels - Context-aware selection - Team-aligned data - Admin configured - Examples: Sales data in #sales !\[Screenshot: Dataset selector showing different dataset types] ## Navigating Datasets ### đŸŽ¯ Quick Selection Commands **See Available Datasets** ``` @Scoop show datasets @Scoop list all data sources @Scoop what data can I analyze? ``` **Switch Datasets** ``` @Scoop use sales dataset @Scoop switch to marketing data @Scoop change to customer analytics ``` **Check Current Dataset** ``` @Scoop current dataset @Scoop what am I analyzing? @Scoop status ``` !\[Screenshot: Dataset selection dropdown interface] ### 📊 Understanding Dataset Cards Each dataset displays rich metadata: ``` 📊 Customer Analytics Dataset ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Type: đŸĸ Organization Dataset Source: Salesforce + Support System Records: 45,832 customers Updated: 2 hours ago (Live sync) Quality: 98% complete Key Metrics: â€ĸ Total Revenue: $45.2M â€ĸ Active Customers: 3,421 â€ĸ Avg Customer Value: $13,200 â€ĸ Churn Rate: 12% Top Tables: â€ĸ accounts (customer master) â€ĸ opportunities (sales pipeline) â€ĸ cases (support tickets) â€ĸ activities (engagement log) [đŸ“Ĩ Use This Dataset] [â„šī¸ More Info] ``` ## Organization Datasets Deep Dive ### 🔗 Connected Systems **CRM Platforms** - Salesforce: Accounts, Opportunities, Leads - HubSpot: Contacts, Deals, Activities - Pipedrive: Deals, Organizations, People - Microsoft Dynamics: Customers, Sales **Support Systems** - Zendesk: Tickets, Satisfaction, Agents - Intercom: Conversations, Users, Tags - Freshdesk: Tickets, Contacts, Groups **Marketing Tools** - Google Analytics: Traffic, Conversions - Marketo: Campaigns, Leads, Programs - Mailchimp: Campaigns, Subscribers **Financial Systems** - QuickBooks: Invoices, Customers - Stripe: Payments, Subscriptions - NetSuite: Transactions, Accounts ### 🔄 Data Freshness ``` Dataset: Sales Pipeline Last Sync: 10 minutes ago Next Sync: In 20 minutes Sync Status: ✅ Healthy Recent Changes: â€ĸ 12 new opportunities â€ĸ 34 updated stages â€ĸ 5 closed deals [🔄 Refresh Now] [âš™ī¸ Sync Settings] ``` ### 🔐 Permission Model **Access Levels:** - **Full Access**: All data, no restrictions - **Department**: Your team's data only - **Role-Based**: Based on Slack groups - **Custom**: Admin-defined rules **Security Features:** - Row-level security - Column masking for PII - Audit trail of access - Compliance controls ## Personal Dataset Management ### 📤 Creating Personal Datasets **From File Upload:** ``` You: [Uploading quarterly_review.xlsx] Scoop: 📊 Creating personal dataset... ✅ "Q4 Review Data" ready for analysis This dataset includes: â€ĸ 15,420 records â€ĸ 12 analysis-ready columns â€ĸ Date range: Oct-Dec 2024 What would you like to explore? ``` **From Analysis Results:** ``` You: Save this filtered view as a dataset Scoop: 💾 Saved as "High-Value Customers" This personal dataset contains: â€ĸ 342 customers â€ĸ Filtered: LTV > $50,000 â€ĸ All original columns preserved [📊 Switch to New Dataset] [🔙 Keep Current] ``` !\[Screenshot: Personal dataset created from uploaded file] ### đŸ—‚ī¸ Organizing Personal Datasets **Naming Best Practices:** ``` ✅ Good Names: â€ĸ "2024_Q4_Sales_Analysis" â€ĸ "Customer_Segmentation_Dec" â€ĸ "Marketing_Campaign_Results" ❌ Avoid: â€ĸ "data" â€ĸ "test" â€ĸ "final_final_v2" ``` **Dataset Actions:** ``` @Scoop rename dataset to "Executive Dashboard Data" @Scoop add description "Monthly KPIs for board meeting" @Scoop tag dataset with #finance #monthly @Scoop delete old datasets ``` ### 🔄 Dataset Lifecycle ``` Personal Dataset: Marketing Leads Created: Dec 1, 2024 Last Used: Dec 15, 2024 Size: 2.4 MB âš ī¸ This dataset hasn't been used in 14 days Options: [📊 Use Dataset] [🔄 Update Data] [đŸ—‘ī¸ Delete] [📤 Share] ``` ## Channel-Mapped Datasets ### đŸŽ¯ Automatic Context **How Mapping Works:** ``` #sales-team → CRM Dataset #marketing → Campaign Dataset #support → Ticket Dataset #finance → Revenue Dataset #product → Usage Dataset ``` **Smart Detection:** - Channel name analysis - Member role detection - Historical query patterns - Admin preferences ### 🔧 Configuration **For Admins:** ``` @Scoop map dataset "Enterprise CRM" to #enterprise-sales @Scoop set default dataset for #analytics @Scoop show channel mappings ``` **Override When Needed:** ``` In #sales channel: @Scoop use marketing dataset for this query @Scoop temporarily switch to support data ``` ## Advanced Dataset Features ### 🔍 Dataset Discovery ``` You: Which dataset has customer churn data? Scoop: 🔍 Found churn data in 3 datasets: 1. Customer Analytics (Primary) â€ĸ churn_flag column â€ĸ churn_date column â€ĸ churn_reason field 2. Sales CRM â€ĸ account_status (includes "Churned") â€ĸ close_lost_reason 3. Support Tickets â€ĸ cancellation_tickets table Recommend using Customer Analytics for comprehensive analysis. [Use Customer Analytics] [Compare All Three] ``` ### 🔄 Dataset Relationships ``` You: Show relationships in this dataset Scoop: 🔗 Analyzing data structure... Key Relationships Found: â€ĸ Customer ID links to multiple orders â€ĸ Product categories have hierarchies â€ĸ Time-based patterns detected Understanding these helps you: - Ask more sophisticated questions - Navigate data relationships - Build better queries [Explore Relationships] [View Schema] ``` ### 📊 Dataset Profiling ``` @Scoop profile this dataset Dataset Profile: Sales Pipeline ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Table Structure: â€ĸ 8 tables, 127 total columns â€ĸ Primary table: opportunities (45K rows) â€ĸ Most joined: accounts (92% queries) Data Quality: â€ĸ Completeness: 94% â€ĸ Missing: phone (45%), website (23%) â€ĸ Duplicates: 0.2% (78 records) Usage Patterns: â€ĸ Top queries: Revenue, Pipeline, Forecast â€ĸ Peak usage: Mon 9-11 AM â€ĸ Power users: Sarah, Mike, Lisa Optimization Suggestions: â€ĸ Add index on close_date â€ĸ Archive records > 2 years â€ĸ Standardize industry field ``` ### 🎨 Custom Views **Create Filtered Views:** ``` You: Create a view of enterprise accounts only Scoop: 🎨 Creating custom view... View: "Enterprise Accounts" Filters: - revenue > $1M - employees > 500 - type = "Enterprise" Results: 342 accounts (8% of total) This view is now available as a virtual dataset. [Use Enterprise View] [Edit Filters] ``` **Saved Calculations:** ``` You: Add customer health score to this dataset Scoop: 🧮 Adding calculated field... Health Score = (Usage × 0.4) + (NPS × 0.3) + (Support × 0.2) + (Payment × 0.1) ✅ Field added and available for all queries ``` ## Dataset Best Practices ### đŸŽ¯ Choosing the Right Dataset **Match Dataset to Question:** ``` Revenue questions → Financial dataset Customer behavior → CRM dataset Campaign performance → Marketing dataset Product usage → Analytics dataset ``` **Start Broad, Then Narrow:** ``` 1. Use comprehensive dataset 2. Explore available fields 3. Create filtered view if needed 4. Save as personal dataset for reuse ``` ### 🔍 Dataset Exploration **First Time with Dataset:** ``` "describe this dataset" "show me all tables" "what are the key metrics?" "show sample records" "what questions can I answer?" ``` **Understanding Relationships:** ``` "how are tables connected?" "show me the data model" "what are the primary keys?" "explain the relationships" ``` ### ⚡ Performance Tips **Query Optimization:** - Start with recent data - Use dataset-specific filters - Leverage pre-aggregated fields - Avoid SELECT * patterns **Large Dataset Strategies:** ``` "sample 1000 records first" "analyze last 30 days only" "use the summary table" "what's the optimized query?" ``` ## Troubleshooting Datasets ### 🚨 Common Issues **"Dataset not found"** ``` Solutions: 1. Check exact dataset name 2. Verify permissions 3. Confirm workspace access 4. Try partial name match ``` **"No data returned"** ``` Debug steps: @Scoop check dataset status @Scoop when was last refresh? @Scoop show sample data @Scoop verify my filters ``` **"Access denied"** ``` Options: â€ĸ Request access from admin â€ĸ Check Slack group membership â€ĸ Verify organization settings â€ĸ Use personal dataset instead ``` ### 🔧 Dataset Maintenance **For Admins:** ``` @Scoop audit dataset usage @Scoop clean unused datasets @Scoop optimize slow datasets @Scoop check sync health ``` **For Users:** ``` @Scoop report data issue @Scoop suggest new dataset @Scoop request refresh @Scoop why is data missing? ``` ## Advanced Workflows ### 🔄 Dataset Comparison ``` You: Compare sales in CRM vs Finance dataset Scoop: 📊 Comparing datasets... CRM Sales Dataset: â€ĸ Total: $12.4M â€ĸ Deals: 1,234 â€ĸ Period: Real-time Finance Dataset: â€ĸ Total: $11.9M â€ĸ Invoices: 1,198 â€ĸ Period: Month-end close Variance: $500K (4%) Missing in Finance: 36 deals (pending invoice) [View Details] [Reconcile] [Export Differences] ``` ### 🎭 Dataset Personas **Create Role-Based Views:** ``` Executive View: - High-level KPIs only - Trends and forecasts - Exception reporting Analyst View: - All detailed data - Custom calculations - Raw access Sales View: - Territory filtered - Real-time pipeline - Commission relevant ``` ### 🤖 Smart Dataset Selection ``` You: Analyze customer satisfaction Scoop: 🤔 Found relevant data in multiple datasets: 1. Support Tickets (CSAT scores) 2. NPS Survey Results 3. Product Usage (engagement) 4. CRM (renewal data) Would you like to: [Analyze Support CSAT] [Combine All Sources] [Compare Datasets] ``` ## Next Steps Ready to become a dataset power user? 1. 📤 **[Upload Your First File](uploading-and-analyzing-files.md)** - Create personal datasets 2. 📊 **[Master Visualizations](visualizations-exports.md)** - Beautiful charts from any dataset 3. 🤖 **[ML on Datasets](machine-learning-analytics.md)** - Advanced analytics 4. 🚀 **[Advanced Features](advanced-features.md)** - Deep reasoning capabilities --- **Pro tip**: The right dataset makes all the difference. Spend 30 seconds choosing the correct dataset and save 30 minutes of analysis time! đŸŽ¯