# Working with Datasets in Scoop for Slack

# Dataset Mastery: Your Gateway to Insights

Master the art of dataset management to unlock the full power of Scoop's analytics capabilities.

## Understanding the Dataset Ecosystem

### 🌐 Three Types of Datasets

**1. Organization Datasets** 🏢
- Company-wide data sources
- Live connections to business systems
- Automatic refresh schedules
- Shared across teams
- Examples: CRM, ERP, Marketing platforms

**2. Personal Datasets** 👤
- Your uploaded files
- Private by default
- Full control over sharing
- Perfect for ad-hoc analysis
- Examples: Excel reports, CSV exports

**3. Channel Datasets** 📣
- Auto-mapped to specific channels
- Context-aware selection
- Team-aligned data
- Admin configured
- Examples: Sales data in #sales

!\[Screenshot: Dataset selector showing different dataset types]

## Navigating Datasets

### 🎯 Quick Selection Commands

**See Available Datasets**
```
@Scoop show datasets
@Scoop list all data sources
@Scoop what data can I analyze?
```

**Switch Datasets**
```
@Scoop use sales dataset
@Scoop switch to marketing data
@Scoop change to customer analytics
```

**Check Current Dataset**
```
@Scoop current dataset
@Scoop what am I analyzing?
@Scoop status
```

!\[Screenshot: Dataset selection dropdown interface]

### 📊 Understanding Dataset Cards

Each dataset displays rich metadata:

```
📊 Customer Analytics Dataset
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Type: 🏢 Organization Dataset
Source: Salesforce + Support System
Records: 45,832 customers
Updated: 2 hours ago (Live sync)
Quality: 98% complete

Key Metrics:
• Total Revenue: $45.2M
• Active Customers: 3,421
• Avg Customer Value: $13,200
• Churn Rate: 12%

Top Tables:
• accounts (customer master)
• opportunities (sales pipeline)
• cases (support tickets)
• activities (engagement log)

[📥 Use This Dataset] [ℹ️ More Info]
```

## Organization Datasets Deep Dive

### 🔗 Connected Systems

**CRM Platforms**
- Salesforce: Accounts, Opportunities, Leads
- HubSpot: Contacts, Deals, Activities
- Pipedrive: Deals, Organizations, People
- Microsoft Dynamics: Customers, Sales

**Support Systems**
- Zendesk: Tickets, Satisfaction, Agents
- Intercom: Conversations, Users, Tags
- Freshdesk: Tickets, Contacts, Groups

**Marketing Tools**
- Google Analytics: Traffic, Conversions
- Marketo: Campaigns, Leads, Programs
- Mailchimp: Campaigns, Subscribers

**Financial Systems**
- QuickBooks: Invoices, Customers
- Stripe: Payments, Subscriptions
- NetSuite: Transactions, Accounts

### 🔄 Data Freshness

```
Dataset: Sales Pipeline
Last Sync: 10 minutes ago
Next Sync: In 20 minutes
Sync Status: ✅ Healthy

Recent Changes:
• 12 new opportunities
• 34 updated stages
• 5 closed deals

[🔄 Refresh Now] [⚙️ Sync Settings]
```

### 🔐 Permission Model

**Access Levels:**
- **Full Access**: All data, no restrictions
- **Department**: Your team's data only
- **Role-Based**: Based on Slack groups
- **Custom**: Admin-defined rules

**Security Features:**
- Row-level security
- Column masking for PII
- Audit trail of access
- Compliance controls

## Personal Dataset Management

### 📤 Creating Personal Datasets

**From File Upload:**
```
You: [Uploading quarterly_review.xlsx]

Scoop: 📊 Creating personal dataset...
✅ "Q4 Review Data" ready for analysis

This dataset includes:
• 15,420 records
• 12 analysis-ready columns
• Date range: Oct-Dec 2024

What would you like to explore?
```

**From Analysis Results:**
```
You: Save this filtered view as a dataset

Scoop: 💾 Saved as "High-Value Customers"
This personal dataset contains:
• 342 customers
• Filtered: LTV > $50,000
• All original columns preserved

[📊 Switch to New Dataset] [🔙 Keep Current]
```

!\[Screenshot: Personal dataset created from uploaded file]

### 🗂️ Organizing Personal Datasets

**Naming Best Practices:**
```
✅ Good Names:
• "2024_Q4_Sales_Analysis"
• "Customer_Segmentation_Dec"
• "Marketing_Campaign_Results"

❌ Avoid:
• "data"
• "test"
• "final_final_v2"
```

**Dataset Actions:**
```
@Scoop rename dataset to "Executive Dashboard Data"
@Scoop add description "Monthly KPIs for board meeting"
@Scoop tag dataset with #finance #monthly
@Scoop delete old datasets
```

### 🔄 Dataset Lifecycle

```
Personal Dataset: Marketing Leads
Created: Dec 1, 2024
Last Used: Dec 15, 2024
Size: 2.4 MB

⚠️ This dataset hasn't been used in 14 days

Options:
[📊 Use Dataset] [🔄 Update Data] [🗑️ Delete] [📤 Share]
```

## Channel-Mapped Datasets

### 🎯 Automatic Context

**How Mapping Works:**
```
#sales-team → CRM Dataset
#marketing → Campaign Dataset
#support → Ticket Dataset
#finance → Revenue Dataset
#product → Usage Dataset
```

**Smart Detection:**
- Channel name analysis
- Member role detection
- Historical query patterns
- Admin preferences

### 🔧 Configuration

**For Admins:**
```
@Scoop map dataset "Enterprise CRM" to #enterprise-sales
@Scoop set default dataset for #analytics
@Scoop show channel mappings
```

**Override When Needed:**
```
In #sales channel:
@Scoop use marketing dataset for this query
@Scoop temporarily switch to support data
```

## Advanced Dataset Features

### 🔍 Dataset Discovery

```
You: Which dataset has customer churn data?

Scoop: 🔍 Found churn data in 3 datasets:

1. Customer Analytics (Primary)
   • churn_flag column
   • churn_date column
   • churn_reason field

2. Sales CRM
   • account_status (includes "Churned")
   • close_lost_reason

3. Support Tickets
   • cancellation_tickets table

Recommend using Customer Analytics for comprehensive analysis.

[Use Customer Analytics] [Compare All Three]
```

### 🔄 Dataset Relationships

```
You: Show relationships in this dataset

Scoop: 🔗 Analyzing data structure...

Key Relationships Found:
• Customer ID links to multiple orders
• Product categories have hierarchies
• Time-based patterns detected

Understanding these helps you:
- Ask more sophisticated questions
- Navigate data relationships
- Build better queries

[Explore Relationships] [View Schema]
```

### 📊 Dataset Profiling

```
@Scoop profile this dataset

Dataset Profile: Sales Pipeline
━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Table Structure:
• 8 tables, 127 total columns
• Primary table: opportunities (45K rows)
• Most joined: accounts (92% queries)

Data Quality:
• Completeness: 94%
• Missing: phone (45%), website (23%)
• Duplicates: 0.2% (78 records)

Usage Patterns:
• Top queries: Revenue, Pipeline, Forecast
• Peak usage: Mon 9-11 AM
• Power users: Sarah, Mike, Lisa

Optimization Suggestions:
• Add index on close_date
• Archive records > 2 years
• Standardize industry field
```

### 🎨 Custom Views

**Create Filtered Views:**
```
You: Create a view of enterprise accounts only

Scoop: 🎨 Creating custom view...

View: "Enterprise Accounts"
Filters: 
- revenue > $1M
- employees > 500
- type = "Enterprise"

Results: 342 accounts (8% of total)

This view is now available as a virtual dataset.
[Use Enterprise View] [Edit Filters]
```

**Saved Calculations:**
```
You: Add customer health score to this dataset

Scoop: 🧮 Adding calculated field...

Health Score = 
  (Usage × 0.4) + 
  (NPS × 0.3) + 
  (Support × 0.2) + 
  (Payment × 0.1)

✅ Field added and available for all queries
```

## Dataset Best Practices

### 🎯 Choosing the Right Dataset

**Match Dataset to Question:**
```
Revenue questions → Financial dataset
Customer behavior → CRM dataset
Campaign performance → Marketing dataset
Product usage → Analytics dataset
```

**Start Broad, Then Narrow:**
```
1. Use comprehensive dataset
2. Explore available fields
3. Create filtered view if needed
4. Save as personal dataset for reuse
```

### 🔍 Dataset Exploration

**First Time with Dataset:**
```
"describe this dataset"
"show me all tables"
"what are the key metrics?"
"show sample records"
"what questions can I answer?"
```

**Understanding Relationships:**
```
"how are tables connected?"
"show me the data model"
"what are the primary keys?"
"explain the relationships"
```

### ⚡ Performance Tips

**Query Optimization:**
- Start with recent data
- Use dataset-specific filters
- Leverage pre-aggregated fields
- Avoid SELECT * patterns

**Large Dataset Strategies:**
```
"sample 1000 records first"
"analyze last 30 days only"
"use the summary table"
"what's the optimized query?"
```

## Troubleshooting Datasets

### 🚨 Common Issues

**"Dataset not found"**
```
Solutions:
1. Check exact dataset name
2. Verify permissions
3. Confirm workspace access
4. Try partial name match
```

**"No data returned"**
```
Debug steps:
@Scoop check dataset status
@Scoop when was last refresh?
@Scoop show sample data
@Scoop verify my filters
```

**"Access denied"**
```
Options:
• Request access from admin
• Check Slack group membership
• Verify organization settings
• Use personal dataset instead
```

### 🔧 Dataset Maintenance

**For Admins:**
```
@Scoop audit dataset usage
@Scoop clean unused datasets
@Scoop optimize slow datasets
@Scoop check sync health
```

**For Users:**
```
@Scoop report data issue
@Scoop suggest new dataset
@Scoop request refresh
@Scoop why is data missing?
```

## Advanced Workflows

### 🔄 Dataset Comparison

```
You: Compare sales in CRM vs Finance dataset

Scoop: 📊 Comparing datasets...

CRM Sales Dataset:
• Total: $12.4M
• Deals: 1,234
• Period: Real-time

Finance Dataset:
• Total: $11.9M
• Invoices: 1,198
• Period: Month-end close

Variance: $500K (4%)
Missing in Finance: 36 deals (pending invoice)

[View Details] [Reconcile] [Export Differences]
```

### 🎭 Dataset Personas

**Create Role-Based Views:**
```
Executive View:
- High-level KPIs only
- Trends and forecasts
- Exception reporting

Analyst View:
- All detailed data
- Custom calculations
- Raw access

Sales View:
- Territory filtered
- Real-time pipeline
- Commission relevant
```

### 🤖 Smart Dataset Selection

```
You: Analyze customer satisfaction

Scoop: 🤔 Found relevant data in multiple datasets:

1. Support Tickets (CSAT scores)
2. NPS Survey Results  
3. Product Usage (engagement)
4. CRM (renewal data)

Would you like to:
[Analyze Support CSAT] [Combine All Sources] [Compare Datasets]
```

## Next Steps

Ready to become a dataset power user?

1. 📤 **[Upload Your First File](uploading-and-analyzing-files.md)** - Create personal datasets
2. 📊 **[Master Visualizations](visualizations-exports.md)** - Beautiful charts from any dataset
3. 🤖 **[ML on Datasets](machine-learning-analytics.md)** - Advanced analytics
4. 🚀 **[Advanced Features](advanced-features.md)** - Deep reasoning capabilities

---

**Pro tip**: The right dataset makes all the difference. Spend 30 seconds choosing the correct dataset and save 30 minutes of analysis time! 🎯