Blending Two Datasets

Combine data from multiple sources into unified datasets

Data blending combines data from two different datasets—often from entirely different applications—into a single unified dataset. This enables analysis that would be impossible when data lives in separate systems.

Why Blend Data?

Use CaseSource 1Source 2Result
Customer 360CRM dealsSupport ticketsComplete customer view
Sales + MarketingPipeline dataCampaign dataAttribution analysis
Finance + OperationsBudget dataActual spendVariance analysis
Product + RevenueUsage metricsBilling dataRevenue per user

How Blending Works

┌────────────────┐     ┌────────────────┐
│   Dataset A    │     │   Dataset B    │
│  (e.g., CRM)   │     │ (e.g., Support)│
└───────┬────────┘     └───────┬────────┘
        │                      │
        ▼                      ▼
┌────────────────┐     ┌────────────────┐
│   Query A      │     │   Query B      │
│ (filter, cols) │     │ (filter, cols) │
└───────┬────────┘     └───────┬────────┘
        │                      │
        └──────────┬───────────┘
                   ▼
         ┌─────────────────┐
         │ Blend Condition │
         │  (JOIN logic)   │
         └────────┬────────┘
                  ▼
         ┌─────────────────┐
         │ Output Formulas │
         │ (new columns)   │
         └────────┬────────┘
                  ▼
         ┌─────────────────┐
         │ Blended Dataset │
         └─────────────────┘

Key Concepts

ConceptDescription
Source queriesDefine what data from each dataset to include
Blend conditionSpecifies which rows from A match which rows from B
Output formulasCalculate the values for your new blended dataset

The Blending Process

Step 1: Create New Dataset

Select Existing Scoop Data when creating a new dataset:

Name your blended dataset:

Select Blended to combine two datasets:

💡

Aggregated Option: If you only need to filter/aggregate a single dataset, select "Aggregated" instead.

Step 2: Configure First Query

Define what data to pull from your first dataset:

See Dataset Queries for query configuration details.

Step 3: Configure Second Query

Define what data to pull from your second dataset:

Step 4: Define Blend Condition

The blend condition determines which rows from Dataset A match which rows from Dataset B:

How Blend Conditions Work

Scoop evaluates every row in Query A against every row in Query B. When the condition is TRUE, an output record is created.

Common blend conditions:

TypeFormula ExampleUse Case
Exact match=A2=B2Match on ID columns
Text cleanup=TRIM(UPPER(A2))=TRIM(UPPER(B2))Handle formatting differences
Date match=A2=B2Match on date columns
Range match=AND(A2>=B2, A2<=C2)Value falls in range

Match Examples

Simple ID Match:

=CustomerID_A = CustomerID_B

Match with Cleanup:

=TRIM(UPPER(Account_Name_CRM)) = TRIM(UPPER(Company_Support))

Date Period Match:

=AND(Transaction_Date >= Period_Start, Transaction_Date <= Period_End)

Step 5: Define Output Formulas

Create columns for your blended output using spreadsheet formulas:

Output formula examples:

Column NameFormulaDescription
Customer_ID=A2Carry forward from Query A
Total_Value=D2+E5Sum values from both sources
Support_Ratio=E5/D2Calculate ratio across sources
Combined_Name=CONCATENATE(A2," - ",B5)Combine text fields

Step 6: Preview Results

Click the Preview Data tab to verify your blend produces expected results:

Check for:

  • Expected number of output rows
  • Correct values in calculated columns
  • No unexpected NULLs or errors

Step 7: Configure Auto-Refresh

OptionBehavior
Reprocess on Source A updateBlend refreshes when first dataset updates
Reprocess on Source B updateBlend refreshes when second dataset updates
Both checkedBlend always reflects latest data

Advanced Blending

Using the Full Spreadsheet

Click the spreadsheet icon in the upper left to open the full Google Sheets workbook. Here you can:

  • Create helper calculations
  • Add lookup tables
  • Build complex multi-step formulas
  • Document your logic

Aggregation Before Blending

Sometimes you need to aggregate data before blending (e.g., sum transactions to monthly totals):

  1. In the Query configuration, enable Aggregate by selected attributes
  2. Select grouping columns and aggregation period
  3. Numbers will be summed, giving you summarized data to blend

Handling Many-to-Many Relationships

When one record in A matches multiple records in B:

Example: One customer, multiple support tickets

Customer (A)Tickets (B)Output Rows
Acme CorpTicket 1Acme Corp + Ticket 1
Acme CorpTicket 2Acme Corp + Ticket 2
Acme CorpTicket 3Acme Corp + Ticket 3

If you want one row per customer, aggregate Query B first to get ticket counts.

Common Blend Scenarios

CRM + Support Data

Goal: Add support ticket counts to CRM accounts

Query A (CRM): Accounts with ID, Name, Revenue Query B (Support): Tickets aggregated by Account, with Count

Blend Condition: =Account_ID_CRM = Account_ID_Support

Output: Account ID, Name, Revenue, Ticket Count, Revenue per Ticket

Budget vs. Actual

Goal: Compare planned budget to actual spending

Query A (Budget): Department, Category, Budgeted Amount Query B (Finance): Department, Category, Actual Spend

Blend Condition: =AND(Dept_A=Dept_B, Cat_A=Cat_B)

Output: Department, Category, Budget, Actual, Variance

Product Usage + Revenue

Goal: Calculate revenue per active user

Query A (Product): Account ID, Active Users, Features Used Query B (Billing): Account ID, Monthly Revenue

Blend Condition: =Account_A = Account_B

Output: Account, Users, Revenue, Revenue per User

Troubleshooting

No Output Rows

Symptom: Blend produces zero results.

Causes:

  • Blend condition never evaluates to TRUE
  • Key columns have different formats (case, spacing, data type)
  • Filters too restrictive in source queries

Solutions:

  1. Preview each source query separately
  2. Check sample key values match exactly
  3. Use TRIM/UPPER to normalize text comparisons

Too Many Output Rows

Symptom: Blend produces far more rows than expected.

Causes:

  • Many-to-many relationship without aggregation
  • Blend condition too loose

Solutions:

  1. Aggregate one or both sources
  2. Add additional match criteria to blend condition

Missing Data

Symptom: Some expected matches are missing.

Causes:

  • Key values don't match (typos, formatting)
  • Date ranges don't overlap
  • Filters excluding needed records

Solutions:

  1. Check source data for the missing records
  2. Verify blend condition handles edge cases
  3. Adjust filters to include more data

Performance Issues

Symptom: Blend takes very long to process.

Causes:

  • Large datasets without aggregation
  • Complex blend conditions

Solutions:

  1. Aggregate data before blending
  2. Filter to necessary time periods
  3. Simplify blend conditions

Best Practices

Design for Maintainability

  • Document blend logic in the spreadsheet
  • Use clear, consistent naming
  • Test with sample data before full processing

Optimize Performance

  • Aggregate when possible to reduce row counts
  • Filter source queries to necessary data
  • Avoid unnecessary columns in output

Ensure Data Quality

  • Validate key columns match formats
  • Check for duplicates in source data
  • Preview results before deploying

Related Topics