Blending Two Datasets

Spreadsheet Data Preparation to Blend Two Datasets

One of the most powerful aspects of Scoop is it's ability to blend data from two different datasets, and accordingly from two different source applications. Blending data is similar to adding calculated columns in that it uses a sheetlet template to allow you to create formulas that calculate records in your new dataset. However there are a couple of key differences:

  • Instead of the source being a single source report, you can specify two inputs
  • While each input is tied to a source dataset, you can aggregate and filter from those datasets using queries

So, to blend two datasets the process is:

  1. Specify a query for each of your source datasets the indicates which columns you want to use, how the data should be filtered and whether it should be aggregated
  2. Create a blending condition in the sheetlet template that specifies when an output record should be created - i.e. a condition, that when true, results in an output record
  3. Create formulas for your resulting dataset by specifying column names and formulas for the output records
  4. Processing the dataset

To create a blended dataset select the "Existing Scoop Data" option for creating a new dataset, since you will be creating a dataset that is based on data that is already in Scoop in other datasets.

Specify a name and optionally a description for your new dataset.

From here, you can select blended to blend two different queries together.

Because Scoop can aggregate and filter data in a query (see Dataset Queries), sometimes it may be useful just to do that on a single source. In which case, you can select "Aggregated" as an option instead. In that case you will only have to specify a single dataset query and can construct an aggregated dataset as a result. For the purposes here, we will blend two queries.

You can then construct the two source queries for your new dataset. See Dataset Queries for more information how to aggregate and filter using the dataset query screen. First, specify the first query from your datasets.

Then, specify the second query.

Now, one the blending page, you can specify a blending condition as below. Basically, Scoop evaluates all the records in the first query against all the records in the second query. Where the blending condition is true, an output record is created. This allows you to create a variety of ways to link sources together. Most commonly you will simply test if two rows, generally some sort of identifier, are equal. However, you may need to do some additional calculations to clean up some fields in order to make them match.

Finally, just like when adding calculated columns, there is an output record row. That row is calculated every time the blending condition is true so that you can create precisely the output that you would like in your dataset.

As with calculated columns, there is a full spreadsheet that is behind this sheetlet template. If you would like to open it up you can also click on the green spreadsheet icon in the upper left. That will open up the underlying spreadsheet in Google sheets where you can create side calculations if you like. After you specify your blend you can click on the Preview Data tab to make sure that you are generating the expected results.

Finally, when you are happy with your blended data conditions, you can name and optionally add a description to your new dataset.

You can also optionally select one of the checkboxes at the bottom. When checked, Scoop will trigger the reprocessing of this dataset when one or both of the source datasets is updated. In this way you can ensure that this dataset is always kept up-to-date.