Joins, Blends and Relationships in Tableau

Image

There are several ways to combine data depending on our requirements and the structure of the data.

Relationships

Relationships are a dynamic, flexible way to combine data from multiple tables for analysis. We recommend using relationships as your first approach to combining your data because it makes data preparation and analysis easier and more intuitive.

Characteristics:

Relationships -

  • Are displayed as flexible noodles between logical tables.
  • Requires you to select matching fields between two logical tables.
  • Do not require you to select join types.
  • Make all row and column data from related tables potentially available in the data source.
  • Maintain each table's level of detail in the data source and during the analysis.
  • Create independent domains at multiple levels of detail. Tables aren't merged in the data source.
  • Create the appropriate joins automatically based on the fields in use during analysis.
  • Do not duplicate aggregate values (when Performance Options are set to Many-to-Many).
  • Keep unmatched measure values (when Performance Options are set to Some Records Match.

Cross-Database Relationships:

Tableau allows table combining in different data sources using relationships, despite limitations from the database side on which platforms are compatible. Cross-database relationships require a multi-connection data source, i.e., you create a new connection to each database before you combine the tables.

Joins

Joins are a more static way to combine data. Joins must be defined between physical tables before analysis and can't be changed without impacting all sheets using that data source. Joined tables are always merged into a single table. As a result, sometimes joined data lacks unmatched values or duplicates aggregated values.

Characteristics:

Joins - 

  • Are displayed with Venn diagram icons between physical tables.
  • Require you to select join types and join clauses.
  • Joined physical tables are merged into a single logical table with a fixed combination of data.
  • May drop unmatched measure values.
  • May duplicate aggregate values when fields are at different levels of detail.
  • Support scenarios that require a single table of data, such as extract filters and aggregation.

Data Model

When you create a data source, it has two layers. The top layer is the logical layer of the data source. You combine data between tables in the logical layer using relationships. The second layer is the physical layer of the data source. You combine data between tables at the physical layer using joins.

Cross-Database Joins:

Tableau allows joins from tables in different data sources despite limitations from the database side on which platforms are compatible. Cross-database joins require a multi-connection data source—you create a new connection to each database before joining the tables.

Once you've connected to the first data source, use the Add option in the data pane to add another connection.

Cross-Database Joins

Note: If the connector you want is unavailable from the Connect list when trying to add another one, cross-database joins are not supported for that combination of sources. This includes connections to cube data (e.g., Microsoft Analysis Services), most extract-only data (e.g., Google Analytics and OData), and published Tableau Server data sources.

This creates a second connection rather than an entirely different data source. You can switch between the two (or more) connections while on the data source tab.

Cross-Database Joins

Once you move to a worksheet and begin analysis, the data source functions as a single, combined data source. This is in contrast to the two independent data sources that can be toggled on a worksheet.

Cross-Database Joins

Note: Typically, joining tables from the same database yields better performance. This is because querying data stored on the same database takes less time and leverages the native capabilities of the database to perform the join.

Blends:

Blends, unlike relationships or joins, never truly combine the data. Instead, blends query each data source independently, the results are aggregated to the appropriate level, and then the results are presented visually in the view. Because of this, blends can handle different levels of detail and work with published data sources. Blends are also established individually on every sheet and can never be published because there is no true "blended data source", just blended results from multiple data sources in a visualization.

Data blending is particularly useful when the blended relationship — linking fields — needs to vary on a sheet-by-sheet basis or when combining published data sources.

When to Use:

Relationships:

  • When relating tables, the fields that define the relationships must have the same data type. Changing the data type on the Data Source page does not change this requirement. Tableau will still use the data type in the underlying database for queries.
  • You can't define relationships based on geographic fields.
  • Circular relationships aren't supported in the data model.
  • You can't define relationships between the published data sources.

Blends:

  • It should be used when analyzing the data available from different sources.
  • Ensure that there is a common field called "Linking field" while performing data blending.
  • Data blending performs like a left join operation and does not accept or perform any other type of join.
  • Since data comes from different sources, both sets do not need the same level of detail. Data blending operation is independent of granularity.

Joins:

  • It is primarily used when merging data sets from the same source.
  • Data has to be maintained at the same level of granularity.
  • Joins data at a row level.

Note: Before version 2020.2, data blending was often the best way to handle data sources at different levels of detail. These can now be combined with relationships. Relationships have fewer technical limitations than data blending and are the recommended way of combining data when possible. Blending is only encouraged when it is the best method for your data or relationships are not available.

Data Blending Vs. Cross-database joins:

Data blending is useful under the following conditions:

  • You want to combine data from different databases that are not supported by cross-database joins - Cross-database joins do not support connections to cubes (for example, Oracle Essbase) or to some extract-only connections (for example, Google Analytics). In this case, set up individual data sources for the data you want to analyze, and then use data blending to combine the data sources on a single sheet.
  • Data is at different levels of detail - Sometimes one data set captures data using various levels of detail, i.e., greater or lesser granularity than the other data set. For example, if you are analyzing transactional and quota data. Transactional data might capture all transactions. However, quota data might aggregate transactions at the quarter level. Because the transactional values are captured at different levels of detail in each data set, you should use data blending to combine the data.

Note: Use relationships wherever it's possible. Use Joins if relationships are not possible. Use blending only if there is no other way.

When to Substitute Joining for Blending

1. Data needs cleaning

If your tables do not match correctly after a join, you should set up the data sources for each table, make any necessary customizations ( renaming columns, changing column data types, creating groups, using calculations, etc.), and then use data blending to combine the data.

2. Joins cause duplicate data

Duplicate data after a join is a symptom of data at different levels of detail. If you notice duplicate data, instead of creating a join, use data blending to blend on a common dimension instead.

3. You have lots of data

Typically, joins are recommended for combining data from the same database. Joins are handled by the database, which allows the joins to leverage some of the database's native capabilities.

However, when working with large data sets, joins can strain the database and significantly affect performance. In this case, data blending might help. Because Tableau handles combining the data after it is aggregated, there is less to combine. When there is less data to combine, generally, performance improves.

When to Substitute Joining for Blending

  • Some data blending limitations exist around non-additive aggregates, such as MEDIAN and RAWSQLAGG.
  • Data Blending compromises the speed of query in high granularity.
  • When you try to sort by a calculated field that uses blended data, the calculated Field is not listed in the Field drop-down list of the Sort dialog box.
  • Cube data sources can only be used as the primary data source for blending data in Tableau. They cannot be used as secondary data sources.

Written by:

Shruthi R

BI Developer

LinkedIn

Related Post

Leave a Reply