Azure SQL Data Warehouse

Azure SQL Data Warehouse
Azure SQL Data Warehouse is a warehousing service in the cloud that’s available on Azure as a Platform as a Service (PaaS) offer. Azure SQL Data Warehouse is also a Massively Parallel Processing (MPP) system. It’s actually a distributed system where different computers, called nodes, work together to supply the data for your queries. The splitting of Azure SQL Data Warehouse into multiple machines makes it dependent on distributed storage as well as distributed compute.

Azure SQL Data Warehouse works well for with analytical workloads typical in Big Data scenarios. It can store large volumes of data, perform query analysis and ad-hoc reporting across large datasets. It can also consolidate disparate data into a single location and shape, model, transform, and aggregate data, using simple SQL constructs.

Azure SQL Database is a Massively Parallel Processing (MPP) system. A Massively Parallel Processing (MPP) system is composed of multiple machines, not just one large machine. Usually these machines will all have a slice of the data from the database, and when a query comes in, the query execution is distributed across each machine to generate the query results you require. When you need to perform query analysis against large datasets, an MPP system scales very well as the amount of data increases.

Data Warehousing Unit (DWU)
In Azure SQL Data Warehouse, there is a concept called a Data Warehousing Unit (DWU). This is a measure of the underlying compute power of the database.

How the System Orchestrates Queries
Control Node and Compute Nodes
The Control Node is a special category of node that receives all the connections and orchestrates the queries. It sends the different queries to different nodes and receives back the results. Therefore it is always in communication with the different Compute Nodes that are part of the MPP system.

The Compute Nodes, on the other hand, read data off the distribution, perform all the processing, and scale with the DWUs. This is an important distinction to grasp. The Control Node will always be the same regardless of how many DWUs you add. However you will get more Compute Nodes as your DWUs increase.Control Node and Compute Nodes

Methods for Loading Data

Leave a Reply