A data warehouse is a data management system created to facilitate and assist business intelligence (BI) activities, particularly analytics. Data warehouses frequently include significant volumes of historical data and are used only to conduct queries and analysis. The data in a data warehouse typically originates from a variety of sources, including transaction apps and application log files.
What is a data warehouse?
A data warehouse (DW or DWH), also known as an enterprise data warehouse (ETW), is a data storage accumulated from a large variety of heterogeneous sources, aimed at helping business owners drive decision-making processes, by transforming data to information and as such it is mainly used for reporting and data analysis tasks.
Data warehouse main components
- Integration layer (data load) – Extraction, loading and transformation of data into the warehouse.
- Data manager – Balances and prepares the data to be consumed, as well as environment maintenance tasks such as index management, backup operations and aggregation.
- Analytical layer (compute and query) – Processing backend resources and internal query optimization tools, orchestrating the data analysis process.
- End-User tools – Either manual or automated techniques used by end-users to research data and generate insights.
On-prem vs. cloud data warehouse
On-premises data warehouses provide:
- Technological stack - control and maintenance
- Maximized resources - for speed and performance
- Regulatory compliance – as part of the organization perimeter
Cloud data warehouses provide:
- Scalability and cost efficiency – automatic scaling according to business needs along with cost prediction, including both storage facilities and compute technology
- Cloud eco-system – native integration with security and analytics offerings
- Built-in governance – physical and logical security controls, adhering to the highest levels with continuous improvement as the cloud providers invest across their offerings
- System uptime and availability – reliability and availability of 99.9% are common offering as part of the service provider SLA
Common cloud data warehouse solutions
Almost every cloud provider has a cloud data warehouse offering Some common examples are Snowflake, Google BiqQuery, Amazon RedShift, Azure Synapse, and IBM db2.
Data lakes and data warehouses
Both are built to store vast amounts of data. However, in data lakes the data is usually in raw format and prepared for consumption, whereas in data warehouses the data is pre-processed and filtered to support specific analytics purposes.