03 Dec Enterprise Data Warehousing Architectures: On-Premise Vs Cloud
Digital Transformation is the buzzword across all forms of enterprises. Enterprises of any type thrive on data and associated digital reports which drive the business towards better business decisions. Scattered and unorganized data leads to chaos and confusion, and will not help in any way towards better business results. There comes the demand for a strong and robust Data Warehousing Architecture.
We will dissect this concept of enterprise data management and then deep dive into the main two types of Enterprise Data Warehousing Architecture(EDWA). This article discusses in-depth about the processes and benefits of Traditional Enterprise Data Warehousing Architecture and EDWA based on Cloud. Also, this article throws light on who would benefit from which type of Enterprise Data Warehousing Architecture.
Data Warehouse – Revealing the Concept
Enterprise Data Warehouse is the backbone of Business Intelligence. This component is mainly used for Intelligent Reporting and In-depth data analysis. Data warehouses are central repositories that store and station integrated data that comes from several disparate sources. This has designated spaces to store historical and current data which are then analyzed to create accessible reports for entire enterprise teams.
Few key benefits of EDW include:
- Improved data quality in the form of consistent codes and descriptions, flagging and fixing bad data
- Provides a single unique data model regardless of the source of data
- Restructuring the data to make it useful for business users. This type of data delivers excellent query performance for complex analytic queries
- Adds value to operational business applications like customer relationship management (CRM) systems
Now let us have a look at On-Premise DW architecture
Traditional data warehouses are built based on some of the following established ideas and design principles. Let us discuss this very famous Three-Tier-Architecture.
Three-Tier Data Warehouse Architecture
This structure contains the following tiers:
- Bottom Tier – this contains the database server which is used to extract data from various sources, such as from transactional databases that are used for front end applications.
- Middle Tier – this station is an OLAP server that transforms data into a structure that is perfectly suitable for complex querying and analysis. OLAP server normally works in 2 ways: a) either as an extended relational database management system which maps operations on multidimensional data to standard relational operations, or b) by making use of a multidimensional OLAP model which directly executes multidimensional data and operations.
- Top Tier – this is the client layer that holds the tools used for high-level data analysis, querying and reporting and data mining.
Data Warehousing Models
We will discuss 3 common data warehouse models, virtual warehouse, data mart, and enterprise data warehouse.
- The virtual database allows users to efficaciously access all data from the data warehouse. Here data is actually stored in a set of separate databases and can be queried together.
- The Datamart model is mainly useful for business-line specific reporting and further analysis. Here the data is aggregated from a wide range of source systems relevant to a particular business area.
- Enterprise data warehouse model contains aggregated data from all business units and it spans across the entire organization.
Loading Data into a Warehouse
Data contained in the warehouse is uploaded from many different operational systems, for example, marketing/sales. This stored data passes through an operational data store and often requires data cleansing to get it ready for additional operations. This is done to ensure data quality before data is used for reporting in the data warehouse. Main 2 approaches used towards building a data warehouse system are:
- ETL – Extract, Transform, Load
- E-LT – Extract, Load, Transform
ETL extracts data from a pool of data sources, mainly transactional databases. This data is then stored in a temporary staging database. Then it is off to the transform process which structures and converts the data into a prescribed format for the target data warehouse system. This structured data is then loaded into the data warehouse and is ready for analysis.
ELT loads data immediately into the single, centralized repository after being extracted from the source data pools. This data is then transformed inside the warehouse system using business intelligence tools and analytics.
Structuring a Traditional Data Warehouse
There are 2 ways to structure a data warehouse: Star schema and Snowflake schema
Star schema boasts of a centralized data repository that is stationed in a fact table. The fact table is being split into a series of denormalized dimension tables by this schema. The fact table consists of aggregated data that are used for reporting purposes. The dimension table actually depicts the stored data. Denormalized designs are simple mainly because data is grouped. The fact table makes use of only one link to join to each dimension table. And this simple design makes it easier to write complex queries.
Snowflake schema normalizes data by efficiently organizing data. In this way, every data dependencies are well defined and each table consists of minimal redundancies. Single dimension tables branch out into disparate dimension tables. Snowflake schema actually uses very less disk space and maintains data integrity. But queries are quite complex and this makes it a little difficult to access required data since there are different joins.
Cloud Data Warehouse Architecture
Cloud Technology is paving its way towards creating modern data warehouse architecture. Cloud-based warehouse architecture is one way to efficiently utilize data warehousing resources. Organizations that are looking for modernizing their operations often optimize their transition from on-premises to cloud-based data warehouses. For this, specialized cloud management solutions that are designed exclusively to manage the movement of data in the cloud are deployed.
Let us discuss this architecture used by the few most popular cloud-based warehouses, Amazon Redshift and Microsoft Azure
Amazon Redshift
Components of Amazon Redshift are as shown in this illustration:
- Client applications: Amazon Redshift is based on industry-standard PostgreSQL. It integrates with several ETL and data loading tools and many other Business Intelligence Reporting, data mining and analytics tools.
- Connections: Amazon Redshift interacts constantly with client applications via industry-standard JDBC and ODBC drivers for PostgreSQL.
- Clusters: These are the core infrastructure components of an Amazon Redshift data warehouse and are composed of compute nodes.
- Leader node: This handles communications with client programs and computes nodes, also it develops execution plans to conduct database operations. Also, it compiles code for individual elements and assigns them to compute nodes.
- Compute nodes: This executes compiled nodes and sends results to the leader node for aggregation.
- Node Slices: Compute node is divided into slices, each slice has a portion of the node’s memory and disk space where it processes the workload assigned to that node.
- Internal Network: Amazon redshift can provide high-speed network communication between leader node and compute nodes by utilizing high bandwidth connections, close proximity and custom communication protocols.
- Databases: Cluster contains databases. SQL client interacts with the leader node and coordinates query execution with compute nodes.
Azure Synapse
This is a unique analytics service that combines enterprise data warehousing and big data analytics. You can query data on your terms by making use of serverless on-demand or provisioned resources at scale.
Main highlights of Azure Synapse:
- Infinite Scale: With this kind of boundless scaling capabilities you can deliver useful insights from all your data across numerous data warehouses and big data analytics systems with fathomless speed.
- Unsurpassed Security: With the right azure consulting services, you can secure your precious data with most sophisticated security and privacy features in the market, for instance, column and row-level security and dynamic data masking.
- Substantial Insights: Augment the process of locating insights from your entire data and apply machine learning models to your entire intelligent apps.
- Integrated experience: Considerably reduce project development time with a unified experience in order to develop end-to-end analytics packages.
Traditional data warehouse tools are still in demand and are deployed at several large organizations that are capable of handling huge infrastructure costs, including maintenance and operational costs. However, the latest cloud-based tools permit enterprises to plan and set up a data warehouse in a matter of a few days with almost nil upfront investment. Cloud-based EDW are highly scalable and have greater query performance and storage capacity. These cloud-based solutions are perfect for those enterprises which are mainly startups or those who are not willing to go for huge upfront investments. Apart from this, many progressive large enterprises are already moving from on-premise to cloud to save on costs and to utilize scalable storage facilities which would give them a better time-to-market advantage.