When we are interested in integrating data from multiple operational systems into one centric data warehouse, the most common method is to use ETL. ‘E’ stands for Extracting the data from the source systems, ‘T’ stands for Transforming the extracted data to a robust analytic data model, and finally the ‘L’ stands for Loading the data model into a data warehouse. After doing that, all data in the data warehouse is ready for analytics queries, and for dashboards & reports of BI tools such as Power BI, Tableau, Pyramid, Qlik, and Quick Sight.

Significant development in cloud implementations

In recent years there has been a significant development in cloud implementations, and more and more companies understand the benefits of transferring their systems to the cloud. There is also a lot of news in the field of BI!

Traditional data warehouse servers

Before using the cloud, traditional data warehouse servers such as Oracle or SQL Server were limited in resources depending on the physical server where they were installed. In order not to harm the end users due to loads on the data warehouse server, the correct method is to perform the data transformation on dedicated servers with ETL tools such as SSIS, DataStage, Informatica, etc., and only after the transformation is finished, to load the final data to the data warehouse as the last step.

New options have been introduced for implementing the data warehouse

Since new options have been introduced for implementing the data warehouse with scalable tools in the cloud, such as Snowflake, Synapse, Redshift, Google Bigquery – the resource limit we had has been practically eliminated, and therefore instead of going through the traditional ETL, it is now possible to change the order of operations and perform an ELT, i.e. perform the transformation phase after loading the raw data into the data warehouse, and using the resources of the data warehouse server to perform transformations.

Well, ok, what is the big deal?! ELT or ETL.

What exactly is the news if the bottom line is the same result? Well, there are many advantages of working in the ELT method, and here are a few of them:

  • There is no doubt that the costs for performing transformations in the data warehouse are cheaper than using a dedicated server for this. Even though today there are SAAS tools such as Data Factory, Rivery, AWS Glue, etc., it will still be cheaper to perform the transformation in the data warehouse.
  • Faster development start time! It is possible to start developing “Extract” and “Load” even before we have finished designing the data model in its finalform. It is enough to know which tables are relevant to the model, and one can already start developing. After the design of the model is finished, the only task that will remain is to perform the “T” step. And it’s simple too! Because it is written in SQL, which is almost every person in the data arena is familiar with.
  • Since the raw data is in the data warehouse (as opposed to ETL which only keeps the final data model), there is a greater readiness to develop machine learning, since all the raw data is concentrated in one central place and ready for retrieval.
  • Carrying out the transformation in the scalable data warehouse enables a better readiness to deal with Big Data.
  • Transformation of unstructured data is easier and simpler. Today there is excellent support for retrieving unstructured data in tools like Snowflake, which was more complex when using traditional ETL tools.
    There are other project and functional advantages – but if you are curious – I will leave you with the task of looking for them by yourself.

To view more articles

ETL or ELT?

More details nogamy-arrow-icon
ETL or ELT

Data Warehouse - why and for who ?

More details nogamy-arrow-icon
nogamy-arrow-icon nogamy-arrow-icon

Let's transform your data to valuable insights