That’s a very basic question for a BI DW professional but it carries its own importance. Let’s discuss it.
ETL stands for Extract Transform and Load.
Extract: In a Data Warehouse environment data comes from various sources like, flat file, Operational database. Extract in ETL is nothing but extracting data from various sources in order to process it further.
Transform: Data Extracted from various sources often requires cleaning, validation to make it load ready for Data Warehouse.
e.g. Generally Data stored in OLTP data base is not Data Warehouse ready, it needs cleaning. If we consider Customer information systems, Customers often skips pin code on entry form or they enter something junk value. Its ok for OLTP but not for Data Warehouse. Its very important dimension in Data Warehouse. So we need to clean the pin code before loading it in database.
Load: Loading is the final step in ETL. It means loading the clean and filtered data in to Data Warehouse.
ETL is a very important step in overall Business Intelligence project implementation. ETL can also be called as Data Integration.
ETL can be performed using commercial ETL software’s like Informatica, Data Stage or can be performed by home grown ETL tool written in some high level language.
However using commercial tool is more popular and cost effective considering scalability, administration and speed of development.