ETL tools are used to
Extract data from homogeneous or heterogeneous data sources
Transform the data for storing it in proper format or structure for querying and analysis purpose
Load it into the final target (database, more specifically, operational data store, data mart, or data warehouse)
Usually in ETL tools, all the three phases execute in parallel since the data extraction takes time, so while the data is being pulled another transformation process executes, processing the already received data and prepares the data for loading and as soon as there is some data ready to be loaded into the target, the data loading kicks off without waiting for the completion of the previous phases.
Here is the list of 10 open source ETL tools.
Talend provides multiple solutions for data integration, both open source and commercial editions. Talend offers an Eclipse-based interface, drag-and-drop design flow, and broad connectivity with more than 400 pre-configured application connectors to bridge between databases, mainframes, file systems, web services, packaged enterprise applications, data warehouses, OLAP applications, Software-as-a-Service, Cloud-based applications, and more.
Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Its primary focus is simplicity. You don’t have to study yet another complex XML-based language – use SQL (or other scripting language suitable for the data source) to perform required transformations. Scriptella is licensed under the Apache License, Version 2.0
KETL is a premier, open source ETL tool. The data integration platform is built with portable, java-based architecture and open, XML-based configuration and job language. KETL features successfully compete with major commercial products available today. Highlights include:
Support for integration of security and data management tools
Proven scalability across multiple servers and CPU’s and any volume of data
No additional need for third party schedule, dependency, and notification tools
Jasper ETL is easy to deploy and out-performs many proprietary ETL software systems. It is used to extract data from your transactional system to create a consolidated data warehouse or data mart for reporting and analysis.
GeoKettle is a powerful, metadata-driven Spatial ETL tool dedicated to the integration of different spatial data sources for building and updating geospatial data warehouses. GeoKettle enables the Extraction of data from data sources, the Transformation of data in order to correct errors, make some data cleansing, change the data structure, make them compliant to defined standards, and the Loading of transformed data into a target DataBase Management System (DBMS) in OLTP or OLAP/SOLAP mode, GIS file or Geospatial Web Service.
The CloverETL Open Source Engine can be embedded in any application, commercial ones as well. The Open Source Engine does not contain a number of components that the full engine contains. We do not provide support for the Open Source Engine
HPCC Systems is an Open-source platform for Big Data analysis with a Data Refinery engine called Thor. Thor clean, link, transform and analyze Big Data. Thor supports ETL (Extraction, Transformation and Loading) functions like ingesting unstructured/structured data out, data profiling, data hygiene, and data linking out of the box. The Thor processed data can be accessed by a large number of users concurrently in real time fashion using the Roxie, which is a Data Delivery engine. Roxie provides highly concurrent and low latency real time query capability.
Jedox is an Open-Source BI solution for Performance Management including Planning, Analysis, Reporting and ETL. The Open Core consist of an in-memory OLAP Server, ETL Server and OLAP client libraries. Powerfully supporting Jedox OLAP server as a source and target system, Jedox ETL is specifically designed to meet the challenges of OLAP analysis. Working with cubes and dimensions couldn’t be easier. Flexibly generate frequently-needed time hierarchies and efficiently transform the relational model of source systems into an OLAP model – with JEDOX ETL.
Apatar is an open source Extract, Transform, and Load (ETL) project. Modular architecture delivers 1. Visual job designer/mapping 2. Connectivity to all major data sources 3. Flexible Deployment Options (GUI, or server engine with JVM, or embedded).
This list is compiled by TechRoba.