In our Data Warehouse projects we are using internally developed framework.
It is helping to avoid many initial mistakes and help expand development team ensuring everyone works accordingly to the same standards.
DWH Framework includes:
- Automated process logs.
- Automated Data Quality checks. DQ violations can be critical (stops ETL) and non-critical.
- Customizable email notifications in case of error or Data Quality rules violation.
- Built-in SCD support.
- Automated durable and surrogate keys generation for Dimensions.
- Well established data flow during ETL process. ALL data pass several predefined steps (stages). This allows to simplify development of individual steps. Every step can be launched individually and many times. Design of steps and automated key generation helps to resolve cross dependency between data marts.
- Built-in data deduplication on a final step (load step).
- Support of daily incremental load.
- Possibility to split load by company entities via filters and parameters.
- Set of template objects to speed up ETL development.
Disciplined, this approach avoids increasing complexity and reliance on underlying systems, supplemented by historical data or other systems. It allows the integration and development of future systems and changes without significantly increasing complexity and resource use.