Repository logo
 

Developing a data lakehouse for a South African government-sector training authority : implementing quality control for incremental extract-load-transform pipelines in the ingestion layer

dc.contributor.authorGovender, Priyankaen_US
dc.contributor.authorNaicker, Nalindrenen_US
dc.contributor.authorPatel, Sulaiman Saleemen_US
dc.contributor.authorJoseph, Seenaen_US
dc.contributor.authorMoonsamy, Devrajen_US
dc.contributor.authorAkinola, Ayotuyi Tosinen_US
dc.contributor.authorMadamshetty, Lavanyaen_US
dc.contributor.authorGovender, Thamotharan Prinavinen_US
dc.contributor.editorOgunleye, Olalekan Samuel
dc.date.accessioned2025-03-03T11:16:18Z
dc.date.available2025-03-03T11:16:18Z
dc.date.issued2024-12
dc.date.updated2025-01-08T19:47:24Z
dc.description.abstractThe Durban University of Technology is undertaking a project to develop a data lakehouse system for a South African government-sector training authority. This system is considered critical to enhance the monitoring and evaluation capabilities of the training authority and ensure service delivery. Ensuring the quality of data ingested into the lakehouse is critical, as poor data quality deteriorates the efficiency of the lakehouse solution. This chapter studies quality control for ingestion-layer pipelines to propose a data quality framework. Metrics considered for data quality were completeness, accuracy, integrity, correctness, and timeliness. The framework was evaluated by practically applying it to a sample semi-structured dataset to gauge its effectiveness. Recommendations for future work include expanded integration, such as incorporating data from more varied sources and implementing incremental data ingestion triggers.en_US
dc.description.availabilityCopyright: 2024. IGI Global. Due to copyright restrictions, only the abstract is available. For access to the full text item, please consult the publisher's website. The definitive version of the work is published in: Machine learning and data science techniques for effective government service delivery. Hershey, Pa.: IGI Global, 157-184. doi:10.4018/978-1-6684-9716-6.ch006en_US
dc.format.extent28 pen_US
dc.identifier.citationGovender, P. et al. 2024. Developing a data lakehouse for a South African government-sector training authority: implementing quality control for incremental extract-load-transform pipelines in the ingestion layer. In: Ogunleye, Olalekan Samuel. Machine learning and data science techniques for effective government service delivery. Hershey, Pa.: IGI Global, 157-184. doi:10.4018/978-1-6684-9716-6.ch006en_US
dc.identifier.doi10.4018/978-1-6684-9716-6.ch006
dc.identifier.isbn9781668497166
dc.identifier.isbn9781668497180
dc.identifier.urihttps://hdl.handle.net/10321/5830
dc.language.isoenen_US
dc.publisherIGI Globalen_US
dc.publisher.urihttps://doi.org/10.4018/978-1-6684-9716-6.ch006en_US
dc.titleDeveloping a data lakehouse for a South African government-sector training authority : implementing quality control for incremental extract-load-transform pipelines in the ingestion layeren_US
dc.typeBook chapteren_US

Files