CEU Electronic Theses and Dissertations, 2021
| Author | Njeru, Anthony Kimathi |
|---|---|
| Title | The Development of a Data Flow Validation Process |
| Summary | Data Flow Validation (also known as data cleaning or data scrubbing) plays a major role in data management. However, data cleaning is considered to be tedious, especially in terms of the quantity and variety of data received from the financial environment. Data cleaning approaches are arguably the most essential aspect of data analysis. In data analysis, data cleaning is the act of removing inaccurate elements to produce data that is accurate, comprehensible and reliable. Therefore, data cannot always be used in its original state and needs to get prepared in a way that data analysis can be implemented to derive accurate results for decision making. In retrospect, cleaning data was manual, thus it tended to be time-consuming and tedious. Innovation has brought forth efficient software that has made data cleaning fast and easy. This capstone project designs a Data Flow Validation Process for a startup client. The process has been customized for the startup, whose mission is to automate auditing and risk analysis procedures with machine learning and artificial intelligence technology. Since data is the fuel for machine learning, the management needs to ensure that the quality of the data uploaded on its algorithms is bereft of any inconsistencies, typos and structural errors. The proposal explains the elements that define ‘dirty’ data, details the process of cleaning financial data and the observable outcomes after implementation of the process on the startup's platform. |
| Supervisor | Ipacs, Laura |
| Department | Economics MSc |
| Full text | https://www.etd.ceu.edu/2021/njeru_anthony.pdf |
Visit the CEU Library.
© 2007-2025, Central European University