CEU eTD Collection (2021); Szuts, Attila: Matching Identical Products using textual data

CEU Electronic Theses and Dissertations, 2021
Author Szuts, Attila
Title Matching Identical Products using textual data
Summary Product matching is a novel and expanding field in data science and e-commerce. In this project I will investigate applications of product matching using textual data in procurement analytics. The project has two objectives: develop a codebase for item matching that can be used by my clients existing AI ecosystem and build a framework for validating the models’ performance for future improvements. I will use purchase order data to find identical transactions with two approaches. The first approach (“classic”) uses traditional text similarity metrics such as Levenshtein distance and tf-idf and has already been implemented by my client and as such, will serve as a baseline in this project to evaluate the performance of the second approach. This second approach (“embedding”) uses Google’s Universal Sentence Encoder which is a deep neural network to embed textual data into 512 by 1 vector space. I will obtain item similarity by calculating Euclidean distances between vectors. Finally, I will cluster items based on their similarity/distance which will be the final output of the production code. As a final step validation of results and optimization of clustering algorithm will be executed.
Supervisor Gyorgy Bogel
Department Economics MSc
Full texthttps://www.etd.ceu.edu/2021/szuts_attila.pdf

Visit the CEU Library.

© 2007-2021, Central European University