CEU Electronic Theses and Dissertations, 2021
Author | Ticu, Cosmin Catalin |
---|---|
Title | The Austrian Start-Up Incubator Ecosystem: A Web Scraping, AWS ML & Text Analytics Competitor Analysis on Digital Content |
Summary | This project is part of a larger effort by CEU’s own iLab to run a full-scale competitor analysis on the Austrian start-up ecosystem as part of the iLab’s expansion into the Vienna campus of the university. The goal of this capstone is to conduct an exploratory data analysis (EDA) on all of the proprietary visual and textual content curated by Austrian (with a keen focus on Vienna) start-up incubators on their own websites in order to identify formal patterns of association and common topics. The need for this project stems from the client’s lack of explicit knowledge on the SEO and content creation efforts employed by its Austrian competitors and the client’s need to diversify their content making efforts. The goal is to move from relying entirely on tacit knowledge, industry experience and networking acumen to employing a more data-driven approach for content creation. The steps taken to ensure a thorough EDA included: own definitions of methodology and strategy, self-gathering of data through web scraping, data munging and augmentation with AWS machine learning services and analysis of processed data with the R programming language to produce visualization artefacts. The findings of this study echo the tacit content creation knowledge that “plagues” the entire start-up incubator market. By comparing all of the 819 articles available in the identified sample of 14 Austrian start-up incubators, this project found that the digital content produced is extremely similar throughout in terms of sentiment, key words, phrases, and image entities. There were rarely any elements found that set certain incubators apart from the group. The only distinctions that became apparent from the analysis were between specialized and non-specialized companies (here understood as having a narrow market focus like agriculture) as well as between content written on general entrepreneurship topics and green circular economics topics. The articles’ contents were largely found to be positive and anticipatory. The majority of recommendations from this study span calls to further research and analysis with the AWS suite of software as well as increased data gathering efforts. By and large, the strongest recommendation echoed here is that the iLab should focus on leveraging the keywords, sentiments and topis identified in this study to conduct A-B testing between content produced according to tacit knowledge (the current standard) and explicit knowledge stemming from this study’s findings. Lastly, a call to preserve the open-ended nature of this study is strongly made so that future CEU students and public domain data analysts can engage with the challenges posed by this text and image analysis. |
Supervisor | Koren, Miklos |
Department | Economics MSc |
Full text | https://www.etd.ceu.edu/2021/ticu_cosmin-catalin.pdf |
Visit the CEU Library.
© 2007-2021, Central European University