Simple Models, Complex Worlds
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Using a simplified model of a real-life phenomenon instead of conducting experiments with the object of study itself can have multiple advantages, such as higher interpretability, lower resource requirements, and others. In this study, we explore two separate use cases of simplified computer models aimed at addressing two problems in different fields.
In the first part of this study, we focus on large language models (LLMs). Modern language models train on vast arrays of often only lightly filtered data scraped from the Internet, and, considering the increasing amounts of machine-generated texts on the Internet, the possibility of a new language model training on the outputs of previous generations is exceedingly high. To analyse how repeated training on datasets that include AI-generated content might affect LLMs, we repeatedly simulate the process using a relatively small GPT-2 based LLM on several datasets. Afterwards, we score the outputs across multiple attributes, including quality, overall text diversity, as well as emotion, toxicity, perceived identity of the author, etc., using both established metrics and fine-tuned classifier models.
As a second main exploration, we propose a minimal social model aimed at studying the conversation surrounding instances of sexual violence and the process of achieving social consensus in such cases from the epistemic perspective. This model accounts for several social factors, such as disparities in institutional power between agents, a victim's possible reluctance to come forward, etc., and simulates communication and opinion propagation between the agents. This design allows us to compare epistemic strategies and evaluate which produce more fair (unbiased) results.
Lastly, we contrast the two models and their limitations and discuss key differences between modelling in these two fields and issues unique to each field. We find that both models provide meaningful results in their respective domains and are limited by their minimal design; however, the impact of a smaller size or a lower complexity differs between fields.

