Online grooming approaches, where a sexual predator approaches minors online with the goal of sexual abuse, are a big problem in today’s world of social media. In this work, we present two approaches to detect sexual predators in chats. We utilize the currently available datasets for Sexual Predator Detection (SPD) and analyze their strengths and weaknesses critically. Using dictionary-based and transformer-based approaches, we analyze the writing styles of predators in comparison to non-predators in order to shed light to their differences. Finally, we present our two approaches, one of which improves the current state-of-the-art score by 7.7%. Both approaches are based on BERT models using additional features of the chats as inputs.
Paper.pdf contains the documentation of our approach. Poster.pdf is a poster version src/ is all the code. data/ is missing because the private dataset we used required access permission.
This project was developed in the context of the course "Computational Semantics for NLP" at ETH Zurich. My team Saahiti Prayaga, Philippe Schläpfer and I attempted to use NLP to tackle a pressing societal problem.