On Challenges in Defending Against Code Stylometry

Talk by Konrad Rieck


Location: TU Wien, FAV Hörsaal 1 Helmut Veith (Favoritenstr. 9-11, Erdgeschoß, Room HEEG02)

Date & Time: 2024-06-24; 16:00 - 17:00

Abstract: Source code often contains subtle stylistic patterns that can be used to identify its developer, an approach known as code stylometry. While a series of research has shown that code stylometry can recognize one programmer among hundreds of others, defenses against this approach have received little attention so far. In this talk, we address this research gap from two perspectives. First, we introduce a method for automatically imitating programming styles through semantic-preserving transformations. This method allows us to mislead correct identification and protect developers’ privacy. Second, however, we prove that true anonymity cannot be achieved in this way and that stylistic patterns remain in source code under realistic conditions. Our results thus underscore the need for raising awareness and further research into protecting developers’ privacy.

Bio: Konrad Rieck is a Professor of Computer Science at TU Berlin, where he heads the Chair of Machine Learning and Security within the Berlin Institute for the Foundations of Learning and Data. Additionally, he is a Guest Professor at TU Wien. Previously, Konrad has been working at TU Braunschweig, University of Göttingen, and Fraunhofer Institute FIRST. His research interests revolve around computer security and machine learning. His group is developing novel methods for detecting computer attacks, analyzing malicious software and discovering security vulnerabilities. Moreover, the group explores the security and privacy of learning algorithms. Konrad is also interested in efficient algorithms for analyzing structured data, such as strings, trees, and graphs. His Erdős number is 3 (Müller → Jagota → Erdős) and his Bacon number is ∞. He is a very distant academic relative of Carl Friedrich Gauß (see here), although this doesn’t help when solving math problems.