Anonymization of sensitive information in financial document Swiss Python Summit 2025

Anonymization of sensitive information in financial document
.ical
2025-10-17 14:05–14:35, Aula 4.101

Data is the fossil fuel of the machine learning world, essential for developing high quality models but in limited supply. Yet institutions handling sensitive documents — such as financial, medical, or legal records often cannot fully leverage their own data due to stringent privacy, compliance, and security requirements, making training high quality models difficult.

A promising solution is to replace the personally identifiable information (PII) with realistic synthetic stand-ins, whilst leaving the rest of the document in tact.

In this talk, we will discuss the use of open source tools and models that can be self hosted to anonymize documents. We will go over the various approaches for Named Entity Recognition (NER) to identify sensitive entities and the use of diffusion models to inpaint anonymized content.

Piotr Gryko

Dr Piotr Gryko, studied experimental physics at University College London. His PhD at Imperial College London focused on using biomaterials to self assemble inorganic materials, merging the boundaries of biological systems and machines.

With 12 years of experience writing software, he now focuses on AI engineering.

Anonymization of sensitive information in financial document .ical 2025-10-17 14:05–14:35, Aula 4.101

Anonymization of sensitive information in financial document
.ical
2025-10-17 14:05–14:35, Aula 4.101