Swiss Python Summit 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
09:00
09:00
30min
When Close Enough is Good Enough
Tim Head

Sometimes getting an approximate answer is super good enough. How do you check for duplicates, count unique users, or track item popularity when your dataset won’t fit in memory? Enter probabilistic data structures like Bloom filters, Count-Min Sketches, and HyperLogLog! This talk introduces these powerful tools, demonstrates simple implementations in Python, and gives you ideas on when to use them.

Walk away ready to apply these techniques in your own projects - no advanced math required.

Day 1 - Python, the programming language you love
Aula 4.101
09:30
09:30
30min
Code review in era of collaborative development
Andrii Soldatenko

Code review is a central part of everyday developer jobs. The motivation to create this talk was a quote:
“The most important superpower of a developer is complaining about everybody else's code.” In this talk, I’ll explain my approach to better code review.

Sometimes, it’s hard to convince a colleague about change or don’t change some lines of code. In my talk I would like to cover some best practices from my software engineering experience about efficient and honest code review. How to create a culture of perfect code review. How to apply automatic tools to improve code review routine of repetitive comments or suggestions. How to write/or reuse coding style guides for your team to reduce the time spent arguing about naming conventions and different styles.
What needs to be automated and what needs to be not automated during code review? The key role of patterns that can be reusable is not to confuse colleagues.

Day 1 - Python, the programming language you love
Aula 4.101
11:00
11:00
30min
Bytecode and .pyc files
Konrad Gawda

Bytecode, the internal language of instructions used by the interpreter is something that perhaps most Python developers have heard about, but few have dug into. This talk will try to explain the idea behind bytecode and how it works.
We will see how to extract bytecode from functions - with dis module, and from .pyc files (and what is the idea of __pycache__ directories). Then, the other way around: we’ll check the possibility of building new functions with raw bytes in runtime.

Day 1 - Python, the programming language you love
Aula 4.101
12:00
12:00
30min
Functional Python: Saving Christmas with itertools & friends
Edoardo Baldi

Are you writing nested loops when solving coding challenges? Discover how Python's functional programming toolbox can transform your problem-solving approach.

We'll explore functional programming principles through the lens of Advent of Code puzzles, learning to think in streams of data rather than step-by-step instructions. We’ll explore some essential bits from itertools, functools, and operator modules, aiming to write more expressive, debuggable code.

Starting with pure functions and lazy evaluation, we'll build up to solving real AoC problems using techniques like:

  • itertools.pairwise() for sequence comparisons
  • functools.reduce() for data aggregation
  • operator.itemgetter() for elegant sorting
  • Generator expressions for memory-efficient processing

Through some puzzles from various years, we’ll see how functional approaches often lead to more concise solutions that closely mirror the problem description. We'll compare imperative vs functional solutions, highlighting pros and cons of both approaches.

Whether you're preparing for coding interviews, tackling AoC, or just want to expand your Python toolkit, you'll leave with a couple more ideas for writing cleaner, more Pythonic code—no external dependencies required.

Day 1 - Python, the programming language you love
Aula 4.101
14:00
14:00
30min
Building Resilient Python Apps for Unreliable Networks
Emeka Onyebuchi

In many parts of the world, especially across Africa, software cannot assume a stable internet connection. From rural communities to field agents working in transit or enforcement, the reality is simple: offline is the default, and sync is a luxury.

In this talk, we’ll explore how to build offline-first applications using Python — apps that work gracefully when the network doesn’t. Drawing from real-world civic and infrastructure projects scenarios in Nigeria, I’ll walk through techniques to queue, cache, and sync data locally, using tools like SQLite, Redis, Celery, and FastAPI. We’ll explore design patterns that prevent data loss, improve user experience, and simplify reconciliation once connectivity is restored.

Whether you’re building field data tools, mobile dashboards, or lightweight IoT integrations, this session will equip you with the mindset and technical building blocks to ensure your Python applications stay resilient — no matter the network conditions.

Day 1 - Python, the programming language you love
Aula 4.101
16:00
16:00
30min
Using Python's array API standard for ESA's Euclid mission
Saransh Chopra

Over the years, the lack of an array data type in Python has resulted in the creation of numerous array libraries, each specializing in unique niches but still having some interoperability between each other. NumPy has become the de facto array library of Python, and the other array libraries try to keep their API close to that of NumPy. However, this often becomes infeasible, and the libraries deviate out of necessity. To make Python's array libraries shake hands with each other without any inconsistencies, the Consortium for Python Data API Standards has formalised an Array API standard for libraries offering array creation and manipulation operations.

The Array API standard allows users to write and use the same code for arrays belonging to any of the standard-conforming libraries. Through this talk, we will explore the need for such standardisation and discuss its salient features in detail. We will primarily delve into the example of using this standard to make specific parts of European Space Agency's Euclid space mission's code GPU and autodiff compatible. Besides cosmology, we will also take a look at a few other examples, mostly sourced from my experience working with and on several Python array libraries for scientific computing. Ultimately, the audience can expect to leave the room with the knowledge of both, the software engineering and the research side of the array API standard.

Day 1 - Python, the programming language you love
Aula 4.101
09:00
09:00
30min
Docling: Get your documents ready for generative AI
Peter Staar, Michele Dolfi, Panos Vagenas, Nikos Livathinos

Docling is an open-source Python package that simplifies document processing by parsing diverse formats — including advanced PDF understanding — and integrating seamlessly with the generative AI ecosystem. It supports a wide range of input types such as PDFs, DOCX, XLSX, HTML, and images, offering rich parsing capabilities including reading order, table structure, code, and formulas. Docling provides a unified and expressive DoclingDocument format, enabling easy export to Markdown, HTML, and lossless JSON. It offers plug-and-play integrations with popular frameworks like LangChain, LlamaIndex, Crew AI, and Haystack, along with strong local execution support for sensitive data and air-gapped environments. As a Python package, Docling is pip-installable and comes with a clean, intuitive API for both programmatic and CLI-based workflows, making it easy to embed into any data pipeline or AI stack. Its modular design also supports extension and customization for enterprise use cases.

We also introduce SmolDocling, an ultra-compact 256M parameter vision-language model for end-to-end document conversion. SmolDocling generates a novel markup format called DocTags that captures the full content, structure, and spatial layout of a page, and offers accurate reproduction of document features such as tables, equations, charts, and code across a wide variety of formats — all while matching the performance of models up to 27× larger.

Day 2 - Data Science & More
Aula 4.101
09:30
09:30
30min
AI Coding Agents and how to code them
Alex Shershebnev

AI Agents are the next big thing everyone has been talking about. They are expected to revolutionize various industries by automating routine tasks, mission critical business workflows, enhancing productivity, and enabling humans to focus on creative and strategic work. Of course, you can apply them to your everyday coding tasks as well.
In this talk we’ll go over what those agents can bring to the table of coding world, and why they can deliver the promise of coding smarter that the current generation of coding assistants can’t. We will then dive right into a quick live coding session where I’ll show what such agents can do in real life and how you can start using them to enhance your everyday life already right after the talk. And we’ll finish off with some remarks on what the future of programming might look like in the near future as those agents get included into your everyday life.

Day 2 - Data Science & More
Aula 4.101
11:00
11:00
30min
Machine Learning - "To Do" or "NOT To Do"
Daksh Gupta

A.I. & Machine learning is fascinating.

We’re not only inclined to use it in all data driven problem domains, but we also believe that machine learning is the only solution we have at our disposal. However, this may not always be the case.

Even though it can solve a large variety of problems, there are some, which can also be solved using pure data structures & (mathematical) algorithms, which by the way doesn’t need any training data and provides accurate results all the time.

For example, algorithms like KNN et al. seem to be the obvious choice for any problems related to and manipulation of nearest data points but can also be solved by using spatial triangulation concepts which can be implemented as data structures and some of these are available as part of SciPy spatial library.

Similarly, simple decision tables can also be used instead of supervised decision trees, based on the number of decision points.

This talk is all about understanding the opportunities and constraints with respect to using machine learning as compared to using data structures and algorithms. .

To demonstrate the point, I’ll be using the examples of SciPy spatial data structures as well as decision tables to show a working system which works as good as (if not better) machine learning based systems.

By the end of this talk, you’ll know enough to make an informed decision about your choices with respect to machine learning.

The presentation as well as the codes will be shared via GitHub Page post completion of the session.

Day 2 - Data Science & More
Aula 4.101
11:30
11:30
30min
Causal ML for Smarter Advertising Campaigns with Python
Francesco Conti

Traditionally, marketing campaign analysis relies on simple metrics like the number of purchases made after a contact, or conversions following a promotion. While these numbers tell us what happened, they don’t reveal why it happened or if the campaign truly made a difference.

Such analysis can’t distinguish between customers who would have acted anyway and those who were genuinely influenced by the campaign. The key question is: did the campaign actually cause the desired effect?

In this practical and beginner-friendly session, we’ll explore how Causal Machine Learning provides the missing piece in campaign evaluation and targeting.

Starting from real-world scenarios, we’ll dive into:

  • Why causality matters more than correlation when evaluating ad performance.
  • How to estimate the true impact of a campaign using uplift modeling and treatment effect estimation in just a few lines of code.
  • How to target users who are not just likely to interact with ads, but whose behavior can be influenced by the campaign (for example, to reduce churn or boost engagement).

The session will be hands-on with Python, with clear examples drawn from marketing applications.

Take-away:
Participants will gain a practical understanding of how to think causally in digital marketing, learning key techniques to measure impact and target campaigns more intelligently. moving from predictive to truly prescriptive analytics.

Day 2 - Data Science & More
Aula 4.101
12:00
12:00
30min
Agentic Cyber Defense with External Threat Intelligence
Jyoti Yadav

This talk will detail how to integrate external threat intelligence data into an autonomous agentic AI system for proactive cybersecurity. Using real world datasets—including open-source threat feeds, security logs, or OSINT—you will learn how to build a data ingestion pipeline, train models with Python, and deploy agents that autonomously detect and mitigate cyber threats. This case study will provide practical insights into data preprocessing, feature engineering, and the challenges of adversarial conditions.

Day 2 - Data Science & More
Aula 4.101
14:00
14:00
30min
Anonymization of sensitive information in financial document
Piotr Gryko

Data is the fossil fuel of the machine learning world, essential for developing high quality models but in limited supply. Yet institutions handling sensitive documents — such as financial, medical, or legal records often cannot fully leverage their own data due to stringent privacy, compliance, and security requirements, making training high quality models difficult.

A promising solution is to replace the personally identifiable information (PII) with realistic synthetic stand-ins, whilst leaving the rest of the document in tact.

In this talk, we will discuss the use of open source tools and models that can be self hosted to anonymize documents. We will go over the various approaches for Named Entity Recognition (NER) to identify sensitive entities and the use of diffusion models to inpaint anonymized content.

Day 2 - Data Science & More
Aula 4.101
14:30
14:30
30min
AI-Powered Software Testing with Multi-Agent Systems
Sneha Mavuri, Koti Vellanki

Traditional software testing struggles to keep pace with rapidly evolving applications—resulting in brittle test cases, time-consuming maintenance, and poor bug detection. This session introduces a smarter, adaptive approach using AI-powered Multi-Agent Systems that automate and continuously improve testing workflows.

We’ll explore how Multi-Agent Retrieval-Augmented Generation (RAG) transforms testing by dynamically generating test cases, adjusting to app changes in real-time, and detecting bugs with greater accuracy. Each agent has a specialized role—retrieving context, generating tests, and analyzing results—working together as a self-learning testing team.

The session will include a live walkthrough of a Python-based pipeline using PyTest, Selenium, LangChain, and ML models to:
• Automate UI and regression testing with minimal manual intervention
• Generate intelligent, context-aware test cases from code and API specs
• Use anomaly detection to flag subtle bugs based on test logs
• Continuously evolve test logic as the app evolves

By leveraging AI agents, teams can reduce manual QA efforts, improve test coverage, and increase reliability across fast-moving software projects. Whether you're a QA engineer, developer, or test automation architect, this talk will give you practical tools and ideas to build scalable, AI-driven QA systems using Python.

Day 2 - Data Science & More
Aula 4.101
16:00
16:00
30min
Machine learning for Swiss democracy
Vita Midori

Demokratis.ch is a non-profit project working to modernise the consultation procedure—a key democratic process that allows Swiss citizens to provide feedback on proposed laws and amendments. Today, the process is slow and cumbersome for everyone involved: it requires studying lengthy PDFs, writing formal letters, and even synthesising legal arguments by copy-pasting into Excel. There’s a huge opportunity to streamline this process and make this democratic tool more accessible and inclusive.

In this talk, I’ll share how we’re tackling this challenge with machine learning: building data processing pipelines, extracting features from endless PDFs, embedding and classifying text, designing and evaluating models—and ultimately deploying them in production. Because the data comes from the federal administration and 26 different cantons, it’s often heterogeneous and in varying formats. Data quality, in general, presents many challenges for both training and evaluation. Spoiler: PDF is a pretty terrible format for machines…

Our approach is practical and pragmatic, and our code is open source, so you’re welcome to explore our solutions or even contribute yourself!

Day 2 - Data Science & More
Aula 4.101