2025-10-16 –, Aula 4.101
Sometimes getting an approximate answer is super good enough. How do you check for duplicates, count unique users, or track item popularity when your dataset won’t fit in memory? Enter probabilistic data structures like Bloom filters, Count-Min Sketches, and HyperLogLog! This talk introduces these powerful tools, demonstrates simple implementations in Python, and gives you ideas on when to use them.
Walk away ready to apply these techniques in your own projects - no advanced math required.
I am a maintainer of the scikit-learn machine-learning library. In the past I've worked on building and running mybinder.org and JupyterHub.
I am employed by NVIDIA.