Time:Tuesday, Dec.10, 10:00--11:00 Venue: Room 308, ITCS
1. 主讲人介绍
Mikkel Thorup (born 1965) has a D.Phil. from Oxford University from 1993. From 1993 to 1998 he was at the University of Copenhagen. From 1998 to 2013 he was at AT&T Labs-Research. Since 2013 he has been back as Professor at the University of Copenhagen. He is currently a VILLUM Investigator heading Center for Basic Algorithms Research Copenhagen (BARC).Mikkel Thorup is a Fellow of the ACM and of AT&T, and a Member of the Royal Danish Academy of Sciences and Letters. He is co-winner of the 2011 MAA Robbins Award in mathematics and winner of the 2015 Villum Kann Rasmussen Award for Technical and Scientific Research, which is Denmark's biggest individual prize for research. More recently he was co-winner of the 2021 AMS-MOS Fulkerson Prize and an ACM STOC 20-year test of time award. His main work is in algorithms and data structures, where he has worked on both upper and lower bounds. Recently one of his main focusses has been on hash functions unifying theory and practice. Mikkel prefers to seek his mathematical inspiration in nature, combining the quest with his hobbies of bird watching and mushroom picking.
2. 讲座介绍
Title: Hash functions bridging the gap from theory to practice
Abstract:
Randomized algorithms are often enjoyed for their simplicity, but the
hash functions employed to yield the desired probabilistic guarantees
are often too complicated to be practical. Hash functions are used
everywhere in computing, e.g., hash tables, sketching, dimensionality
reduction, sampling, and estimation. Many of these applications are
relevant to Machine Learning, where we are often interested in
similarity between high dimensional objects. Reducing the dimensionality
is key to efficient processing. Abstractly, we like to think of hashing
as fully-random hashing, assigning independent hash values to every
possible key, but essentially this requires us to store the hash values
for all keys, which is unrealistic for most key universes, e.g., 64-bit
keys. In practice we have to settle for implementable hash functions,
and often practitioners settle for implementations that are too simple
in that the algorithms ends up working only for sufficiently random
input. However, the real world is full of structured/non-random input.
The issue is severe, for simplistic hash functions will often work very
well in tests with random input. Moreover, the issue is often that error
events that should never happen in practice, happen with way too high
probability. This does not show in a few test, but will show up over
time when you put the system into production. Over the last decade there
has been major developments in simple to implement tabulation based
hash functions offering strong theoretical guarantees, so as to support
fundamental properties such as Chernoff bounds, Sparse
Johnson-Lindenstrauss transforms, and fully-random hashing on a given
set w.h.p. etc. I will discuss some of the principles of these
developments and offer insights on how far we can bridge from theory
(assuming fully-random hash functions) to practice (needing something
that can actually implemented efficiently).
编审:唐志皓 江波