Random commentary about Machine Learning, BigData, Spark, Deep Learning, C++, STL, Boost, Perl, Python, Algorithms, Problem Solving and Web Search
Saturday, February 5, 2011
Find a good sample of a query log
You have a stream of queries (infinite, meaning no way to hold all of it in memory). The stream follows a power law. Find a good strategy to sample the stream giving "a good" representativeness to both the tail and the head of the power law.