Monday, August 31, 2009

Free the H-1Bs, Free the Economy

"I have a suggestion for our President on how to boost economic growth without spending a penny: Free the H-1B's."

I agree.

Saturday, August 29, 2009

I joined the Bing Microsoft new Search Technology Centre (STC) in Europe

It’s official. I decided to join the Microsoft new Search Technology Centre (STC) in Europe. I work on Bing Search technology and will be leading all the engineering development for UX and verticals in Europe.

My office is in the London site of STC Europe which is located close to Carnaby Street, right in the centre of Soho a location full of artists, music, and creativeness. The environment and all the people around are having a galvanizing effect on me. I can feel the energy and new ideas flooding.

STC Europe has three sites: London, Munich and Paris. In addition, it has a strong connection with STC Asia in Beijing, the India Development Center, the new in-development STC center at Silicon Valley, and, obviously, the headquarters in Redmond. All in all, this gives me more opportunity to travel, work with smart people, and improve my skills.

After a week here, I like the environment open to experiment with new stuff. You simply say hey I have this new idea for search and you get the resources to experiment with it. If it works, it goes online. Search is all about continuous improvements and evolutions, isnt’it?

Friday, August 28, 2009

The Daily Beast -- Five who are changing the face of the Internet.

The Daily Beast received a nomination for "Five who are changing the face of the Internet." by the newsweek.

I am proud of the ex-group that I led when I was in Ask.com. They contribuited to deliver the News Search algorithmic experience for Dailybeast, together with the other R&D center in NJ (if you search on DailyBeast this is a service powered by Ask.com).

Wednesday, August 26, 2009

Pointers and Smart Pointers

I love when people imposes the use of smart pointers. Well, if you come from C you know that a pointer is nothing but a memory address. If you come from Java, it's another religion. I like you, but I am on the wild and spice side.

If you are into C++, then you must use smart pointers. Smart pointers are all about resource management. You want to be on the wild but safe side of life. So when you allocate something, you may want to be sure that it will be deallocated at the right time. Talking about sustainability.

It's easy: a smart pointer destructor takes the responsability of freing memory. Now, since the destructor is automatically called by the language when the object goes out of the scope.. you are on the wild but safe side of the life. It's all about RAII.

There are a bunch of smart pointers and you should know them all.
  1. std::auto_ptr and boost::scoped_ptr. Here the destructor will actually free the memory for you.
  2. boost::shared_ptr. Here the destrucor will decrement a reference count and when it gets zero counts then it will free the memory. Very useful if you have a share resource, and you are on the wild and open side.
Another important aspect is that you can transfer the ownership of the allocated object, if the semantic of the pointer allows it. For instance, returning an std::auto_ptr from a function will tranfer the ownership of the object to the caller. Useful, no?

Monday, August 24, 2009

Book review: Large Scale C++ Software design

Large-Scale C++ Software Design is a must read book, if you are in software industry. Sometime you may what to move down from Design patterns to low-level physical organization of C++ projects. I believe this is the very first book dealing with this important aspect of software, which is too frequently ingnored in favour to more "abstract" aspects.

On the negative side, the book is too redundant and could have been reduced. The most interesting Chapter is number 5. Go directly there and start reading from that point.

Saturday, August 22, 2009

Remember Cuil? Now It’s a Real-Time Search Engine

Cuil does not look good. A lot of hype when they launched and now moved to the Real time news search; but it does not seem better than Oneriot.

"Okay, so is this a competitor to Twitter Search? Maybe a little, but really it’s more like OneRiot in terms of real-time search. And to be honest, OneRiot blows Cuil out of the water in this vertical."

Thursday, August 20, 2009

I no longer work for Ask.com

It's official. After a long period in Ask.com, it's now time to work on something different.

More thant 4 years ago, I started with a small team of people soon expanded into the first Ask.com European R&D Center. Pisa has been selected as location for the excellent quality of life and for the good concentration of software engineers and academic researchers. Our office was magnificent, as you can see from this collection of pictures (1, 2, 3).

Different Pisa teams worked on various projects:
  • Image Search, co-lead -- "Ask new image search is a step ahead in a notoriously tricky area. With the quality of its image search results, combined with the new Zoom query refinement feature, I'll be using it as my default image search service going forward", SearchEngineWatch

  • News and Blog search, co-lead -- "Ask.com has a pretty original approach to the old-time, old-school, traditional maybe, view of news.", Techcrunch

  • Video News Search, lead -- "it's interesting that they've managed to integrate the video playing right into the main page since I doubt all the source videos are the same format (Flash, aspx, etc.)", Niraj user comment
  • DailyBeast, Tina Brown's news site, co-lead -- "How did IAC/Tina Brown's new Daily Beast do in its first month? Pretty well: The company says it attracted 2.3 million unique monthly visitors and served up 11.4 million page views. A great start for any publishing startup, Alleyinsider.
  • Core Web Search Infrastructure; Pisa was involved in the design and implementation of the middleware software providing the base of all the Ask.com search products. A number of people in Pisa worked on this project.
  • RealTime Fresh Web Ranking, lead; injecting and ranking fresh news, video and blogs into Web search results in realtime.
  • Frontend Platform for UK, Pisa was involved in the Jeeves rebranding in UK. A number of people in Pisa worked on this project.
  • A bunch of Search Patents, "Ask.com has been working hard since then at making itself a more useful resource for timely news information, and has started incorporating multimedia into that mix.", Seobythesea
Note that the above list includes some of the projects carried out by people that I hired in Pisa.

I posted different blogs pointing out some differentiating aspects of our technology:
Working in Ask.com was an amazing experience since the enviroment is very productive. Anyway, Ask.com recognizes your hard work. In 2007 I received the IAC Emerging Leader Award, and 2006 IAC Horizon Award, for being a top performer above and beyond the expectation.

I want to thank Pisa team for the impressive work we carried out together. I also want to thank all the people from other offices world wide (Edison, NJ ; Oakland & Campbell, CA; London, UK; Dublin, Ireland; Hangzhou, China). In no particular order: Kelly, Jim, Apostolos, Tomasz, Yufan, Yihan, Doug, Rona, Navid, Chuck, Tuoc, Nitin, Alex, Eric, Andy, Erik, Miguel, Dominic, Steve, Padriac, Peter, James, Michael, Juanita, John, Mary, Kurt, Brendan, Michelle, Danica, Amy, Cassie and so many other people that is difficult to mention in a small blog posting.

I am fortunate to have worked with many bright, talented teams. I learned a lot from them.

Wednesday, August 19, 2009

Tuesday, August 18, 2009

Off-Topic: Ryanair business model

Ryanair: How a Small Irish Airline Conquered EuropeI definitively recommend reading this book. If you are a manager, here you will find a lot of insights for running a company when the market is already under a monopoly. Ryanair took the Southwest business model of low-cost and no frills flights and adapted it to the European market. This market was much more protective of larger companies and with different cultures and ways of making business in different countries. There is no free dinner here: all the aspects of Ryanair growth are discussed. They were about to run out of money so many times, they had a very negative behavior with trade unions. Anyway, they were able to make low cost flights a commodity as taking a train or a car. And even better that this. If you plan to run a company, or if you want to have a look the cost-saving kingdom, or if you plan to be an ass-hole once in your life, then this is your book

Sunday, August 16, 2009

What Apple is doing with this big-ass data center?

Going search, Going social, Going Apps, Going something completely new?

Saturday, August 15, 2009

Google loosing shares

Quoting mashable

"Numbers released by Nielsen tell a similar story: while Google grew from June to July, it still lost market share to its competitors – from 66.1% in June to 64.8% in July, a 1.3 percentage point drop. However, a closer look at the numbers reveals that Bing wasn’t the primary culprit – it was Yahoo which stole Google’s market share."

Friday, August 14, 2009

Latent Space Domain Transfer between High Dimensional Overlapping Distributions

This paper combines different techniques for learning across different knowledge domains. SVM Regression is used to fill up missing values, and SVD is used to reduce dimensions. A theoretical bound for the two combined techniques is provided.

Tuesday, August 11, 2009

Knol: is not looking goog(d)

Hmm I had a similar idea and Google released it before. I guess I was wrong.

quoting marketpilgrim: "It’s been a little over a year since Google launched Google Knol. Now it appears the service may not make it to its 2nd birthday."

Monday, August 10, 2009

How do you define a query session?

How can you identify a query session? Smart Miner: A New Framework for Mining Large Scale Web Usage Data suggests using three major components: 1. temporal visit constrains; 2. the links among pages, and 3. maximal visit paths, computed using an a-priori like algorithm. I suggest reading the paper if you want to see reasonable ideas for identifying query sessions.

What I don't like is the experimental part. A site with 1,5K unique users and 5K pages cannot be considered a Large Web site...

Sunday, August 9, 2009

Book Review: Building Search Application


Building Search Applications: Lucene, LingPipe, and Gate is a pretty good introduction to Information Retrieval with a lot of pragmatic examples. Based on Lucene, Gate and LingPipe. I recomend to add it to your library if you like Lucene and Nutch or if you need to maintain or create a medium scale search application.

Saturday, August 8, 2009

Friday, August 7, 2009

OffTopic: Dear Kara, I love your blog

Dear Kara, I love your blog. Everyday, I cannot start working without checking it.

Thanks
Antonio.

Real Time Query Expansion -- Query Logs vs News (update)

After about 12 hours I checked two real-time search engines (Namely Twitter and Oneriot). None of them offer a real time query expansion service (Yet).
Both of them have a "trending topic", not related to the particular query submitted by the user



Thursday, August 6, 2009

Real Time Query Expansion -- Query Logs vs News

Many Search engines offer a related query suggestion service. For instance, when you search for "Obama" the search engine can suggest the query "Is Obama Muslin?". This happens because both the queries have been submitted very frequently by different users in the same search session. In information retrieval this process is called Query Expansion. A common approach is to extract correlations between query terms by analyzing user logs.

The query log based approach shows its limit when you deal with real time events. In this case, there might be no time to accumulate past queries since events are happening right now. For dealing with real time search query expansion, a new idea is to extract fresh correlation from news events.

For instance, Sonia Sotomayor has been just confirmed to the high court.



Judd Gregg, is one of the supporters



And the algorithm nailed the correlation



Now compare the query suggestion provided by Google, where no correlations are provided since the event is too recent.



And compare with the query suggestion provided by Bing, where related search query log based are shown



I believe that leveraging both past query logs and real time news events can provided a more complete and updated query expansion service, since you leverage the best of both the worlds.



(PS: In addition, please note that both Bing and Ask are showing a related fresh video, while Google is not)

Wednesday, August 5, 2009

Tagommenders: Connecting Users to Items through Tags

Can tags be used for improving the performance of recommandation systems? This paper investigates the idea, comparing different signals derived by tags, and implicit or explicit rating.
A bunch of interesting metrics, but the MovieLens dataset is too small and it's not easy to understand how they will scale on large scale.

Tuesday, August 4, 2009

Bing's market share

"Bing's share of the search market grew another percentage point in July, indicating that some of those initial users may be sticking around for the long haul. Google, on the other hand, fell by nearly the same amount, and now faces the combined forces of Microsoft and Yahoo in the race for search market share." arstechnica


Monday, August 3, 2009

Learnin to Rank (a good tutorial)

A very good and updated tutorial for "Learning to rank", from Microsoft Reasearch @ WWW09.
This extends my previous favourite tutorial on the subject [pt1][pt2]

Sunday, August 2, 2009

WebIR a classical tutorial

A complete WebIR tutorial from WebBar 2004, a bit outdated but still valid. The major topic missing there is Learning to Rank -- which has been introduced in 2005.

Saturday, August 1, 2009

How fast Flicker photos propagate in the social network?

The paper "A Measurement-driven Analysis of Information Propagation in the Flickr Social Network" provides an answer: the propagation is quite slow, and each published photo "jumps" no more than 1-2 hops in Flicker social network.

I wonder how what would be the result for a similar study applied to social network more newsworthy such as Facebook or Twitter.