Wednesday, March 30, 2011

A feature I like in Google Search (for power users)

Search "Swine Flu" with temporal information turned on. You clearly see that they tagged offline temporal entities, since those are highlighted in the snippets

Monday, March 28, 2011

Specialize swap

in C++ how do you specialize swap for your own Class? and what about a template?

Sunday, March 27, 2011

A feature suggestion for Wikipedia or Facebook: Social Wikipedia, or wikipedia with Subjects

This is not a new idea. Three years ago, I was considering to leave my CTO position in Ask.com for creating a new start up around a new idea of social content creation. This posting will describe a revision of those original intuitions.

I am a big fan of Wikipedia, which is, in my opinion, the natural complement of Web search. In Wikipedia, hundreds of thousands of editors contribute to create content organized in millions of different encyclopaedic voices which are available in multiple languages. The key strength of Wikipedia is that anyone can start creating content but this content will be continuously revised by other editors. As a consequence, this never ending process guarantees very high quality. Indeed, On 14 December 2005, the scientific journal Nature reported that, within 42 randomly selected general science articles, there were 162 mistakes in Wikipedia versus 123 in Britannica.

My idea was to give a clear identity to the editors, associate an explicit profile and an explicit network of relations, so that it would have been possible to identify experts around different topics in a clear and direct way.

Adopting a more recent jargon introduced in my recent posting "from objects to subjects", I can describe the idea in a more synthetic way.  Encyclopaedic voices in Wikipedia are a particular type of objects and the editors are the subjects who are creating, editing and discussing those objects. It should be possible to rank those subjects, match on the fly different subjects according to the objects they annotate, identify expertise in subjects and so on and so forth.

You may wonder what was the reason why I decided to not create a new startup. Well, while I was considering the idea Google announced Knol, which is probably one of the less successful initiative started by Google because Wikipedia was already very strong and there was no idea of Social graph at that time.

Anyway, three years ago was probably too early but now it could be the right time to have a Social Wikipedia == Social Graph + Wikipedia.

What do you think?

Saturday, March 26, 2011

Friday, March 25, 2011

A puzzle with switches

There is a lock which is an N by N grid of switches. Each switch can be in one of two states (on/off). The lock is unlocked if all the switches are on. The lock is built in such a way that, if you toggle some switch, all the switches in its row and its column toggle too

Give an algorithm which, given a starting configuration, will tell you whether the lock can be unlocked by a sequence of switch toggles

Thursday, March 24, 2011

A question of classes

One frequent element of discussion during my interviews is about the different types of classes you can define in C++. In this sense, C++ Coding Standard provides a valuable summary:

  1. Value classes have a public constructor and a public destructor and a copy constructor with value semantics. Also, they have no virtual function, no virtual destructor and so you cannot inherit from them. The idea is that you use value classes for holding other objects and the typical example are STL containers;
  2. Base classes are used as base of a hierarchy. Therefore you can have a virtual public destructor, or in very rare cases a protected non virtual destructor (if you don't want to use polymorphism as in std::unary_function;
  3. Trait classes are template classes with information about types, where you define a certain numbers of typedefs and you never instantiate explicitly the class with a constructor;
  4. Policy classes represents pluggable behaviour and it is usually instantiated as a base or a member;
  5. Exception classes mix a value and reference semantics since they throw the exception by value and capture it by reference. Quite frequently they have virtual functions.


Tuesday, March 22, 2011

A feature suggestion for Facebook: one login, multiple social graphs stored in a central place

I really like the idea of having one single login for multiple sites and Facebook connect is a real killer application which rapidly became a de-facto standard over Internet because it simplifies our life. In short, I believe that having one single login is cool because I am myself regardless of the place where I am.

Anyway, having a single global social graph is not exactly the same thing because the friends I have for my hobbies (acquaria, ham radio, and programming) are different from my professional friends, who are in turn different from my gym friends, who are in turn different from my Italian friends, who are in turn different from my U.K. friends and so on and so forth. Also, I use other community sites powered by vbulletin and other similar software tools which encourage users to build local social graphs (see the friend list feature) for each different community site.

In this case, having multiple social graphs hosted in one central place but with different access rights for being shared would be probably much better than having one single social graph hosted in one central place.So facebook, how about being the place in internet where everyone can store his own set of social graphs and decide how, if and when they are connected?

Saturday, March 19, 2011

Smart pointers

Smart pointers must be part of the background of every programmer. Surprisingly enough this is not the case in several situations and many candidates who are not aware of them.

The intuition is very simple: when you assign an area of memory to a (naked) pointer, you must remember to deallocate that memory when you are no longer using it. This is the typical situation which can create leaks because it is easy to forget to deallocate an area of memory in a complex program.Smart pointers deallocates the area of memory when they are no longer used, in the destructor. Since an area can be referred by many smart pointers we should have a shared counter where we store the number of active references. Here you have the code. For a more professional use, please refer boost.

Thursday, March 17, 2011

Find single element

You are given an array A that contains integers. Every integer occurs 3 times in A leaving one integer that appears only once. Fastest way to find that single integer

A rant against CSS

So I should be considered a backend guy. The problem is that I don't know what is the different from backend and frontend and the two things are actualy the same one.

Anyway, this is a rant. Who was that crazy guy that suggested to abandon the HTML table and adopt divs? Cmon it's embarrassingly hard to create a table with autofit and div!!!

Virtual constructor

In c++ we don't have such a thing as a virtual constructor. Anyway, given a derived class B : public C would you call a virtual function in the constructor of B?

Wednesday, March 16, 2011

QT clustering

QT clustering is a very simple clustering algorithm, where you don't need to provide the desired number of clusters (like in K-Means). Instead, you give a maximum distance allowed. In one variant, you consider the diameter of the clusters already formed. In another variant you consider a single linkage distance. Here you have the code

Tuesday, March 15, 2011

A feature suggestion for Facebook: associate objects to subjects, Web page owners can annotate their pages with their Facebook profile

In my post From objects to subjects: Next possible scenarios for Social search, i claimed that social search is a complete paradigm shit, where you no longer interacts with just objects (pages, images, news and so on and so forth), but you start to interact with subjects that are making concrete actions on objects (create, comments, likes, annotations, shares, and so on and so forth).

Now, one technical problem is how to associate in a reliable way the owner of an object with the subject that created it. I started to think about the problem early this morning while running in Pisa under a light rain. The run was amazing good, and I felt very good after 7 kms.

My first thought was that I can simplify the problem if I just focus on Web pages (a type of object), because almost all the other objects (images, videos, audios) are generally embedded in web pages.

My second though was that all the Web pages belongs to a domain. Therefore, I can check whether the subject claiming the ownership has email access to that domain by sending an email and asking for an explicit acknowledgement. After all, Facebook adopts a similar solution for checking the rights when you want to declare your (work) affiliation. This solution may work but has the problem of discriminating the ownership for multiple users belonging to the same domain. It works for affiliation, it does not work for ownership.

My third thought was that the owner of a page has, by definition, edit rights on it. Any owner can edit her own pages. Therefore, she can embed a special tag with an unique id generated for checking the ownership. Whoever wants to check the ownership can, therefore, check the presence of this tag and verify the ownership.

Now, if you think about it you can ask yourself if you have seen this elsewhere. In a sense, Facebook is already adopting a similar principle for their Like buttom which must be generated ad-hoc for each specific page. Therefore, Facebook can check the ownership of a page once the like button is embedded in the page, because only the owner can edit the page.

Apparently, Facebook is not exposing this feature because the Like button is "only" used by other people for express their preference on a page (an object), instead of checking the ownership of the object itself.

Anyway, this could be a good feature: Giving the opportunity to all the Web page owners to annotate their pages with their facebook profile. In my lingo associate any object with the subject that create it.

PS: here it is the like button generated for this particular URL. I certify the ownership of the page by embedding the button

Monday, March 14, 2011

The beaty of lazy workers -- doing more with less

I usually sleep 4-5 hours per night, and the less I sleep the more I do. Don't be surprised. It is all about the quality of your activities and the respect you have for your body and your mind.

I am a big fan of lazy working which means do more with less but make sure you conclude your activities just when needed and not before. In computer science, we have this beautiful concept of Lazy evaluation, also known as "call by need" which delays the evaluation of an expression until its value is actually required. The intuition is very simple you start an activity very very early, but you don't conclude it immediately. No, you defer it until you are forced to conclude it by a real need.

Let me make a toy example to clarify. Suppose you need to compute 2+3, you write the sum but you don't carry out the summation. Instead, you defer it. And you keep deferring until you need to consume that result for another operation. And so on and so forth. It's a very simple principle but it is powerful. Because it allows to do more with less. Why? because you can start multiple activities, and you keep deferring them until you really need to consume their results as input to new activities.

The magical aspect is that you can carry out many activities in parallel. If you are good in managing your time, you will start to observe a mutual boosting effect where you reduce the time needed to conclude one single activity because you have all the other pending and waiting on-hold.

In a sense, this is similar to a runner who is not just running. No, she trains her body taking care of different muscles of her body in parallel starting to improve each muscle, then stopping that particular activity and starting a new one, deferring the conclusion until there is a real need.

So next time someone will ask you: "Are you focused? Are you carrying out your assigned activity?" you may answer: "No, I am a lazy worker. I do more with less"

Sunday, March 13, 2011

From objects to subjects: Next possible scenarios for Social search

We are living an age of amazing transformation for Web search, which was very hard to forecast until two or three years ago. In a sense, search is a teenager and it is about to become an adult. A teenager plays with objects, an adult interacts with other subjects.

Search was born in 1995 circa with Lycos, an open source project from CMU. Do you remember "pursuit"? It was a mix of perl script and c code which offered (almost) everything a modern engine is offering including crawling, index, search, dynamic caption generation, and textual ranking. Well, the Web was kind of small at that time -- say a less than 10 millions of pages. So, when the Web became larger we saw the raise of Altavista, which added the ability to scale to an index of hundred of millions of pages. Anyway, the quality of search was still poor because the ranking was purely a textual match. Everything changed in 1999 when Google started to consider the Web as a graph with connections among pages. Pagerank was an idea already used in the academic world, where a paper is important if it is frequently cited by other (important) papers, but the wonderful intuition of Google was to apply the same principle to the Web. In doing so, they created a new type of science. Also, Google had the superb merit of understanding how to make money out of search, when everyone else was considering it as a commodity with no real business opportunities. They introduced an auction system and created an ecosystem with users, publishers & advertisers, and a new science (of making money). In a glimpse Google became a verb, a monopoly, and an incentive to reduce our attitude to learn and reason ("People don't think, they Google!", said my friend Alessio Signorini). Nowadays, Bing and Yahoo! have the merit of fighting the monopoly and using massive machine learning for systematically improving the quality of results, which are in addition more and more visual.

So in 1995 we had almost everything, with the only exception of scale (altavista), a good ranking (google), money (google), machine learning & visual ux (bing). If you think about it in 16 years we saw many changes but no one was dramatic (maybe the only one was to start making money, meaning that the search kiddie understood how to walk on her own legs). More or less, no change was dramatic. Why? Well the Web was still the same. It was a collection of objects (pages, images, videos, news and so on and so forth). Surely, we moved from from say a couple of millions of objects to hundreds of billions of objects (or trillions? ;-) and therefore the data mining problems to solve are fascinating and ever changing but still we are talking about objects.

Now, two years ago something changed. The web is no longer a collection of objects, but we are starting to see people, subjects, organized in a social network. Facebook counts more than 600 millions of subjects, connected in fascinating structure and very active in making instantaneous comments about the world, sharing objects, and liking or disliking actives carried out by other subjects. Twitter offers similar perspective.

Think about it: this is a complete change of search paradigm! You no longer search just objects, you can know start to interact with subjects.


Phase I

If you look carefully, this change is already starting to happen. Today, if you search Web pages (a type of object) in Bing the results are enriched with information coming from Facebook about the subject that produced an action related to that page (or that object). This action can be either sharing that particular link or liking that particular page. In this case, Bing offers a picture of the person -- the subject -- who carried out that particular action. Let's call this phase one.

Phase II

What can happen in the future? The limit is just in our imagination. If you think in terms of objects and subjects, you understand that, in phase I, search is annotating objects with subjects making actions on objects. But it is still a search about objects. Now what is the next step? How about a search directly mixing objects and subjects? In other words, can i submit a query and get as result the best subject, the best expert about that query? My friend Tony is my expert for handcrafted lights, and my friend Max doesn't know this. Can Max get Tony as a result of a query for {handcrafted lights}, in addition to the most relevant sites out of there? Think about it, this could easily be a new type of science where we learn the expertise of subjects by observing what types of objects they produce and what types of object they annotate? Once we have learned this information can we propagate it to the graph of interconnection among subjects. Sometime, we can learn expertise about friends that we were not aware of (my friend Ian is an expert in cycling and I was not aware of it). Sometime we can introduce friends that don't know each others once we discover their expertize. Sometime we can produce a set of subjects (e.g. persons) as result of a query.

Also, we clearly need a robust and consistent model for ranking subjects. We probably have many persona, including producers, early adopters, influential, and the mass. For each topic, we can follow the temporal evolution of likes and create an evolving ranking.

So, phase II would be about moving from a Web of Objects to a Web of Objects and Subjects.

Is this the evolution of the adolescent search into a more mature subject?

Phase III

Now let's assume that we have a Web of Subjects and Objects. How can this impact the art of making money? Can we sell the expertise of subjects and transform the ads space?

In my idea, once you discover expertise by observing public behaviors you can match different subjects and allow them to compete in selling their expertise with an auction based mechanism similar to the one adopted for ads.

Sometime, two matching subjects agree to exchange expertise just for free because they are friends or because they enjoy helping each others. Sometime, when the expertise is more professional and become a Service offered by Subjects, then several experts can compete for offering their expertise at the best price. In this case, the matchmaker takes a little margin for the cost of this transaction. As an example, my friend Ralf is an expert of non-bayesan prediction, and my friend Bret may be not aware of it. So Bret my want to talk with Ralf about his expertise.

So, phase III would be about moving from a Web of Objects to a Web of Objects, Subjects, and Services

Conclusions

Web is evolving from a collection of objects (pages, news, images, videos, and so on and so forth) into a collection of objects and subjects interacting each others and eventually offering services. In my view, Social search is all about this shift in paradigm.

Now, let's step back for a little while and let's ask to ourselves is this a generalization of a model seen elsewhere? I just said: "Pagerank was an idea already used in the academic world, where a paper is important if it is frequently cited by other (important) papers, but the wonderful intuition of Google was to apply the same principle to the Web".

Think about it: in academic world a paper, an object, is important if it is frequently cited by other (important) papers or by other (important) authors, the subjects. This is the natural evolution of PageRank. In addition, an author, a subject, is important if other authors are frequently citing her. More about this point. We can learn the expertise of an author by observing the papers she publishes, We can match authors even if they don't know each others once we learn about their expertise, We can learn about new expertise of authors we already know, and so on and so forth.

In short, the paradigm shift exists elsewhere and we just need to apply it to Social Search. Google focused on the objects (the papers, the pages), Social search is about objects and subjects (the authors and their expertise).

Disclaimer

The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of Bing, Microsoft. Examples of analysis performed within this article are only examples. Assumptions made within the analysis are not reflective of the position of Bing, Microsoft.

I wrote this article during 1.5 hour flight while listening Greatest, Duran Duran

Saturday, March 12, 2011

Special queue

Implement a queue in which push_rear(), pop_front() and get_min() are all O(1) operations.

Friday, March 11, 2011

Japan earthquake (Bing UK and FR)

My thoughts and prayers with all our Japanese friends, colleagues and citizens today.

When those events happen our role is to report relevant information as fast as we can so that users can be informed and, hopefully, reduce that sense of fear and uncertainty generated by lack of first hand communications.

During last months we worked hard to improve the freshness and quality of our search algorithms by leveraging new clustering algorithms which work in multiple countries and multiple language. This is a visual comparison of our results in two countries and two different languages. Thanks to all the teams (Beijing, Bellevue, and London) who made this possible working together for a global market and a better service offered to world wide users.

France









United Kingdom








Thursday, March 10, 2011

News is the ultimate frontier for personalization

My friend Greg was the pioneer in this area with Findory. Then, I really felt in love with paper.li, now Linked.In is entering the arena. They have the people sharing news, and they know the professional category to which that particular user is belonging to. So this is a little and nice innovation.

Wednesday, March 9, 2011

News answers in Fr and UK

When freshness is important (again ...)




Sort 10 Tbs of data

Assume you have 100 tbs of data made up of fixed length records. How would you sort it?

Monday, March 7, 2011

Would you keep you index in internal memory or external memory

Consider this a classical question for IR and the source of infinite discussions for the community. What is the most reasonable answer?

Sunday, March 6, 2011

Prime numbers that are pairs

A couple of pairs prime numbers is made up of two prime numbers that have distance 2 (e.g. (2,5), (5, 7), (11, 13)). How would you generate all the couples of pair prime numbers?

Thursday, March 3, 2011

Build a caption for a document

Given a query Q made up of p keywords Q = k1, k2... kp and a text T find the portion of text which mimize the distance among words. What is the complexity?

Note that this algo is used by search engines to build the snippet provided as an answer to Q and for highlighting word.

Wednesday, March 2, 2011

News Clustering is live in US and FR

News ClusteringV2 is live in U.S.. This is a result of a wonderful collaboration among Beijing, Bellevue and London. London provided the core clustering algorithm, which is language agnostic and scales to multiple markets.


Tuesday, March 1, 2011

What happens when there is no collaboration: no good recipes

A couple of days ago, Google launched a nice recipe search engine where you can search for specific ingredients and get a selection of recipes. This product is very similar to the recipe engine developed by Ask.com three years ago as a result of a collaboration of three offices Pisa, in Europe, Piscataway in New Jersey and Oakland in California.

Unfortunately the first recipe engine suffered a lot of internal fights among different groups aiming at getting the ownership of the product. As a consequence, the quality of the product suffered because the original team was fed up about the political implications of those fights. That was a good lesson learned for the future and something I wanted to share with the readers of this blog.

Competition is outside and it is huge. So tell me: why there are inside fights? Don't get me wrong, competition is always good provided that the rules are fair and that the judges are not part of the competition. Otherwise it becomes just destructive.