Monday, February 28, 2011
Sunday, February 27, 2011
A good book to read if you are in DM: "Ensemble Methods in Data Mining: Improving Accuracy Through Combining Prediction"
A couple of years ago, I got fascinated by the simplicity of Adaboost algorithm and the amazing performance it offers for mining quite different sets of data. Adaboost is based on a very simple intuition: you start with a very simple and naive classifiers and then you improve the performance with boosting techniques. After that, I started to read papers about RandomForest, Bagging, Mart and many other boosting methodologies but I felt the lack of an unifying approach and description of all those techniques.
Ensemble Methods in Data Mining: Improving Accuracy Through Combining Prediction offers this view by giving a description of the ISLE framework in a very synthetic yet detailed exposition. Great work!
Saturday, February 26, 2011
Friday, February 25, 2011
Given with two arrays A and B, each of size N where the elements of array contains either 1 or 0 we have to find such an interval (p,q)(inclusive) such that the sum of all the elements of A in this interval and sum of all elements of B in this interval is equal.
Thursday, February 24, 2011
Wednesday, February 23, 2011
Tuesday, February 22, 2011
Monday, February 21, 2011
We finally are live in 8 markets and 5 languages with geo-located aggregation, ranking, and filtering. This is really an amazing result achieved by my team
Sunday, February 20, 2011
Saturday, February 19, 2011
Friday, February 18, 2011
Thursday, February 17, 2011
Wednesday, February 16, 2011
Tuesday, February 15, 2011
But.. wait a second. Could this be a new way of expanding my social network? I'd like to be friend with someone who shares my musical interest. Facebook is suggesting new friendships because You and the potential new friend(s) are both both having one or more friends in common.
So what about suggesting new friendships because You and the potential new friend(s) are both having similar musical interests? This would be an interesting new signal for making new friendships.
I wonder if this principle can be applied in other contexts as well...
Hey Youtube, I have one idea for you...
Monday, February 14, 2011
Sunday, February 13, 2011
Saturday, February 12, 2011
This feature is useful but it's main problem is that all the references are public and people tends to coalize and quite soon they start sending multual bragging and very often exagerating their own capabilities. Therefore, it is sometime very hard to make a distinction between a good and a bad (or exagerrated) reference. How to solve this problem?
If you think about how "References" work in real life, you soon understand that we have a different mechanism in place. When I want to hire someone, I ask the candidate to provide the name of three anonymous refereers. Then, I contact the referees directly and their authoritativeness together with the secreteness of the communication will allow me to judge the quality of the candidate with no risk of listening to exaggerated bragging.
So LinkedIn, how about having a similar feature? I would pay for it. Give me a way to ask for secret refeerees and a way to communicate with them in a private manner.
Friday, February 11, 2011
Not sure whether there is already a similar social service out of there, but I woul like to have a way to instantaneously communicate with the people around my geo-location that are willing to listen.
I believe that this would be an interesting feature for establishing new friendships around a place (e.g. a pub, a bistro, a disco, and so on and forth), or for getting help when I am in new area. Probably, there should also be a way to tag annoying people and for switching this feature on and off.
Doe anyone know whether there is already such social service?
Thursday, February 10, 2011
In reality, people have multiple interests (or multiple colours). I could be very much interested in following all the messages coming from my friend Alice about latest cool technological gadgets, but not very interested in getting all her messages about recipes. Anyway, Mandy can have a lot of interest in recipes and zero interest in all the Alice's postings about gadgets. Alice has multiple interests and they should be expressed by the way of multiple colours. How to solve this problem?
How about to adopt a social solution? Let anyone posting a message annotate it with one more more tag ("recipes", "gadgets", and so on and so forth). Let anyone receiving a message express a like for that specific tag. This combination can be used by Facebook ranking algorithms as a powerful signal for expressing my willness to receive more and more messages about a specific topic (and less messages about a topic that I never liked).
How to create new topics? Well again a social solution. First, suggest a list of popular topics "as a I type", because I am lazy and I want to reuse meaningful tags already produced by my social graph. Second, simply allow the creation of new tags when no satisfying match is found. Each tag is a new colour and this world can therefore become a full HD colour TV.
How to select the topics that I like? Well again a social solution. I should be simply be allowed to "Like" that tag and this is an explicit signal to Facebook.
So Facebook can I have this feature?
Wednesday, February 9, 2011
Tuesday, February 8, 2011
Monday, February 7, 2011
Sunday, February 6, 2011
PS: In IR these are the skip lists, can you devise an quadratic algo? how about a linear one?
Saturday, February 5, 2011
Friday, February 4, 2011
Thursday, February 3, 2011
My reaction is mostly one of surprise. I am surprised that Google wants this issue discussed in the press. I am surprised that Google wants this aired in the court of public opinion.
Google is trying to draw a line on what use of behavior data is acceptable. Google clearly thinks they are on the right side of that line, and I do too, but I'm not sure the average searcher would agree. And that is why Google is playing a dangerous game here, one that could backfire on them badly.
Let's take a look at what Google Fellow Amit Singhal said:
This experiment confirms our suspicion that Bing is using some combination of:Of course, what Amit does not mention here is that the widely installed Google Toolbar and the fairly popular Google Chrome web browser send very similar data back to Google, data about every page someone visits and every click they make. Moreover, Google tracks almost every web search and every click after a web search made by web users around the world, since almost every web search is done on Google.
or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click.
- Internet Explorer 8, which can send data to Microsoft via its Suggested Sites feature
- the Bing Toolbar, which can send data via Microsoft’s Customer Experience Improvement Program
By raising this issue, Google very publicly is trying to draw a particular line on how toolbar and web browsing data should be used, and that may be a dangerous thing for Google to do. The average searcher, for example, may want that line drawn somewhere other than where Google might expect it to be drawn -- they may want it drawn at not using any toolbar/Chrome data for any purposes, or even not using any kind of behavior data at all -- and, if that line is drawn somewhere other than where Google wants it, Google could be hurt badly. That is why I am surprised that Google is coming out so strong here.
As for the particular issue of whether this is copying or not, I don't have much to say on that, but I think the most thought-provoking piece I have seen related to that question is John Langford's post, "User preferences for search engines". John argues that searchers own their browsing behavior and can reveal what they do across the web to whoever they want to. Whether you agree or not with that, it is worth reading John's thoughts on it and considering what you think might be the alternative.