Friday, May 30, 2008

h-Index is a new citation impact measure

Recently, in information science, perhaps due to the proliferation of easy to use biblio-tools, there is a renewed interest in understanding and measuring the impact of research publications and authors.

I learned from Dannyfest and Don Norman that there is a new tool appropriately called Publish or Perish. It is available for free-download for both Windows and Linux. Alternatively, you might use the web version of a different tool instead called QuadSearch. Publish or Perish calculates a bunch of citation impact analysis metrics, and one of them is a metric called h-index, which has gotten considerable interest lately in information science.

It is also sometimes referred to as the Hirsch Index or Hirsch number.

A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np-h) papers have no more than h citations each. So, as an example, I used the tool to calculate my h-index, and found that I have a h-index of about 19, which means that I have 19 papers that have at least 19 citations, and I have no other papers with more than 19 citations.

For comparison, Stu and Danny have h-indexes in the high 30s (39 and 36 respectively). Johan, Peter, Jock Mackinlay have in the low 30s (33, 31, 30 respectively). Victoria Bellotti, Mark Stefik are around low 20s (24, 23 respectively). You may be interested in downloading the tool and trying it out to "measure" yourself and your friends. :) Maybe it will inspire you to do greater and more impactful research! Or maybe it will just boost or deflate your ego! Use with caution!

Hirsch suggests that, for physicists, a value for h of about 10-12 might be a useful guideline for tenure decisions at major research universities. A value of about 18 could mean a full professorship, 15–20 could mean a fellowship in the American Physical Society, and 45 or higher could mean membership in the United States National Academy of Sciences. Clearly, the numbers needs to be adjusted for computer science related fields, due to the different nature of our publications (more conference papers, etc.)

PS: BTW, you should take care in choosing the fields on the left hand side, so you avoid picking up bad citations. This is one of the best tools I have seen (even better than the SCI [Science Citation Impact] one, because it appears to be more comprehensive.)

Tuesday, May 6, 2008

Microsoft Live Mesh syncs folders and files

A central concept in computing these days is syncing between the myriad of devices and storage locations that a user might own. The reason why this is a good idea is because users want to get away from the problem of having to keep track of where data lives. They simply want the information that they need to be available at their fingertips. I have been using Foldershare for a number of years, which was acquired by Microsoft. Now based on the same technology, Microsoft have released a new service called Live Mesh, which syncs not just folders and files, but also applications. These types of services are also being developed by startups in the valley, including SugarSync, which I beta'ed, and also Dropbox.

Of these, only Foldershare and SugarSync currently support the Mac.

I believe this is an important step forward for end-users, as this relieves them from having to worry about where the data is located, and they can simply rely on the background services to make the data available. This is the concept of content-based computing and networking, which Van Jacobson here at PARC has talked about.

Monday, May 5, 2008

Service excellence can be defined as what a business chooses not to do well

Harvard Business Review (Apr 2008 issue) had a great article on service design: pick a niche, make sure your system doesn't
just scratch an itch, and then serve that market by optimizing for that niche. I think there is a lot of interesting lessons in here about service design. The article agues that in service design you must pay attention to four elements, and they're often driven by the idea that "Service excellence can be defined as what a business chooses not to do well."

(1) the focus on offering. The offering here is to find out what customers want out of their service experience. If customers want longer store hours, you may be able to offer them out by trading off a higher cost to the products. It cites Walmart as an example, where they traded sales help for cheaper prices.

(2) the focus on the funding mechanism. The focus here is on understand who is going to pay for the service. For example, it uses Progressive insurance as an example, that they send vans to assess the damage on the accident scene. This turns out to reduce their fraud rate as well, and that's how they pay for the improved customer service experience.

(3) employee management system. Having your employees be of a certain type might mean you have to trade off on other attributes. For example, Commerce Bank focuses on having a great teller experience, so they don't select for the smartest straight-A students, but instead selects for people with great attitudes.

(4) customer management system. The key here is to design a way to modify customer behavior. Discounts and late fees is one way (instrumental) to modify behavior; Another is to use normative means such as reputation, shame, and pride to motivate customers to do the right thing. It cites Zipcar, the car-sharing service, as an example here.

The sidebar on how incumbents in a business react to more focused entrant firms was really educational, as it reflects well with how big companies sometimes react to threats to its current lines of business.

Hadoop at the heart of Yahoo Search

Hadoop running in production on the Yahoo! Search Webmap

"The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search. Some Webmap size data:
- Number of links between pages in the index: roughly 1 trillion links
- Size of output: over 300 TB, compressed!
- Number of cores used to run a single Map-Reduce job: over 10,000
- Raw disk used in the production cluster: over 5 Petabytes"


Doug Cutting used to work at PARC, and we have been using Hadoop heavily in our research as well.

Friday, May 2, 2008

Early work on Social Bookmarking

I just dug out this early work on social bookmarking and social indexing:

Wittenburg, K., D. Das, L. Stead, and W. Hill (1995) Group Asynchronous
Browsing on the World Wide Web. In Proceedings of Fourth International
World Wide Web Conference, Boston, MA., December 11-14, 1995, pp. 51-62.
http://www.w3.org/Conferences/WWW4/Papers/98/

key points:

- The obvious move for providing access to personal or general subject-oriented indices is to manually or automatically collect them into a database and then provide query or browse capabilities over this database.

- we have created a server that collects and merges bookmark/hotlist files of participating users and then can serve (subsets) of these merged bookmark files to either standard HTML client browsers or to a client built with the multiscale visualization tool Pad++.

- we have included one general purpose subject guide in our initial experiments as well, namely, Yahoo [18], whose role we will subsequently explain. Such a database combined with a World Wide Web server, which we call a Group Asynchronous Browsing (GAB) server, can then provide access to a merged subject tree structure in various ways. This collection of tools is intended to address the issue of how to utilize the browsing activities of others to discover resources, some of which themselves may be guides to further World Wide Web resources.

- The essential point to note with respect to information discovery is that, starting from some particular resource, new resources that have a good chance of being similar to it may be discovered by navigating "up" to any of the subject headings that include this starting resource and then navigating "down" from those subject headings to other, potentially unknown, resources.

- One of our goals then is to explore World Wide Web services that might be based on such merged subject trees.

Teresa by Sergio Endrigo

Was in Vernazza in Italy and was introduced to this beautiful song by a restaurant owner.

Teresa

quando ti ho datto quella rosa [When I have given that rose to you]
rosa rossa, [Red rose]
mi hai detto
prima de te io non ho amato mai.
[You have said to me
Before you I have not never loved]

Teresa
quando ti ho datto il primo bacio
sulla bocca,
[When I have given the first kiss to you
On the mouth]
mi hai detto
adesso cosa penserai di me.
[You have said to me
Now what you will think about me]

Teresa
non sono mica nato ieri
[At all they are not been born yesterday.]
per te non sono stato il primo
nemmeno l'ultimo
[for you they have not been first the not even last one.]
lo sai, lo so.
[you know, I know it.]

Teresa
di te non penso proprio niente,
proprio niente,
mi basta,
[Of you not task just nothing
Just nothing
Me enough]

restare un poco accanto a te, a te
[To remain beside you, to you]
amare come sai tu non sa nessuna
[To love like you know no one else.]
non devo perdonarti niente
[I have no regrets]
mi basta quello che mi dai
[For me that's enough]

Teresa, Teresa