Citizendium Blog

November 17, 2007

CZ and what’s wrong with Google

Filed under: Internet, Other projects — Larry Sanger @ 9:41 am

It is interesting that, if you do the Google search for “biology,” you won’t find the fine Citizendium article on the subject.  It is actually better than the #1 result (the Wikipedia article), and it is certainly of more interest and use for people who do the “biology” search than many items in the top 100.

This alone demonstrates that there is something wrong with Google.  But what, precisely?  What is Google doing wrong, and how could they do better?

12 Comments »

  1. With all due respect Larry, this isn’t Google’s fault necessarily. Its search result rankings are automated, mostly, according to rules which a lot of people know about, like PageRank. The problem - given that you’ve done a decent job at search engine optimisation of your content - is that not many external sites link to you, and uncounted millions of sites link to Wikipedia.

    One other point: Google for biology citizendium, and the Biology entry in Citizendium comes up third. Maybe this is a special case, because it’s your first approved article?

    Comment by Paul Montgomery — November 17, 2007 @ 10:25 am

  2. Clearly, you’re missing my point, Paul. Of course I know that not many people external sites link to us, and of course I also know that that’s why our Google rank isn’t high. Surely you aren’t assuming (this would be silly) that simply because a lot of websites link to a given page, that is what makes the page worthwhile. I certainly don’t make that silly assumption myself; our content is in fact higher-quality or more relevant than much of what people find in the top 100 results for “biology”. Should this be regarded as a flaw in Google’s system? Nobody ever said Google system was perfect, of course. But shouldn’t they be doing something to correct this flaw? If so — now to my actual question — what?

    Comment by Larry Sanger — November 17, 2007 @ 1:19 pm

  3. I guess it’s a matter of what Google thinks is worthwhile. Maybe they’re looking for certain kinds of outbound links as well. Maybe their natural language parser decided the article was too scholarly. Google tries to provide search results for all kinds of users, and maybe a majority of users who search for biology aren’t looking for an encyclopaedic article… or if they are, Wikipedia fits the bill and there’s no room for CZ.

    Perhaps a better question is why Wikipedia articles are almost always in the top 10 of every Google result where a page is relevant. It’s quite uncanny. I don’t know if there is some special weighting going on, but I would certainly like to see CZ articles like the Biology one fill the “encyclopaedic article slot”, if such a thing exists, in Google’s search results.

    I would suggest asking some Google people directly.

    Comment by Paul Montgomery — November 17, 2007 @ 8:52 pm

  4. Once cz becomes more popular I wouldn’t be surprised if the google algorithm were changed to give more relevance to cz than wp.
    In the meanwhile, how difficult would it be to write a firefox extension or a bookmarklet that checks if a citizendium entry exists and puts it on top of the results?

    Comment by Andrea Moro — November 18, 2007 @ 2:52 am

  5. Clearly a terrible and even horrific problem that came sickeningly clear to me one day some time back when I discovered Wikipedia and other trendy sites got top billing over and above an article by a world renowned scholar who had written a simply magisterial general article on a certain historical subject. The article was quite buried. But if you were Google, how would you change it? I can’t think of any way to do it except for many humans to make many, many subjective decisions about the actual importance of sites.

    Comment by Stephen Ewen — November 18, 2007 @ 3:17 am

  6. Wikimedia’s never lifted a finger to get in good with Google. Indeed, up to about 2005, Wikipedia’s Google results were awful - even searching on a piece of text from Wikipedia, you’d get a pile of mirror sites first and Wikipedia on about page three!

    Googlemancy is generally a futile pursuit. I can only imagine Wikipedia’s high results are because people keep linking to it and linking to it and linking to it. So this would suggest:

    1. Get on with writing an excellent information site.
    2. Encourage people to link to stuff they like from it.
    3. See 1. Ignore the outside world. Wikipedia being top 10 hampers actually writing the encyclopedia.

    Comment by David Gerard — November 18, 2007 @ 4:43 am

  7. David, again, you’re missing the point. Of course I know that we can raise our Google rank by writing excellent material and getting people to link to it. As one of the original architects of Wikipedia’s Google success, I think I have a fair grasp of this elementary point. In fact, your reaction illustrates a problem I’m concerned to point out: people just take the propriety of Google’s algorithms for granted, when the algorithms are evidently still crude at best, even if they are still as good as we can do for now. The point is that it seems humanity can do better. The question is how.

    Comment by Larry Sanger — November 18, 2007 @ 4:54 am

  8. Yes, sorry, I see your point :-) I believe Jimmy Wales is talking up his Wikia search engine project with visible search algorithms for the reasons you outline. Could be interesting if it goes anywhere.

    Meanwhile, here’s a related thing that may be of interest.

    Comment by David Gerard — November 18, 2007 @ 5:59 am

  9. Andrea is on track, current search algorithms rely on statistical analysis so popularity is what we get in the first top results when searching (incoming links, similar queries CTR, personal search history, etc) which is a good enough metric for relevancy.

    What is needed is context (how does a search engine know in what search mood you are?) and feedback (trusted? sources to define the relevancy of search results). There is not a one-size-fits-all solution but social searching could bring additional data to analyze, specially if anyone can annotate (tag) the search results to have more metadata.

    Comment by Alberto Saavedra — November 18, 2007 @ 4:41 pm

  10. take a comercial license and we’ ll hit number 1

    Comment by tom — November 21, 2007 @ 11:51 am

  11. Google needs to define a ranking algorithm defined by “subjective” criteria. Their Pagerank algorithm tries to define objective criteria to classify web pages, based on the number of other pages that link to it. This models websites as having an objective reality as defined by static linkages in hyperspace.

    A subjective approach would allow you to rank search results as defined by the biases of different user-groups. For example the results that are meaningful to “biologists with more than 10 years experience in research.” A search which specifies that it is intended for high school students would return different results. This would require that every user who does a google search accurately record their search-result rankings, plus the metadata necessary to classify his “search bias.” This would be an extension of wiki-like behaviour to the search-engine domain.

    So every single web search could sort its results an infinite number of ways.

    Comment by Hasan Murtaza — December 2, 2007 @ 3:48 pm

  12. When CZ has an article that is finished and much better than Wikipedia, I suggest you do a campaign in which you get bloggers to blog about this particular facet of Google Search and in turn link to the finished CZ article. See if you can improve the ranking of the CZ article and in turn make a point and draw attention to the CZ project. That would be good marketing.

    Comment by Caylie Pasteur — December 13, 2007 @ 2:12 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress