Citizendium Blog

July 24, 2007

A Citizendium Web Directory?

Filed under: Project growth, Web 2.0, Theory — Larry Sanger @ 6:54 am

As you might know, we’re deep in development of an expansion of the Citizendium’s scope.  The basic idea is that we’re going to have different kinds of reference material on subpages of encyclopedia article pages.

Link lists are one sort of subpage we’re talking about.  Here are some policy guidelines I wrote up today for link pages.

I think that ultimately, a really successful Citizendium collection of Links pages would constitute a very useful Web directory, with three advantages over previous directories: (1) links would have been chosen for quality first and foremost, and ultimately vetted by experts, (2) it will be very easy to prevent “link spam” given the way our community is set up, and (3) the links would be filed under an ever-expanding, very fine-grained category scheme. The result could be used by search engines, via various algorithms, to drive the selection of higher-quality search results. The result, one hopes, is that the highest-quality results would be more likely to float to the top of search results.

This is what I argued in a keynote delivered at the Consortium of Liberal Arts Colleges Annual Meeting, Reed College, June 13, 2007: What should we do about Internet “cruft”? Toward knowledge-rich websites

Here are some excerpts.

But hardly anybody would say that search is a “solved problem.” Some people complain a lot about the vast quantities of garbage online—especially educators, librarians, and journalists.  These are the traditional gatekeepers of information. Many of them are dismayed that so much biased, misleading, and outright wrong information is available so easily. Many also are troubled by the amount of “junk information”: they find too much that is trifling, unimportant, and insubstantial, the mental equivalent of junk food.

Of course, you might think, how could we do anything about Internet cruft, anyway? You can’t stop it; that’s like trying to stop the tide. You might not be able to stop the existence of cruft, but there are certainly ways to avoid seeing it.

But I’m getting ahead of myself. The first step in deciding what to do about cruft is to ask whether there really is a problem in the first place—and if so, what precisely the problem is.

After all, a lot of people would say there’s no problem at all. I can hear you saying—and I’d agree with you—that the existence of so much wrong and lightweight information on the net is an inevitable and not particularly troubling byproduct of a wonderful new development, that everyone is now empowered to reach a worldwide audience. You might as well complain that people say all sorts of false things in the privacy of their own homes. That’s just a consequence of freedom of speech.

I think there is a problem, but the problem isn’t with the expanding scope for free speech that the Internet makes possible. That, I think, is a very positive thing. So what is the problem about “Internet cruft”?

So Internet cruft is not a problem merely because it exists, but because of the use to which we put it. Often, a lot of what you and I might call cruft isn’t meant to serve as a definitive presentation of some information. When somebody makes a YouTube video of the funny noises her cat makes, she isn’t acting as a serious documentary producer. She’s just some random person who wants to share a funny video. Usually, what we see online is just the humble perspective of one person, or of a group of amateurs having fun together—and there’s nothing wrong with that, in itself.

The problem arises when we sit down to use our search engine of choice and actually look for facts, to get a balanced and well-informed perspective on things. The problem is about finding good information, not about the mere existence of bad information.

Thing is, there’s tons of good information online. So maybe we can put the problem this way: why is good information so hard to find?

At this point, you might well be wondering if there really is any problem about Internet cruft at all. So let me tell you, finally, what I really think.

It’s this. The problem with Internet cruft isn’t that it exists, or that the existence of cruft makes it harder to find reliable information. The problem, rather, is simply that there isn’t as much fantastic information online as there could be. The problem with Internet cruft is that it keeps us thinking small. I want us to think big.

Let me explain. Right now, to rise to the top of the search rankings, all you have to do is be popular. And to be popular, all you have to do is be minimally credible, and have some interesting information. People will link to your website, which will cause it to rise in the rankings, and then, if your website seems to have quick answers to a quick Web search, people will click on it.

To be popular, therefore, all you have to have is quick, credible answers. There’s no pressure to be maximally useful in addition.

Let’s take the “Lion” search as an example again. Most of the informational articles there are rather short, and have a few pictures of lions. But imagine, if you will, the ideal website about lions. There is a big long meaty encyclopedia article. There’s a shorter and simpler article for children. There’s an annotated bibliography. There’s an annotated set of Web links. There’s plenty of immediately-accessible multimedia. There is a gallery with thousands of free lion pictures. There are recordings of lion roars. There are educational and other videos all collected together. There are maps. There are links to scientific studies. There is educational material.

That’s what I mean by a “maximally useful” website. I want the whole enchilada all in one place.

So now my question is: how could search engines reward knowledge-rich websites—or, how could we help them do so?

Well, I don’t propose that anyone catalog every website that’s out there by hand, since that’s impossible. But what we can do is create a list of websites that are of unquestionably high quality. We can then use data about what those websites link to, and what websites those websites in turn link to, to seed an otherwise human-free search engine. Actually, I don’t presume to have an interesting opinion about how a good search engine might make use of data about which websites really are of high quality. But a bit more about that later.

So how then should we create this list of websites that are “of unquestionably high quality”?

I’ve got a proposal. Before I explain it, I should give some credit to British columnist Simon Stuart who, in a short piece a few months ago, argued: “the web needs quality control.” So, he said, we should set up an “Internet-Wide Accuracy Commission,” or “iWac.” This would produce “a list of approved websites,” and even give approved websites a logo they could stick on their site, like the Good Housekeeping Seal of Approval.

Well, I don’t exactly want to propose all that, but I do want to propose something like it.

Right now, Citizendium is just an encyclopedia project, but we have added some other kinds of data, such as images, lists, and bibliographies. In the coming months, I hope to kick off a lot of ancillary projects. I hope to announce actual projects, with fleshed-out policies and project leaders, devoted to bibliographies; information catalogs (in other words, almanac-type lists); galleries; debate guides; and possibly other things too.

Well, one of these ancillary projects will be a Web directory. …

On Citizendium, people are already adding links to credible websites at the bottom of articles, but we haven’t been doing this systematically or according to any clear rules. Well, we’re going to move what links we have now to their own separate links pages, and we’re going to start asking contributors to make more extensive sets of links, divided into sections, about every aspect of a topic. In short, we’re going to start something that is more of a serious, and seriously useful, Web directory.

For example, on the links page attached to our Biology article, we’ll have links to introductory articles; free textbooks; Biology image databases; general Biology encyclopedias; important and interesting essays about Biology in general; and, essentially, every type of information about Biology in general on which it’s possible to have credible websites.

So you’ll be able to quickly click from an encyclopedia article to various other genres of information on the same topic. If you’re looking at an article about Biology, then, you’ll be able to click on “links” and come to a whole page devoted to the best Web content about Biology in general.

That’s the proposal in outline. What I haven’t answered yet, however, is why the world needs another Web directory when, as I said, the Web directories that exist, like Yahoo! and DMOZ, don’t really add much to what Google and other sophisticated search engines offer.

There are two truly exciting advantages of this proposal. First, we might build, perhaps for the first time ever, a free and enormous general collection of credible, expert-approved links. The Web is full of link lists compiled by amateurs. It could use one that is managed by actual experts.

Second, because Citizendium requires real names and identities, we have had virtually no vandalism. The likelihood of link spam is very low, because we have a policy against self-promotion. You can ask someone else to put up a link to your website, but you can’t put one up yourself.

Given these two advantages—that the directory will be managed by experts, and that it will have a low rate of link spam—I think we can expect the signal-to-noise ratio to be very high. Our directory should have a very low incidence of cruft.

But still, you might wonder, even so, what would the directory be good for? It would be silly to think that Citizendium’s Web directory might replace Google. Our directory will never be nearly as complete as Google’s.

Well, imagine what it will be like if we had one million encyclopedia articles—considering that Wikipedia has nearly two million, we think this is possible within some years—and ten links per topic, on average. That would be ten million links, compiled collaboratively under the guidance of experts.

Imagine, then, a search engine taking this free information and re-ranking its search results based on whether a website appeared in the Citizendium Web directory. This would, I think, solve the problem I described earlier. If search engines were to use the data we collect to re-rank its results, they would in effect be rewarding knowledge-rich websites.

If a website is actually credible and useful enough that the Citizendium Web directory links to it, then that simple data can be used in all sorts of interesting ways. Remember also that it isn’t just a URL that is associated with a particular topic. In addition, we’ll file different links under their proper data type—such as essays, images, or textbooks—and the links will (I hope) be annotated.

Here’s the whole thing.

2 Comments »

  1. Hi Larry,

    you write “As you might know, we’re deep in development of an expansion of the Citizendium’s scope.”. I have an honest question for you: Wouldn’t it be wise to take care in fulfilling the promise of the original scope of Citizendium first?

    Mathias

    Comment by Mathias Schindler — July 24, 2007 @ 8:33 am

  2. This is a fair question, Mathias. Without going into great detail, my answer is: first of all, but this is the original announced scope of Citizendium. It has always been my plan to expand the scope of Citizendium beyond just an encyclopedia relatively early on. The encyclopedia is doing well enough that, with a few tweaks such as automated application approval, my anxieties will be more or less put to rest as far as that goes. We are hosting many other kinds of content already, with pages specifically devoted to “catalogs” of this and that, bibliographies, galleries, etc. The subpage project merely puts our official stamp of approval on that varied non-encyclopedic behavior, and organizes it neatly onto subpages.

    Besides, I think that when we are ready to actually announce our various subprojects, in a month or two, we will get a fresh flood of interest in the entire project. The encyclopedia project will only benefit, I think. I have the impression that many people areinterested in relatively narrowly-defined tasks, and creating an opportunity for them to work on other sorts of content will not so much distract them from writing encyclopedia articles is it will attract other people, who are interested in that other content and not encyclopedia articles.

    Sad that I never had this idea for Wikipedia. The more experience I have with it, the more I am persuaded it’s the right move. Actually, I think I did have the idea, although not in this precise form (as far as I recall); but I constantly rejected it precisely because I thought, as you do, that it was crucial that we focus. But I’ve come to believe (as you can see in a Forum discussion) that the various subpage projects are actually entirely continuous with the encyclopedia projects, so it isn’t as much of a stretch as you might think.

    There are a couple of subpage projects, particularly the news summary project and the debate guide project, that I don’t propose to start right away. Those projects would require extra time that the other ones wouldn’t.

    Comment by Larry Sanger — July 24, 2007 @ 9:27 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress