The following describes an idea I had a few weeks ago while attending an entrepreneur’s conference in Paris. I have little desire and even less time (between Citizendium and Watchknow) to pursue this myself, so I commend it to anyone who is interested. I want to kick it out the door and see if it survives on its own. I will not be working on it myself. I was informed that this idea resembles the moribund PICS project somewhat. I view this as an interesting possible alternative to the too-influential search behemoths, Google and Yahoo, as well as their various would-be Web 2.0 competitors: it would make Web search entirely distributed, decentralized, and less subject to the control of any single interest. By the way, I circulated the idea among a set of very distinguished Internet thinkers and was graced with some interesting replies. Suffice it to say that quite a few very smart people think this is worth thinking about, at the very least. “The case for syndicated Web ratings,” below, captures why I am so excited about this idea.
Should there be a universal standard, like RSS, that enables people to rate (and otherwise describe) websites — and to syndicate that data? If there were such a standard and such syndicated data, search engines could seed their results in creative ways using the data. That’s the basic idea.
Ultimately, such a standard could greatly decentralize the power of Internet search. How? Well, imagine five kinds of tools.
(1) Tools and data types for the ratings themselves:
(a) “Rating toolbars,” like StumbleUpon’s, allow you to recommend and rate a website you’re looking at. In addition, you can write a description, add tags, and rate it on specific dimensions like length, accuracy, grade level, and “family-friendliness.” The toolbar then publishes a “feed” of your ratings wherever you choose. The only required data for an individual rating are: URL and up-or-down.
(b) Moreover, it could be possible to rate another person’s or entity’s feed (meta-rating), as well as a feed of feeds (meta-meta-rating).
(c) Moreover, a feed could have meta-data about the person doing the rating, listing facts like education level, age, ethnicity, political views, or whatever a person might feel is relevant.
(2) Social bookmarking services, such as Digg, del.icio.us, StumbleUpon, as well as websites like Mahalo and Wikia Search, would be encouraged to publish their data using the standard (or at least allow their users to publish their own work easily). Mapping from existing attributes used by, e.g., del.icio.us to a well-designed standard would seem to be easy.
(3) Various “Web rating registrars” collect many feeds in one central location. Most registrars are absolutely open; a few are carefully edited. Moreover, most registrars, based on internal, statistical analysis of ratings, and/or meta- and meta-meta-ratings, offer a service that labels certain feeds as recommending porn, spam, and virus-infested webpages — a sort of distributed blacklist of both websites and of feeds.
(4) Search engines then use the data aggregated by the registrar(s). Due to the quantity and variety of data published in the aggregated feeds, it becomes possible to weight and filter search results not just on Google-style pagerank algorithms, but also things like:
(a) quality according to generally trusted sources; or quality according to your peer group; or quality according to academic and academic-endorsed sources; etc.
(b) whether the page contains porn, spam, or viruses.
(c) webpage type (e.g., one attribute might allow us to search just those pages that are marked as movie reviews).
(d) education level of resource (i.e., suitability for children; or post-graduate work).
(5) Making distributed rating into a Digg-type game. As new pages came on the Web, once they had a certain minimum number of ratings, you can easily imagine “fresh meat” websites that enable and encourage people to rate them even more, letting users rate the newest, most popular stuff coming online about their particular interests. This would work a little like Digg or Reddit, except that the inputs would not come from individual users “Digging” a story, but from countless decentralized feeds rating a fresh page for the first time.
The case for syndicated Web ratings
On first glance at least, the case for syndicated Web ratings is surprisingly, even startlingly compelling.
Improves poor search engine results. Probably the most common complaint about search engine results is that, while often relevant and useful, they do not always place the highest quality material front and center. The best is often buried deep. The system is not broken, but it could use improvement. If there were enough syndicated Web rating data, and effective mechanisms were in place to combat gaming the rating system (e.g., using statistical analysis of ratings, meta-ratings, and “certified” rating providers), the result could be used by search engines to deliver far higher-quality results. This would also subtly encourage people to create higher-quality Web pages, i.e., pages that are more likely to be highly-rated. (Cf. here this paper.)
Decentralizes search power. Not only would the system be open, it would be fully distributed and decentralized, like the Blogosphere. If well-constructed, a syndicated Web rating system would place the most powerful, important dataset for making the Web searchable directly in the hands of Internet users. This could essentially “level the playing field” and could be profoundly disruptive to Google et al.
Many more people would be involved in vetting the Web. There are huge numbers of people using Digg, del.icio.us, and StumbleUpon, as well as newer services like Mahalo and Wikia Search. But their users are contributing just to those search/bookmarking services, and are not benefitting the search results used on a daily basis on services like Google, Yahoo!, MSN, and Ask.com. How many more people would take the time to recommend and rate Web pages if they knew their data would be distributed across the Web, and would help the proper placement of websites they know and love? It could be an order of magnitude or more: suddenly, we all have a direct vote about search results.
Speeds up recognition of good new websites. Websites would not have to wait for months or even years for their quality to be recognized, as they do now. Right now, Google dominates search, and Google’s rankings are effectively but still somewhat lamely determined by a somewhat mysterious, proprietary algorithm involving the most-linked-to and most-clicked-on websites. Since it often takes some really excellent pages months or even years to receive the number of links they “deserve” — if they ever do receive them — it takes that long for them to rise up in Google’s search results. By contrast, if we could seed search results in line with massive amounts of data about website ratings, a really excellent new website might be placed at the top of the rankings almost immediately.
Could be used to tailor search to the individual user. With data about education level, a search engine could, on request, return only those pages appropriate for a 5-7 year old — or for post-doctoral researchers. Moreover, with data included in the feed about the rater, we would be enabled to see, for any given search, what the top rated websites were for our peer group. How teenage girls rate a news article might differ greatly from how 40-year-old men rate them — and this would be useful data for both groups to have. With data about pornography contributed by trusted sources, the user could opt to have a search guaranteed to omit pornography. In general, the adoption of the standard could improve the flexibility and power of Internet search. And because it would be an open standard, it would become possible to use the standard (and later versions of the standard) to organize all manner of distributed Internet rating, description, organization projects, possibly more effectively than proprietary products have done. For example, the system could foster an open project to create a free, more powerful search alternative to proprietary “walled garden” services for children and education. (See item (4) under “The idea” above.)
Could be a way to combat Web abuse. In particular, a syndicated Web rating system could be used as a neutral, universally distributed protocol for publishing and sharing data about what websites are considered sources of viruses, spam, porn, and criminal activity. These problems — long considered the most serious of Internet problems — might be best attacked by widely distributing, decentralizing, and only then organizing the means to attack them.
If this analysis is correct, the idea could be deeply disruptive — but positively so.
How gaming the system could be combatted
What stops people from posting multiple feeds, all of them favorable to their own websites? Indeed, won’t gaming the system be far worse in this case than under the present system? At least under the present system, if you want to game the system, you must go to the trouble of creating interlinking “dummy” websites and spamming blogs with links, and so forth — that makes gaming the system relatively difficult. But this system makes it possible to influence search results directly. This might be why no such system has been created yet. That, anyway, is what a critic might say.
The solution is that most search engines will not be so silly as to aggregate the ratings in any simple way, or treat all feeds (or individual ratings) equally. First, it will be possible to “certify” and rate feeds; second, there will be internal indicators of abuse that search engine coders will be able to analyze and exploit.
It is entirely possible that a search engine will not use a feed if it is not in some way adequately “endorsed.” Endorsement might be via networks of certified feeds, which have a distributed protocol allowing network members to vet other feeds.
The internal indicators of abuse might prove to be more powerful, however. If a certain website is often described as “porn,” for example, and if it is recommended by a feed as non-porn, the feed registrar might discard that particular feed. More generally, ratings and descriptions will be mutually reinforcing in a variety of ways, so that it will be possible to devise algorithms to detect abuse automatically.
But probably the most effective way to combat system-gaming will be a combination of certifying feeds and internal data analysis. While it might be easier to post a feed in bad faith than to create a web of supporting websites, the data in the system itself will be far richer and thus capable of creating more powerful, creative solutions to the gaming problem.
Indeed, it seems entirely possible that we could, using syndicated Web ratings, engineer systems that are virtually perfect in their elimination of virus-ridden websites, porn, really bad blogs, and other stuff. At bottom, the combination of transparent, rich data and the fact that most Internet users act in good faith might mean the disappearance of cruft from our search results.
What if the system succeeds?
“But wait,” you might say, “I don’t like the idea that cruft will disappear from search results. There is something comforting about cruft being in our search results. That means that any schlub like me can get the ear of the whole world. Even if this Web rating system is distributed and decentralized, it is not really egalitarian. Wouldn’t it mean the effective silencing of people who are unjustly regarded as ‘not good enough,’ or not mainstream enough, to be rated highly?”
The short answer is: no, and in fact the effect might be precisely the opposite: it would probably empower the regular folks even more the current search system. Since meta-tagging would enable us to label our feeds in various ways, we could search for results that are important and relevant for our peers. Moreover, a syndicated Web rating system would allow us to pluck undiscovered talents out of the obscurity that Google’s popularity-based algorithm places them in.
Besides — if the new system has undesirable results, no doubt Google or a Google-like system, that does not use syndicated ratings, will still exist and still be heavily used.
This project should be developed openly
This effort should be developed openly in the free-for-all way that characterizes much open source development. This is absolutely required, in fact, because otherwise there will likely not be adequate adoption of the standard. The standard should be propagated by an open, neutral consortium, not any single entity, and certainly not any for-profit business. No single interest should have control over a standard that could be so consequential.
I have no interest in leading the effort, or even participating very much in it, except as a user. I am merely putting the idea out there and hoping that others, who have more experience writing standards and working with syndication, will be motivated to create the components of the system. My main concern is that the standard itself be adopted according to an open, democratic process, and not be unduly influenced by any single interest.
Questions and answers
Shouldn’t we discuss this idea and make sure it really is a good one before we rush off headlong to implement it?
Yes. Hopefully the discussion will happen on the Blogosphere, Slashdot, and elsewhere as well. I asked for comments on SharedKnowing, for what it’s worth. It’s a Big Idea and it would affect everyone online deeply, and so it needs a huge amount of vetting and exploration.
How would I create a Web ratings feed? I wouldn’t want to write XML by hand.
If the idea has legs, people will create free software that will write the XML for you, as well as post it automatically (i.e., syndicate it for anyone’s use). It is easy to imagine people writing toolbars like the StumbleUpon toolbar, which allow you to rate websites and provide other information about them, which info is then syndicated automatically.
Where would the feeds be posted?
Think of this on analogy with blog feeds. One could post one’s ratings feed anywhere online, where they could be found by webcrawlers. But one could also register the feed with various feed registrars (in the same way you register a blog feed with Technorati), or post directly to the registrars.
What might the markup for a rating feed look like?
We define a markup schema that allows people to declare whether they think a Web page, or a whole domain or subdomain, is high quality, or garbage; and to describe and evaluate it on any number of features. Just for example — we need not use these exact tags or features — we might write something like this:
<keywords>encyclopedia reference wiki free open content collaboration</keywords>
<description>A new wiki encyclopedia project inviting everyone to participate under their own real names, and making a special, low-key role for experts.</description>
What elements should be required by the standard?
It seems that search engines could be improved with just two officially “required” elements: the URL and a “yes or no” overall rating. This would allow, e.g., digg.com users to publish their ratings. It is possible that after further discussion we will decide that certain other elements might be needed. Of course, feed aggregators and registrars, and search engines, might require various additional pieces of information.
There is a difference between “high quality” and usefulness. Some academic papers, for instance, might be very high quality, but useful for only a very small number of people. How can this be taken into account in the ratings standard?
How could the system get started?
You might say this is an interesting idea, but how can it get started? Probably, if it happens at all, entrepreneurs will make it happen. The system would involve, in fact, at least four different new business types, namely (1), (3), (4), and (5) under “The idea” above, and existing social bookmarking websites might be persuaded to drive it forward as well.
Zittrain’s The Future of the Internet indicates that something like this is the natural next step. There is a natural progression of search “generativity”:
- The Yahoo! directory — proprietary, centralized directory
- Google — proprietary, centralized search
- Mahalo and Wikia Search — free, centralized search enhanced by human input
- Syndicated Web ratings — free, decentralized search enhanced by human input (with data support for dynamically created tagging and directory systems)
In short, this may the prototypical “idea whose time has come.” If enough people are interested, the support for a truly distributed project like this will quickly appear. But if people aren’t that excited about it, it will die a perhaps well-deserved death.
But another reason to be optimistic that the standard, once published, will be rapidly adopted and used, is the simple fact that there are so many people already engaged in rating and recommending websites, even though the ratings benefit only the other users of the websites. But how many more of us would actually take the time to rate and describe websites, if we knew the work would positively affect the results of all competitive Web search services? In other words, what if we knew that our vote would count? We’d vote!
Shouldn’t we simply pressure social bookmarking websites to work on a standard and use it to publish their data?
It couldn’t hurt. If we should target any websites for such pressuring, it should be those that are already sympathetic to the ideals of the open source community. Go to work on them. Of course, many will prefer to ignore this idea, because it is profoundly disruptive.
I support this idea and I want to make it happen. What should I do?
Here are some things you could do:
- Write about it. Debate about it. Help build the critical mass of people interested in the idea.
- Join forums that are discussing the idea, and work toward a shared understanding of what the standard should look like.
- If there is support for the idea, eventually someone will set up a wiki to work on the standard. Then you could help work on the standard.
- Start writing software, or adapting your current software — preferably, free software — to do the things listed under “The idea” above. Then announce your software and get other people working on it. Standards are often developed alongside applications that use them.
The idea is loose…and it’s up to you and the innovation commons in general to make it happen, if it’s going to happen.