Brief post on Wikipedia’s relationship with google. Note this is a personal-capacity, knowledge-may-be-limited, not-a-real-programmer thing.
How does Wikipedia get its google hits?
Genuinely? Good, pertinent content. On the things people usually search for we’re pretty good. I can’t comment on Google’s search algorithm,1 but I’d imagine the general quality of our stuff has a lot to do with it. It’s a self-fulfilling prophecy that gathers dirt as it rolls: people look for information. If it’s about something prominent, Wikipedia covers it. If Wikipedia does not cover it, a subset of [people] will contribute to it. If [subset] contributes to it, Wikipedia covers it – or covers it with higher quality. If Wikipedia gets a reputation for covering things to a high standard, people are more likely to trust Wikipedia links, which means they’re more likely to click on them/link to them, and if they find a problem a subset….you get the picture.
Not except in the sense that we do SEO right. We have deep links, we have a ton of content, we have a trusted brand for pretty much any general knowledge subject. This is what SEO is meant to be about, and in that respect we do it well. But it isn’t something we intentionally went into – deep linking, large amounts of content and a trusted brand are positive and necessary offshoots of putting the sum of the world’s knowledge before the eyes (and pens) of humanity, but exist in our site model for that purpose and that purpose only – and it isn’t SEO in the “black hat” sense of the word.2
Amusingly, Wikipedia’s external links automatically have the “nofollow” tag added…meaning that people engaging in grey or black hat SEO work who seek to capitalise on the aforementioned prominence of Wikipedia are wasting their time and energy. Still, we appreciate the contributions!
If organisations are interested in taking advantage of Wikipedia’s prominence to boost their own traffic, they should probably take advantage of Wikipedia’s model: find the article about your org or person or thing, make it of high quality, built it into the web of internal links that is Wikipedia, and include a pertinent external link to your site/forum/whatever. It’s far more useful for us, far more useful for you, far more useful for the reader, and it saves you from embarrassing media stories and permanent monitoring on one of the most prominent web properties on earth.
Okay, so not SEO. Uhm. What about some kind of agreement with Google? They give you money, right?
This is complicated, but the short answer is “no to the first question, yes to the second”. The long answer:
- Yes, Google has given us money. But it seems confusing that they’d give us money and then also bias their search results towards us – kind of a tit-for-tit-for-no-tat thing there. And the fact that they’ve given us money certainly doesn’t bias us towards them – if we found being biased or under the influence of donors acceptable we’d just have set up advertising long ago. Some calculations I’ve seen have us making 10 times our annual budget if we put up ads, and all the people in Fundraising could retire happy.
- But, ignoring that – what kind of influence would 2 million dollars even buy? The answer is “not much”. Those 2 million dollars are less than 5 percent of the money we hope to raise this coming year for our movement-wide budgets – money we hope to raise from millions upon millions of small donors, just as we’ve done every year. And, just as we’ve done every year, we hope that those small donors will be the vast majority of donors. Don’t get me wrong, it’s great to have people show up and offer you an anonymous briefcase of money,3 but Wikipedia is the world’s tool. It should be owned by humanity as well as used by it.
- Having said all that: we do have a “special arrangement” with Google. But it’s absolutely nothing to do with search prominence. The mechanism most search engines use to update their cached copies of pages is web spiders: automated bots that squirrel around the internet nabbing copies of pages and building them into one big, constantly updated database the search engine could refer to. The problem with doing this for the Wikimedia projects is, well…self-explanatory. We’re encyclopaedias, archives, quote books and dictionaries that anyone can edit, and our flagship project has 10 millionpages, including disambiguation pages and redirects. It goes up to 26 million if you factor in non-articlespace pages. Other projects aren’t as big, but we’ve got around 800 of the things – they don’t have to be very big individually to be a colossal pain in the arse for web spiders. Then you factor in the sheer volume of linking, the fact that our projects are editable by almost anyone, and the fact that updates could come to any page at any time and be changed back within a second, and you get what is known as a headache.Google’s solution to this, as I understand it, has just been to set up a monitor at the recent changes feed, which is a constantly-updated feed showing the most recent revisions to every page, as a way of knowing what to point their spiders at. This is pretty much unique to Wikipedia (I had a devil of a time working on some features because of how they run things) but it shouldn’t bias search results, and it exists exclusively, as far as I’m aware, to solve their headache rather than up the stress of our network engineers with oodles of pageviews.
- Last but not least; if Google are biasing their rankings towards us, they’re doing a pretty shitty job of it. For the organisation that redefined search for the 21st century they’re having their arses handed to them by Bing.
Ah. So it really is just decent content and inter-page linking.
Pretty much. If you’re interested in getting involved in the only site on the internet that works in practise but not in theory, and you’ve got an employer/company/whatever that thinks improving Wikipedia’s coverage of them would be beneficial, there’s a quick-and-dirty guide here. Happy editing!
Update: Liam Wyatt, awesome gentleman that he is, has also pointed out that google helped fund Wikidata – a project that makes semantic web people squee, but not so much readers. Directly, anyway.
- because I know absolutely fuck-all about it. I’m the Enwiki Whisperer, not the oracle of delphi [↩]
- As an aside: if you are reading this and you are one of the prize eejits who emails the 5th largest web property on the intertubes to ask if we want to give you money to increase our search ranking, I have your money here. It’s the thing shaped like a sock with a snooker ball in it. [↩]
- if anyone has one of those briefcases lying around, I am happy to take it off your hands. [↩]