When I was down at my place of work this week we had a discussion over lunch about how many Web ‘sites’ there are on the Internet. This question is a lot trickier to answer than you might think. There are a lot of things that need to be considered and factored.
One huge problem is the definition of a Web site.
If it were as easy as “each registered www DNS name is a Web site” then it would be pretty simple to work out how many Web sites there are; or would it? Depending on who’s data you believe, at the end of 2009 the number of registered DNS web addresses ranged between 155 million and 200 million. So, splitting the difference, for the purposes of this exercise, let us agree that there is some where around 177.5 million registered www DNS names in the global DNS.
But many of these names are either dead ends or abandoned. This means that they do not link to a working Web site/page, or if they do, the Web site was long ago abandoned by its creator—but it might still exist out there in the Web universe because the hosting company has not yet taken it down.
How many sites have you come across where the last entry was sometime in 1999, or 2004, or at least two or three years ago? This often happens to me, but then I do search around the Web a lot.
So how many are dead ends and how many are abandoned? What a question. How would anyone ever be able to work this out? Well some Web stats sites have tried to estimate the answer. Basically they have decided that about half of all registered DNS Web addresses are most likely either dead ends, stale, or abandoned.
So, discounting 50 percent and taking our 177.5 million starting point, that leaves us with about 88.75 million active and working Web sites around the world.
Or does it? That was just too easy.
The answer is no. This is no where near the final answer. What about Web sites that are hosted by a hosting site such as SquareSpace, WordPress, Tumblr, Blogger, BlogSpot, Yahoo, FatCow, iPage, justhost, HUB, eziHosting, GoDaddy, and thousands of others, that use internal DNS-style tables to map into an internally hosted Web site? Sites such as (found at random):
- Tiffany Blues at tumblr (here).
- Susan’s Musings at WordPress (here).
- One Cool Thing A Day at SquareSpace (here).
- Stocking Girls at BlogSpot (here).
- Blue Sky Photography at WordPress (here).
- Just Boobs at WordPress (here).
They don’t have a global DNS entry. Do we count these as Web site front pages? And if we do count them then how on earth do we work out how many of them there are?
One interesting statistic I found about hosted sites, mainly ‘blogs’, is that 80 percent are abandoned within a month of being created.
There are lots of different statistics for the number of blog Web sites that can be found on the Web. I guess the answer is that nobody really knows how many blog Web sites there are, and then how many of them are ‘active’. Wikipedia says there were 112 million blog Web sites in 2008.
BlogPulse, which indexes and tracks blog Web sites specifically, reckons there are 126 active blog Web sites. This number seems reasonably likely based on Wikipedia’s 112 million in 2008 and factoring in that 80 percent of new blog sites are abandoned within a month of being created.
So if we add our 88.75 million working www DNS entries to our 126 million blog Web sites, we now have 214.75 million ‘front pages’ we can link to.
But that’s the easy part. Now we get to the tricky bit. What about front page links into sites like Facebook, MySpace, FourSquare, Yahoo, Live, Bebo, Flickr, LinkedIn, Twitter, PeekYou, and hundreds of other less well know ‘social media’ sites all over the globe? None of these are in the global DNS either. Should these be counted as front page links? Probably so. And how many of these are there? Following are some randomly selected examples:
- Pet Shop Boys page at Facebook (here).
- Beach Boys page at Facebook (here).
- Katy Perry page at Facebook (here).
- Networking in High Heels page at Facebook (here).
- Leo LaPorte’s page at FourSquare (here).
- John C Dvorak page at PeekYou (here).
Social pages on Facebook and all the other social sites suffer from the same problems as hosted blog Web sites. They can become stale or get abandoned.
The trick with these social ‘sites’ is that they don’t actually exist until someone goes to the link which activates the page builder engine. When someone goes to the link the page is constructed programmatically on-the-fly by a Web site engine using data and instructions stored in the database. All pages of the ‘site’ only exist temporarily as they are accessed and displayed on the user’s screen. Any updates made are made to the database data, not to an actual Web site. Every time you go forwards or backwards through the ‘site’ the required page is reconstructed again, and again, from the database data.
It is hard to count Web sites that don’t exist :-)
Facebook recently celebrated its 500 millionth user joining up. I posted an item about it (here). In this item I pointed out that while Facebook may well have 500 million unique user accounts allocated in its database, what they do not publish is how many of these are stale (no activity for more than six months) or abandoned (no activity after the first month of being created).
In my posting on this I suggested that I would be surprised if even 300 million of the 500 million Facebook pages registered in their database were ‘active’. Looking back, based on the 80 percent abandonment rate for Web blog sites, I think that this was a very optimistic estimate and I am now revising it down by another 50 million to 250 million active pages.
If we then add 50 million back onto this number to adjust it up to take into account all the other social sites besides Facebook we end up with 300 million.
Each of these user accounts is a potential ‘starting page’ Web site. When someone goes to one of these, be it the user themselves, or someone else, the Web engine will instantly build the required site so it can be displayed. Hence each account is a potential Web starting page.
Adding this 300 million to our previous 214.75 million we now have 514.75 million ‘front pages’ across the Web that we can link to. Rounding up to the nearest quarter of a million (why not?) that would give us 525 million active ‘front pages’ on the Web.
So that’s my figure then—525 million.
Note that I am not counting Web sites on internets internal to companies and businesses; on their so-called Intranets. This 525 million relates only to the public Internet.
This number should not be confused with the answer to the question “How many Web pages are there”? The answer to this question is going to be billions, if not trillions. The number I have come up with here is the number of ‘front pages’, or, if you prefer, top-of-site pages, or starting point Web pages.
525 million seems like a good number. I am happy with it.