Hit Stats Rant



by your sysadmin

here's a rant about web stats and why i'm likely to look at you with withering scorn if you ask me how to get yours.

people ask me "how do i find out who's hitting my site?" i say, "ask them." they say, "don't you have some wonderful esoteric piece of technology that can do it for me?" i say, "not really." here's why:

hits are not a meaningful counter of anything. if you have images on your page, each one counts as a hit. so, a typical front door page with 4 images actually generates 5 hits in the log every time someone looks at it.

did i say every time? well, not really. see, netscape and some other browsers have a cache system in place. caches are really cool - they store stuff for you locally so that you don't have to reload it over the network all the time. it's a great idea - reading off your hard disk is MUCH faster than communicating over a network. however it also makes your hits count even more meaningless. some people have non-caching browsers or browsers with small (or disabled) caches. this means that every time they visit your front door page, you rack up 5 more hits. so if they just go one link in and then back up, that's at least 6 more hits (more if the second page has images).

(let's not even get started on "server push" - the netscape feature that gives you those cutesy animations... you don't even want to THINK about how many hits that generates!)

if you get 4000 hits one day and 3000 the next, that doesn't necessarily mean that fewer people were browsing your site - they might have just been visiting pages that had fewer images. or maybe more of them were using caching browsers with bigger caches.

the next thing to try is finding out who's hitting you based on the machine they're coming from. this is also useless because sometimes one machine can appear as many and vice versa. for instance, all the big service providers - AOL, prodigy, that lot - they've probably got a dozen or so machines actually talking on the net, but how many distinct users does that represent? hundreds, thousands... who can say? then there's people who use TIA and SLIRP... they don't even have an IP address.

all this adds up to: meaningless data. and as computer dweebs are so fond of saying, "garbage in, garbage out."