Quick Links
Interpreting Web Statistics
The Basics: Server logs and log analysis
Every time a file is requested from a Web site, the server keeps a record of it. This information gets stored in log files that contain not only a record of which pages were requested at which times, but also a good bit of information about the computers and systems that make the requests.
One visitor viewing a single document will create a log entry like this:
This shows what page was requested and when, where the visitor came from, and even what browser and OS they used. The DU web server records hundreds of thousands of these entries each month. So there’s not much of interest you can learn just by looking at the raw log files.
That’s where “log analysis” software makes a contribution. With a good log analyzer, we can read patterns in the visits. The software uses the internet address of the visitor, called the IP, to track successive clicks on the Web server. We can learn some things about where visitors enter and exit the server. But as with all statistics, assumptions and misinterpretations can easily lead to misinformation.
So before you use these metrics to score points with the boss, take the time to understand how the numbers are derived. It’s also useful to factor metrics into a broader usability assessment of your site when considering changes.
Shortcomings in all log analysis
As you look at the statistics, it’s important to understand that they could easily be off by significant percentages.
The inaccuracy isn’t because AWstats is an inferior product, and it’s not the result of setups on our system. According to Doug Linder who wrote “Interpreting WWW statistics”, the inaccuracy of the numbers is simply a byproduct of the way the Web functions. Even the most technically advanced log analysis provides only a general idea of the amount and nature of the traffic on a Web server. There are several reasons why this is true.
- Caching: Most browsers are setup by default to save once-requested Web files to your computer for a period of time, typically 30 days. Then when you make a second request of that same file, the browser can retrieve it from your own hard drive rather than making a request over the Internet. This saves a lot of load time. But it’s rather easy to see why it would distort Web stats. For this reason, visits should be assumed to be under-reported, but there’s no way to know by how much. Caching would also impact entry and exit numbers. Imagine a visit that starts with a cached version of the DU home page, but then quickly requests a page never before visited. According to server logs, this second page counts as the visitor’s entry to the site.
- Internet addressing: Server logs track host addresses, not people. (A host, or IP number, identifies a computer’s internet address.) The problem is that many networks assign addresses dynamically (i.e., a computer gets a new address each time it connects to the network). Additionally, sometimes many users connect from the same host: either from the same company or ISP, or from the same cache server. As a result, log analysis can never accurately count visitors.
Key issues in DU log analysis
The central Web server only logs what the visitor does on that one server. With 30-some Web servers across campus, single-server metrics provide only part of the picture. For example, if a visitor clicks a link for Campus Calendars, that’s not included in the log for the central Web server because the calendar system runs on its own server. If the next click is for more information on a departmental page, the log for the central server registers that page as the point of entry for that visit. Obviously, a holistic view of where a visitor enters the DU Web, how long they are DU focused, and where they exit cannot be construed from these metrics.
In spite of these limitations, the log analysis can still offer the savvy Web manager a valuable tool in the pursuit of usability.
Appropriate use of DU log analysis
Compare the same numbers over time. This is the apples-to-apples theory of statistics. While it may be misleading to say there are exactly X number of visitors entering the DU Web on your home page, it is realistic to say there are X more this month than last month. The caveat here is that the numbers must originate from the same software.
Trends in data over time can provide useful feedback. But they can still be misleading if you don’t apply some critical thinking of your own. Always look for those hidden variables, like changes made on another site linking to yours.
Test hypothesis. Let’s say you have a hunch that the wording on a menu item might be sending visitors down the wrong path. Look at the metrics for the page behind that link. Are the exit numbers significantly larger than other pages? If so, your next step might be usability research with test subjects. Or you might try rewording the item and checking the numbers after another week.
Interpret sizeable differences. Did this month’s news story draw more traffic than last month’s? Is the entry number only slightly smaller than the viewed total for a particular page? That could indicate that visitors bookmark this URL, or it could mean significant links from other servers. If you are reorganizing your site, these are critical considerations for usability.
Look for unforeseen issues. When you browse the statistics for your site think about the numbers you see. Are there any glaring differences? If the average size of a page stands out among the others, you might want to explore the reasons and/or take actions to reduce the load time.
Glossary
Unique Visitor
AWstats, like other log analyzers, counts a unique visitor as a host that has
made at least one hit on one page of the DU Web server during the current period
reported. If this host makes several visits during this period, it is counted
only once.
“Host” here is defined as a computer’s network or internet address, typically referred to as an IP (Internet Protocol) number. Because many networks assign addresses dynamically, (i.e., a new number is assigned to a computer each time it connects to the network), this statistic should never be interpreted as an accurate representation of people.
Visits
Number of visits made by all visitors.
Think "session" here. When a unique IP accesses many pages on the same server without an hour between any of the requests, all of the "pages" are included in the visit. Therefore you see multiple pages per visit and multiple visits per unique visitor.
Pages
Number of "pages" logged.
Pages include HTML, PDF, Word documents, PowerPoint, and other such files. The count does not include images, style sheets, flash, or other page elements.
Hits
Hits lump everything together -- all files requested from the server. This
includes all images, style sheets, and other supporting files, as well as
files that are "Pages". So the number here will be significantly
larger than the page count.
Bandwidth
Total number of bytes downloaded for all files.
Entry Page
First page viewed on the central Web server (agora) by a visitor during a visit.
This should not be interpreted as the first page visited on the DU Web site,
as there are many other Web servers on campus from which this visit might
have started.
Exit Page
Last page viewed by a visitor during a visit. Remember that any click to another
server on the DU network will register as an exit. With 30-some Web servers
on campus, a high percentage of exits can be more related to the type of
links on a page, than the failure of a page to hold the visitor.
Session Duration
Time a visitor spent on the central DU server for each visit.
Some visit durations are "unknown" because they can't always be calculated.
Here are a couple of reasons for this:
- Visit was not finished when the automated "update" ran (Updating happens
four times each day).
- Visit started the last hour (after 23:00 ) of the last day of a month.
HTTP Status Codes
HTTP status codes are returned by web servers to indicate the status of a request.
Only the codes used to tell the browser that a document CANNOT be viewed
are listed here. These codes are not included in the viewed pages count.



