![]() |
Table of Contents Introduction Imagine running a traditional bricks and mortar high street store and not knowing how much you were selling, or how many customers you were getting through the doors. Inconceivable, isnt it? For decades, the high street retail world has been using data gathered about its customers to target marketing messages and ensure that the most popular product lines are always kept in stock. Only the means of gathering the information has changed over they years, and today these databases of customer information are amongst the most prized possessions of the retail giants. If you think of your Web site as an electronic store front or advertising hoarding, you can see how similar information would help you make your site more effective. Whether or not you are actually selling anything direct to the end user, you need to ensure that you are attracting the right sort of visitors to your site, and that the on-line experience they have while they are there is nothing but positive. Amazingly, the means to do that is readily available to everyone who runs a Web site, whether they host the site themselves or have someone else do it for them. The key is the Web server log file, which contains a wealth of information if you know how to get at it. Analysing the data in the server logs can provide vital information such as who has visited the site, what the peak demand is, which are your busiest days, what are the most popular downloads, and how many people give up before completing that user registration form you spent days developing. It may quickly become apparent that you are spending 70 per cent of your time developing a portion of the site that is only of interest to 5 per cent of your visitors. Valuable information indeed. Anyone who has ever looked at the log file produced by their Web server will realise that this information is not easily gleaned from the mass of seemingly random URLs and IP addresses contained therein. This is where Web traffic analysis software comes in. This software often inexpensive and easy to use distils the information on the Web logs and presents it in easy-to-read reports and graphs. A clear Web strategy is required to get the most out of your Web site, and traffic analysis can help you get it right. For example, what do you know about your demographics? In-depth analysis can tell you in general terms what nationalities frequent your site the most; by what route they enter and exit your site; which pages they visit while they are there and how long they spend on each page; what they were searching for in Yahoo just before they entered your site; and which search engines or banner ads are generating the most traffic for you. If you can determine which area of your site is the most popular, you can ensure that navigation in that area is made as smooth as possible. Or perhaps you can attempt to improve the content in other areas and then link to them from the popular pages. Log analysis can also tell you how well your site is handling traffic. It is critical to pinpoint your site's peak hours and days if you want your server to handle peak demand without turning people away. All of this information can be provided by Web traffic analysis software. As part of this competitive analysis, we looked at the latest versions of WebTrends Log Analyser from WebTrends Corporation, and WebSuxess from German-based Exody. WebTrends Log Analyser is one of the best known Web log analysis products on the market today, offering a wealth of features, excellent reporting and simplicity of use. A Professional version is also available offering additional features at extra cost. Installation is as straightforward as it is possible to be, and the documentation provided as hard copy in the box, which is a welcome change these days is excellent. Configuration Actually, for most people, reference to the documentation will rarely be necessary, since operation of WebTrends is fairly intuitive if you are willing to accept the default settings. It is only when you need to delve behind the scenes to tailor how WebTrends reports on your log data that things can get difficult. The first step is to create a profile via the Profile Wizard, which determines the basic settings for a given batch of log data. Clearly, the first requirement is to define the location of the log file (or files, since profiles can consolidate data from multiple log files if required). Unfortunately, WebTrends almost falls at the first hurdle here, since it is not particularly easy to specify a log file location. Logs can be retrieved from a local drive, ODBC database, or via FTP or HTTP. If you use FTP (a popular choice for retrieving log files unless you have your Web server drive mapped as a network drive, which is unlikely) then you have to enter the full directory path to the log files from the Root. Even though there is a Browse button available, it does not allow you to browse through directories, only select files that are in a directory you specify. This is much more convoluted than the WebSuxess approach, which acts like a full FTP client when browsing for your log files and makes life much simpler. WebTrends is also not as flexible as WebSuxess in its handling of multiple log files. Wild cards and date macros can be specified in the log file location in order to consolidate information from a range of logs into a single profile, but all those logs must reside in the same physical location. This may not affect too many people, but it caused us a problem in our tests. We were rolling over our log files and moving them from our Web server to a local network drive on a regular basis. Thus, if we wanted to report on a range of log files up to the current day, we would need to consolidate a number of logs from the local network drive plus the current log on the Web server itself. This is simply not possible with WebTrends the files need to be retrieved all from the network or all via FTP a mixture is not allowed. By default, the log file format is automatically detected, though the format can be specified from a large number of choices in a drop-down menu if required (including Microsoft IIS and Site Server, Netscape, Apache, CERN, NCSA, O'Reilly, Lotus Domino, Oracle, Open Market, IBM, and Novell, amongst others). Extended log formats are supported to provide additional information such as referrer and browser. Also specified is the DNS resolution mode, the home page URL, and any filters. DNS resolution mode specifies whether WebTrends should use the data from the log file only or attempt to resolve the DNS names. The latter option can provide more useful analysis, but can also slow down the reporting process. However, it is possible to make use of a built-in DNS cache in order to reduce lookup times on subsequent reports. Filters can be created to specifically include or exclude individual pages based on return code, directory, file, referrer, entry page, data and time, URL parameter, authenticated user name, user address or country, multi-homed domain, advertising, browser, or cookie. Selecting any of these brings up a tab list entry where you can enter the parameters for the filter. For instance, exclude by user address and you can remove all log entries from a particular domain or IP address from the analysis (you would probably want to exclude hits from addresses on your own internal network, for example). Very simple to do. However, you do need to spend some time to get these right if you want the most accurate results. For instance, WebTrends does not automatically exclude such objects as FAVICON.ICO or ROBOTS.TXT from the analysis. The FastTrends database is a useful feature that allows you to improve the performance of report production by saving the analysis data against a particular profile so that reports can be run again and again. This also provides the opportunity to use the Real Time Analysis option (although this is not really real time - just an automated scheduling option). Here, log files are analysed at regular intervals via the Scheduler, and any new activity is stored in the FastTrends database. Then, whenever new reports are run on that profile, it always has the latest information locally. Note that this does not prevent the entire log file from being downloaded each time a report is run, it only removes the need for earlier parts of the log file those which are already stored in the FastTrends database to be re-parsed. Log file download is controlled via a separate setting. It is possible to force WebTrends to download the log file every time a report is run against it, or it can download once per session allowing multiple reports to be run against the same log file (discarding it when the session ends). This is useful when doing multiple what if analyses, although the report generation is slow enough to make this frustrating even without having to download the log file each time WebSuxess is better at what if scenarios with its interactive GUI reporting interface. Alternatively, it is possible to have WebTrends determine if any changes have been made to the log file, and only download it when new activity is detected, although on a busy Web site this can have the same effect as download every time. Reporting and Analysis Reports can be run on demand via the WebTrends interface, or automated and run at regular intervals via the report scheduler. This runs in the background either as a regular program or as an NT Service, and all that is needed to schedule an event is to specify the analysis profile, a start time and date, a repeat frequency, and the output options. It is also possible to create pre-processing and post-processing options for comprehensive scripting support (such as running a batch file to archive and delete old logs after analysis). A comprehensive scheduler log provides a detailed history of the day's events, and the scheduler also provides built-in browser based remote access capabilities. This allows users to create and monitor scheduled events from a remote location via a standard Web browser. Whether run from the normal GUI interface or via the scheduler, WebTrends reports are extensive, easy to read and highly customisable. There are a huge number of standard report sections available, including:
Help Cards can also be included in each report. These briefly explain the terminology used in each section of the report, providing helpful suggestions for using the information contained in the report. The aforementioned sections are combined in various ways to make up the pre-defined reports, with all of them appearing in the rather comprehensive Complete Summary.
Custom reports can be created by simply checking and unchecking the various components in the hierarchical tree display to include or exclude the various sections. The tree can be expanded to show the components that make up each report section too, allowing you to remove or include individual graphs and tables if required. It is also possible to specify the type of graph and the number of elements to be incorporated into each graph and table (i.e. top 20, top 50, and so on). The default language can be selected, as can the report style, which is chosen from a wide range of available colour and layout themes. The style wizard provides the means to define your own set of colours, fonts, tables and logos used in HTML reports. WebTrends will normally report on the entire log file, though it is an simple matter to select a specific date range, or to choose from a number of easy-to-use pre-defined date ranges such as yesterday, last week, this month, and so on. The output format defaults to HTML, but other supported formats include plain text, Word document, Excel spreadsheet or comma-delimited file. The two Microsoft formats require the appropriate applications to be installed on the WebTrends machine in order to perform the appropriate conversion. The final customisation option is the distribution method. The normal option would be to save as a file (or group of files, in the case of the HTML report) on a hard drive, but it is also possible to e-mail the report or publish it directly to a Web or FTP server via FTP. Once all these options have been selected, the customised report can be saved for reuse either via the GUI or the report scheduler. Unfortunately, actually running the report is the most painful part of using WebTrends because it is so slow. It seems to take forever to create each report even when you are not downloading the log file, although at least if you make use of the DNS cache and the FastTrends database times of repeated runs can be kept to an absolute minimum. The reports can also be quite large. As HTML output, the Complete Summary report comprises 59 objects and takes up 1.74MB, although the size can be reduced if you reduce the number of sections or eliminate the graphics.
Contact:
WebTrends Corporation WebSuxess is a Web log analysis product from German-based Exody which enables an organisation to monitor e-commerce success by analysing and reporting on the behaviour of visitors to its Web site. Exody also produces a complimentary product called ShopSuxess for professional analysis of on-line shopping behaviour. ShopSuxess precisely analyses strengths and weaknesses of on-line shops and provides valuable suggestions to improve the shop and its "Return on Internet." As with WebTrends, installation is extremely straightforward and there is some excellent hard copy documentation provided in the box full marks to both vendors for this move. Configuration and running of WebSuxess is very straightforward. Although it does not take the wizard approach of WebTrends, the basic idea of creating a profile and running reports against it is the same.The profile definition is accomplished via eight dialogue screens, each of which has an excellent help screen associated with it. The General dialogue allows you to name the profile and set various parameters governing how the log file data is processed, whilst the Internet Address/Server Name dialogue allows you to define the root Web address of the site (so WebSuxess can determine which are internal and which are external links).
One check box determines if the raw host data is to be used from the log file or if DNS lookups are to be performed on IP addresses. A DNS cache will store DNS names so that lookups do not have to be performed again on subsequent runs. Other check boxes provide support for virtual servers (multiple Web servers in the same log file) and BDA presence (where an Internet presence is distributed across several servers, resulting in multiple log files for the same site). The Log File dialogue provides the means to specify any number of log files to be included in a single profile. These can be located on local or network hard drives, or can be retrieved via FTP at the time the profile is run. The nice thing about WebSuxess is that it is possible to combine local and FTP files in the same profile (thus allowing a single profile to cover archived log files stored locally right up to the most current data which is still on the remote Web server). It also provides an excellent FTP browser, so that inexperienced users can browse a remote FTP server to locate the log file(s) they require. WebSuxess can handle Combined logs, or separate Access, Referrer and Agent logs if required. A range of different log file formats are supported, including Apache, Lotus Domino, IIS, and Netscape, amongst others, and WebSuxess also provides detailed analysis of streaming media logs from NetShow, Windows Media Server and RealServer. The appropriate log formats are detected automatically, and compressed log files (.ZIP and .GZ) can be handled without unzipping first. Another nice feature of WebSuxess is the ability to compute visitor paths for accurate navigation analysis. This tracks a visitors movements through a web site page by page, allowing the Web administrator to determine if site navigation requires tuning. This option can be performance- and memory-intensive with large log files, and so can be disabled if required.The Scheduler tab enables you to run an unattended analysis by selecting a start date and time, a repeat cycle and a target destination for the finished report (a local file or a remote FTP directory). Unfortunately, there is no background scheduler service available - WebSuxess itself must be running for the scheduler to be active. The scheduler log file is also hidden away in the WebSuxess directory rather than being accessible from the program itself. The Interval dialogue enables the analysis to be restricted to a portion of the log files determined by a start and end date. Dates are selected from a neat little drop down calendar display, but it would be nice to see some plain English pre-defined ranges such as Yesterday, Today, This Week, This Month and so on as provided by WebTrends. File Type Filters allow certain file types (i.e. JPG or GIF files) to be excluded from the main analysis so they are not counted as hits. More general purpose Filters can also be defined, to specifically include or exclude pages based on various selection criteria such as error code, page ID, visitor, HTML parameter or ad clicks (referrers). The final dialogue tab in the profile is an extremely powerful analysis feature of WebSuxess. Page Comments provide an editorial view of the Web site based on how you would like to see your pages grouped. For instance, you could add a comment of Download Documents to all *.DOC and *.PDF files on your Web site (you only need specify the wild card WebSuxess applies the comment as needed). It is then possible to group some of the WebSuxess reports by these comments, totalling the hits and other statistics to give a more succinct, higher level view. Unlike WebTrends, each profile represents a fixed relationship between a set of report parameters and a specific download of a log file. So whereas you create a profile called NSS Group in WebTrends and then run reports against it time and time again (each new report usually triggering a log file download unless told otherwise), a profile in WebSuxess causes a single download and parse of the log file, following which any number of reports and queries can be run against the data retrieved. If you want to trigger the download of a more current version of the log file then it is necessary to create a new profile (which is merely the work of a couple of mouse clicks, since it can be based on an existing profile). Whilst this may seem a little strange, the net effect is basically similar to the FastTrends feature of WebTrends, since once the data has been downloaded it is stored permanently, allowing subsequent reports to be run against it as many times as required. The differences between WebSuxess and WebTrends here are thus more to do with terminology than functionality. Once older profiles are no longer required, they can be deleted, and any associated log files (only downloaded copies not the originals, of course) are removed automatically. Reports can be run on demand via the WebSuxess interface, or automated and run at regular intervals via the report scheduler, and without a doubt, the reporting and analysis is the real strong suit of WebSuxess For a start, it is not necessary to go all the way and produce an HTML report. Instead, WebSuxess provides a superb interactive graphical interface as the first stage of report production. Select a profile and click on OK and WebSuxess retrieves the log file (if it has not already done so existing profiles will use the previously stored data) and parse it to produce an interactive on-screen report directly in the WebSuxess interface. As with WebTrends, the WebTrends reports are extensive, easy to read and highly customisable. There are also a huge number of standard report sections available, including:
At first glance it may appear that there are fewer reports available in WebSuxess than WebTrends, but this is not actually the case. Each report view can be sorted on any column in the report instantly in ascending or descending order - simply by clicking on the column heading. So, for instance, instead of having fixed reports for Top 20 Page Views or Least Requested Page Views, WebSuxess reports on the entire log file (not just the Top 20 or 50) and a click of the mouse switches from most popular to least popular pages. Each view also has an accompanying graph or chart, and the style of graph can be selected from a drop down menu (bar, 3D bar, line, ribbon, pie, and so on). Right clicking on the column headings also allows the columns to be removed or included by clicking on the appropriate check boxes, and the order of the columns can be changed by simply dragging and dropping them. Custom grouping of the pages can be achieved using Page Comments, and dynamic temporary (i.e. not saved with the profile) filters can be applied at any time by simply typing the search criteria in the Filter box and clicking on the Filter button. Right clicking on any page entry on screen brings up a menu from which the page can be quickly and easily included or excluded from the profile, or have a Page Comment assigned for inclusion in a custom group.
In other words, an almost infinite number of on-screen reports can be created in various orders and with various graphics, and each report is created instantly, since it is not necessary to re-parse the log file after each change (unlike WebTrends). This makes WebSuxess the most flexible Web log analysis tool we have seen. The only report section we would deem to be missing from the analysis is something similar to WebTrends Technical Statistics report, that lists total hits, successful hits, failed hits, cached hits, client errors and Page Not Found errors. It is possible to recreate such a report by clever use of the filters in the profile settings, but this would then necessitate running two reports to acquire all the necessary information. It would be better to have this data readily available from the main report if possible. The same could be said of the Most Download File Types report in WebTrends, which gives a useful indication of which of your *.DOC or *.PDF files are most popular, and the Dynamic Pages and Forms report, which shows which of your interactive pages are accessed the most frequently. Again, these can be duplicated quite easily by filtering in WebSuxess, but a standard section in the HTML report showing the same information would be nice perhaps we could see these in a future version please, Exody? The attraction of the what if analyses that can be effected with WebSuxess is increased by the fact that for any changes in the report format, re-parsing of the log file is not necessary. When re-parsing is necessary (say when a group of pages has been excluded from the profile and you need to re-parse to reflect the new page hits without those pages) then it is incredibly fast. This provides the means to fine-tune a profile by repeatedly making minor changes and examining the results before finally committing them to a finished profile. This is simply not the case with WebTrends. Given the fact that it is necessary to re-rerun a report (and thus re-parse the log file) every time any change is made either to the profile content or the report format, what if analysis is impractical with WebTrends. The FastTrends database can improve matters if many reports are to be run on the same set of data, but the initial run when the FastTrends database is populated is even slower than a normal WebTrends report run. Even with this feature enabled, it is significantly slower than WebSuxess on subsequent runs.
Of course, once the analysis has been completed most people will still want to have the results presented in some form of permanent report, since the only way to view the initial set of statistics is to run WebSuxess itself. Naturally, WebSuxess allows you to create a permanent HTML report, but once again it takes things one step further. The HTML pages are generated to present the image in a similar manner to the interactive analysis screen, with a spreadsheet-style display in the bottom half of the screen and a corresponding graphic in the top half. However, rather than make each report section a static entity like the WebTrends reports, WebSuxess allows the end user to sort on any column by simply clicking on the column header within the browser. As the report is re-sorted in the required order, a new graphic is displayed depicting the corresponding data for the column selected. This is an extremely powerful facility, and extends the advanced analysis capabilities right down to the end user. Of course, all these different views on the same data makes the WebSuxess reports rather large in size. There are a number of pre-defined reports in the system, and if you select the Complete Summary which includes every section and allows sorting on every column, then the resulting report comprises 287 objects and takes up a whopping 7.19MB. If you are hosting a number of sites as a service provider then running reports of this size for each of your clients would soon eat up your available disk space. Likewise, publishing the finished report via FTP is no small matter. However, it is possible to customise these reports by creating a new report template and selecting each section required and which columns within those sections should be capable of being sorted dynamically. We created our own template that included every available report section, but just allowed sorting on a single column (either page views or visits depending on the section), and this report comprised just 92 objects and 900KB, which is smaller than the fixed reports offered by WebTrends. Thus you have the capability to treat each Web site under your control differently, perhaps providing fixed reports for the Intranet sites, and a complete summary with all the sorting options for the main corporate Web site. One area which is not customisable in WebSuxess is the report style. It is not possible to change the layout, colour schemes or fonts as you can with WebTrends. You are thus stuck with the format provided by Exody which although being perfectly clear and easy to read obviously does not allow you to impose a corporate style. All in all, however, WebSuxess provides an excellent combination of power and flexibility, whilst remaining simple to use. Contact:
Exody E-Business Intelligence UK Ltd In testing, we used a 150MB log file covering a period of 116 days and with over one million hits, 400,000 page views and 100,000 visits. The host machine was a Pentium III 700MHz with 256MB RAM running Windows 2000 Professional. During testing, the log file was installed on the local hard disk and we disabled all features that would require reference to the Internet, which would have made the results inconsistent. We thus disabled DNS resolution in both products, and disabled the page title retrieval capability in WebTrends. Bear in mind, however, that in a real world situation you may have to cope with both the initial FTP transfer and external DNS lookups during the report run, both of which will lengthen the time it takes to produce a finished report from scratch. We performed one initial run with a default profile (apart from the settings mentioned above) and compared the results. There were significant differences between the two products initially due to differences in default configurations. For instance, WebTrends does not exclude FAVICON.ICO or ROBOTS.TXT by default, which skewed the page hit figures excessively. Also, both products determined visits in a different way. The log file did not contain cookie details (which WebTrends can use to determine visitor stats), so both products had to guess what constitutes a visit by noting the amount of time elapsed without the same IP address requesting objects from the site (this is the normal mode of operation of Web log analysis products). Once a pre-determined amount of time has elapsed, the visit is deemed to be terminated, and this timeout value is initially different in each product, resulting in WebTrends reporting more visitors than WebSuxess (erring on the side of popularity, obviously). Once the above settings had been tweaked and the reports re-run, we found that the results were within a few hundred page impressions and visits of each other, and we deemed that both products were reporting on essentially the same subset of data in essentially the same way. We then set a report running and timed it from the point we clicked on OK to the point we had the finished results available to view on our screen. With WebTrends, we had to allow to parse the log file and create the full HTML report. With WebSuxess, we merely had to allow it to parse the log file and generate the report on-screen within the application. Initially, we ran the test without FastTrends enabled in WebTrends, since that provides the fairest comparison of actual speed of log file parsing. We then re-ran the report with FastTrends enabled and timed the initial run where the database had to be populated. Finally, we ran more reports in WebTrends with the FastTrends database populated. Each report was run ten times and an average time taken. The results were as follows:
If HTML reports are produced at regular intervals overnight and stored in a central location for subsequent study, then such performance differences are not really a problem. However, if your aim is to sit in front of your PC slicing and dicing your log file data in various ways to obtain a real handle on the usefulness of your Web site, then the speed of the WebSuxess product is clearly a significant advantage. In the final analysis (no pun intended!) there is not a huge difference between the two products in terms of overall functionality, and both will provide a wealth of information that is vitally important in gauging the effectiveness or lack of it of your Web site. Each product has its strengths and weaknesses. WebTrends has some useful features such as cookie support, that allows it to track visitor access more accurately providing those visitors do not have cookies turned off in their browsers, of course. Without cookies, visitor sessions are guessed based on inactivity times, and are approximate at best (though not, in our experience, wildly inaccurate). It also appears to provide a slightly more informative means of reporting on ad views and clicks, together with an intranet management capability that allows reports of internal web servers to be segregated by department or region, and a more feature-rich report scheduler. On the other hand, WebSuxess also has a number of unique features that may make it a sure thing for certain organisations. One of the main ones is the streaming video analysis capability, allowing it to analyse the use of RealVideo, RealAudio and NetShow streaming content on the Web site. The analysis includes all relevant information, such as visitor usage of streaming content, transmission information and media players used. Other interesting features include visitor path analysis, which allows it to recreate the exact path each visitor takes through your Web site from page to page, and page comments, which allows a more logical editorial view to be imposed on the Web site for analysis purposes, regardless of the underlying physical directory structure. In general, the WebTrends HTML report is slightly easier to read for the novice, with a straightforward if completely static - presentation of Top 20 this and least popular that, each table accompanied by a clear graphic representation and some useful hints on how to interpret the results. In contrast, the WebSuxess HTML report can be unwieldy at first glance, presenting much more information in a dynamic manner that allows it to be re-sorted and re-presented in a multitude of different ways. Unfortunately, the graphics are not always as clear or as easy to read as the WebTrends counterparts. The whole report is thus not quite as easy for the beginner to get to grips with, but it obviously provides much more scope, flexibility and power for the seasoned user who needs to be able to extract more information from his or her Web log reports. Then again, of course, there is also the wonderful interactive graphical interface in WebSuxess that provides the means to slice and dice your Web log statistics in hundreds of different ways, drilling down, sorting and filtering data on the fly before committing the final analysis to HTML file or paper. If you have a specific question that needs answering regarding a particular visitor or HTML page, then WebSuxess provides the most flexible (and the quickest) environment in which to achieve that. With similar levels of functionality on offer, for some it could come down to pricing. Although this will vary with time and geographic region, at the time of writing the UK pricing gives WebSuxess the advantage of an approximate 12 per cent saving over WebTrends. Whichever product you prefer, it is clear that you should be evaluating one of these if you want to gain an insight into how your Web site is working (or not working!) for you. To see a sample of a WebTrends v 6.0 report, click here To see a sample of a WebSuxess V4 report, click here (note that all column headings are live hyperlinks, allowing you to resort each report as you please). |
Security Testing |
Send mail to webmaster
with questions or
|