Analyzing your internet application's log files II - configuring your reports

ArticleCategory:

Applications

AuthorImage:

[Foto of Egon]

TranslationInfo:

original in en Egon Willighagen

AboutTheAuthor:

Besides working for the LinuxFocus project, Egon was working for the Stichting Logreport Foundation until November 1st of this year. Part of the foundation's goal is to write software for log analysis with a GPL license.

Abstract:

This article is the second in a series about using Lire to analyze log files of internet server applications. This article shows you how you can customize the generated reports. The first article in this series explained how Lire is installed.

ArticleIllustration:

[illustration]

ArticleBody:

Introduction

This article is based on the latest release of Lire, being lire-20011017. Note that configuration has changed a lot since the previous release, and that, basically, the first article in this series is already outdated. The general idea of lr_config, however, has not changed.

New features in this release are, among others: two new super services (FTP and firewall), a lot of new reports (total > 68), new output formats (XHTML and RTF) and lots of bug fixes. But, the most important change in this release is in the engine. The report generation process has completely been rewritten to make use of XML technology.

This article will introduce one of the XML formats that are now used in Lire, and how this is used to specify reports. It will not be a tutorial on how to make new reports, but it will show you how you can change the predefined reports at a low level. But first, this article will explain how you can tell Lire which reports it should generate and how parameters for these reports can be set.

Selecting Reports

Each super service (e.g. `email' is a super service, the `postfix' and `sendmail' service's belong to this super service) has a number of reports available, which extract information from the log for you. The WWW super service has, for example, 31 reports. Not all reports are interesting for everyone. Some are very specific. By default, most of those reports are selected, but it is useful to customize this.

The reports that will be used in the generation of the report are given in the file <prefix>/etc/lire/<superservice>.cfg (assuming Lire is installed in the directory <prefix>). For example, the configuration file for the FTP super service looks like:

# Report configuration for the FTP super service

# Top X reports
top-remote-host hosts_to_show=10
#top-files files_to_show=10
top-files-in files_to_show=10
top-files-out files_to_show=10
top-users users_to_show=10

# By day reports
bytes-by-day

# Transfers by X reports
transfers-by-direction
transfers-by-type

The FTP super service thus has eight reports defined and all but one are selected. The "top-files" is deselected by means of the "#" character. Removal of the "#" char will select the report again.

Note that not all line starting with "#" are reports. In this configuration file the lines "Report configuration for the FTP super service", "Top X reports", "By day reports" and "Transfers by X reports" are comments. Similar things can be expected in the other configuration files.

Sorting Reports

Ordering is very simple. The order in which report lines appear in the config files, is the order in which the reports will be given in the output. Rearranging the lines in these configuration files reorders them in the output. For example, in the above example, transfers-by-type will be the last report given in the output.

Customizing Reports

Many reports can partly be customized with the configuration files explained in the previous section. For example, consider this DNS super service configuration:

# Report configuration for the DNS super service

# Top reports
top-requesting-hosts hosts_to_show=10
top-requesting-hosts-by-method hosts_to_show=10 method='recurs'
top-requesting-hosts-by-method hosts_to_show=10 method='nonrec'
top-requested-names names_to_show=10
top-requested-names-by-method names_to_show=10 method='recurs'
top-requested-names-by-method names_to_show=10 method='nonrec'
requesttype-distribution
requesttype-distribution-by-method method='recurs'
requesttype-distribution-by-method method='nonrec'

# By Day reports
requests-by-period period=1d
requests-by-period-by-method period=1d method='recurs'
requests-by-period-by-method period=1d method='nonrec'

# By Hour reports
requests-by-period period=1h
requests-by-period-by-method period=1h method='recurs'
requests-by-period-by-method period=1h method='nonrec'

All fifteen reports are selected, but furthermore, for the reports giving a Top X output the number X can be defined. With the above configuration the report top-requesting-hosts will give a Top 10.

These reports are generated from only eight report specifications. The use of parameters (period, method, hosts_to_show, and names_to_show) makes this possible. This is one of the new powerful features of the new XML based engine.

Important: all variable settings must be placed on the same line as the report name!

A more exotic example is taken from the WWW super service configuration file:

top-referers-by-page referer_to_show=5 page_to_show=10 referer_exclusion='^-$'

In this example a Perl regular expression is used as content for the referer_exclusion variable. This expression matches all referers "-". Such referers are found in the log file in cases when e.g. the URL of your web page was typed by the client user. (When users visit your page by clicking on a link in a page, refering to your page, the page linked from will be given in the referer field.) All referers that match "-" will be excluded from the analysis.

Low Level Customization of Reports

This new release starts a complete new branch of Lire. The report generation and specification process has completely been rewritten to make use of XML technology. Reports are specified in XML, but variable setting is done in plain ASCII format. The previous report specification was a Perl script that had to know both the input format as well as the output format. With the new XML format, the implementation is separated from the specification, and one does not have to know the input and output format; just the information that can be processed.

Thus, when customizing reports on low level, you need to know XML a bit. An example report taken from the <prefix>/share/lire/reports/firewall directory:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE lire:report-spec PUBLIC
"-//LogReport.ORG//DTD Lire Report Specification Markup Language V1.0//EN"
"http://www.logreport.org/LRSML/1.0/lrsml.dtd">
<lire:report-spec xmlns:lire="http://www.logreport.org/LRSML/"
superservice="firewall" id="bytesperfrom" charttype="bars">

<lire:title>Top Bytes per From-IP Report</lire:title>
<lire:description>
<para>
This report lists the IP addresses sending the highest data volume.
</para>
</lire:description>

<lire:param-spec>
<lire:param name="ips_to_show" type="int" default="10">
<lire:description>
<para>This parameter controls the number of sending IP adresses to
display in the report.
</para>
</lire:description>
</lire:param>
</lire:param-spec>

<lire:display-spec>
<lire:title>Volume per sending IP, Top $ips_to_show</lire:title>
</lire:display-spec>

<lire:report-calc-spec>
<lire:group sort="-rcvd_volume" limit="$ips_to_show">
<lire:field name="from_ip"/>
<lire:sum name="rcvd_volume" field="length"/>
</lire:group>
</lire:report-calc-spec>

</lire:report-spec>

The lire Namespace

First thing you should notice it that almost every XML element in this report starts with lire:. This is used to assign a namespace to that element. Every element with the lire namespace, is defined in XML DTD http://www.logreport.org/LRSML/1.0/lrsml.dtd (empty link!), which can be browsed at http://www.logreport.org/pub/docs/dtd/lrsml/.

All other elements are supposed to belong to the DocBook XML 4.2 DTD. Such as the <para> element on the tenth line of the example.

Changing the title that appears in the Lire generated reports

If you want to change the title that appears in the report, you need to change the <lire:title> content in the <lire:display-spec>. Keep in mind that strings starting with "$" are Perl variables where the name corresponds to one of the specified parameters in the <lire:param-spec> section.

The tricky thing is that you have take the correct <lire:title> element. You need the element which is content of the <lire:display-spec> node. The latter element contains the information that is displayed in the output report. The first <lire:title> element contains the report title that is used in documentation of the Lire software.

The next example shows a fragment of the requests-by-result WWW report specification. One can see that the <lire:display-spec> now not only outputs a title but also some more explanation. Note that all content within the <lire:description> element is not using the lire namespace, and thus is DocBook content.

<lire:display-spec>
<lire:title>Requests By HTTP Result</lire:title>

<lire:description>
<para>
The most common HTTP status codes are given below:
<variablelist>

<varlistentry>
<term>200</term>
<listitem>
<para>OK (The request has succeeded.)</para>
</listitem>
</varlistentry>

<!-- rest is cut out -->
</variablelist>
</para>
</lire:description>
</lire:display-spec>

The report output will look something like (only top part shown):

Requests By HTTP Result

The most common HTTP status codes are given below:

200 OK (The request has succeeded.)

201 Created (The request has been fulfilled and resulted in a new resource being created.)

206 Partial Content (The server has fulfilled the

Changing the type of image for a report

Most reports have graphics associated with the data. These images are generated from the data and the report specification also defines the format in which the image is plotted. Take for example the following snippet from the FTP transfers-by-type report.

<lire:report-spec xmlns:lire="http://www.logreport.org/LRSML/" superservice="ftp" id="transfers-by-type" charttype="pie">

For this report the data is visualized with a pie chart as can be seen from the @charttype attribute in the above code. The result looks like:

[charttype pie]

By changing the chart type to bars as in 'charttype="bars"' the output changes to:

[charttype bars]

Note that the report title contains a bug. The report is on the transfer type not the file type. This bug has already been reported.

Specifics

More specific information about the XML language used for report specification can be found at the LogReport web site. You will see that the language is quite extensive, and for now i can suggest that you will use report specification coming with the distribution as your main guide.

Elements that have not been covered in this article, but that are used in those reports are used for parameter specification (<lire:param-spec>) and calculation of the outputed data (<lire:report-calc-spec>). Especially, the latter has many options and use prior knowledge of the internal format (called DLF) in which the log data is stored. This will be covered in a future article.

Roundup

This article introduced the XML based report generation engine and explained how you can customize the reports you get. More information can be found at the LogReport web site: http://www.logreport.org/.

If you want to get in touch with the LogReport team, you can join IRC. The developers can often be found at the #logreport channel at the OpenProjects.org IRC network. Questions, comments, and support requests are welcomed. If you prefer email, you can reach the team on the public mailing list [email protected].