LinuxFocus January 1998: PG2CGI Access to a Database from the Web

Luis Colorado
About the Author:

Luis Colorado works as administrator of UNIX systems and Internet access for Telef�nica Sistemas S.A. in Spain. He graduated in Physics by the Universidad Complutense of Madrid and has developed several open source UNIX utilities.

Write to the Author

Contents:
Introduction
Motivation
M4
How does it work
Transfering information from user to CGI
Download

PG2CGI Access to a Database from the Web

Abstract:

We describe a new program that allows one accessing a database from HTML pages. The design goals of the program were to control data access, flexibility in the configuration and independence from layout formats. This program resulted from a number of ideas emerge during my readings of several articles about M4 that appeared recently in the magazines Linux Journal and LinuxFocus (www.linuxfocus.org) during the last 12 months.

Introduction

This article describes the rationale and ideas that lead me to develop a package for accesing a database from the web. It is not my intention to write a manual on the PG2CGI package (a reference guide is already included in the distribution of the software, the URL for this package is given later), rather my goal is to give a brief introduction and to hopefully stimulate readers to use it and send their feedback.

Motivation

I wrote this program as a response to a series of articles that appeared in the Linux Journal and in LinuxFocus concerning the use of M4 to manage HTML sources. These articles demonstrated the advantages and potential of M4, as a standalone tool for the maintenance of webpages and the dynamic generation of content.

On the other hand the availability of numerous web servers and databases contrasts with the lack of utilites that can interface between both environments (most of these interface applications are commercial or have serious deficiencies with regards to the formats that can be handled).

The following software package unifies the database and web environments. It was designed to satisfy these requirements:

It must be easy to configure. The configuration of the system is controlled by a single configuration file.
Allow the security of the site to be controlled. The package supports a flexible method for selecting a configuration and it can be easily configured to control the access to a website.
It is a well known fact that HTML is a formatting language for formatting documents that can be display on a web browser. A common characteristic of a good text generator is a decent language for formatting text with sufficient control over the arrangement of the paragraphs and text items in the final typesetted page. However in most formatting languages there is always a set of macros that allows to handle floating elements in the document (that is elements like tables, images etc. that adjust dynamically on the page during runtime).

Most of the utilities I know for interfacing HTML and databases generate the information on the output text by extending the set of HTML tags; these new tags then take the role of `macros'. However this approach has a serious drawback: most of the time we have to modify the elements within the native HTML tags , and this is a major limitation. Often times these utilities have to `reinvent' tags just in order to allow a certain effect within an existing HTML tag. Mi program uses M4 in depth in order to accomplish macro replacements wherever needed. The program per se does not generate any formatted text and that is why it is independent of the final output format language used. The program generates a set of macros as a result of a query and later on M4 generates an HTML document by using this set of macros and a given HTML template.

The utility presented here satisfies these requirements at the expense of a small penalty in efficiency (it has to invoke M4 several times in the course of execution) but the results are satisfactory in most cases (take into account that in most cases the database query already takes more time than the dynamic generation of HTML text)

M4

M4 is a macro processing tool developed a long time ago. Our software makes heavy use of this macro procesor:

M4 was originally developed to link the CGI (Common Gateway Interface) with the PostgreSQL database. Soon after people realized it was general enough for other interfaces besides CGI (for example a gateway between a LDAP server and other databases like Informix and Oracle).

Later the interface was standarized incorporating all the necessary functionality and the PostgreSQL module was rewritten according to the new interface. Now it is possible to develop new drivers that allow the program to link with other database managers.

It is used to replace the macros that pass the information from the left to the right side of the selection rules in the configuration file (see the software documentation for more information)
It is also used to implement all the definitions given by the template that generates the final HTML output file.
Our software also uses M4 internally (several times)to to execute replacements of internal variables that are passed to the drivers responsible for launching database queries.

The repeated used of the macro processor can mean a lost of efficiency, although our tests with the GNU M4 utility have been quite satisfactory.

Regular Expressions:

Our utility makes heavy use of regular expressions to verify the configuration rules to be applied. Regular expressions are prefered over simple comparisons because they give us better functionality. Regular expressions have among other advantages:

the possiblity to group a number of possibilities in a single configuration rule.
there is no significant loss of performance because the current libraries (GNU) for regular expressions are sufficiently lean and efficient to justify their use.
the libraries included allow grouping regular sub-expressions (using parenthesis) and identifying sub-strings from the original expression. This functionality is ideal to parse information needed to feed the query drivers.

PG2CGI makes use of regular expressions when verifying the configuration rule to be used.

The syntax of expressions and parsing the data for the drivers passed in the regular expressions can be easily done thanks to grouping multiple sub-chains in a single regular expression.

An example: let us supposed the client has to passed some informartion in the query_string, and that such data must fit to the following syntax:

  FIELD=value

Furthermore the query_string has to satisfy this format and there should not be any additional information in the string.

Enforcing this syntax can be easily accomplised by writing the following term in the selection rule:

  QUERY_STRING: "^FIELD=[^&]*$";

The previous line warrants that the rule will be invoked only if the QUERY_STRING fits the proper syntax. No only that, but if we write parenthesis around the value in the expression above then the program will allow to parse the corresponding value:

  QUERY_STRING: "^FIELD=([^&]*)$";

In the process the program also transforms the escape sequences of the form %xx introduced by the navigator

How Does it Work?

This is the moment to describe how the program works. When it is first launched it receives variables from the web server environment and it then configures itself according to these variables. The environment variables are all that the program has to findout: How about the client? Where is the origen of the request? What is the request? What type of information (MIME type) is supported by the client? etc. PG2CGI then selects the rule to be used based on the left-hand side of the rule. At the present time there are three types of rules:

Environment variables fit a regular expression indicated in the configuration file. For example:
```
  QUERY_STRING:    "^FIELD=([^&]*)$"; 
```
Environment variables do not fit the regular expression in the configuration file. This type of terms let us reject a rule provided some condition occurs in the environment variables. For example:
```
  ! HTTP_ADDRESS: "^194\.142\.12\.";
	
```
(This example would invalidate the rule if the request came from 194.142.12.xxx)
Environment variables whose value is optional (it is not determinant to select the rule). If the value is present it is parsed and passed along. For example:
```
  [ QUERY_STRING: "USER=([^&]*)" ];
```
This term allows us to pass the value of the user if the client provides it, but it does not invalidate the rule if the client does not specify it.

With all this machinery we can start building rules. Next we will group a number of terms that specify a valid left-hand side for a rule. The terms are place between "{}" brackets.

The left and right hand side of a rule are delimited using curly brackets "{}" and are separated by the symbol `->'.

The right-hand side contain terms with the same syntax: a variable name, the character `:', a string of characters and the terminator `;'. All the terms on the right-hand side are asignments of variables to given values to be performed by M4:

Next run M4 with the delimiters exchanged (replace ` by <<< and ' by >>>) and the comments disabled so that characters like # do not interfere in the substitution string. The resulting chain is assigned as value to the named environment variable.
PG2CGI uses three reserved environment variables:

M4FILE.
this variable contains the filename for M4's output template.

DRIVER.
points to the driver to be used.

TESTMODE.
if TRUE it indicates the output should bu produced in the debug mode, then PG2CGI does not call M4 in the last stage and the result produced in TEXT/PLAIN format. The resulting file is the input statements M4 would have taken in the normal running mode.

Other variables may be used either by the template file or the corresponding driver.

How Does the Information from the User Reach the CGI?

Very easily. The groups created from the regular expressions in the left hand side terms of each rule are transformed into variables (the names used for these new variables have the form `term__match_<j>', where refers to the ordinal number of the term within the rule (so the first term gets 1, the second 2, etc..) and the <j> refers to the order of the group counting from the left of the regular expression. Then if the query_string passed by the client is:

`NAME=JOSE&FAMNAME1=DE+LA+FUENTE&FAMNAME2=LOPEZ'

and the rule declared in the website is :

     QUERY_STRING: "NAME=([^&]*)";
     QUERY_STRING: "FAMNAME1=([^&]*)";
     QUERY_STRING: "FAMNAME2=([^&]*)";

the result woud be::

     term_0_match_0  <- "NAME=JOSE";
     term_0_match_1  <- "JOSE";
     term_1_match_0  <- "FAMNAME1=DE LA FUENTE";  
           (noticed the replacement of the + characters by ` ')
     term_1_match_1  <- "DE LA FUENTE";
     term_2_match_0  <- "FAMNAME2=LOPEZ";
     term_2_match_1  <- "LOPEZ";

Drivers:

We will not describe in this article the use of the drivers, all that is already explained in the documentation included with the distribution of PG2CGI. The interested reader is encouraged to look at the reference manual included.

For the moment there is only one driver, for the connection to database of the POSTGRESQL. However the author has in plans writing another driver for LDAP type databases.

An Example:

Let us now examine the sources for a full example.

Consider the table of announcements from the database of annnouncements at slug.ctv.es (follow the AVISO A LOS NAVEGANTES link) It is a very simple example that uses two templates to examine individual records from a table or make a full listing.

/etc/html2sql.cfg

{
  PATH_INFO: "^/avisos/?$"; # Selects by PATH_INFO
  [SERVER_ADMIN: ".*"];     # Obtains information from SERVER_ADMIN. Optional
} -> {
  DRIVER:   "POSTGRESQL";
  PGTTY:    "/dev/console";  # Sends logs to the console
  PGDATABASE:     "postgres";
# We make a query (always write the oid that
# will use it internally in the template file to
# link the individual records).
  PGQUERY:  "select oid,ct,titulo,texto,mt"
      " from avisos"
      " where (dt is NULL or dt > 'now')"
      " order by mt desc";
# File containing the template 
  M4FILE:    "/usr/local/etc/httpd/plantillas_m4/avisos.m4";
  WEBMASTER: "term_1_match_0";
  #TESTMODE: "TRUE";
}

# The following selection rule allow to select an announcement (aviso)
# with a known OID (primary key). The information is included in the
# variable  PATH_INFO of the CGI.

{
  PATH_INFO: "^/avisos/([0-9]+)/?$";
  SERVER_ADMIN: ".*";
} -> {
  DRIVER:   "POSTGRESQL";
  PGTTY:    "/dev/console";  # as before, write logs to the console
  PGDATABASE:     "postgres";
  OID:      "term_0_match_1"; # assign an OID

# Once again the selection is important. We include the OID
# at the beginning of the field just in case we wish to write
# a hyperlink to delete this record.

  PGQUERY:  "select oid,ct,titulo,texto,mt,dt,autor"
            " from avisos"
            " where (dt is NULL or dt > 'now') and oid=OID";
# The template is different now
  M4FILE:    "/usr/local/etc/httpd/plantillas_m4/avisos_oid.m4";
  WEBMASTER: "term_1_match_0";
  #TESTMODE: "TRUE";
}

/usr/local/etc/httpd/plantillas_m4/avisos.m4

define(<<<for>>>, <<<dnl ifelse(eval((<<<$2>>>) <= (<<<$3>>>)), 1, <<<define(<<<$1>>>,<<<$2>>>)$4<<<>>>dnl for(<<<$1>>>,eval(<<<$2>>>+1),<<<$3>>>, <<<$4>>>)dnl >>>)dnl >>>)dnl divert(0)dnl Mime-Version: 1.0 Content-type: text/html <HTML> <BODY BGCOLOR="#ffffff"> <CENTER> <H1>AVISOS A LOS NAVEGANTES QUE PASAN POR SLUG</h1> </center> Nota: Esta sección ha sido creada para dar a conocer cualquier noticia de interés relacionada con <A HREF="http://slug.ctv.es/">SLUG</a>, <A HREF="http://LuCAS.ctv.es/">LuCAS</a>, <A HREF="http://www.HispaLinux.ctv.es/">HispaLinux</a> y en general, cualquier servicio prestado por slug.ctv.es. <CENTER><HR WIDTH=100></center> ifelse(PGRES_RESULTSTATUS, <<<PGRES_TUPLES_OK>>>,<<<dnl ifelse(PGRES_NTUPLES, 0,<<<dnl  No hay avisos. >>>,<<<dnl /* PGRES_NTUPLES != 0 )( */ <CENTER>  <TABLE> <TR> <TH></th> <TH ALIGN="LEFT">Fecha-Hora </th> <TH ALIGN="LEFT">Asunto </th> </tr> for(<<>>,0,eval(PGRES_NTUPLES-1),<<<dnl <TR> <TD> <A HREF="/cgi-bin/pg2cgi/avisos/cell(i,0)"> <IMG SRC="/icons/burst.gif"> </a> </td> <TD>cell(i,1) </td> <TD>cell(i,2) </td> </tr> >>>)dnl /* for */ </table> </center> >>>)dnl /* PGRES_NTUPLES */ >>>,<<<dnl /* ifelse PGRES_RESULTSTATUS )(*/ Error en el resultado: PGRES_RESULTSTATUS Mensaje del servidor: PGRES_ERRORMSG >>>)dnl <CENTER><HR WIDTH=100></center> <A HREF="mailto:WEBMASTER?subject=TABLON DE ANUNCIOS"><CODE>WEBMASTER</code></a> </body> </html>

/usr/local/etc/httpd/plantillas_m4/avisos_oid.m4

divert(-1) $Id: generic_list.m4,v 1.1 1998/07/06 17:13:33 luis Exp $ define(<<<cell>>>, <<<PGRES_CELL_$1_$2>>>) define(<<<field>>>, <<<PGRES_FNAME_$1>>>) define(<<<for>>>, <<<dnl ifelse(eval((<<<$2>>>) <= (<<<$3>>>)), 1, <<<define(<<<$1>>>,<<<$2>>>)$4<<<>>>dnl for(<<<$1>>>,eval(<<<$2>>>+1),<<<$3>>>, <<<$4>>>)dnl >>>)dnl >>>)dnl divert(0)dnl Mime-Version: 1.0 Content-type: text/html <HTML> <BODY BGCOLOR="#ffffff"> <CENTER> <H1>AVISO OID</h1> </center> <CENTER><HR WIDTH=100></center> ifelse(PGRES_RESULTSTATUS, <<<PGRES_TUPLES_OK>>>,<<<dnl ifelse(PGRES_NTUPLES, 0,<<<dnl  No existe el aviso OID, o ha caducado. >>>,<<<dnl /* PGRES_NTUPLES != 0 )( */ <CENTER>  <TABLE> <TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP>Fecha de entrada: </td> <TD width="80%">cell(0,1)</td> </tr><TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP>Fecha última modif: </td> <TD>cell(0,4)</td> </tr><TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP>Fecha eliminación: </td> <TD WIDTH=*>cell(0,5)</td> </tr><TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP>Autor: </td> <TD><a href="mailto:cell(0,6)?subject=[TABLON-SLUG] cell(0,2)">cell(0,6)</a></td> </tr><TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP>Asunto: </td> <TD>cell(0,2)</td> </tr><TR> <TD COLSPAN=2 BGCOLOR="#c0ffff">cell(0,3)</td> </tr> </table> </center> >>>)dnl /* PGRES_NTUPLES */ >>>,<<<dnl /* ifelse PGRES_RESULTSTATUS )(*/ Error en el resultado: PGRES_RESULTSTATUS Mensaje del servidor: PGRES_ERRORMSG >>>)dnl

<CENTER><HR WIDTH=100></center> <A HREF="mailto:WEBMASTER?subject=TABLON DE ANUNCIOS"><CODE>WEBMASTER</code></a> </body> </html>

The results may be viewed in these URLs:
http://slug.ctv.es/cgi-bin/pg2cgi/avisos/
or
http://slug.ctv.es/cgi-bin/pg2cgi/avisos/20384

Download

The PG2CGI program can be dowloaded from the following URLs :

http://slug.ctv.es/~luis/utils/pg2cgi.tar.gz
http://slug.ctv.es/~luis/utils/pg2cgi.README
ftp://slug.ctv.es/pub/slug/luis/pg2cgi.tar.gz
ftp://slug.ctv.es/pub/slug/luis/pg2cgi.README

Translated by Miguel A Sepulveda