"Linux Gazette...making Linux just a little more fun!"
Defining a Linux-based Production System
Introduction
In a previous article ("Thoughts About Linux", LG, October) I browsed upon
several topics to use Linux in business, not only for networking and communication,
but also for real production systems. There are several conditions which
need to be fulfilled, but it should be possible to define a basic database
system, which is rapidly deployed and has everything that is needed in
such a system.
The past two months I have been increasing my knowledge about Linux
and available tools on Linux. There are several points which need further
elaboration, but I have a fairly good idea of what is needed.
Goal
The goal is to have a look at the parts which are needed to implement a
reliable production database system, together with the tools needed to
provide for (rapid) (inhouse) development, but for a fairly lower cost
than is needed with traditional platforms. It must be reliable, in the
sense that all necessary procedures for a minimum downtime are available,
with the emphasis that a little downtime can be tolerated.
I do need to place a remark here. I worked on several projects, where
people tried to save time by asking for rapid development, or trying to
save money by reusing parts which lay around, or by converting systems.
What happened in all cases was that both money and time were lost, the
main reason being not understanding all aspects of the problems.
This is a mistake I don't want to make. I think that I now have enough
experience to show a way to achieve the above defined goal, describing
a Linux-based production platform which has lower deployment and exploitation
costs.
Basic recommendations
These are general guidelines. The first part in creating and exploiting
a successful production system is in constraining the amount of tools that
are needed on the platform. This leads to the second part of success, understanding
and knowing your tools. Experience is still the most valuable tool, but
depending on the amount and complexity of tools, much time can be wasted
trying to get to know the tools that are at hand. Good handbooks with clear
examples and thorough cross-references are a great help, as are courses
on the subjects that matter.
Hardware
At the moment I won't go very deep into hardware. The base platform should
be a standard PC with an IDE harddrive on a PCI controller which is fast
back-to-back capable. I tested the basic data rate of a Compaq Despro system
(166 MHz, Triton II chipset) and got a raw data-speed (unbuffered, using
write()) of 2.5 MB (megabytes)/s. I suppose that for a small entry platform
this is fairly reasonable. Further tests should be developed to test the
loading of the machine under production circumstances.
The most important part , however, is that all machines running Linux
with production data, should be equipped with a UPS. This is because the
e2fs file system (as most Un*x filesystems) is very vulnerable in the case
of an unexpected system shutdown. For this reason, a tape drive is also
indispensable, with good backup and restore procedures which must be run
from day 0.
Production tools
Our main engine is the database management system. For production purposes,
the following features must be available :
-
Fast query capability
-
Batch job entry
-
Printing
-
Telecommunication
-
Transaction monitoring
-
Journaling
-
User interfacing
Fast query capability
This feature is especially necessary for interactive applications. Your
clients shouldn't wait half a minute before the requested operation is
fulfilled. This capability can be enhanced by buffering, faster CPU's,
faster disk-drives, faster bus-systems and RAID.
Batch job entry
This is a very valuable production tool. There are much jobs which depend
on the daily production changes, but which need much processing time afterwards.
These are mostly run at night, always on the same points of time, daily,
weekly, monthly or yearly.
Printing
Printing is a very important action in every production system and should
be looked after from the design phase. This is because many companies have
several documents that are official. Not only printing on laser- or inkjet
printers should be supported, but also printing with heavy duty printing
equipment for invoices, multi-sheet paper, etc.
Telecommunication
Telecommunication is not only about the Internet. There are still many
systems out there that work with direct lines between them. The main reason
is that this gives the people who are responsible for the services, a much
greater degree of control over access and implementation. In addition to
TCP/IP; e-mail and fax, support for X.25 should also be an option in this
area.
People should also have control over the messages and/or faxes they
send. A queue of messages should be available, where everybody can see
all messages globally (dest, time, etc) and where they have access to their
own messages.
Transaction monitoring
With transaction monitoring, I mean the ability to rollback pending updates
on database tables. This feature is especially needed when one modification
concerns several tables. These modifications must all be committed at the
same time, or be rolled back into the previous state.
Journaling
This capability is needed to repair files and filesystems which got corrupted
due to a system failure. After restarting the system, a special program
is used to undo all changes which couldn't be committed. In this sense,
journaling stands very close to transaction monitoring.
User interfacing
This is a tricky part, because it is part development and part production.
On the production side, the interface system should give users rapid access
to their applications and also partition all applications between departments.
Most production systems I have seen do this with menu's. There are several
reasons. The main reason is that most production systems still work with
character-based applications. There are many GUI's out there, but production
systems will still be solely character based (except for graphics and printing,
but I consider these niche markets), even on a GUI. The second reson is
that a production system usually has lots and lots of large and little
program's. You just can't iconify them all and put them in maps. Then you
would only have a graphical menu, with all icons adding more to confusion
than clarity.
What's available ?
Note : When I name or specify products, I will only specify those with
which I am already familiar. I presume that any one of you will have his/her
own choices. They serve as basic examples, and do not imply any preferences
on my side.
The only database system on Linux I personally know for the moment,
is PostgreSQL. It supports SQL and transaction monitoring. Is it fast ?
I don't know. One should have a backup of a complete production database,
which can then be used to test against the real production machine, with
interactive, background and batch jobs running like they do in the real
world.
For batch jobs, crond should always be started. In addition to this,
the at and batch commands can be used to have a more structured
implementation of business processes.
For printing, I know (and use) the standard Linux tools lpd, Ghostscript
and TeTeX. There might be a catch however. In some places you need to merge
documents with data. The main reason for this is that a wordprocessing
package offers more control over the format and contents of the document,
instead of printing the document with a simple reporting application. On
my current workplace, a migration to HP is busy. The solution there is
WordPerfect. In the past, I have used this solution under DOS, to automatically
start WP and produce a merged document. Is this possible too with StarOffice
?
Are there other print solutions which offer more interactive control
over the printing process than lpd ? Users should have more easy access
to their printjobs and the printing process.
Telecommunication is a real strong point of Linux. I won't enumerate
them all. Even if it doesn't support X.25, it is still possible to use
direct dial-up lines using SLIP or PPP.
Journaling is the weakest point of Linux. I have worked with the following
filesystems : FAT, HPFS, NTFS, e2fs, the Novell filesystem and the filesystem
of the WANG VS minicomputer system. With all these systems, I have had
power-failures or crashes, but the only file-system that gives trouble
after this is e2fs. In all other cases, a filesystem check repairs all
damage and the computer continues. On WANG VS, index-sequential files are
available. When a crash occurs, the physical integrity of an indexed file
can be compromised. To defend against this, there are two solutions. The
first is reorganizing the file. This is copying the file on a record-by-record
basis. This rebuilds the complete file and its indices, and inserts or
deletes which were not committed are rejected. The second option is using
the built in transaction system. A file can flagged as belonging to a database.
Every modification to these files is logged until the transaction is completely
committed. After a failure has occurred, the files can be restored in their
original states using the existing journals. This is a matter of minutes.
I think that the only filesystem on PC which offers comparable functionality
is that of Novell.
The e2fs file system check works, but it does offer not enough explanation.
When there is a really bad crash, the filesystem is just a mess.
Development tools
I will describe here the kind of tools that I needed when I was maintaining
a production database in a previous job. The main theme here is that programmers
in a production environment should be productive. This means that they
should be offered a complete package, with all tools and documentation
necessary to start immediately (or in a relatively short time). This means
that for every package there should be a short, practical tutorial available.
I will divide this section into two parts, the first being necessary
tools, the second being optional tools. Also necessary for development
is a methodology. This methodology should be equal through all delivered
tools. The easiest way to do this is through an integrated development
environment.
Compulsory development tools
Which tools are the minimum needed to start coding without much hassle
? I found these tools to be invaluable on several platforms :
-
An integrated development environment
-
A powerful editor
-
An interactive screen development package
-
A data dictionary
-
A high-level language with DBMS preprocessor support
-
A scripting language
Integrated development environment
Your IDE should give access to all your tools in an easy and consistent
manner. It should be highly customisable, but be delivered with in a configuration
which gives direct access to all installed tools.
Editor
If you have a real good editor, it can act as an integrated development
environment. Features which enhance productivity are powerful search-and-replace
capabilities and macro features (even a simple record-only macro feature
is better than no macro features). Syntax colouring is nice, but one can
live easy without it. Syntax completion can be nice, but you have to learn
to live with it. Besides, the editor cannot know which parts of statement
you don't need, so ultimately you will have more clutter in your source,
or you waste your time erasing unnecessary source code.
Screen development
This is an area where big savings can be done. For powerful screen development
you need the following parts in the development package :
-
Standard screens which are drawn upon information in the data dictionary
-
Easy passing of variables between screens and applications
-
A standard way of activating a screen in an application
The savings are on several places. If you create a new screen, then you
should immediately get a default screen with all fields from the requested
table. After this, only some repositioning and formatting to local business
standards needs to be done. I worked with two such systems, FoxPro and
WANG PACE, and the savings are tremendous in all parts of the software
cycle (prototyping, implementation and maintenance).
Data dictionary
A data dictionary is a powerful tool, from which much information can be
extracted for all parts of the development process. The screen builder
and the HL preprocessor should be able to extract their information from
the data dictionary. The ability to define field- and record-checking functions
in the data dictionary instead of the application program, eliminates the
need to propagate changes in this code through all applications. With the
proper reports, one should also be able to look at different angles into
the structure of the database.
High level language with DBMS preprocessor support
You can't do complete development without a high-level language. There
are always functions needed which can't be implemented through the database
system. To make development easier, it should be possible to embed database
control statements in the source program. The compiler should be invoked
automatically.
Scripting language
A scripting language is very useful in several aspects. Preparing batch
jobs is part of it. I also found out that a business system consists of
several reusable pieces, which can be easily strung together using a scripting
language. Also, the overall steering and maintenance of the system can
be greatly simplified.
Optional development tools
These are tools that were avalailable on several platforms, which can come
in handy, but aren't necessarily usable to deliver production environment
applications. I found out that these are little used.
-
Interactive query system
-
Report editor
Interactive query system
This is often designed to be used by people which are not programmers.
Experience has thaught me however that people who are not programmers in
a business, don't have the time to learn these tools. It is a useful tool
for a programmer to test queries and views, but it isn't really useful
as a production tool. Only in some cases, for real quick and dirty work,
is it worth using.
Report editor
This is even a more overestimated tool. I shared thoughts about this with
other programmers, and our conclusion was : bosses always ask reports which
are much more complicated than a simple report editor can handle. It would
be far better to use a programming language specifically designed for reporting
(any one know of such a thing ? Any experiences with Perl for extraction
and reporting ?).
What's available ?
Note : I will direct my attention only at the compulsory development tools.
The rest of the environment will be centered around the features of PostgreSQL.
As an integrated development environment, EMACS is probably the first
which comes to mind. It integrates even with my second subject, a powerful
editor. Is it even at all possible to draw a line between the two ? Is
EMACS a powerful editor which serves as a development environment, or is
it a development environment which is tightly integrated with its editor?
The data dictionary, the screen development package and the DBMS preprocessor
are more thightly bound than other parts of the package. The screen editor
and the DBMS preprocessor should get their information from the data dictionary,
and the DBMS HL statements should also provide for interaction with screens.
It should be both possible to develop screens for X-windows, as well for
character-based terminals.
In the field of high level languages, there are several options, but
a business oriented language is still missing. Yes, I am talking COBOL
here, although an xBase dialect is also great for such applications. I
have programmed for eight years in several languages, only the last two
year in COBOL, and it IS better for expressing business programs than C/C++.
If anyone would ask me now to write business programs in C/C++, I think
the first thing I would do was write a preprocessor so that I could express
my programs with COBOL-like syntax.
I don't know how ADA goes for business programs, but a combination
of GNAT, with a provision to copy embedded SQL statements to the translated
C-source and then through the SQL preprocessor would maybe work.
I only had a small look at Perl, and from Tcl and Python I know absolutely
nothing, but while interactive languages are fine for interactive programs,
you should also bear in mind that some programs must process much data,
and that therefore access to a native code compiler is essential.
There is another point in which only COBOL is good. This is in financial
mathematics. This is due to the use of packed decimal numbers up to 18
digits long where the decimal point can be in any place. You should have
compiler support for that too. On the x86 platform this capability exists
in the numerical processor, which is capable of loading and storing 18
digit packed decimal numbers. Computations are carried out in the internal
80-bit floating point format of the co-processor.
When you have a Linux system, the first scripting language you run into
is probably that of the bash shell. This should be sufficient for most
purposes, although my experiences with scripting languages is that they
benefit greatly from statements for simple interaction (prompting and simple
field entry).
What should be delivered ?
As I said before, this list doesn't present any endorsement from me towards
any of these products or programs. This list should be expanded with all
products which fit in either one of these categories, so all hints all
wellcome.
Another weak point in some areas of Linux is documentation. For a production
environment, the Linux documentation project is probably a must, preprinted
from the Postscript sources. For the commercial products, good documentation
is also not a problem. For other parts of Linux tools, the books from O'Reilly
& Associates are very valuable. HOWTO's are NOT suited for a production
environment, but since they are more about implementation, they are suitable
for the people who put the system together. The catch is this : when a
system is delivered, all necessary documentation should be prepared and
delivered too. I worked with several on-line documentation systems, but
when in a production environment, nothing beats printed documentation.
Production system |
DBMS
- Fast query/update
- Transaction processing
- Journaling |
postgreSQL
mySQL
mSQL
Adabas
c-tree Plus/Faircom Server
... |
Communication |
ppp
slip
efax
... |
Batch job entry |
crond
at
batch |
Printing |
lpd |
User interfacing |
? |
Development system |
IDE |
EMACS |
Editor |
EMACS
vi |
Screen development |
Depends on DBMS |
Data dictionary |
Depends on DBMS |
Application language |
C
C++
Cobol ?
Perl
Tcl(/Tk)
Python
Java |
Scripting language |
bash |
Summary
I am still trying to drag Linux into business. If you want to do business
using Linux, you should be able to deliver a complete system to the customer.
In this article I outlined the components of such a system and some weaknesses
which should be overcome. As a result, I created a table enumerating the
needed components for such a system.
This table is absolutely not finished. I welcome all
references to programs and products to update this table. It should be
possible to publish an update once a month. What I also should do, is extend
the table with references to available documentation.
Another part which needs more attention is developing tests to assess
the power of the database system, ie. what can be expected in terms of
throughput and response under several load scenarios.
Copyright © 1999, Jurgen Defurne
Published in Issue 36 of Linux Gazette, January 1999