...making Linux just a little more fun!
Object Caml is an ML type of language. For the non-gurus: it's a functional language that can also be programmed in a non-functional and object-oriented way.
This language is really easy to learn. It's powerful and keeps impressing me with its speed. Programs written in this language are almost always stable by default. No segmentation faults, only occasional unending loops for the programmers that still hang on to program their own loops. It is really not needed to write most loops, since the libraries contain standard functions that are good enough in 99% of the cases. So try to use those functions: It really pays off in terms of stability of your programs, and, unless you have intimate knowledge of the inner works of this language, they tend to be better optimised.
The language can be obtained from the website caml.inria.fr. Here, they provide RPMs for the RedHat 7.2/8.0/9 and Mandrake 8.0 distributions. Also MS Windows binaries are available, but not all Unix library functions will work there, for some mysterious reason. The source tarball does compile flawlessly for me. It just has a somewhat unusual makefile layout:
# ./configure # make world; make opt; make install
The normal libraries include many usable data-structures like balanced trees, hash tables, and streams. Their version of header files (.mli files) contain all the basic documentation you need, and those are directly converted into HTML and published on the Web in their OCaml manual. This manual is not very usable to study this language, so I'll try to explain here some of the basic language constructions. This is just to give you an impression of the power of this language.
Now some real life examples. I wrote a program to help administrating a computer. It is a subset of a normal file finder, but is a command line tool and very fast. It helps locating large, not-recently-used files to be deleted from the system. It crawls through the directory tree and show the contents in different layouts.
Every module in OCaml has its own namespace. Specific definitions can be found by adding the module name, with the first character an upper-case character. You can also change the namespace of the current program to include a total module. Normally, only the standard module 'pervasives.mli' is included in the default namespace. The example program 'show.ml' starts with:
open Basics open Unix open Unix.LargeFile
This includes my own set of 'basics' functions and 2 standard libraries: 'Unix' and 'Unix.LargeFile'. A module normally consists of 2 files. The first file for exporting definitions 'module.mli' (like the C .h file), and the second one for actual code (the 'module.ml' file). The program uses the function 'string_sub' that provides a foolproof version of the 'String.sub' standard function (from the string.mli module). The basics.mli file contains the lines:
val string_sub: string -> int -> int -> string (** Get the sub string from a [string] from position [from] with [length]. This is the same function as String.sub, but it will never raise an exception. And a negative [from] value is counted from the right side of the string. *)
This gives the definition of this function and the description. There is an automatic documentation generator (ocamldoc) that reads .mli files and writes .html files as basic interface documentation. Normal comments start with (* but the documentation generator only writes comments that start with (** to the .html files. This document contains links to the documentation of the used modules. This documentation is really helpful to start programming ocaml. The .mli files are all included in the distribution, but the complete manual and a book can be downloaded from the Web site caml.inria.fr
The function is followed by its type. It wants 3 parameters and provides a string. Normally we need to write 'Basics.string_sub' to use this function. But after the 'open Basics' instruction just 'string_sub' is enough.
Now, back to the main program again. The first function is 'gettype'. It will try to return the type of a file. The file type is defined as the part of the filename following the last '.'. When there is no dot, the type is unknown and returned empty.
let gettype file = try let pos = String.rindex file '.' in String.sub file (pos+1) (String.length file-pos-1) with Not_found -> "" ;;
This function only uses standard functions. First, it catches the Not_found exception in the 'try' 'with Not_found -> ""' code. All other exceptions will be passed to the caller to be handled, and can possibly stop the main program. The local variable pos get is filled with the result of the function rindex. This function is also the reason to catch the exception; otherwise, the main program might stop on the first found file with no '.' in it. Local variables can be declared everywhere inside ocaml with 'let <variable> = <value> in <code>'. After the completion of the given code, the variable is out of scope and will be forgotten. The data will be passed to the garbage collector to be removed from memory. Function calls do normally use brackets. The function call to 'String.sub' gets 3 parameters the string 'file' the integer '(pos+1)' and the integer '(String.length file-pos-1)'. The last parameter calls the function 'String.length' with a single parameter 'file'. So, the functions are eager for their parameters; brackets are needed only when the parameters are filled with calculations.
Also '(+)' and '(-)' are functions of the pervasives module. It is very easy to define your own operators; just add brackets around their definition, and they are ready.
The next routine 'filesize' in the example code is far longer, but largely introduces sub-functions and 'if <bool-expr> then <expr> else <expr>' statements. This function creates a string from an int64 number for human readable file and directory sizes. The types of parameters are normally not given; they are determined by ocaml through their usage. When something is not clear, the compiler or interpreter will complain about it before executing the code.
let filesize s = let tostr f = if f>9.9 then string_of_int (int_of_float (f +. 0.5)) else let res = string_of_float (floor (f *. 10.0 +. 0.5) /. 10.0) in if String.length res=2 then res ^ "0" else res in let bytes = Int64.to_float s in if bytes > 512.0 then let kb = bytes /. 1024.0 in if kb > 512.0 then let mb = kb /. 1024.0 in if mb > 512.0 then let gb = mb /. 1024.0 in tostr gb ^ " Gb" else tostr mb ^ " Mb" else tostr kb ^ " kb" else Int64.to_string s ;;
The ocaml standard library has a set of conversion functions. These functions normally follow the form of 'int_of_float' and 'string_of_float'. Specific types like 'Int64' use shorthand notations like 'Int64.to_float'. String concatenations are done with the operation '(^)'. Normally, functions are defined for only one specific type, so there are new sets of arithmetic functions for floats like '(+.)', '(*.)' and '(/.)'. The 'tostr' sub-function has some extra calculation to change something like '5. Gb' into the nicer form of '5.0 Gb'.
The next function, 'converttime', converts a string into a float. OCaml uses floats for date for 2 reasons. The first is to prevent possible Year 2k problems, and can also be used for less than one-second time measurements. The function accepts English acronyms for month names. So let's introduce the list and the pair to create a translation of acronyms into numbers.
let month = [("jan", 0); ("feb", 1); ("mar", 2); ("apr", 3); ("may", 4); ("jun", 5); ("jul", 6); ("aug", 7); ("sep", 8); ("oct", 9); ("nov", 10); ("dec", 11)] ;;
This list is totally static, and can be used easily by the standard function List.assoc to convert a string into the corresponding number.
let converttime str = try begin match if str>"a" && str<"z" then ( int_of_string (string_sub str (String.rindex str ' '+1) 99), List.assoc (string_sub str 0 3) month, 1 ) else ( int_of_string (string_sub str 0 ( try String.index str '-' with Not_found -> 99 )), ( try let pos=String.index str '-'+1 in int_of_string (string_sub str pos ( try String.index_from str pos '-'-pos with err -> 99 ))-1 with err -> 0 ), ( try let pos=String.index str '-'+1 in int_of_string (string_sub str (String.index_from str pos '-'+1) 99) with err -> 1 ) ) with (yr,mn,md) -> (* print_string ("Last access before: "^ string_of_int (if yr<50 then yr+2000 else if yr<100 then yr+1900 else yr)^"-"^ string_of_int (mn+1)^"-"^ string_of_int md^"\n"); *) fst (mktime { tm_sec = 0; tm_min = 0; tm_hour = 0; tm_mday = md; tm_mon = mn; tm_year = if yr<50 then yr+100 else if yr<100 then yr else yr-1900; tm_wday = 0; tm_yday = 0; tm_isdst = false }) end with err -> print_string ("Cannot decipher this date string '" ^ str ^ "'\n"); max_float ;;
The new operation in this function is the 'match <expr> with <template> -> expr'. This is one of the most versatile instructions of ocaml. It can be used to examine the contents of variables and get the needed information out of it. This function creates the triplet (year, month, day-of-month) out of 2 different date notations. To debug this function the 'print_string' instruction is included but commented out to prevent clutter in the output of the program. Normally there is some logging mechanism to make the extra messages optional for the user. The 'print_string' shows the ISO notation of the given date; it creates a 4-digits year and gives a month number with January=1 instead of the internal Unix use of January=0.
This function also shows the use of 'try <expr> with err -> <expr>' that caches every possible exception and fills the variable 'err' with the details of the exception. This function can raise quite a lot of different exceptions, and frankly I am not very interested in the details. The routine just complains to the user about the given date string and gets over it. It returns the maximal possible float to include every filename.
The main standard function is the 'Unix.mktime' function. It wants to get a record filled with numbers about the current time. This function returns a pair with the needed float and a normalized record. With the pervasives function fst returns just the first parameter of the pair.
The ';' before the 'max_float' indicates that the expression results in a float, but the instructions before the ';' are calculated first. This is the first non-functional instruction inside the example code. OCaml is not strictly functional, but has the full power of other functional languages.
Now is the time for a real data structure that is dynamically build and can be used in a lot of different ways.
type entrytype = | Dir of entry list (* directory with a list of files *) | File of string (* a file inside a directory *) and entry = { mutable e_name: string; (* name of a file or directory *) e_type: entrytype; (* what type is this together with type related information *) e_atime: float; (* last access time *) e_size: int64; (* size of the file or size of all the matching files in the directory *) }
The 'and' statement is used to glue the two definitions together. They are created at the same time so that 'entrytype' can include 'entry' and vice-versa. 'entrytype' can consist one of 2 things: a directory with a list of entries or a file with its type. The directory entry has a mutable name. This is can be used later on to change a filename info the full path to that file.
As with ANSI C, the operators for Boolean algebra are '(&&)' and '(||)'.
let rec dirwrite el depth sortfn = List.iter ( fun e -> match e.e_type with | Dir lst -> if e.e_size <> Int64.of_int 0 then begin print_string ((String.make (depth*2) ' ') ^ "Directory " ^ e.e_name ^ " = (" ^ filesize e.e_size ^ ")\n"); dirwrite lst (depth+1) sortfn end | File string -> print_string ((String.make (depth*2) ' ') ^ e.e_name ^ " (" ^ filesize e.e_size ^ ")\n") ) (List.sort sortfn el) ;;
Here is the recursive ('rec') function 'dirwrite' that traverses a given tree 'el' and writes the result to the standard output. The parameter 'depth' indicates the amount of spaces to write a tree like structure of filenames. The function sorts all the lists with the given function 'sortfn'. The new language structure here is 'fun <parm-1> ... <parm-n> -> <expr>'. This construction creates a function without a name. The parameters of this function like construction can be used like a template to match pairs.
This function suppresses directories that are 0 bytes in size to reduce clutter.
(* List of global variables *) let min_size = ref (Int64.of_int 0) and (* minimum size of a file in bytes *) last_access = ref max_float and (* last access time in seconds since 1970 *) has_type = ref "" and (* type of file to show or empty to show all *) name_match = ref "" and (* regular expression to match the filename with; empty is show all *) name_regexp = ref (Str.regexp "") and (* pre-calculated regular expression *) no_symlinks = ref false (* don't follow symbolic links to directories *) ;;
This is a list of variables that can be changed due to the 'ref <expr>' construction. Normally definitions are just a label to their contents. These definitions are pointers to the memory and can be read by '!<variable>' and written by '<variable> := <expr>'. The parameters given to the program can make changes to the way the files are read.
let rec dirread path = let list = ref [] and size = ref (Int64.of_int 0) in try let dh = opendir path in while true do let file = readdir dh in if file<>".." && file<>"." && file<>"CVS" && String.sub file 0 1 <> "." then let s=stat (path^"/"^file) in if s.st_kind = S_DIR && (not !no_symlinks || (lstat (path^"/"^file)).st_kind <> S_LNK) then let dir = dirread (path^"/"^file) in list := { e_name = file; e_type = Dir (fst dir); e_atime = s.st_atime; e_size = snd dir } :: !list; size := Int64.add !size (snd dir) else if (!has_type = "" || gettype file = !has_type) && s.st_size > !min_size && s.st_atime < !last_access && (!name_match = "" || Str.string_match !name_regexp file 0) then begin list := { e_name = file; e_type = File (gettype file); e_atime = s.st_atime; e_size = s.st_size; } :: !list; size := Int64.add !size s.st_size end done; (!list, !size) with | End_of_file -> (!list, !size) | Unix_error (EACCES, err, parm) -> (!list, !size) ;;
The following functions are introduced in the function 'dirread':
let rec flat el path = List.fold_right ( fun e ls -> match e.e_type with | Dir lst -> flat lst (path ^ "/" ^ e.e_name) @ ls | File string -> e.e_name <- (path ^ "/" ^ e.e_name); e :: ls ) el [] ;;
This neat little routine 'flat' hits the tree 'el' flat on the ground. It takes every file from every branch and creates a single list of all the encountered files. This is done with one of the most versatile standard routines inside ocaml: the 'List.fold_right' routine. This routine introduces a state machine (scarab) that crawls over a list and operates on every encountered element. It produces a new structure (droppings) as a result -- in this case, a flattened list.
The construction '<record-field> <- <expr>' changes the contents of a mutable record field. Without mutable fields, you can mutate records only by creating a new one with lots of fields inherited from the old one. This is a shortcut for that.
let name_order a b = compare a.e_name b.e_name ;; let type_order a b = let typea = match a.e_type with Dir ls -> "dir" | File tp -> tp and typeb = match b.e_type with Dir ls -> "dir" | File tp -> tp in if compare typea typeb = 0 then compare a.e_name b.e_name else compare typea typeb ;; let atime_order a b = compare a.e_atime b.e_atime ;;
A set of sorting functions to use inside 'dirwrite'. The function 'compare' results in the widely used values of -1 for lower than, 0 for equal and +1 for higher than.
let dir = ref "." and sort = ref name_order and show = ref 0 in Arg.parse [ ("-t",Arg.Unit (fun () -> sort := type_order), "Sort by type and filename"); ("-l",Arg.Unit (fun () -> sort := atime_order), "Sort by last access time"); ("-n",Arg.Unit (fun () -> show := 1), "List filenames"); ("-b",Arg.Unit (fun () -> show := 2), "List both filenames and sizes"); ("-s",Arg.Unit (fun () -> no_symlinks := true), "Don't follow symbolic links"); ("--before",Arg.String (fun s -> last_access := converttime s), "Last access older than give date (format 'yyyy-mm-dd' or 'mmm yyyy')"); ("--size",Arg.Int (fun i -> min_size := Int64.mul (Int64.of_int i) (Int64.of_int (1024*1024)) ), "File size bigger than size in Mbytes"); ("--type",Arg.String (fun s -> has_type := s), "File is specific type"); ("--name",Arg.String (fun s -> name_match := s; name_regexp := Str.regexp (s ^ "$") ), "Filename matches regular expression") ] (fun d -> dir := d) "show [DIR]"; let res = dirread !dir in if !show=0 then begin dirwrite (fst res) 0 !sort; print_string ("Total size " ^ filesize (snd res) ^ "\n") end else List.iter (fun e -> print_endline (e.e_name ^ if !show=2 then " ("^filesize e.e_size^")" else "") ) (List.sort !sort (flat (fst res) !dir)) ;;
And here is the main routine. It calls the Arg.parse routine to parse the parameters given to the program. But this is too much un-GNU for me. I wrote my own version of it that follows the GNU coding standards a bit more than the default one (Gnuarg). The other version is a bit more complicated so I will include only the sources that use it.
The code can be obtained from here. Just unpack it somewhere with 'tar -xzf show.tar.gz' and move into the source directory with 'cd show/src'. There is also a Makefile that compiles to machine code and installs everything. But Makefiles are too rough for sour eyes to show in this article. The nitty-gritty details are there in the source. The general compile form is.
ocamlopt -o show unix.cmxa str.cmxa basics.cmx show.ml
The only non-standard libraries in use here are unix.cmxa and str.cmxa.
make su make install exit show --help show -s ~ --size 3 --before "apr 2003"
That concludes this example program.
Developer at a small technology firm in the Netherlands called V&S bv.
(www.v-s.nl)
We sell firewall, anti-virus and spam boxes based on the Linux OS.
Using more and more the OCaml language to write my applications.
Busy writing a lightweight http server with an internal scripting language
(camlserv.sourceforge.net,
source code here)
Interested in writing AI based computer games. Always trying writing
one, nothing ready yet.