Deividson on Databases: Stored Procedures
Stored Procedures
Stored Procedures are subroutines that are stored inside the database. They allow you to select and manipulate data, and, with the use of control structures and loops, perform complex computations and return the calculated result to the client. This saves considerable amounts of client/server communication.
PostgreSQL allows Stored Procedures to be written in several different Procedural Languages, including Perl, Python, TCL, and pgSQL - the PostgreSQL internal procedure language. User-defined Procedural Languages can also be used, and several of these languages are easily downloadable, e.g. PL/Java.
In this article, we will be using PL/pgSQL. PL/pgSQL is very similar to normal SQL, but adds many more features to it, like control structures and user-defined data types and functions.
Example 1: The Basic Stored Procedure
Let's get started with a very basic stored procedure that returns "Hello World!" - not very useful, I know, but it will get us started with the basic syntax of PL/pgSQL. Here's the code:
create or replace function hello() RETURNS text AS $$ DECLARE hello text; begin hello := 'Hello World!'; return hello; END; $$ LANGUAGE plpgsql;
Here's what it does:
create or replace function hello() RETURNS text AS $$
Creates the function called hello
which receives no parameters and
returns text. You must always define what the function returns; use
VOID if you don't need to return anything.
DECLARE
Opens the variable declarations block.
hello text;
Declares a variable called "hello" of type "text". To define multiple variables, use ";" as the separator. You can use any of standard types used in tables, like integer and float, and even user-defined types or domains.
BEGIN
Starts the actual function code.
hello := 'Hello World!';
Pupulates the variable "hello" with 'Hello World!'. Note that you have to use single quotes for string/text values.
return hello;
Returns our value.
END;
Ends the function.
$$ LANGUAGE plpgsql;
Defines what language we used - 'plpgsql' in this case. To call that function, you use the following SQL code.
select * from hello();
The output will be a text field called "hello", with the value of "Hello World!".
Example 2: Populating a Table with Test Data
This is another use of a Stored Procedure ('SP' from now on) - generating test data for your tables. Let's use last month's article as an example - we used a SP to generate 500K rows of data for one of our tables. Here's the code:
create or replace function test_data_computer() RETURNS integer AS $$ DECLARE count integer; sql text; begin count = 1; LOOP sql = 'insert into computer(computer_id, computer_ram, cpu_id, video_id) values'; sql = sql || '('|| count ||', ' || random()*1024 || ', ' || (random()*49999)+1 || ', ' || (random()*49999)+1 || ')'; EXECUTE sql; count = count + 1; EXIT WHEN count > 500000; END LOOP; return count; END; $$ LANGUAGE plpgsql;
It starts much like our previous example, but this time we declare 2 variables instead of one. Things become different at line 8, where we introduce the LOOP statement. The loop is a basic repeating structure: it repeats the code inside indefinitely, until it finds a EXIT or EXIT WHEN clause.
Lines 9 and 10 are used to generate the SQL code to include a simple record in our tables. The double pipes ("||") is the concatenation operator. Random() generates a random float number between 0 and 1 (so "random()*49999)+1" will generate a random number between 1 and 50000).
Line 11 executes the SQL code stored inside the sql variable, adding the registry to the table.
Lines 12 and 13 are used to control the flow of the LOOP, and if omitted will make the loop an infinite one. "EXIT WHEN count > 500000;" makes the loop stop when the condition is met (when "count" goes over 500000 in this case.)
Line 14 closes the LOOP block, making the function go back to line 8, executing everything that is inside the loop again (and again, and again).
Line 15 returns the number of added registries (plus one in this case).
Example 3: Calculations and Date/Time Handling
Let's make up a scenario for this one. Imagine that you are building a system for a doctor, and one of the bits of data he wants is exactly how much time he spends with his patients (NOT just idling in the office.) Even more, he wants to be able to select the data for a given date or date interval, and he wants the option of selecting the records of either a single patient or all of them. Complex scenario, right? Well, we can solve it all with a single SP. These are the tables our database will have:
create table patient ( patient_id serial primary key, patient_name text ); create table visits ( v_id serial primary key, patient_id integer references patient, v_date date, v_time_start time, v_time_end time );
One for the patients, another one to store the visits, with the date, start, and end time. Let's now populate the tables with some data:
insert into patient (patient_name) values ('Deividson'); insert into patient (patient_name) values ('John'); insert into patient (patient_name) values ('Benjamin'); insert into visits (patient_id, v_date, v_time_start, v_time_end) values (1, '10/04/2008', '08:00', '09:00'); insert into visits (patient_id, v_date, v_time_start, v_time_end) values (1, '14/04/2008', '13:00', '13:45'); insert into visits (patient_id, v_date, v_time_start, v_time_end) values (1, '18/04/2008', '10:00', '10:15'); insert into visits (patient_id, v_date, v_time_start, v_time_end) values (2, '11/04/2008', '14:00', '15:00'); insert into visits (patient_id, v_date, v_time_start, v_time_end) values (2, '12/04/2008', '14:00', '15:45'); insert into visits (patient_id, v_date, v_time_start, v_time_end) values (2, '17/04/2008', '14:00', '15:15'); insert into visits (patient_id, v_date, v_time_start, v_time_end) values (3, '15/04/2008', '08:00', '12:00');
Three patients, seven records - enough to test our SP. Here is the code:
CREATE OR REPLACE FUNCTION total( date1 date, date2 date, patient integer ) RETURNS interval AS $$ DECLARE total interval; rec record; sql text; BEGIN total = '00:00:00'::time; sql = 'select * from visits'; if date1 is not null OR patient is not null then sql = sql || ' where '; end if; if patient is not null then sql = sql || '(patient_id = ' || patient || ')'; end if; if date2 is not null AND date1 is not null then if patient is not null then sql = sql || ' AND '; end if; sql = sql || '(v_date between ''' || date1 || ''' and ''' || date2 || ''')'; else if date1 is not null then if patient is not null then sql = sql || ' AND '; end if; sql = sql || '(v_date = ''' || date1 || ''')'; end if; end if; for rec in EXECUTE sql loop total = total + (rec.v_time_end - rec.v_time_start); end loop; return total; END; $$ LANGUAGE plpgsql;
Wow! Big one this time, eh? Let's take a look at it. The start of the code is pretty similar to the other examples, but we have 3 variables this time. 'total' will store the total time to return to the client, and 'rec' (of type record) is a variable that will hold the result of the query we will run.
On line 8, we start the variable total with the value of 00:00:00 - the "::" is a typecast - ":: time" means the string we're passing ("00:00:00") needs to be turned into a time.
From line 9 all the way down to line 31, all we are doing is creating the SQL statement that will select the data we want. Here, we use another type of structure - the IF. IFs are basic flow-control structures, and its syntax is (as in most programming languages):
IF (condition) THEN (commands) [ELSE (commands)] END IF;
The condition can be any logical comparison ( <, > , =, IS NULL, or IS NOT NULL), and you can combine multiple conditions using the logical operators (AND, OR, etc). If the condition is true, then the execution will continue with the commands inside the THEN clause, or if it's false, execution will move to the commands in the ELSE (if it exists), or to after the END IF.
It's in those IFs that we create the conditions (single date, date interval, single patient, etc.)
On line 33, we execute the SQL code we generated, looping over each of the records of the result. We then add the duration of each visit in the 'total' variable, and return the result when there are no more records available.
We can call this SP in one of several different ways, each way selecting a different data set and giving us a different result:
# All the records, from all patients select * from total(NULL, NULL, NULL);
# All the records, from patient #3 only select * from total(NULL, NULL, 3);
# Records from '14/04/2008', all patients select * from total('14/04/2008', NULL, NULL);
# Records from '14/04/2008', patient #1 only select * from total('14/04/2008', NULL, 1);
# Records from '14/04/2008' through '17/04/2008', all patients select * from total('14/04/2008', '17/04/2008', NULL);
# Records from '14/04/2008' through '17/04/2008', patient #2 only. select * from total('14/04/2008', '17/04/2008', 2);
Conclusion
Stored Procedures are powerful and flexible, and can be a very good way to help you pre-select and pre-process data, as well as allowing you to manipulate data and run code directly on the server.
PostgreSQL offers a comprehensive manual on their site, including a chapter about PL/pgSQL. You can find it here: http://www.postgresql.org/docs/8.3/static/plpgsql.html
That's it for Stored Procedures - see you next month, when we'll discuss Triggers!
Talkback: Discuss this article with The Answer Gang
Deividson was born in União da Vitória, PR, Brazil, on 14/04/1984. He became interested in computing when he was still a kid, and started to code when he was 12 years old. He is a graduate in Information Systems and is finishing his specialization in Networks and Web Development. He codes in several languages, including C/C++/C#, PHP, Visual Basic, Object Pascal and others.
Deividson works in Porto União's Town Hall as a Computer Technician, and specializes in Web and Desktop system development, and Database/Network Maintenance.