Posted on

Connect To Your Database With Emacs

The recent Emacs post made me think of a useful SQL trick I recently learned that I wanted to share. Emacs refers to different SQL engines as ‘products’, e.g. MySQL, SQLite, and so on. You can connect to an SQL database through Emacs but you have a call a different command based on the product.

An easier way to do this, as of version 24, is to use the keyboard shortcut C-u C-c TAB when viewing an SQL document. This will prompt you first for the product, and then for any connection information. So if you are working with an SQLite database you can press C-u C-c TAB, type sqlite, and then the filename of the database; that will open an interactive window connected to the database. Or if you are using PostgreSQL then Emacs will ask for a username, port, etc. The supported products are:

  • db2
  • informix
  • ingres
  • interbase
  • linter
  • ms
  • mysql
  • oracle
  • postgres
  • solid
  • sqlite
  • sybase

I don’t even know what some of those are…

Anyways, sometimes I find this faster than opening a terminal if I need to run some test queries or copy output into a document. (This is very useful with SQLite’s .mode html command, which will print the results of queries in <table> form.)


ejmr
南無妙法蓮華經

This article was originally posted on “The List” by EJMR.

Posted on

PHP + PostgreSQL App Performance Example

PHP and PostgreSQLOne of our projects this week was to improve the performance in a payment processing system for a client. The system performed well under limited data sets, but as the data set grew larger the performance response time increased exponentially. In reviewing the application we found several design flaws that were significantly impacting performance. This article outlines some of the issues we found.

Overview

The take-away from this article is that you should know the details of the implementation from ALL angles.  If you are a code junkie, then make sure you review and understand the database angle.  Same thing in reverse if you are a DB guru, take the time to understand the code.    Nobody can do an effective job of application design & implementation without understanding the big picture.

In this case there is way too much emphasis on putting all the business intelligence into the database.   While there is nothing wrong with that, and in fact that is often a preferred architecture, it was not well thought out and thus not well implemented in this case.   One of the bigger mistakes was putting the business intelligence into simple views versus using proper stored procedure designs.

Bottom line, sometimes the better solution is to put SOME intelligence in the data engine and move some of the extraction/looping and other processing logic on the code side.  ESPECIALLY in a case like this one where we know that the client will not, at least any time soon, be accessing the data from any applications outside the server-side PHP application that is implemented for them.  Thus we know we could put all the intelligence in the code, though that makes for a LOT more work if/when they decide to introduce things like mobile apps.

Lesson 1: Don’t Put Multiple Sub-Selects In A View

This is a simplified example from a simple view that was built in Postgres for the client.

SELECT a,b,
to_char(c_date, (select settings.value from settings where settings.id='date_format')) as c_date_str,
to_char(d_date, (select settings.value from settings where settings.id='date_format')) as d_date_str,
to_char(e_date, (select settings.value from settings where settings.id='date_format')) as e_date_str
...

This view is using a settings table which holds the date format.  The client can use the web interface to change the date format, which is stored in the settings table.   That is a good web app design.

Doing multiple queries to retrieve that date format in a single view is a bad design.   In the simple example we have above we end up hitting the database, and thus doing disk I/O, no less than FOUR TIMES for a single query.  That is going to be slow.

There are a myriad of better options here, here are the two options I would consider:

  • Move the sub-select into a stored procedure and turn it into a function.An intelligent design of that procedure will retain the initial data fetch in a global variable that is tested on each call, blocking future data I/O requests.   Data I/O is now 2calls v. 4+ for the view.
  • Return the raw data.Allow the code to format the strings.   The code can easily fetch the date format and apply the equivalent PHP formatting call ONCE and apply it to all raw data data.  This also cuts down the data I/O.Using raw data also increases the chances for the PostgreSQL engine to optimize the query via the internal heuristics engine.
In our application improvement the path taken was to avoid this view whenever possible.  As it turns out, this view is so complex and convoluted that there are often multiple shortcuts that get to just the few data elements that are needed.  Constructing new queries retrieved the necessary data without all the view complexities and data overload.In this case the view is so complex that is hampers performance throughout the application and has limited benefit.    The long term solution will be to break the view into a subset of stored procedures.  For the few cases where the larger complex view is actually viable we will see improved performance via an intelligent series of cascading stored procedures or code-side logic.

 Lesson 2: Use Parameter Binding

Any modern database and their related interfaces will support data binding.  If your database does not support this and you are building enterprise-class applications it is time to select a new data engine.  PostgreSQL has supported this for years.   Nearly all of the data interfaces, including PHP’s MDB2 interface have also supported parameter binding for years. With parameter binding you will get a significant performance boost when iterating over data, especially in a nested loop fashion.
In our example the code was doing something similar to this, simplified for instructional purposes:
$qry1 = 'SELECT v_id,a,b,c,d FROM my_first_view WHERE NOT paid';
$result = $database->db()->query($qry1);
$dataset1 = $result->fetchAll(MDB2_FETCHMODE_ASSOC); 
$datastack = $dataset1;
$qry2 = 'SELECT v_id,e,f,g FROM my_second_view WHERE NOT paid';
$result = $database->db()->query($qry2);
$dataset2 = $result->fetchAll(MDB2_FETCHMODE_ASSOC);
foreach ($dataset1 as $data1) {
    foreach ($dataset2 as $data2) {
        if ($data1['v_id'] == $data2['v_id']) { 
             $datastack['subdata'][] = $data2; 
        }
    }
}

There are several significant performance issues here.   To start with there is significant memory consumption as we need to collect ALL the data from the database into memory.  We then collate the data from two complete sets in memory to create a single data element.    There are much more efficient ways to do this without fetching all data in memory first.

The better option would be to fetch the data from dataset1 on a row-by-row basis and push the data onto the stack one record at a time.  The inner loop for dataset2 should then select a subset of data that is specifically for the matching v_id from the outer dataset1 loop.   This is where parameter binding comes in.

Here is the same loop in untested simplified code format, using parameter binding.  In our real-world example this one simple change increased performance more than 50x because the database can be much more intelligent about how it selects subsets of data from the database & the PHP overhead both in memory and stack management is significantly reduced:

// give us parameter binding in MDB2 please
$database->db()->loadModule('Extended'); 

// setup our queries
$qry1 = 'SELECT v_id,a,b,c,d FROM my_first_view WHERE NOT paid';
$qry2 = 'SELECT v_id,e,f,g FROM my_second_view WHERE v_id = ?';
// get the "outer" data
// since we know we will use all the "outer" data, just put it
// directly on the data stack, cutting this memory consumption in half
$result = $database->db()->query($qry1);
$datastack = $result->fetchAll(MDB2_FETCHMODE_ASSOC);

// still loop through outer data to drive the 2nd query
foreach ($datastack as $data1) {
    // Fetch the data2 subset of matching data as
    // a named array, only getting those records that
    // match on v_id... in essence an SQL JOIN done
    // via code
    $database->db()->extended()->getAll(
        $qry2,
        null,
        array($data1['v_id']),
        array('integer'),
        MDB2_FETCHMOD_ASSOC
        );
    // Now we attach each of the matching elements in
    // the "inner" data set to the data stack, attaching
    // it under the related v_id 
    //
    foreach ($dataset2 as $data2) {
             $datastack['v_id'][$data1['v_id']]['subdata'][] = $data2; 
    }
}

This can be further refined and improved per our discussion above, but you get the idea.  I’ll leave it to you to figure out how to further refine the process.

You may be asking “why didn’t you just do a simple JOIN in the database engine?”  Good question.  The real world example is much more complex than this and some of the data elements and views in play make that solution complex to maintain and causes the database engine to trip-up on the optimization and is actually SLOWER in our real world case.   Here we are simplifying to illustrate the general concept only.

 Summary

A couple of simple real-world examples of improving performance have been illustrated here.    When refactoring a real-world application there are often complex interactions that need to be researched & evaluated.  The solutions are rarely simple and often can be approached in several ways.   The options shown here are not necessarily the best solutions but are the solutions that were the quickest to implement while providing a significant performance increase to the client.

Finding the balance between results and costs is always a challenge from the business perspective.    From the technical perspective a similar balance is often necessary between database and code intelligence.  Knowing your environment as a whole will help guide you down the proper path toward improving application performance.

 

Posted on

PostgreSQL – Exclusive Locking On Triggers

We have been working on a client’s project and dealing with an especially complex view in a PostgreSQL database. The view has a few tables linked together as well as several sub queries, a case statement, and a number of elements that do aggregate functions to build a single record of the view.

After updating the admin panel to allow them to use the database & admin page for processing orders (versus an email + excel system) we ran into several issues. The first issue was a significant degradation in performance whenever the system was under moderate load. This is important to note as our baseline release testing did not uncover the problem. The logic in the app was syntactically and functionally correct. However the application misbehaved under load, causing a bottleneck and essentially shutting down the app.

The Table Trigger Loop

We implemented a series of update triggers and stored procedures to propagate certain data elements between tables. Part of the design allowed for 2 tables to update each other. The UI allowed for an admin user to update Table A, which called a trigger to push a status field update into Table B (the legacy data). In other places Table B *could* be updated first, so we had a trigger to update Table A. The problem is the triggers would loop. Table A triggers an update in Table B triggers an update in Table A etc.

The initial designed called for turning off the triggers in the other table within the stored procedure. This requires an ALTER TABLE command. ALTER TABLE forces an exclusive lock on the database. NOTHING else can happen during this command, it is one of the few (if not the only) truly exclusive locks on the locking hierarchy.

This really slows down bulk updates. “Lock entire table”… insert… “unlock entire table”.

Master/Slave Tables Is A Better Solution

A better solution was to take out the alter table commands completely and remove the Table B updating Table A trigger. We made an application logic decision to make Table A the master and Table B the slave and eliminate (and/or log) any where Table B gets updated first. That should NEVER happen in the new app but some legacy stuff may still do that.

This is much faster as it generates no exclusive DB locks.

Entire Application Assessment v. Technical Complexity

In this case thinking about the overall solution which includes the application design and implementation as well as the technical syntax on the application and DB layer all work together to improve performance. Knowing the application and coming up with an overall design solution is often as effective as fancy technological solutions. Sometimes less complexity or technical “mojo” is the right answer to a complex problem.

Posted on

SQL: Using IN / NOT IN Operators

Here is an explanation I recently shared with someone about how to use NOT IN to select a subset of data from a one-to-many relationship.   It is a simple example that covers the basic use of NOT IN (and IN) operators in SQL.  These operators can be fairly powerful for performing quick queries on complex data relationships where a programmatic solution may otherwise be required.   This is not always the best solution but it can be a useful shortcut when used appropriately.

Syntax in various SQL dialects can vary, so I am using a generic SQL dialect based on my PostgreSQL knowledge.

The Problem

You have 2 tables you are linking together.   I forget the names so I’ll call them Nurses and RolesOfNurses.    For the purpose of this explanation let’s assume you are linking them by an ID field for the Nurse.

Your Nurse data may look like this:

ID: 1   Name: Karen
ID: 2  Name: Sharon

And your RolesOfNurses data:

Nurse ID: 1 RoleID: 10
Nurse ID: 1 RoleID: 555 
NurseID: 2 RoleID: 10

Normally with a one-to-many (1:M) relationship this RolesOfNurses table is just a middle-man so there would be another table called “Roles” that may have the RoleID and an common name to make human readable output (like reports) possible. So role #10 may be “nurse” and the specialized role #555 might be “Can Give Meds”.

In your select statement you want to find “All nurses that do not have the Can Give Meds” role. In database-like talk you want “All nurses that have role #10 but not role #555”. So your instinct is to join the two tables and filter by where role == 10 and role != 555. But that is wrong. The reason is that joining tables creates one unique row for every combination of data in the joined tables.

Breaking It Down : Select Nurses and Roles

Take it from the beginning.  Let’s just say “I want a list of all the roles nurses have”.  That is a simple join:

select * from nurses where nurse.id = rolesofnurses.nurseid;

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 1, Name: Karen, NurseID: 1, RoleID: 555
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Next: Filter The Data On Role # 10 Only

This data set is the basis of your advanced query, but before we get to the “dual and” let’s start with a single where clause to filter our data.  I’ve added the parenthesis only to make the where clause groupings easier to read.  The first where clause links the two tables together to give you the above data set.  The 2nd where clause filters that data to just show nurses.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Then : Add The Second Role Filter

Now add your 2nd where clause so you can have “all nurses that don’t have the Can Give Meds role” filter.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (roleofnurses.roleid != 555);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Notice that this is NOT the desired result!

Why? Look at the data.  All of your roles match BOTH of those filters.   The roles are all 10 AND they are NOT 555.  The first instinct is wrong. Personally I blame the fact that SQL, as powerful as it is, does not process English-based query syntax the same way that our brains do. Many people will make the same mistake. Sadly, the solution is a bit more complex to explain, but I’ll try to make it simple.

Fixing The Query: Step 1 – Build A List Of IDs

I’m going to start with the 2nd part of your filter, all the IDs of nurses that have role #555.  Note I said “IDs of nurses” here, that is important.  I don’t want the full set of nurse data like we have been doing up until now.  I simply want the list of ID numbers of nurses that have role #555.

select nurseid from rolesofnurses where roledid = 555;
NurseID: 1

This is the ONLY NurseID in our data set that has role # 555.

The Correct Query : Using NOT IN

Now let’s go and get our data we want by telling SQL “give me all the nurses that have role # 10 but are NOT in the list of nurses with role # 555”:

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id NOT IN (select nurseid from rolesofnurses where roledid = 555) );

And we get the desired result:

ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Explaining NOT IN

The trick here is the <blah> NOT IN <SQL QUERY> syntax.   Most (not all) SQL dialects support this, and it is an important advanced SQL construct. It allows you to filter data based on the results of another select statement. The trick with the NOT IN operator is to make the <SQL QUERY> part return a list that consists of a single elements you want to filter on. In our case the <SQL QUERY> returns the entire list of nurses that have role #555. We use NOT IN to tell it to only accept the record if the Nurse ID is NOT IN the list of nurse IDs that have role #555 or in English “pick all nurses that do not have role #555”. In our example it was a single record, but in a larger data set it could return an entire list of records.

Notes About IN/NOT IN Usage

For a few hundred records this construct is OK.  At some point you need to start looking into temp tables and other methods as using the <blah> NOT IN <SQL QUERY> construct can consume a lot of memory on the server. You need to be intelligent about what kind of resources your server has and how large the potential data set is for the second part of the IN query because most DB engines will load the entire list of ID #s into RAM.

Another note, you can create complex constructs with IN the same way as NOT IN, for example “give me a list of all nurses that have role #10 AND role #555”.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id IN (select nurseid from rolesofnurses where roledid = 555) );

BTW, I’ve used uppercase for emphasis only.   SQL is not case sensitive.

Posted on

Restaurant Apps with The Table Tap

Cyber Sprocket Labs started a fun and exciting relationship that is the perfect cross section of two of our favorite things, programming and beer.    While we’ve worked on many exciting projects over the years, this is definitely in the top two.  None of our esteemed programmers ever thought they’d be working on an application that facilitates the delivery of beer directly to the consumer.  Yet, that is exactly what they are doing.

The Table Tap provides a unique and innovate new service to bars and restaurants anywhere in the world.   This service puts full service beer taps within the consumer’s reach, literally.    A computer controlled interface with up to 4 beer taps is installed directly in the table.    A quick swipe of an RFID card activates the taps and allows the customer to pour their own beer, as much or as little as they’d like.

Cyber Sprocket has been instrumental in the helping Jeff Libby bring his concept to the next level.  By providing technical support both during and after the installation he has been able to speed up his deployment cycle, increasing revenue.   We have also provided extensive programming services to update the business manager, hostess, and system administrator interfaces.    During our first few months working on the project we’ve also been tasked with several new additions to the software, the newest of which is enabling direct table-to-table chat using the system’s built in color LCD displays.

Like we said, a very fun and exciting project that has taken our technology services to places we never expected.   Thanks Jeff, we look forward to a long and prosperous relationship as your one-of-a-kind solution takes off!

Technical Overview

Services Provided

  • Web Application Programming
  • Database Design
  • Database Maintenance
  • Network Support
  • Device Interface Programming
  • System Configuration and Installation
  • Technical Support
  • Customer Support

Platform Details

Posted on

PostgreSQL Cheat Sheet

PostgreSQL is one of our favorite database engines for a variety of reasons.  Here is our cheat sheet to help you get online and get around Postgres with minimal effort.

Database Access Security

Database security is handled primarily in two place, from the system service level via a file called pg_hba.conf and within the database metadata files themselves.   The pg_hba.conf file controls what level of credentials are needed based on what IP address the requesting connection is coming from.   The metadata within the engine itself generally controls user level access once they are connected and approved at the system level.

Systemwide Configuration via pg_hba.conf

This file matches IP address with a set of rules to determine how much data you need to provide in the first place before getting access to the database engine.   It includes the IP address, the username trying to connect, and what type of validation is needed.

The data comes in a series of tab separated columns including:

  • Connection Type
    • local = from local tty connection
    • host = from an internet connection
  • Database = which database is the user trying to connect to?
  • User = which user they are connecting as.
  • IP Address = what address are they coming from?
  • Method = how shall we authenticate them?
    • md5 = let them in if the password matches
    • ident sameuser = let them in in the password matches and their login user matches the user they are trying to connect as
    • trust = let them in as long as the ip address matches, no password required

Finding pg_hba.conf

The pg_hba.conf file lives in various places depending on the flavor of Linux.

  • Red Hat, CentOS, Fedora = /var/lib/pgsql/data/
  • Ubuntu = /etc/postgresql/<version>/main/

Command Line Access

You can do a lot of maintenance operations or test queries using the command line interpreter.  The command line in PostgreSQL is accessed via the psql command.   The most often used parameters with psql are to connect as a user other than your login user, provide your password, and give it the name of the database on which to connect.

Example:

# psql -U other_name -W other_db

Command Line Shortcuts

From the command line there are a variety of shortcuts to help you navigate around the database engine or see what is going on.  Here are a few of the most useful:

  • List Databases: \l
  • List (display) Tables : \dt
  • List Columns in a Table: \d <tablename>

Creating A Database

Here is how you create a new database that is owned by  a specific user.  This assumes a “clean slate” install.   You will need to have the postgres user login credentials and/or root access.  You will be creating a PostgreSQL user and password and will change the system-level postgres daemon security settings to allow access with the password regardless of which user you login as.

# # login as postgres user or su postgres if you are root

# psql
psql> create user lance with password ‘cleveland’;
psql> create database myfirstdb with owner lance;
psql> \q
# vi /var/lib/pgsql/data/pg_hba.conf

While in pg_hba.conf change this line:

local   all         all                              ident

to this:

local   all         all                              md5

Backing Up / Dumping Your Data

Data dumps are a quick way to put the schema, data, or a combination of both out into a file that can be used to re-create the database on other systems or just back it up to a remote location.  The PostgreSQL command for this is pg_dump.  It takes the same parameters as thecommand line access.

Simple Data Dump

Example:

# pg_dump -U myname -W the_db_name > dump_thedb_2010_0704_001.sql

Clean Data Dump

This is the format to use if you want to ensure the entire database is dropped & re-created when loading on a new system.

Example:

# pg_dump --clean --create -U myname -W the_db_name > dump_thedb_2010_0704_001.sql

Reloading Dumped Data

To reload such a script into a (freshly created) database named the_db_name :

# psql -d the_db_name -f dump_thedb_2010_0704_001.sql

If the clean data dump method was used you will want to login as postgres and let the sql script create the database:

# su postgres

# psql -f dump_thedb_2010_0704_001.sql

Summary

There are plenty more tips, tricks, and shortcuts.  These will get you started.  We’ll add more as time allows.

Posted on

Analyzing Queries in Postgres

Yesterday I was curious if there was a good way to write my IDS query using a single ‘select’. All of us had the right idea of using a sub-query to limit the results from the join, although there were different ideas about how to use a sub-query. Richard’s solution was the most succinct.

However, there is an important follow-up lesson that I want to talk about. What I asked yesterday is representative of something we all seem to ask ourselves consistently: “What is the most clever way I can do this?” When it comes to database queries I believe we have a tendency to see how much we can accomplish in one swoop. At the heart of this tendency is the assumption that more queries equals less efficiency.

Please strive to constantly challenge this assertion. Our thoughts about software should be continuously reinforced by demonstrable evidence.

In the case of SQL, this means query analysis.

There are various tools out there for measuring the performance of your queries. In Postgres you can get a pretty good idea about database performance by using the query analyzer. The syntax is pretty simple:

explain analyze [query...];

An actual example:

explain analyze select * from users;
QUERY PLAN
------------------------------------------------------------
Seq Scan on users  (cost=0.00..18.30 rows=830 width=68)
(actual time=0.013..0.021 rows=3 loops=1)
Total runtime: 0.078 ms

What we are seeing here are the details of the query plan that Postgres uses for this query. Let’s break down the information we see here.

Seq Scan on users
This means Postgres is performing a “sequential scan” on the ‘users’ table. In other words, it’s reading row by row. This is the type of plan that Postgres is using. If it were reading an index, you would see that reflected here, for example.
cost=0.00..18.30
The first number is the predicted start-up time for this query, meaning the amount of time before we can get output. Since this is a sequential scan with no sorting or grouping or anything, Postgres can start returning data immediately.

The second number is the estimated total cost. That is, the amount of time it will take if all rows are returned. Take note that this really means all rows. If our query had a limit, it would not impact the total cost, and in reality our query would run faster than the estimated total cost.

The values of these two numbers are somewhat arbitrary. See section 18.6.2 of the manual for a description of the values used to compute these numbers. But keep in mind that in general ‘1.0’ equals one disk fetch. So for this query we could be looking at around eighteen disk fetches.

rows=830
This is the estimated number of rows output. Again, this number only reflects the estimate if the query plan is executed to completion. Things like a limit may cause it to stop sooner. It also does not necessarily mean the number of rows processed. Usually this number will always be lower than the number of rows actually processed. Note that in my case the value was wildly off. If you look at the actual results, you will see only three rows were returned. When you have very, very small data sets this value is typically irrelevant.
width=68
The estimated size of each row, in bytes.

For more complex queries you are going to see more information. Much to my own surprise, I do not have a complex enough database on my computer right now to generate anything but a sequential scan. And I can’t get online, and I’m about to go to bed, so…

If you analyze queries with an index then you will likely see that appear in the analyzer. But not always! Even if you have an index, there are situations where it is faster to avoid it. Sometimes you will also see what appear to be multiple plans. Sometimes Postgres will break your query down into a series of plans, and you will see the individual costs of each. For instance, it may use an index to determine the location of rows based on certain criteria, and then a sequential scan to grab them in mass.

Internally, Postgres represents these query plans as nodes on a tree, which is reflected in the output of ‘explain analyze’. Meaning that the plans and costs at the top include the costs of their children below.

You can run ‘explain analyze’ with inserts, updates, and deletes as well. But since those are destructive operations you should run them in a transaction. This is because ‘explain analyze’ will actually execute your query—it does not merely examine it and make a guess at its results; it measures real data. So if your query is a destructive one, you need to do this:

begin;
explain analyze insert ...;
rollback;

That will let you get real numbers without making permanent changes to your data.

Keep your eyes peeled for chances to use this. It usually does not take much effort to copy a query into Postgres and run the analyzer on it. Take note of the numbers you see and what types of queries produce what types of costs. Gradually this will help you gain a better understanding of how SQL queries translate to actual performance.

Posted on

Data Translation Service with IDS

Cyber Sprocket Labs first came across Inlet Data Systems (IDS) in January, 2005 via a simple Guru posting asking for assistance “converting data from one proprietary format to another”.   The objective was to created an automated process by which a client could upload a file which would be automatically processed by the system & put it into a customer inbox for later retrieval in their required format.

The System Today

Over the years this system has grown into a fully automated, fully customizable web portal for sharing and translating data between clients. The system now supports over two dozen input & output file formats from a variety of sources.   New file layouts can be created and managed via a 100% web driven interface.  The administrator can select from several base files types such as CSV, XML, printed report layouts, fixed length, and multi-record fixed length formats.   A simple data mapping application allows the administrator to select which fields are relevant and how the input data structure maps to an output file format.

Users of the system have a simple interface for logging into the secured system, uploading files, and monitoring progress of the translation.   The sender of the file simply uploads a file to the proper folder and the system does the rest.  The data recipient can review file that are in the queue to be processed, download translated files, or merge multiple translated files into a single data set.  Behind the scenes the administrator can monitor the entire process, manage users, manage files, and invoke manual processing as requested by the client.

Half A Decade of Service

5 years later Spahr Systems, LLC (parent of IDS) has renewed their commitment tousing Cyber Sprocket Labs as their sole source provider of software development services. We are proud to be working with Doug Spahr at Spahr Systems for the better part of the past decade & are looking forward to a long and mutually beneficial relationship moving into the decade to come.  Thanks Doug!

Technical Overview

Services Provided

  • Web Application Programming
  • Interface Design, functionality over flair
  • System Architecture
  • Linux system tuning and configuration

Platform Details

Posted on

PostgreSQL Stored Procedures Part 1

Stored procedures are a great way to enforce business and data access logic in a complex environment. Often with larger projects you will have multiple applications architected by a varied set of professionals all interacting with the same data sets.   Stored procedures are a tried and true way to regain some control over those data sets and ensure standards compliance.

The following paragraphs will show the basic syntax and some simple examples for using stored procedures in PostgreSQL.

Syntax

CREATE [ OR REPLACE ] FUNCTION
name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
[ RETURNS rettype ]
{ LANGUAGE langname
| IMMUTABLE | STABLE | VOLATILE
| CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
| [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER
| COST execution_cost
| ROWS result_rows
| SET configuration_parameter { TO value | = value | FROM CURRENT }
| AS 'definition'
| AS 'obj_file', 'link_symbol'
} ...
[ WITH ( attribute [, ...] ) ]

Examples (generic)

Here are some trivial examples to help you get started. For more information and examples, see Section 34.3.

CREATE FUNCTION add(integer, integer) RETURNS integer
    AS 'select $1 + $2;'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;

Increment an integer, making use of an argument name, in PL/pgSQL:

CREATE OR REPLACE FUNCTION increment(i integer) RETURNS integer AS $$
        BEGIN
                RETURN i + 1;
       END;
$$ LANGUAGE plpgsql;

Return a record containing multiple output parameters:

CREATE FUNCTION dup(in int, out f1 int, out f2 text)
    AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
    LANGUAGE SQL;

SELECT * FROM dup(42);

You can do the same thing more verbosely with an explicitly named composite type:

CREATE TYPE dup_result AS (f1 int, f2 text);

CREATE FUNCTION dup(int) RETURNS dup_result
    AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
    LANGUAGE SQL;

SELECT * FROM dup(42);

Resources

http://www.postgresql.org/docs/8.3/static/sql-createfunction.html

Posted on

PostgreSQL Introduction

Intro

Possibly the most robust freeware DB on earth. Some argue better than MySQL even though MySQL has a much larger following.

From the makers

PostgreSQL is a powerful, open source relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL92 and SQL99 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.

Quick Tricks

Dumping The DB To Disc

A quick backup method:

# pg_dumpall --username=postgres > postgres-backup-all

Removing Duplicate Records

This is a fast way to find duplicate records in a table and delete them.

Note: Replace the various table and field names with yours.

Example:

--
-- Create a function which will return the max OID from a table
-- based on the unique id. In this case, it is hash
--
create  function maxoidhash( int8)
        returns int as
        'select max(oid) from codes where hash = $1'
        language 'sql';
--
-- Now delete
--
delete from codes where oid < maxoidhash(hash) ; 

--
-- Drop the function
--
drop function maxoidhash;

Listing Databases

Enter l from the command line prompt.

root@host [/home/modelloc/public_html/cgi-install]# psql -U modellocate -d template1
Password:
Welcome to psql 7.4.17, the PostgreSQL interactive terminal.

Type:  copyright for distribution terms
       h for help with SQL commands
       ? for help on internal slash commands
       g or terminate with semicolon to execute query
       q to quit

template1=> l
              List of databases
        Name        |    Owner    | Encoding
--------------------+-------------+-----------
 alutask_webcollab  | alutask     | SQL_ASCII
 cyberspr_webcollab | cyberspr    | SQL_ASCII
 fvideo             | fvideo      | LATIN1
 fvideo_application | fvideo      | SQL_ASCII
 lance_emailendar   | lance       | SQL_ASCII
 phpbb              | modellocate | SQL_ASCII
 storeml            | modellocate | SQL_ASCII
 template0          | postgres    | SQL_ASCII
 template1          | postgres    | SQL_ASCII
(9 rows)

template1=>

Windows and PostgreSQL

  • What versions of Windows does PostgreSQL run on?

PostgreSQL is supported on Windows 2000, XP and 2003. At the time of writing, it has only been tested on 32 bit systems.

The most recent version of this document can be viewed at http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html.

The FAQ for compiling PostgreSQL on native Windows is at http://www.postgresql.org/files/documentation/faqs/FAQ_MINGW.html.

Notable Changes

A list of differences between versions that should merit special attention.

Since v7.3

  • Prior to PostgreSQL 7.3, serial implied UNIQUE. This is no longer automatic. If you wish a serial column to be in a unique constraint or a primary key, it must now be specified, same as with any other data type.[1]

Links

Posted on

Postgres::Handler

Our local updated perl docs. These enhanced notes will be published on our next push to CPAN.

Name

Postgres::Handler – Builds upon DBD::Pg for advanced CGI web apps

Description

Postgres::Handler builds upon the foundation set by DBI and DBD::Pg to create a superset of methods for tying together some of the basic interface concepts of DB management when used in a web server environment. Postgres::Handler is meant to build upon the strengths of DBD::Pg and DBI and add common usability features for a variety of Internet applications.

Postgres::Handler encapsulates error message handling, information message handling, simple caching of requests through a complete iteration of a server CGI request. You will also find some key elements that hook the CGI class to the DBI class to simplify data IO to & from web forms and dynamic pages.

Synopsis

 # Instantiate Object
 #
 use Postgres::Handler;
 my $DB = Postgres::Handler->new(dbname=>'products',dbuser=>'postgres',dbpass=>'pgpassword');

 # Retrieve Data & List Records
 #
 $DB->PrepLEX('SELECT * FROM products');
 while ($item=$DB->GetRecord()) {
     print "$item->{PROD_ID}t$item->{PROD_TITLE}t$item->{PROD_QTY}n";
 }

 # Add / Update Record based on CGI Form
 # assuming objCGI is an instatiated CGI object
 # if the CGI param 'prod_id' is set we update
 # if it is not set we add
 #
 my %cgimap;
 foreach ('prod_id','prod_title','prod_qty') { $cgimap{$_} = $_; }
 $DB->AddUpdate( CGI=>$objCGI     , CGIKEY=>'prod_id',
                 TABLE=>'products', DBKEY=>'prod_id',
                 hrCGIMAP=>%cgimap
                );

Requires

CGI::Carp

CGI::Util

Class::Struct

DBD::Pg 1.43 or greater (fixes a bug when fetching Postgres varchar[] array data)

DBI

Data Access Methods

new()

Create a new Postgres::Handler object.

Parameters
 dbname => name of the database to connect to
 dbuser => postgres user
 dbpass => password for that user
Notes
Set the errortype’ data element to ‘simple’ for short error messages.

 $self->data('errortype') = 'simple';

data()

Get/set the data hash – this is where data fields are stored for the active record.

dbh()

Returns the database handle for the DB connection.

dbpass()

Get/set postgres user’s password.

dbname()

Get/set database name. Simple string name of the database.

dbuser()

Get/set postgres username.

sth()

Returns the statement handle for the active record selection.

Public Methods

AddUpdate()

Adds a new record or updates an existing record in the database depending on whether or not a specific CGI parameter has been set.

Useful for processing a posted form that contains form fields that match data fields. Pre-populate the form field that contains the database key field and an update occurs. Set it to blank and a new record is added.

Your database key field should have a default value that is unique and should be set as type ‘PRIMARY KEY’. We always use serial primary key to auto-increment our keys when adding new records.

Notes
If a key is provided but is doesn’t match anything in the existing data then the update fails, UNLESS… CHECKKEY=> 1 in which case it will attempt to add the record.

Your CGI->DB key hash reference should look something like this: %mymap = ( tablefld_name => ‘form_name’, tablefld_ssn => ‘htmlform_ssn’ ); And is passed with a simple %mymap as the hrCGIMAP parameter to this function.

Even better, name your CGI form fields the same thing as your Postgres DB field names. Then you can skip the map altogether and just provide the CGISTART variable. All fields that start with the the CGISTART string will be mapped. Want to map every field? Set CGISTART = ‘.’.

Parameters (Required)
 CGI       => a CGI object from the CGI:: module

 DBKEY     => the name of the key field within the table
              defaults to Postgres::Handler Object Property <table>!PGHkeyfld
              must be provided
                                  - or -
                             the <table>!PGHkeyfld option must have
              been setup when creating a new Postgres::Handler object

 TABLE     => the name of the table to play with

 CGISTART or hrCGIMAP must be set (see below)
Parameters (Optional)
 CGISTART  => map all CGI object fields starting with this string
              into equivalently named database fields
                                  only used when hrCGIMAP is not set

 CGIKEY    => the CGI parameter name that stores the data key
              defaults to DBKEY

 CHECKKEY  => set to 1 to perform ADD if the DBKEY is not found in the
              database.

 DBSTAMP   => the name of the timestamp field within the table
              defaults to Postgres::Handler Object Property <table>!PGHtimestamp

 DONTSTAMP => set to 1 to stop timestamping
              timestamp field must be set

 hrCGIMAP  => a reference to a hash that contains CGI params as keys and
              DB field names as values

 MD5      => the name of the md5 encrypted field within the table
              defaults to Postgres::Handler Object Property <table>!PGHmd5

 REQUIRED  => array reference pointing to array that holds list of CGI
              params that must contain data

 VERBOSE   => set to 1 to set lastinfo() = full command string
              otherwise returns 'INSERT' or 'UPDATE' on succesful execution

 BOOLEANS  => array reference pointing to the array that holds the list
              of database field booleans that we want to force to false
                                  if not set by the equivalently named CGI field

 RTNSEQ    => set to a sequence name and AddUpdate will return the value of this
              sequence for the newly added record.  Useful for getting keys back
                                  from new records.
Action
 Either adds or updates a record in the specified table.

 Record is added if CGI data key [1] is blank or if CHECKKEY is set
 and the value of the key is not already in the database.

 Record is updated if CGI data key [2] contains a value.
Returns
 1 for success, get message with lastinfo()
 0 for failure, get message with lasterror()

DoLE()

Do DBH Command and log any errors to the log file.

Parameters (positional only)
 [0] = SQL command
 [1] = Die on error
 [2] = return error on 0 records affected
 [3] = quiet mode (don't log via carp)
Returns
 1 for success
 0 for failure, get message with lasterror

Field()

Retreive a field from the specified table.

Parameters (required)
 DATA     => Which data item to return, must be of form "table!field"

 KEY      => The table key to lookup in the database
               Used to determine if our current record is still valid.
               Also used as default for WHERE, key value is searched for
               in the PGHkeyfld that has been set for the Postgres::Handler object.
Parameters (optional)
 WHERE   => Use this where clause to select the record instead of the key

 FORCE   => Force Reload

 HTML    => Return HTML Quoted Strings
Returns
 The value of the field.

 Returns 0 and lasterror() is set to a value if an error occurs
               lasterror() is blank if there was no error
Example
 my $objPGDATA = new Postgres::Handler::HTML ('mytable!PGHkeyfld' => 'id');
 my $lookupID = '01123';
 my $data = $objPGDATA->Field(DATA=>'mytable!prod_title', KEY=>$lookupID);

 my $lookupSKU = 'SKU-MYITEM-LG';
 my $data = $objPGDATA->Field(DATA=>'mytable!prod_title', WHERE=>"sku=$lookupSKU");

GetRecord()

Retrieves the record in a hash reference with uppercase field names.

Parameters (positional or named)
 [0] or -name     select from the named statement handle,
                  if not set defaults to the last active statement handle

 [1] or -rtype    type of structure to return data in
        'HASHREF' (default) - Returns a hashref via fetchrow_hashref('NAME_uc')
        'ARRAY' - Returns an array via fetchrow_array()
        'ITEM' - Returns a scalar via fetchrow()

 [2] or -finish   set to '1' to close the named statement handle after returning the data
Returns
The hashref or array or scaler on success. 0 for failure, get message with lasterror.

lasterror()

Retrieve the latest error produced by a Postgres::Handler object.

Returns
The error message.

lastinfo()

Retrieve the latest info message produced by a Postgres::Handler object.

Returns
The info message.

nsth()

Retrieve a named statement handle.

Returns
The handle, as requested.

PrepLE()

Prepare an SQL statement and returns the statement handle, log errors if any.

Parameters (positional or named)
        [0] or -cmd     - required -statement
        [1] or -exec    - execute flag (PREPLE) or die flag (PREPLEX)
        [2] or -die             - die flag     (PREPLE) or null     (PREPLEX)
        [3] or -param   - single parameter passed to execute
        [4] or -name    - store the statement handle under this name
        [5] or -aparam   - An array reference of multiple values to bind to the prepared statement
Returns
1 for success

PrepLEX()

Same as PrepLE but also executes the SQL statement

Parameters (positional or named)
        [0] or -cmd     - required -statement
        [1] or -die             - die flag     (PREPLE) or null     (PREPLEX)
        [2] or -param   - single parameter passed to execute
        [3] or -name    - store the statement handle under this name
Returns
1 for success

Quote()

Quote a parameter for SQL processing via the DBI::quote() function. Sets the data handle if necessary.

Semi-Public Methods

Using these methods without understanding the implications of playing with their values can wreak havoc on the code. Use with caution…

SetDH()

 Internal function to set data handles
 Returns Data Handle

 If you don't want the postgres username and password
 littering your perl code, create a subclass that
 overrides SetDH with DB specific connection info.

SetMethodParms()

 Allows for either ordered or positional parameters in
 a method call AND allows the method to be called as EITHER
 an instantiated object OR as an direct class call.
Parameters
 [0] - self, the instantiated object
 [1] - the class we are looking to instantiate if necessary
 [2] - reference to hash that will get our named parameters
 [3] - an array of the names of named parameters
       IN THE ORDER that the positional parameters are expected to appear
 [4] - extra parameters, positional or otherwise
Action
 Populates the hash refered to in the first param with keys & values
Returns
 An object of type class, newly instantiated if necessary.
Example
 sub MyMethod() {
        my $self = shift;
        my %options;
                $self = SetMethodParms($self,'MYCLASS::SUBCLASS', %options, [PARM1,PARM2,PARM3], @_ );
        print $options{PARM1} if ($options{PARM2} ne '');
        print $options{PARM3};
 }

CGIMap()

 Prepare a hash reference for mapping CGI parms to DB fields
 typically used with AddUpdate() from Postgres::Handler.
Parameters
 hrCGIMAP       - reference to hash that contains the map
 CGI                    - the CGI object
 CGISTART       - map all fields starting with this text
 CGIKEY                 - the cgi key field
 BOOLEANS       - address to list of boolean fields
Example
 @boolist = qw(form_field1 form_field2);
 $item->CGIMap(CGI => $objCGI, hrCGIMAP=>%cgimap, CGISTART=>'cont_', CGIKEY=>'cont_id', BOOLEANS=>@boolist);

Notes

Parameters (Named v. Positional)

Some methods allow for parameters to be passed in via both positional and named formats. If you decide to use named parameters with these “bi-modal” methods you must prefix the parameter with a hyphen.

Positional Example
 use Postgres::Handler;
 my $DB = Postgres::Handler->new(dbname=>'products',dbuser=>'postgres',dbpass=>'pgpassword');
 $DB->PrepLEX('SELECT * FROM products');
Named Example
 use Postgres::Handler;
 my $DB = Postgres::Handler->new(dbname=>'products',dbuser=>'postgres',dbpass=>'pgpassword');
 $DB->PrepLEX(  -cmd    =>      'SELECT * FROM products'        );

Examples

Short Program
 # Instantiate Object
 #
 use Postgres::Handler;
 my $DB = Postgres::Handler->new(dbname=>'products',dbuser=>'postgres',dbpass=>'pgpassword');

 # Retrieve Data & List Records
 #
 $DB->PrepLEX('SELECT * FROM products');
 while ($item=$DB->GetRecord()) {
        print $item->{PROD_ID}t$item->{PROD_TITLE}t$item->{PROD_QTY}n";
 }

 # Add / Update Record based on CGI Form
 # assuming objCGI is an instatiated CGI object
 # if the CGI param 'prod_id' is set we update
 # if it is not set we add
 #
 my %cgimap;
 foreach ('prod_id','prod_title','prod_qty') { $cgimap{$_} = $_; }
 $DB->AddUpdate( CGI=>$objCGI     , CGIKEY=>'prod_id',
                 TABLE=>'products', DBKEY=>'prod_id',
                 hrCGIMAP=>%cgimap
               );
AddUpdate Example
 # <form method="post" action="/thisapp.pl">
 # <input type="submit" name="submit" value="submit">
 # <input type="hidden" name="chat_id" value="">
 # <input type="text" name="chat_text" value="">
 # </form>

 use Postgres::Handler;
 use CGI;
 my $DB = Postgres::Handler->new(dbname=>'products',dbuser=>'postgres',dbpass=>'pgpassword');
 my $CGI = new CGI;
 my $AOK = $DB->AddUpdate(
           CGI     => $CGI,
           DBKEY   => 'chat_id',
           TABLE   => 'chatter',
           CGISTART=> 'chat_'
           );
 print ($AOK?'Awesome!':'Fail!');

Author

Cyber Sprocket Labs (CSL) is and advanced internet technology consulting firm based in Charleston South Carolina. We provide custom software, database, and consulting services for small to mid-sized businesses.

For more information visit our website at www.cybersprocket.com

Copyright

(c) 2008, Cyber Sprocket Labs

This script is covered by the GNU GENERAL PUBLIC LICENSE.

/wiki/index.php/GNU_General_Public_License

Revision History

 v2.3 - May 2008
      Documentation cleanup.

 v2.2 - Apr 2006
      Fixed problem with SetDH database handle management

 v2.1 - Mar 2006
      Added RTNSEQ feature to AddUpdate so we can get back the key of a newly added record

 v2.0 - Feb 2006
      Moved CGI::Carp outside of the package to prevent perl -w warnings

 v1.9 - Feb 2006
      Update Field() to prevent SIGV error when WHERE clause causes error on statement
                Field() now returns 0 + lasterror() set to value if failed execute
                            returns fldval + lasterror() is blank if execution OK

 v1.8 - Jan 2006
      Bug fix on PrepLE and PrepLEX for perl -w compatability
                Added DoLE param to return error status (0) if the command affects 0 records '0E0'
                Added DoLE param to keep quiet on errors (do not log to syslog via carp)
                Documentation updates

 v1.5 - Nov 2005
                Fixed @BOOLEANS on AddUpdate to force 'f' setting instead of NULL if blank or 0

 v1.5 - Oct 2005
                Fixed return value error on AddUpdate()

 v1.4 - Aug 2005
      Minor patches

 v1.3 - Jul 17 2005
      Minor patches
                Now requires DBD::Pg version 1.43 or greater

 v1.2 - Jun 10 2005
      GetRecord() mods, added 'ITEM'
                test file fix in distribution
                created yml file for added requisites on CPAN

 v1.1 - Jun 9 2005
      pod updates
                Field() cache bug fix
                GetRecord() expanded, added finish option
                Moved from root "PGHandler" namespace to better-suited "Postgres::Handler"

 v0.9 - May 2 2005
      pod updates
                AddUpdate() updated, CGIKEY optional - defaults to DBKEY
                AddUpdate() updated, BOOLEANS feature added
                GetRecord() updated, added check for sth active before executing
                Field() fixed hr cache bug and data bug and trap non-set hr issue

 v0.8 - Apr 26 2005
      Fixed GetRecord() (again) - needed to check $DBI::errstr not $err

 v0.7 - Apr 25 2005
      Added error check on ->Field to ensure hashref returned from getrecord
      Added CGIMAP method
      Invoke CGIMAP from within AddUpdate if missing map
      Fixed GetRecord Return system

 v0.5 - Apr/2005
      Added DBI error trap on DoLE function
      Added named statement handles for multiple/nested PrepLE(X) capability
      Added VERBOSE mode to AddUpdate
      Added NAME to retrieved named statements via GetRecord
      Updated FIELD to use named statement handles

 v0.4 - Apr/2005
                Fixed some stuff

 v0.3 - Apr/2005
      Added REQUIRED optional parameter to AddUpdate
      Improved documentation
      Quoted DBKEY on add/update to handle non-numeric keys

 v0.2 - Mar/2005 -
      Added error messages to object
      Fixed issues with Class:Struct and the object properties
      Updated AddUpdate to use named parameters (hash) for clarity

 v0.1 - Dec/2004
      Initial private release

CPAN

You can find the published version of Postgres::Handler under the Cyber Sprocket Labs account on CPAN.