Posted on

Connect To Your Database With Emacs

The recent Emacs post made me think of a useful SQL trick I recently learned that I wanted to share. Emacs refers to different SQL engines as ‘products’, e.g. MySQL, SQLite, and so on. You can connect to an SQL database through Emacs but you have a call a different command based on the product.

An easier way to do this, as of version 24, is to use the keyboard shortcut C-u C-c TAB when viewing an SQL document. This will prompt you first for the product, and then for any connection information. So if you are working with an SQLite database you can press C-u C-c TAB, type sqlite, and then the filename of the database; that will open an interactive window connected to the database. Or if you are using PostgreSQL then Emacs will ask for a username, port, etc. The supported products are:

  • db2
  • informix
  • ingres
  • interbase
  • linter
  • ms
  • mysql
  • oracle
  • postgres
  • solid
  • sqlite
  • sybase

I don’t even know what some of those are…

Anyways, sometimes I find this faster than opening a terminal if I need to run some test queries or copy output into a document. (This is very useful with SQLite’s .mode html command, which will print the results of queries in <table> form.)


ejmr
南無妙法蓮華經

This article was originally posted on “The List” by EJMR.

Posted on

SQL: Using IN / NOT IN Operators

Here is an explanation I recently shared with someone about how to use NOT IN to select a subset of data from a one-to-many relationship.   It is a simple example that covers the basic use of NOT IN (and IN) operators in SQL.  These operators can be fairly powerful for performing quick queries on complex data relationships where a programmatic solution may otherwise be required.   This is not always the best solution but it can be a useful shortcut when used appropriately.

Syntax in various SQL dialects can vary, so I am using a generic SQL dialect based on my PostgreSQL knowledge.

The Problem

You have 2 tables you are linking together.   I forget the names so I’ll call them Nurses and RolesOfNurses.    For the purpose of this explanation let’s assume you are linking them by an ID field for the Nurse.

Your Nurse data may look like this:

ID: 1   Name: Karen
ID: 2  Name: Sharon

And your RolesOfNurses data:

Nurse ID: 1 RoleID: 10
Nurse ID: 1 RoleID: 555 
NurseID: 2 RoleID: 10

Normally with a one-to-many (1:M) relationship this RolesOfNurses table is just a middle-man so there would be another table called “Roles” that may have the RoleID and an common name to make human readable output (like reports) possible. So role #10 may be “nurse” and the specialized role #555 might be “Can Give Meds”.

In your select statement you want to find “All nurses that do not have the Can Give Meds” role. In database-like talk you want “All nurses that have role #10 but not role #555”. So your instinct is to join the two tables and filter by where role == 10 and role != 555. But that is wrong. The reason is that joining tables creates one unique row for every combination of data in the joined tables.

Breaking It Down : Select Nurses and Roles

Take it from the beginning.  Let’s just say “I want a list of all the roles nurses have”.  That is a simple join:

select * from nurses where nurse.id = rolesofnurses.nurseid;

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 1, Name: Karen, NurseID: 1, RoleID: 555
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Next: Filter The Data On Role # 10 Only

This data set is the basis of your advanced query, but before we get to the “dual and” let’s start with a single where clause to filter our data.  I’ve added the parenthesis only to make the where clause groupings easier to read.  The first where clause links the two tables together to give you the above data set.  The 2nd where clause filters that data to just show nurses.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Then : Add The Second Role Filter

Now add your 2nd where clause so you can have “all nurses that don’t have the Can Give Meds role” filter.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (roleofnurses.roleid != 555);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Notice that this is NOT the desired result!

Why? Look at the data.  All of your roles match BOTH of those filters.   The roles are all 10 AND they are NOT 555.  The first instinct is wrong. Personally I blame the fact that SQL, as powerful as it is, does not process English-based query syntax the same way that our brains do. Many people will make the same mistake. Sadly, the solution is a bit more complex to explain, but I’ll try to make it simple.

Fixing The Query: Step 1 – Build A List Of IDs

I’m going to start with the 2nd part of your filter, all the IDs of nurses that have role #555.  Note I said “IDs of nurses” here, that is important.  I don’t want the full set of nurse data like we have been doing up until now.  I simply want the list of ID numbers of nurses that have role #555.

select nurseid from rolesofnurses where roledid = 555;
NurseID: 1

This is the ONLY NurseID in our data set that has role # 555.

The Correct Query : Using NOT IN

Now let’s go and get our data we want by telling SQL “give me all the nurses that have role # 10 but are NOT in the list of nurses with role # 555”:

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id NOT IN (select nurseid from rolesofnurses where roledid = 555) );

And we get the desired result:

ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Explaining NOT IN

The trick here is the <blah> NOT IN <SQL QUERY> syntax.   Most (not all) SQL dialects support this, and it is an important advanced SQL construct. It allows you to filter data based on the results of another select statement. The trick with the NOT IN operator is to make the <SQL QUERY> part return a list that consists of a single elements you want to filter on. In our case the <SQL QUERY> returns the entire list of nurses that have role #555. We use NOT IN to tell it to only accept the record if the Nurse ID is NOT IN the list of nurse IDs that have role #555 or in English “pick all nurses that do not have role #555”. In our example it was a single record, but in a larger data set it could return an entire list of records.

Notes About IN/NOT IN Usage

For a few hundred records this construct is OK.  At some point you need to start looking into temp tables and other methods as using the <blah> NOT IN <SQL QUERY> construct can consume a lot of memory on the server. You need to be intelligent about what kind of resources your server has and how large the potential data set is for the second part of the IN query because most DB engines will load the entire list of ID #s into RAM.

Another note, you can create complex constructs with IN the same way as NOT IN, for example “give me a list of all nurses that have role #10 AND role #555”.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id IN (select nurseid from rolesofnurses where roledid = 555) );

BTW, I’ve used uppercase for emphasis only.   SQL is not case sensitive.

Posted on

Analyzing Queries in Postgres

Yesterday I was curious if there was a good way to write my IDS query using a single ‘select’. All of us had the right idea of using a sub-query to limit the results from the join, although there were different ideas about how to use a sub-query. Richard’s solution was the most succinct.

However, there is an important follow-up lesson that I want to talk about. What I asked yesterday is representative of something we all seem to ask ourselves consistently: “What is the most clever way I can do this?” When it comes to database queries I believe we have a tendency to see how much we can accomplish in one swoop. At the heart of this tendency is the assumption that more queries equals less efficiency.

Please strive to constantly challenge this assertion. Our thoughts about software should be continuously reinforced by demonstrable evidence.

In the case of SQL, this means query analysis.

There are various tools out there for measuring the performance of your queries. In Postgres you can get a pretty good idea about database performance by using the query analyzer. The syntax is pretty simple:

explain analyze [query...];

An actual example:

explain analyze select * from users;
QUERY PLAN
------------------------------------------------------------
Seq Scan on users  (cost=0.00..18.30 rows=830 width=68)
(actual time=0.013..0.021 rows=3 loops=1)
Total runtime: 0.078 ms

What we are seeing here are the details of the query plan that Postgres uses for this query. Let’s break down the information we see here.

Seq Scan on users
This means Postgres is performing a “sequential scan” on the ‘users’ table. In other words, it’s reading row by row. This is the type of plan that Postgres is using. If it were reading an index, you would see that reflected here, for example.
cost=0.00..18.30
The first number is the predicted start-up time for this query, meaning the amount of time before we can get output. Since this is a sequential scan with no sorting or grouping or anything, Postgres can start returning data immediately.

The second number is the estimated total cost. That is, the amount of time it will take if all rows are returned. Take note that this really means all rows. If our query had a limit, it would not impact the total cost, and in reality our query would run faster than the estimated total cost.

The values of these two numbers are somewhat arbitrary. See section 18.6.2 of the manual for a description of the values used to compute these numbers. But keep in mind that in general ‘1.0’ equals one disk fetch. So for this query we could be looking at around eighteen disk fetches.

rows=830
This is the estimated number of rows output. Again, this number only reflects the estimate if the query plan is executed to completion. Things like a limit may cause it to stop sooner. It also does not necessarily mean the number of rows processed. Usually this number will always be lower than the number of rows actually processed. Note that in my case the value was wildly off. If you look at the actual results, you will see only three rows were returned. When you have very, very small data sets this value is typically irrelevant.
width=68
The estimated size of each row, in bytes.

For more complex queries you are going to see more information. Much to my own surprise, I do not have a complex enough database on my computer right now to generate anything but a sequential scan. And I can’t get online, and I’m about to go to bed, so…

If you analyze queries with an index then you will likely see that appear in the analyzer. But not always! Even if you have an index, there are situations where it is faster to avoid it. Sometimes you will also see what appear to be multiple plans. Sometimes Postgres will break your query down into a series of plans, and you will see the individual costs of each. For instance, it may use an index to determine the location of rows based on certain criteria, and then a sequential scan to grab them in mass.

Internally, Postgres represents these query plans as nodes on a tree, which is reflected in the output of ‘explain analyze’. Meaning that the plans and costs at the top include the costs of their children below.

You can run ‘explain analyze’ with inserts, updates, and deletes as well. But since those are destructive operations you should run them in a transaction. This is because ‘explain analyze’ will actually execute your query—it does not merely examine it and make a guess at its results; it measures real data. So if your query is a destructive one, you need to do this:

begin;
explain analyze insert ...;
rollback;

That will let you get real numbers without making permanent changes to your data.

Keep your eyes peeled for chances to use this. It usually does not take much effort to copy a query into Postgres and run the analyzer on it. Take note of the numbers you see and what types of queries produce what types of costs. Gradually this will help you gain a better understanding of how SQL queries translate to actual performance.

Posted on

SQL Movement Commands

Most programming modes in Emacs support two commands, beginning-of-defun and end-of-defun.  These move you to the beginning and end of the nearest function, and by default are bound to C-M-a and C-M-e, respectively.

However, sql-mode does not have anything similar.  I have the above two commands bound to C-up and C-down for quickly moving around functions, but I find myself often hitting them when viewing an SQL file to move through tables—which doesn’t work.  Fortunately that functionality is pretty easy to add.

For my fellow Emacs users, here is some code I whipped up that you can use to easily move between tables and views.

(defconst sql-beginning-of-table-or-view-regexp
"^\\s-*create\\s-+\\(?:table\\|view\\)"
"Regular expression for matching the start of an SQL table or view definition.")

(defun sql-prev-table-or-view ()
"Move backwards to the beginning of the nearest table or view
from point."
(interactive)
(re-search-backward sql-beginning-of-table-or-view-regexp
nil 'noerror))

(defun sql-next-table-or-view ()
"Move forwards to the beginning of the nearest table or view from point."
(interactive)
(re-search-forward sql-beginning-of-table-or-view-regexp
nil 'noerror))

Then you can add a hook for sql-mode to bind those commands to keys whenever sql-mode is first loaded. I know you guys already know about using global-set-key, but you can also create key bindings specific for modes. Each mode in Emacs has an associated “mode map”, for example, sql-mode has the sql-mode-map, php-mode has the php-mode-map, et cetera. You can use the function define-key to create a binding for a specific mode map. Like I said, I have C-up and C-down globally bound to move functions, but I use define-key like this to make the keys move by tables/views is sql-mode:

(global-set-key (kbd "<C-up>") 'beginning-of-defun)
(global-set-key (kbd "<C-down>") 'end-of-defun)

(add-hook 'sql-mode-hook
(lambda ()
(define-key sql-mode-map (kbd "<C-up>")
'sql-prev-table-or-view)
(define-key sql-mode-map (kbd "<C-down>")
'sql-next-table-or-view)))

I put my global key settings in there just for comparison. The call to define-key is almost identical, except the first argument is the mode map.

While I’m on the subject, there are two similar commands for text-mode that have no default binding, which I find come in handy:

(add-hook 'text-mode-hook
(lambda ()
(define-key text-mode-map (kbd "<C-down>")
'forward-paragraph)
(define-key text-mode-map (kbd "<C-up>")
'backward-paragraph)))

So there’s some hopefully useful commands for SQL work, and for configuring key bindings in general.

Posted on

PostgreSQL Stored Procedures – Part 2

We need your help!


Cyber Sprocket is looking to qualify for a small business grant so we can continue our development efforts. We are working on a custom application builder platform so you can build custom mobile apps for your business. If we reach our 250-person goal have a better chance of being selected.

It is free and takes less than 2 minutes!

Go to www.missionsmallbusiness.com.
Click on the “Login and Vote” button.
Put “Cyber Sprocket” in the search box and click search.
When our name comes up click on the vote button.

 

And now on to our article…

In a follow up to our initial Stored Procedures article, here we present some more complex examples based on real-world work we’ve completed on a PostgreSQL 8.3 database.

Fetching A Newly Inserted ID

Often times you will want to insert a record into a table that has a default ID string.   There is any easy method for doing this with the INSERT…RETURNING syntax.

In this stored procedure example below we add a new group name to our groups table and get back the auto-generated ID (primary key) in the newid variable.   We then use that to insert a record in our group members table AND return the newly created ID back from the stored procedure, making it easy for the calling program to catch the new ID:

DECLARE
    newid BIGINT;
BEGIN
    INSERT INTO groups (group_name) values ($1) RETURNING group_id INTO newid;
    INSERT INTO group_members (group_id, avatar_id,groupmember_admin) values (newid,$2,'t');
    return newid;
END

Select Loop With String Building

This first example will select a set of records in the child table of a one-to-many relationship (the many part) and build up semi-colon separated string comprised of the name field for each child record.  The name field contains the human-readable name of on item a person is wearing.

sp_what_am_i_wearing(<id>)

This stored procedure extracts all of the items a person is wearing as a string separated by semi-colons by simply passing in the user ID.

definition

CREATE OR REPLACE FUNCTION sp_what_am_i_wearing(id bigint) RETURNS text AS $$
    DECLARE
        retval text = '';
        row_data UserRow%ROWTYPE;
    BEGIN
        FOR row_data IN SELECT * FROM MyObjects WHERE id = $1
           LOOP
              retval := retval || row_data.object_name || ';';
          END LOOP;
          return retval;
      END;
$$ LANGUAGE plpgsql;

Usage and Output

game_data=# select sp_what_am_i_wearing(2);
avatariswearing
-----------------------------------------------------------------
hat_01;shirt_01;pants_22;shoes_32;

Insert/Update Loop With String Decomposition

This second example is the “setter” companion to our prior example’s “getter” functionality.  This version breaks down a human-readable string with names separated by colons and then updates the child table in the one-to-many relationship.  We not only have to add records to that table for any new objects we might be wearing but also remove any pre-existing records from our prior wardrobe.

First we select any object that is not in our character string but IS in the child objects table, loop through those records, and delte them.  In essence we are deleting anything we are no longer wearing.  This may be ALL of the items we were wearing previously or just a single item if we only changed something like our hat or shoes.

In the second part of the procedure we loop through all the objects in the string as we break it down based on the semicolon, and only select those items that are NOT left remaining in the child table of our one-to-many relationship.  Remember, that child table now contains only those items we were wearing before and did not change.  If that child set is empty we’ll end up adding new records for EVERY object in our string.   We fetch the ID of the objects we are wearing from the objects table and then insert our user ID, the object ID, and a flag in the child table saying that we are wearing the various objects.

sp_set_what_i_wear(<id>,<wearing_string>)

This stored procedure stores all of the items a user is wearing, breaking down a string separated by semi-colons and using that to add or delete objects from the avatar objects table.

Definition

CREATE OR REPLACE FUNCTION sp_set_what_i_wear(id bigint, wearing text) RETURNS void AS $$
    DECLARE
       ao_id BIGINT;
       obj_id BIGINT;
    BEGIN
       -- Delete what I used to be wearing, but am no longer wearing
       --
       FOR ao_id IN SELECT object_id FROM ObjectList
              WHERE id=$1 AND name NOT IN (SELECT regexp_split_to_table($2,E';'))
       LOOP
              DELETE FROM MyObjects WHERE object_id = ao_id;
       END LOOP;

       -- Fetch the object IDs of things I am now wearing, but don't show up in my object list
       -- Insert the object IDs into that child table, called "MyObjects" that links me to stuff I wear
       --
       FOR obj_id IN SELECT object_id FROM Objects
              WHERE name IN (SELECT regexp_split_to_table($2,E';'))
                     AND name NOT IN (SELECT name FROM ObjectList WHERE id=$1)
       LOOP
              INSERT INTO MyObjects (id,object_id,onperson) VALUES ($1,obj_id,'t');
       END LOOP;
END;
$$ LANGUAGE plpgsql;

Usage and Output

Note: current return type is VOID.

It may be wise to set a return value showing the items that were in the set string but don’t exist in the DB.  This can be done with a simple select from objects not in select from regexp_split_to_table.  This is a performance hit, so it was left out for this revision.


game_data=# select sp_set_what_i_wear(2,'hat_01;shirt_10;pants_11;shoes_33;');
sp_set_avatar_is_wearing
--------------------------
(1 row)

Posted on

PostgreSQL Stored Procedures Part 1

Stored procedures are a great way to enforce business and data access logic in a complex environment. Often with larger projects you will have multiple applications architected by a varied set of professionals all interacting with the same data sets.   Stored procedures are a tried and true way to regain some control over those data sets and ensure standards compliance.

The following paragraphs will show the basic syntax and some simple examples for using stored procedures in PostgreSQL.

Syntax

CREATE [ OR REPLACE ] FUNCTION
name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
[ RETURNS rettype ]
{ LANGUAGE langname
| IMMUTABLE | STABLE | VOLATILE
| CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
| [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER
| COST execution_cost
| ROWS result_rows
| SET configuration_parameter { TO value | = value | FROM CURRENT }
| AS 'definition'
| AS 'obj_file', 'link_symbol'
} ...
[ WITH ( attribute [, ...] ) ]

Examples (generic)

Here are some trivial examples to help you get started. For more information and examples, see Section 34.3.

CREATE FUNCTION add(integer, integer) RETURNS integer
    AS 'select $1 + $2;'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;

Increment an integer, making use of an argument name, in PL/pgSQL:

CREATE OR REPLACE FUNCTION increment(i integer) RETURNS integer AS $$
        BEGIN
                RETURN i + 1;
       END;
$$ LANGUAGE plpgsql;

Return a record containing multiple output parameters:

CREATE FUNCTION dup(in int, out f1 int, out f2 text)
    AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
    LANGUAGE SQL;

SELECT * FROM dup(42);

You can do the same thing more verbosely with an explicitly named composite type:

CREATE TYPE dup_result AS (f1 int, f2 text);

CREATE FUNCTION dup(int) RETURNS dup_result
    AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
    LANGUAGE SQL;

SELECT * FROM dup(42);

Resources

http://www.postgresql.org/docs/8.3/static/sql-createfunction.html

Posted on

On Delete Cascade

ON DELETE CASCADE

In a relational database where would you use the following construct? What does it do?

 … ON DELETE CASCADE

Answer

ON DELETE CASCADE is used when creating a table that has a field with a FOREIGN KEY. The purpose it to ensure that when a row in the parent table is deleted all of the related rows in the CHILD table are deleted.

Example

 CREATE TABLE cities (
       city     varchar(80) primary key
 );

 CREATE TABLE weather_history (
       city      varchar(80) references cities(city) ON DELETE CASCADE,
       temp_lo   int,
       temp_hi   int,
       date      date
  );

The parent table is cities, the child is weather_history. We normally load it up so that we can get reports like this:

 New Orleans
    Jan 1 , 60f 70f
    Jan 2 , 61f 74f
 Charleston
    Jan 1 , 35f 50f
    Jan 2 , 45f 65f

If New Orleans gets washed into the ocean due to a hurricane, we can now get rid of it like this:

 DELETE FROM cities where city='New Orleans';

The ON DELETE CASCADE ensures all the related records in weather_history are removed, preventing orphans (a bunch of records in weather history that have no city attached to them)