Posted on

WordPress dbDelta Better In 4.2, Not Yet Perfect

WP dbDelta Bug Report Banner

There have been a LOT of changes in WordPress Core including updates to the dbDelta() method in the upgrade function set.   What is dbDelta()?  dbDelta is the primary function used by WordPress to determine if the structure of data tables needs to be augmented between releases.   It works great for upgrading core WordPress data tables.    Where it has problems is with developer-specific SQL statements used to create secondary tables in plugins and themes.    Store Locator Plus, for example, uses a secondary table to storing basic location data.

Unfortunately dbDelta has always had issues (see running commentary on Trac ticket 10404), especially when it comes to indices on a MySQL table.   Most well-designed data tables will have primary and secondary indexes on the table to improve query performance.    dbDelta has a hard time figuring out when an index already exists and has not been modified.   If you did not craft your CREATE TABLE command EXACTLY the way dbDelta wants to see it, the dbDelta method will create new indexes.    This can cause performance degradation and unnecessary overhead in the MySQL tables when a data column ends up with 10, 20 , 30+ copies of the index; one created every time the plugin or theme is updated.

Many plugins and themes suffer from this malady.   Over two years ago I dug into dbDelta and learned the syntax secrets to get WordPress to stop creating duplicate indexes.   Today I found a site that was back to its old tricks running the latest version of Store Locator Plus and WordPress 4.2.2.   As it turns out the rules for dbDelta have changed and thus the syntax for creating a “dbDelta friendly” SQL statement has changed as well.    With dbDelta a behind-the-scenes feature it is never broadcast to the general public that it has changed and thus goes undetected for a while when something goes awry.

Turns out the changes are not new.  They go back to at least WordPress 4.1.    The last time I dug into this issue was WordPress 3.8.   Guess it is time for some phpUnit testing on dbDelta within my own plugin suite testing.   If things are running properly the dbDelta command for an existing installation of Store Locator Plus 4.2 should yield NO QUERY UPDATES when activating/installing a newer version of the plugin.

WP 4.1 dbDelta Results

Source SQL:

dbDelta SQL: CREATE TABLE wp_store_locator ( sl_id mediumint(8) unsigned NOT NULL auto_increment, sl_store varchar(255) NULL, sl_address varchar(255) NULL, sl_address2 varchar(255) NULL, sl_city varchar(255) NULL, sl_state varchar(255) NULL, sl_zip varchar(255) NULL, sl_country varchar(255) NULL, sl_latitude varchar(255) NULL, sl_longitude varchar(255) NULL, sl_tags mediumtext NULL, sl_description text NULL, sl_email varchar(255) NULL, sl_url varchar(255) NULL, sl_hours varchar(255) NULL, sl_phone varchar(255) NULL, sl_fax varchar(255) NULL, sl_image varchar(255) NULL, sl_private varchar(1) NULL, sl_neat_title varchar(255) NULL, sl_linked_postid int NULL, sl_pages_url varchar(255) NULL, sl_pages_on varchar(1) NULL, sl_option_value longtext NULL, sl_lastupdated timestamp NOT NULL default current_timestamp, PRIMARY KEY (sl_id), KEY sl_store (sl_store), KEY sl_longitude (sl_longitude), KEY sl_latitude (sl_latitude) ) DEFAULT CHARACTER SET utf8

dbDelta Updates List:

QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_linked_postid sl_linked_postid int NULL
QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_lastupdated sl_lastupdated timestamp NOT NULL default current_timestamp
QUERY: ALTER TABLE wp_store_locator ADD PRIMARY KEY (sl_id)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_store (sl_store)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_longitude (sl_longitude)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_latitude (sl_latitude)
for update

Update Results:

[wp_store_locator.sl_linked_postid] => Changed type of wp_store_locator.sl_linked_postid from int(11) to int
[wp_store_locator.sl_lastupdated] => Changed type of wp_store_locator.sl_lastupdated from timestamp to
[0] => Added index wp_store_locator PRIMARY KEY (sl_id)
[1] => Added index wp_store_locator KEY sl_store (sl_store)
[2] => Added index wp_store_locator KEY sl_longitude (sl_longitude)
[3] => Added index wp_store_locator KEY sl_latitude (sl_latitude)

My WP 4.2 dbDelta Results

Better, but still not quite right.

Source SQL:

CREATE TABLE wp_store_locator ( sl_id mediumint(8) unsigned NOT NULL auto_increment, sl_store varchar(255) NULL, sl_address varchar(255) NULL, sl_address2 varchar(255) NULL, sl_city varchar(255) NULL, sl_state varchar(255) NULL, sl_zip varchar(255) NULL, sl_country varchar(255) NULL, sl_latitude varchar(255) NULL, sl_longitude varchar(255) NULL, sl_tags mediumtext NULL, sl_description text NULL, sl_email varchar(255) NULL, sl_url varchar(255) NULL, sl_hours varchar(255) NULL, sl_phone varchar(255) NULL, sl_fax varchar(255) NULL, sl_image varchar(255) NULL, sl_private varchar(1) NULL, sl_neat_title varchar(255) NULL, sl_linked_postid int NULL, sl_pages_url varchar(255) NULL, sl_pages_on varchar(1) NULL, sl_option_value longtext NULL, sl_lastupdated timestamp NOT NULL default current_timestamp, PRIMARY KEY (sl_id), KEY sl_store (sl_store), KEY sl_longitude (sl_longitude), KEY sl_latitude (sl_latitude) ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

dbDelta Updates List:

QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_linked_postid sl_linked_postid int NULL
QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_lastupdated sl_lastupdated timestamp NOT NULL default current_timestamp
QUERY: ALTER TABLE wp_store_locator ADD PRIMARY KEY (sl_id)

Update Results:

[wp_store_locator.sl_linked_postid] => Changed type of wp_store_locator.sl_linked_postid from int(11) to int
[wp_store_locator.sl_lastupdated] => Changed type of wp_store_locator.sl_lastupdated from timestamp to
[0] => Added index wp_store_locator PRIMARY KEY (sl_id)

WP 4.2 dbDelta SQL Best Practices

Some of these carry over from the prior dbDelta comments, some are new with later versions of dbDelta and MySQL.   The code is fairly robust, but if your MySQL installation comes back with a different format for the describe command this can wreak havoc on dbDelta().     I’ve presented things to look out for when crafting your dbDelta SQL statements as well as possible improvements in WP Core for dbDelta.    I may submit patches once again, but it has proven nearly impossible to get patches reviewed and approved for dbDelta.   Makes sense given the HUGE impact this function has on the WordPress Core engine, but there are thousands of plugins and themes wreaking havoc on WordPress performance due to some of these issues.    Maybe a detailed review like this will help get some updates into core.

Table Creation / Modification Commands

dbDelta() uses the MySQL DESCRIBE <table> command to fetch data about the table name passed in to dbDelta().  It is parsed with a regular expression out of the SQL statement looking for CREATE TABLE <…> , CREATE DATABASE <…> , INSERT INTO <..>, and UPDATE <…>  where the first “word” in the <…> part is assumed to be the table name.

Notes

  • The process appears to be case sensitive.  Use the commands as noted above in all caps.
  • Make sure you have no extra spaces, newlines, or other whitespace in between CREATE and TABLE or the other two-word directives noted here.
  • Make sure your table name has a leading space before and trailing space after.

Possible Improvements

  • Consider make thing preg_match() case insensitive.
  • Consider changing the single-space in commands like INSERT INTO into multi-whitespace regex i.e. INSERT\s+INTO

Defining Column Types

On some servers, such as my CentOS 7 development box, MySQL Ver 15.1 Distrib 5.5.41-MariaDB, has decided that post types of INT really mean INT(11).   Different variations may post different results back from the DESCRIBE command, but here is what I find working on MOST servers including my Linux dev box:

  • Always specify the LENGTH for each field type that accepts a length.  For most fields use the default length.
    • int(11)
    • mediumint(8)
    • varchar(1) to varchar(255), 255 is the default.
  • Fields without lengths:
    • longtext
    • mediumtext
    • text
  • Do not put spaces between the field type and the parens or the number in parens for the length.
  • User lowercase for field name and the field type.
  • Use uppercase for the NULL and NOT NULL specification.
  • Do NOT have extra spaces between the field name, the field type, and any of the additional definition parameters.

Examples

global $wpdb;
$sql =
'CREATE TABLE my_table (
my_id mediumint(8) unsigned NOT NULL auto_increment,
my_first_field varchar(1) NULL,
my_second_field varchar(255) NULL,
my_third_field int(11) NULL,
my_lastupdate timestamp NOT NULL default CURRENT_TIMESTAMP
)
DEFAULT CHARACTER SET ' . $wpdb->charset;

Things the cause problems in my setup:

  • Two spaces after the field name my_lastupdate and the word timestamp.
  • Using int instead of int(11) on my_third_field.
  • Using lowercase CURRENT_TIMESTAMP on my_lastupdate.

Possible Improvements

  • Consider make thing preg_match() case insensitive.
  • Make the comparison on the default case insensitive, or shift the inbound query uppercase.
  • Consider changing the single-space in commands into multi-whitespace regex tests.

 

Defining Keys (Indices)

The key comparison system is just as strict with whitespaces and case sensitivity as the column comparison routine in dbDelta.   You MUST have TWO spaces after the keywords “PRIMARY KEY” before defining your index field but only ONE SPACE after all other keys.

  •  ALWAYS declare the column names for every key type, including primary key.
  • ALWAYS put your column names inside parenthesis with no spaces.
  • PRIMARY KEY should NEVER have a key name.
  • PRIMARY KEY is always followed by two spaces then the parenthesis then the column name and closing parenthesis.
  • KEY is always followed by a single space, the column name, then a single space then parenthesis around the column name.
  • They keyword PRIMARY KEY must be uppercase.

Examples

global $wpdb;
$sql =
'CREATE TABLE my_table (
my_id mediumint(8) unsigned NOT NULL auto_increment,
my_first_field varchar(1) NULL,
my_second_field varchar(255) NULL,
my_third_field int(11) NULL,
my_lastupdate timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY  (my_id),
KEY third_key (my_third_field)
)
DEFAULT CHARACTER SET ' . $wpdb->charset;

Things the cause problems in my setup:

  • Missing the second space after the PRIMARY KEY definition.

Possible Improvements

  • Consider changing the single-space in commands into multi-whitespace regex tests.
  • Make the indices incoming text for the PRIMARY and KEY keywords force to uppercase.
Posted on

Janitor 4.1.15 Makes Quick Work Of Resetting Locations

WordPress Locations Tabbar Banner

The Janitor add-on pack for Store Locator Plus adds a new drop locations feature.    This feature makes quick work of deleting the Store Locator Plus locations when a simple database reset is required.   The new feature is far faster than the previous Clear Locations option, but Drop Locations does have its limits.

J 4.2.15 Clear or Drop Locations
J 4.2.15 Clear or Drop LocationsWo

Drop Locations

This new feature removes the main locations table and the extended meta data fields, used by the * Extender add-on packs, via a the SQL DROP TABLE command.  Both tables are immediately recreated.     This feature does NOT remove the associates custom WordPress “store pages” data NOR does it remove the Tagalong and custom taxonomy data.    If you are not using Tagalong categories and are not use Store Pages, then Drop Locations can remove tens- or even -hundreds-of-thousands of locations in less than a minute.    By comparison a 200,000 location table can take as long as 2 hours using Clear Locations.

Clear Locations

Clear Locations is a more thorough removal mechanism.  This process leaves the data tables intact but deletes the records one-at-time using the WordPress data interface which invokes all the hooks-and-filters.  That means any third party or CSA-derived add-on packs will fire all the “location cleanup” processes including Tagalong data cleanup , Store Pages cleanup, and any related meta data that may be hiding in WordPress options tables, taxonomy tables, custom page types, or custom data tables installed by add-on packs.

Janitor Change Log

Posted on

WordPress Performance Tuning

MySQL InnoDB Banner

The Charleston Software Associates website has been getting slower every week.   For the past month I’ve been spending an hour or two trying to figure out what was going on.    I’ve used several tools to analyze the site but it seems no matter what I do the site gets marginally faster at best.   I still receive daily alerts of “your site took more than 15 seconds to respond” from the Windows Azure monitoring service that pings my hosted Linux virtual machines and alerts me to issues.

Last night I decide to switch some of my tables from MyISAM to InnoDB.  The theory being that the MySQL InnoDB engine does row-level locking and since the bulk of my site is order transactions and forum posts the insert and update operations are at least as frequent as the read-only operations.   Row level locking should be better.

Wow, did I SCREW THAT ONE UP.

Actually I only partly screwed it up.

I made the change around 11PM last evening and everything looked OK.  No notable performance gains but I wanted to see what happened over 24 hours with the “your site is slow as hell” notices I get every day.   By noon today it was clear I moved in the wrong direction.   I was now getting “your site sucks” warnings from my monitoring service every half-hour.   Ouch!

Fixing InnoDB

It turns out you should not just convert a table from MyISAM to InnoDB without first “checking under the hood”.   While it is easy to convert from MyISAM to InnoDB with a simple MySQL ALTER TABLE <x> ENGINE = InnoDB command (make sure the table does not have Full Text indexes with SHOW INDEXES IN <x> first) that may not be prudent.

When I got on my server today I noticed the CPU would spike to 90% or more, then drop back to normal.  Every time it was MySQL spinning the CPU as hard as it could and always when accessing the half-dozen InnoDB tables I created.

Turns out that InnoDB could use some serious tuning before turning it on.  InnoDB can really use a helping hand by allowing multiple CPU threads and by utilizing a buffer.   Turns out there are some other general MySQL buffers and MyISAM settings that can help performance as well.

For MySQL on a Red Hat based distribution, and many other Linux distros, you can tweak the initial startup settings by editing the my.cnf file.  On CentOS it lives in /etc/my.cnf.

Here is what I added to the file in the [mysqld] section:

# default: 1M
innodb_additional_mem_pool_size=32M
# default: 8M
innodb_buffer_pool_size=256M
# default: 0
innodb_thread_concurrency=8
# default: 128K (131072)
join_buffer_size=64M
# default:8M
myisam_sort_buffer_size=32M
# default: 0 (disabled)
query_cache_size=128M
# default: 2M
sort_buffer_size=32M

The MySQL InnoDB settings page will tell you more about each setting. Google can help you find out what the other settings mean.

You can use the SHOW VARIABLES MySQL command to see what your current settings are.   Most of the settings can be added to the my.cnf file and tweaked.  Just make sure you read about settings before blindly guessing what correct values are for your server.  Learn what they do, tweak, and monitor.   Hopefully you can avoid doing this to your CPU and response times:

InnoDB Tuning Fail
InnoDB Tuning Fail

By increasing the buffer sizes and turning on multiple threads my CPU is not spinning up to 90% or more when an SQL query is sent along from WordPress. With 4GB of RAM on my server I am certain I can tune this even further and there are other settings that will help, but this is a great start to getting MySQL to perform better than it has been.

Lesson Learned

The lesson here, especially if you are on a dedicated server, is to look at your MySQL configuration. The “out of the box” setup is not usually ready for production in ANY environment. I should know better, but when running a business solo it is one of the many “I’ll do it later” tasks that don’t get addressed until it bites you in the ass. Like it did for me today.

My guess is my quad-core 4GB RAM server can more than handle the load my website brings to it.    If I turn all of the right dials in the right direction I should be able to get the page response times to under 1 second.   As one of the most impatient web surfers you’ll meet I find my current 4-second average page response time is simply far too long to wait.

Some good articles on WordPress DB Tuning

http://wordpresswebmaster.net/mysql-database-optimization-on-wordpress/
http://www.prelovac.com/vladimir/wordpress-optimization-guide

Posted on

Turn Off dbDelta Describe Errors

debug bar banner

I use the Debug Bar and Debug My Plugin plugins to assist in support and development of my WordPress plugins.   I also make use of the built-in WordPress database management tools provided by wpdb and the upgrade module of WordPress.    One of the nice features of the upgrade module is the dbDelta function.   This function provides a simple mechanism for either modifying an existing table structure or to create a new table as needed.     This is a Godsend for complex plugins with multiple data tables, like Store Locator Plus.

However the dbDelta function has a little side effect that causes some plugins like Debug Bar to throw up error messages when a brand new data table is created.    The problem is that the upgrade module runs dbDelta with suppress errors ($wpdb->suppress_errors()) which only prevents the immediate printing of errors to the screen.  The database errors are still pushed onto the global EZSQL_ERROR array.     While I can see the utility of such a method, it does generate some error messages in the log files and the Debug Bar interface if it is enabled.

Luckily there is an easy fix to this issue.  Immediately after running the dbDelta() function you can either “pop the stack” of the EZSQL_ERROR array or for the lazy people like me just set EZSQL_ERROR back to a blank array:


dbDelta($createTableSQL);
global $EZSQL_ERROR;

$EZSQL_ERROR = array();

This will strip the error message off the stack and bypass Debug Bars reporting of anything that is in the SQL error stack.  For new tables the benign “table doesn’t exist” error for the describe command will go away.

I do NOT recommend doing this on a regular basis.  I only use it on the production release of Tagalong to prevent the red box warning on Debug Bar that shows up every time the plugin is activated on a new install with Debug Bar running.

 

Posted on

WordPress dbDelta Bug Affects SLP

There is a bug that remains open in the WordPress Core tracking system that impacts Store Locator Plus database tables.  The bug will degrade database performance after each consecutive update to the Store Locator Plus base plugin.    The problem and resolution are described below.

The Problem

There is a glitch in a core WordPress function, dbDelta(), that creates duplicate table indexes.    Store Locator Plus uses 3 indexes on the locations table.  Indexes significantly increase the performance of database searches when used properly.  For Store Locator Plus an index exists on the store name, the latitude, and the longitude.   The latter two mean that whenever someone does a location search on your website the data returned from the locations table is found much more quickly than if the data index was not in place.  On large installs, those with hundreds or thousands of locations, the difference is notable.

To put the index and any other data table elements in place, SLP uses the built-in dbDelta function from WordPress.  This is the recommended method for managing table and index creation as well as structure updates.  It works great.  Most of the time.

The problem is that dbDelta() has a glitch that creates a new index every time the Store Locator Plus version changes.    It should be looking for the store, lat, and long indexes and if not there it will create them.  Instead if it finds an existing index it creates a new index and adds a numeric suffix.    After a number of upgrades to SLP you end up with a database that has multiple indexes like this:

  • sl_store
  • sl_store_1
  • sl_store_2
  • sl_latitude
  • sl_latitude_1
  • sl_latitude_2
  • sl_longitude
  • sl_longitude_1
  • sl_longitude_2

Those _# indexes are IDENTICAL to the non-suffixed version.

This is bad.   Any time a record is updated or added in the locations table it is MUCH slower as every index needs to be updated.  After just 3 updates MySQL is performing 9 index updates on every record operation versus 3.

It can also confuse and SLOW DOWN record searches as the MySQL engine will take longer to determine which is the best index to use as it has to load the entire index directory.  I assume MySQL is smart enough to drop after the “first perfect match”, but that is purely conjecture.   If that is true the search impact would be minimal.  If MySQL is not that intelligent and it looks at ALL INDEXES before selecting the “best match” during a search the impact here can be notable as well, though not nearly as dramatic as a updating or inserting new data.

Do You Have A Problem?

Unfortunately there are only 2 ways to tell.

First – your WordPress site starts throwing errors (usually hidden in the error log) that are coming from MySQL and tell you your index limit of 64 entries has been exceeded.    You shouldn’t need to increase this limit, it is per-table and there are very few (if any) cases where 64 indexes on a single table is helpful.

Second – use the MySQL command line tool or phpMyAdmin and look at the wp_store_locator table.   If it has _# indexes they need to go.

The Manual Fix

The way to fix installations by hand is to use the MySQL command line tool, or PHPMyAdmin and drop the _# indexes from the wp_store_locator table.   Since Store Locator Plus is not the only plugin affected you may want to check for any other tables that are created by plugins and see if they suffer from the same issue.  Contact the plugin author before randomly deleting indexes.

From the MySQL command line you can execute the DROP INDEX command as often as necessary based on how many duplicate indexes appear:

mysql> DROP INDEX sl_store_2 ON wp_store_locator;
mysql> DROP INDEX sl_latitude_2 ON wp_store_locator;
mysql> DROP INDEX sl_longitude_2 ON wp_store_locator;

You should leave the indexes without the _# intact.

The Automated Fix

I am looking into ways to detect and auto-delete the secondary indexes in Store Locator Plus in the 3.9.X+ updates, however there is a danger of bloating the update/install process which I want to avoid.    I am also looking for ways to prevent the problem from happening in the first place and have been active with the reported bug to see if there is a way to get WordPress Core developers to patch the problem.

If a workaround is found it will be noted in the updates road map for SLP.

###

Alexa Rank: 264,930 / 89,424 / 611

Technorati Rank: 27347

Posted on

Connect To Your Database With Emacs

The recent Emacs post made me think of a useful SQL trick I recently learned that I wanted to share. Emacs refers to different SQL engines as ‘products’, e.g. MySQL, SQLite, and so on. You can connect to an SQL database through Emacs but you have a call a different command based on the product.

An easier way to do this, as of version 24, is to use the keyboard shortcut C-u C-c TAB when viewing an SQL document. This will prompt you first for the product, and then for any connection information. So if you are working with an SQLite database you can press C-u C-c TAB, type sqlite, and then the filename of the database; that will open an interactive window connected to the database. Or if you are using PostgreSQL then Emacs will ask for a username, port, etc. The supported products are:

  • db2
  • informix
  • ingres
  • interbase
  • linter
  • ms
  • mysql
  • oracle
  • postgres
  • solid
  • sqlite
  • sybase

I don’t even know what some of those are…

Anyways, sometimes I find this faster than opening a terminal if I need to run some test queries or copy output into a document. (This is very useful with SQLite’s .mode html command, which will print the results of queries in <table> form.)


ejmr
南無妙法蓮華經

This article was originally posted on “The List” by EJMR.

Posted on

PHP + PostgreSQL App Performance Example

PHP and PostgreSQLOne of our projects this week was to improve the performance in a payment processing system for a client. The system performed well under limited data sets, but as the data set grew larger the performance response time increased exponentially. In reviewing the application we found several design flaws that were significantly impacting performance. This article outlines some of the issues we found.

Overview

The take-away from this article is that you should know the details of the implementation from ALL angles.  If you are a code junkie, then make sure you review and understand the database angle.  Same thing in reverse if you are a DB guru, take the time to understand the code.    Nobody can do an effective job of application design & implementation without understanding the big picture.

In this case there is way too much emphasis on putting all the business intelligence into the database.   While there is nothing wrong with that, and in fact that is often a preferred architecture, it was not well thought out and thus not well implemented in this case.   One of the bigger mistakes was putting the business intelligence into simple views versus using proper stored procedure designs.

Bottom line, sometimes the better solution is to put SOME intelligence in the data engine and move some of the extraction/looping and other processing logic on the code side.  ESPECIALLY in a case like this one where we know that the client will not, at least any time soon, be accessing the data from any applications outside the server-side PHP application that is implemented for them.  Thus we know we could put all the intelligence in the code, though that makes for a LOT more work if/when they decide to introduce things like mobile apps.

Lesson 1: Don’t Put Multiple Sub-Selects In A View

This is a simplified example from a simple view that was built in Postgres for the client.

SELECT a,b,
to_char(c_date, (select settings.value from settings where settings.id='date_format')) as c_date_str,
to_char(d_date, (select settings.value from settings where settings.id='date_format')) as d_date_str,
to_char(e_date, (select settings.value from settings where settings.id='date_format')) as e_date_str
...

This view is using a settings table which holds the date format.  The client can use the web interface to change the date format, which is stored in the settings table.   That is a good web app design.

Doing multiple queries to retrieve that date format in a single view is a bad design.   In the simple example we have above we end up hitting the database, and thus doing disk I/O, no less than FOUR TIMES for a single query.  That is going to be slow.

There are a myriad of better options here, here are the two options I would consider:

  • Move the sub-select into a stored procedure and turn it into a function.An intelligent design of that procedure will retain the initial data fetch in a global variable that is tested on each call, blocking future data I/O requests.   Data I/O is now 2calls v. 4+ for the view.
  • Return the raw data.Allow the code to format the strings.   The code can easily fetch the date format and apply the equivalent PHP formatting call ONCE and apply it to all raw data data.  This also cuts down the data I/O.Using raw data also increases the chances for the PostgreSQL engine to optimize the query via the internal heuristics engine.
In our application improvement the path taken was to avoid this view whenever possible.  As it turns out, this view is so complex and convoluted that there are often multiple shortcuts that get to just the few data elements that are needed.  Constructing new queries retrieved the necessary data without all the view complexities and data overload.In this case the view is so complex that is hampers performance throughout the application and has limited benefit.    The long term solution will be to break the view into a subset of stored procedures.  For the few cases where the larger complex view is actually viable we will see improved performance via an intelligent series of cascading stored procedures or code-side logic.

 Lesson 2: Use Parameter Binding

Any modern database and their related interfaces will support data binding.  If your database does not support this and you are building enterprise-class applications it is time to select a new data engine.  PostgreSQL has supported this for years.   Nearly all of the data interfaces, including PHP’s MDB2 interface have also supported parameter binding for years. With parameter binding you will get a significant performance boost when iterating over data, especially in a nested loop fashion.
In our example the code was doing something similar to this, simplified for instructional purposes:
$qry1 = 'SELECT v_id,a,b,c,d FROM my_first_view WHERE NOT paid';
$result = $database->db()->query($qry1);
$dataset1 = $result->fetchAll(MDB2_FETCHMODE_ASSOC); 
$datastack = $dataset1;
$qry2 = 'SELECT v_id,e,f,g FROM my_second_view WHERE NOT paid';
$result = $database->db()->query($qry2);
$dataset2 = $result->fetchAll(MDB2_FETCHMODE_ASSOC);
foreach ($dataset1 as $data1) {
    foreach ($dataset2 as $data2) {
        if ($data1['v_id'] == $data2['v_id']) { 
             $datastack['subdata'][] = $data2; 
        }
    }
}

There are several significant performance issues here.   To start with there is significant memory consumption as we need to collect ALL the data from the database into memory.  We then collate the data from two complete sets in memory to create a single data element.    There are much more efficient ways to do this without fetching all data in memory first.

The better option would be to fetch the data from dataset1 on a row-by-row basis and push the data onto the stack one record at a time.  The inner loop for dataset2 should then select a subset of data that is specifically for the matching v_id from the outer dataset1 loop.   This is where parameter binding comes in.

Here is the same loop in untested simplified code format, using parameter binding.  In our real-world example this one simple change increased performance more than 50x because the database can be much more intelligent about how it selects subsets of data from the database & the PHP overhead both in memory and stack management is significantly reduced:

// give us parameter binding in MDB2 please
$database->db()->loadModule('Extended'); 

// setup our queries
$qry1 = 'SELECT v_id,a,b,c,d FROM my_first_view WHERE NOT paid';
$qry2 = 'SELECT v_id,e,f,g FROM my_second_view WHERE v_id = ?';
// get the "outer" data
// since we know we will use all the "outer" data, just put it
// directly on the data stack, cutting this memory consumption in half
$result = $database->db()->query($qry1);
$datastack = $result->fetchAll(MDB2_FETCHMODE_ASSOC);

// still loop through outer data to drive the 2nd query
foreach ($datastack as $data1) {
    // Fetch the data2 subset of matching data as
    // a named array, only getting those records that
    // match on v_id... in essence an SQL JOIN done
    // via code
    $database->db()->extended()->getAll(
        $qry2,
        null,
        array($data1['v_id']),
        array('integer'),
        MDB2_FETCHMOD_ASSOC
        );
    // Now we attach each of the matching elements in
    // the "inner" data set to the data stack, attaching
    // it under the related v_id 
    //
    foreach ($dataset2 as $data2) {
             $datastack['v_id'][$data1['v_id']]['subdata'][] = $data2; 
    }
}

This can be further refined and improved per our discussion above, but you get the idea.  I’ll leave it to you to figure out how to further refine the process.

You may be asking “why didn’t you just do a simple JOIN in the database engine?”  Good question.  The real world example is much more complex than this and some of the data elements and views in play make that solution complex to maintain and causes the database engine to trip-up on the optimization and is actually SLOWER in our real world case.   Here we are simplifying to illustrate the general concept only.

 Summary

A couple of simple real-world examples of improving performance have been illustrated here.    When refactoring a real-world application there are often complex interactions that need to be researched & evaluated.  The solutions are rarely simple and often can be approached in several ways.   The options shown here are not necessarily the best solutions but are the solutions that were the quickest to implement while providing a significant performance increase to the client.

Finding the balance between results and costs is always a challenge from the business perspective.    From the technical perspective a similar balance is often necessary between database and code intelligence.  Knowing your environment as a whole will help guide you down the proper path toward improving application performance.

 

Posted on

PostgreSQL – Exclusive Locking On Triggers

We have been working on a client’s project and dealing with an especially complex view in a PostgreSQL database. The view has a few tables linked together as well as several sub queries, a case statement, and a number of elements that do aggregate functions to build a single record of the view.

After updating the admin panel to allow them to use the database & admin page for processing orders (versus an email + excel system) we ran into several issues. The first issue was a significant degradation in performance whenever the system was under moderate load. This is important to note as our baseline release testing did not uncover the problem. The logic in the app was syntactically and functionally correct. However the application misbehaved under load, causing a bottleneck and essentially shutting down the app.

The Table Trigger Loop

We implemented a series of update triggers and stored procedures to propagate certain data elements between tables. Part of the design allowed for 2 tables to update each other. The UI allowed for an admin user to update Table A, which called a trigger to push a status field update into Table B (the legacy data). In other places Table B *could* be updated first, so we had a trigger to update Table A. The problem is the triggers would loop. Table A triggers an update in Table B triggers an update in Table A etc.

The initial designed called for turning off the triggers in the other table within the stored procedure. This requires an ALTER TABLE command. ALTER TABLE forces an exclusive lock on the database. NOTHING else can happen during this command, it is one of the few (if not the only) truly exclusive locks on the locking hierarchy.

This really slows down bulk updates. “Lock entire table”… insert… “unlock entire table”.

Master/Slave Tables Is A Better Solution

A better solution was to take out the alter table commands completely and remove the Table B updating Table A trigger. We made an application logic decision to make Table A the master and Table B the slave and eliminate (and/or log) any where Table B gets updated first. That should NEVER happen in the new app but some legacy stuff may still do that.

This is much faster as it generates no exclusive DB locks.

Entire Application Assessment v. Technical Complexity

In this case thinking about the overall solution which includes the application design and implementation as well as the technical syntax on the application and DB layer all work together to improve performance. Knowing the application and coming up with an overall design solution is often as effective as fancy technological solutions. Sometimes less complexity or technical “mojo” is the right answer to a complex problem.

Posted on

SQL: Using IN / NOT IN Operators

Here is an explanation I recently shared with someone about how to use NOT IN to select a subset of data from a one-to-many relationship.   It is a simple example that covers the basic use of NOT IN (and IN) operators in SQL.  These operators can be fairly powerful for performing quick queries on complex data relationships where a programmatic solution may otherwise be required.   This is not always the best solution but it can be a useful shortcut when used appropriately.

Syntax in various SQL dialects can vary, so I am using a generic SQL dialect based on my PostgreSQL knowledge.

The Problem

You have 2 tables you are linking together.   I forget the names so I’ll call them Nurses and RolesOfNurses.    For the purpose of this explanation let’s assume you are linking them by an ID field for the Nurse.

Your Nurse data may look like this:

ID: 1   Name: Karen
ID: 2  Name: Sharon

And your RolesOfNurses data:

Nurse ID: 1 RoleID: 10
Nurse ID: 1 RoleID: 555 
NurseID: 2 RoleID: 10

Normally with a one-to-many (1:M) relationship this RolesOfNurses table is just a middle-man so there would be another table called “Roles” that may have the RoleID and an common name to make human readable output (like reports) possible. So role #10 may be “nurse” and the specialized role #555 might be “Can Give Meds”.

In your select statement you want to find “All nurses that do not have the Can Give Meds” role. In database-like talk you want “All nurses that have role #10 but not role #555”. So your instinct is to join the two tables and filter by where role == 10 and role != 555. But that is wrong. The reason is that joining tables creates one unique row for every combination of data in the joined tables.

Breaking It Down : Select Nurses and Roles

Take it from the beginning.  Let’s just say “I want a list of all the roles nurses have”.  That is a simple join:

select * from nurses where nurse.id = rolesofnurses.nurseid;

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 1, Name: Karen, NurseID: 1, RoleID: 555
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Next: Filter The Data On Role # 10 Only

This data set is the basis of your advanced query, but before we get to the “dual and” let’s start with a single where clause to filter our data.  I’ve added the parenthesis only to make the where clause groupings easier to read.  The first where clause links the two tables together to give you the above data set.  The 2nd where clause filters that data to just show nurses.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Then : Add The Second Role Filter

Now add your 2nd where clause so you can have “all nurses that don’t have the Can Give Meds role” filter.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (roleofnurses.roleid != 555);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Notice that this is NOT the desired result!

Why? Look at the data.  All of your roles match BOTH of those filters.   The roles are all 10 AND they are NOT 555.  The first instinct is wrong. Personally I blame the fact that SQL, as powerful as it is, does not process English-based query syntax the same way that our brains do. Many people will make the same mistake. Sadly, the solution is a bit more complex to explain, but I’ll try to make it simple.

Fixing The Query: Step 1 – Build A List Of IDs

I’m going to start with the 2nd part of your filter, all the IDs of nurses that have role #555.  Note I said “IDs of nurses” here, that is important.  I don’t want the full set of nurse data like we have been doing up until now.  I simply want the list of ID numbers of nurses that have role #555.

select nurseid from rolesofnurses where roledid = 555;
NurseID: 1

This is the ONLY NurseID in our data set that has role # 555.

The Correct Query : Using NOT IN

Now let’s go and get our data we want by telling SQL “give me all the nurses that have role # 10 but are NOT in the list of nurses with role # 555”:

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id NOT IN (select nurseid from rolesofnurses where roledid = 555) );

And we get the desired result:

ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Explaining NOT IN

The trick here is the <blah> NOT IN <SQL QUERY> syntax.   Most (not all) SQL dialects support this, and it is an important advanced SQL construct. It allows you to filter data based on the results of another select statement. The trick with the NOT IN operator is to make the <SQL QUERY> part return a list that consists of a single elements you want to filter on. In our case the <SQL QUERY> returns the entire list of nurses that have role #555. We use NOT IN to tell it to only accept the record if the Nurse ID is NOT IN the list of nurse IDs that have role #555 or in English “pick all nurses that do not have role #555”. In our example it was a single record, but in a larger data set it could return an entire list of records.

Notes About IN/NOT IN Usage

For a few hundred records this construct is OK.  At some point you need to start looking into temp tables and other methods as using the <blah> NOT IN <SQL QUERY> construct can consume a lot of memory on the server. You need to be intelligent about what kind of resources your server has and how large the potential data set is for the second part of the IN query because most DB engines will load the entire list of ID #s into RAM.

Another note, you can create complex constructs with IN the same way as NOT IN, for example “give me a list of all nurses that have role #10 AND role #555”.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id IN (select nurseid from rolesofnurses where roledid = 555) );

BTW, I’ve used uppercase for emphasis only.   SQL is not case sensitive.

Posted on

MySQL – I Am a Dummy


Today I learned about an interesting command-line option for MySQL:
$ mysql --i-am-a-dummy
This is an alias for --safe-updates, which imposes three restrictions:

  1. Any UPDATE or DELETE that does not have a WHERE or LIMIT is rejected.
  2. Any SELECT without a LIMIT will be automatically limited to 1,000 rows in the result.
  3. Any join that would result in examining more than a million row combinations is rejected.

I thought it was funny at first, but the first restriction alone makes this useful for doing work on the command-line on live servers.  You ever intend to do something like DELETE FROM users WHERE id = … and you forget that tiny little where-clause?  Because I have, and all you want to do is hide under your desk, under the assumption that if no one can see you then you must be invisible.

Posted on

Restaurant Apps with The Table Tap

Cyber Sprocket Labs started a fun and exciting relationship that is the perfect cross section of two of our favorite things, programming and beer.    While we’ve worked on many exciting projects over the years, this is definitely in the top two.  None of our esteemed programmers ever thought they’d be working on an application that facilitates the delivery of beer directly to the consumer.  Yet, that is exactly what they are doing.

The Table Tap provides a unique and innovate new service to bars and restaurants anywhere in the world.   This service puts full service beer taps within the consumer’s reach, literally.    A computer controlled interface with up to 4 beer taps is installed directly in the table.    A quick swipe of an RFID card activates the taps and allows the customer to pour their own beer, as much or as little as they’d like.

Cyber Sprocket has been instrumental in the helping Jeff Libby bring his concept to the next level.  By providing technical support both during and after the installation he has been able to speed up his deployment cycle, increasing revenue.   We have also provided extensive programming services to update the business manager, hostess, and system administrator interfaces.    During our first few months working on the project we’ve also been tasked with several new additions to the software, the newest of which is enabling direct table-to-table chat using the system’s built in color LCD displays.

Like we said, a very fun and exciting project that has taken our technology services to places we never expected.   Thanks Jeff, we look forward to a long and prosperous relationship as your one-of-a-kind solution takes off!

Technical Overview

Services Provided

  • Web Application Programming
  • Database Design
  • Database Maintenance
  • Network Support
  • Device Interface Programming
  • System Configuration and Installation
  • Technical Support
  • Customer Support

Platform Details

Posted on

PostgreSQL Cheat Sheet

PostgreSQL is one of our favorite database engines for a variety of reasons.  Here is our cheat sheet to help you get online and get around Postgres with minimal effort.

Database Access Security

Database security is handled primarily in two place, from the system service level via a file called pg_hba.conf and within the database metadata files themselves.   The pg_hba.conf file controls what level of credentials are needed based on what IP address the requesting connection is coming from.   The metadata within the engine itself generally controls user level access once they are connected and approved at the system level.

Systemwide Configuration via pg_hba.conf

This file matches IP address with a set of rules to determine how much data you need to provide in the first place before getting access to the database engine.   It includes the IP address, the username trying to connect, and what type of validation is needed.

The data comes in a series of tab separated columns including:

  • Connection Type
    • local = from local tty connection
    • host = from an internet connection
  • Database = which database is the user trying to connect to?
  • User = which user they are connecting as.
  • IP Address = what address are they coming from?
  • Method = how shall we authenticate them?
    • md5 = let them in if the password matches
    • ident sameuser = let them in in the password matches and their login user matches the user they are trying to connect as
    • trust = let them in as long as the ip address matches, no password required

Finding pg_hba.conf

The pg_hba.conf file lives in various places depending on the flavor of Linux.

  • Red Hat, CentOS, Fedora = /var/lib/pgsql/data/
  • Ubuntu = /etc/postgresql/<version>/main/

Command Line Access

You can do a lot of maintenance operations or test queries using the command line interpreter.  The command line in PostgreSQL is accessed via the psql command.   The most often used parameters with psql are to connect as a user other than your login user, provide your password, and give it the name of the database on which to connect.

Example:

# psql -U other_name -W other_db

Command Line Shortcuts

From the command line there are a variety of shortcuts to help you navigate around the database engine or see what is going on.  Here are a few of the most useful:

  • List Databases: \l
  • List (display) Tables : \dt
  • List Columns in a Table: \d <tablename>

Creating A Database

Here is how you create a new database that is owned by  a specific user.  This assumes a “clean slate” install.   You will need to have the postgres user login credentials and/or root access.  You will be creating a PostgreSQL user and password and will change the system-level postgres daemon security settings to allow access with the password regardless of which user you login as.

# # login as postgres user or su postgres if you are root

# psql
psql> create user lance with password ‘cleveland’;
psql> create database myfirstdb with owner lance;
psql> \q
# vi /var/lib/pgsql/data/pg_hba.conf

While in pg_hba.conf change this line:

local   all         all                              ident

to this:

local   all         all                              md5

Backing Up / Dumping Your Data

Data dumps are a quick way to put the schema, data, or a combination of both out into a file that can be used to re-create the database on other systems or just back it up to a remote location.  The PostgreSQL command for this is pg_dump.  It takes the same parameters as thecommand line access.

Simple Data Dump

Example:

# pg_dump -U myname -W the_db_name > dump_thedb_2010_0704_001.sql

Clean Data Dump

This is the format to use if you want to ensure the entire database is dropped & re-created when loading on a new system.

Example:

# pg_dump --clean --create -U myname -W the_db_name > dump_thedb_2010_0704_001.sql

Reloading Dumped Data

To reload such a script into a (freshly created) database named the_db_name :

# psql -d the_db_name -f dump_thedb_2010_0704_001.sql

If the clean data dump method was used you will want to login as postgres and let the sql script create the database:

# su postgres

# psql -f dump_thedb_2010_0704_001.sql

Summary

There are plenty more tips, tricks, and shortcuts.  These will get you started.  We’ll add more as time allows.

Posted on

Logistics & Inventory Management with Alutiiq

Alutiiq recently award Cyber Sprocket a 1-year teaming agreement, making this our 4th year of working with Alutiiq on developing, supporting, & maintaining their military logistics & inventory management application.  We are very excited to be working with Alutiiq for another year.  The upcoming year will bring some exciting new possibilities for follow on projects that augment the system already in place.  We’re looking forward to being part of the design & development team and supporting our clients and our country both at home and abroad.

Technical Overview

Services Provided

  • Web Application Programming
  • Database Design
  • Database Maintenance
  • System Architecture
  • Network Support

Platform Details

Posted on

Consumer Web Apps with Abundatrade

When Abundatrade decided to take their project to the next level they chose Cyber Sprocket Labs to help them get there.   They brought their existing website to us and asked us to help.  They were looking for a more fluid, more enjoyable web experience for their users.   They needed an updated easy-to-use web calculator & they needed it fast.

Cyber Sprocket did the job quickly & did it right.   More than a year after that first quick “help us fix our site” project we are still helping them push their technology further.   We are now not only helping them slowly morph their consumer services into something bigger & better, we are also helping them run their operations more efficiently by integrating the web technologies with their daily operations & services that run the business.

The main part of the consumer interface is a product valuation calculator.  Using a proprietary algorithm the system looks up a bar code on a product that a consumer enters and runs a series of data queries against a variety of popular online stores such as Amazon to determine a fair market valuation on the product.  The system then compares that to current stock levels within the Abundatrade warehouse and generates a real-time trade-in value for the consumer.

During our first year with Abundatrade we have helped them improve the user experience, refined the user interface, and given them some valuable ideas and insight into how technology can make their business more efficient.   We are looking forward to forging an even closer relationship with Abundatrade in the coming year in both a business sense… and a physical sense; Abundatrade is now our closest neighbor – literally just a few hundred yards away from our new Mount Pleasant home!

We are looking forward to another fun-filled and exciting year working with Abundatrade and helping them grow!

Thanks Kent & Clayton, we appreciate working with you!

Technical Overview

Services Provided

  • Web Application Programming
    • PayPal API
    • Amazon API
    • Monsoon API
    • 3rd Party Music DB API
  • Web Interface Design
  • Process Flow Analysis

Platform Details

Posted on

Data Schema Documentation Tool : SchemaSpy

A topic that came up today is that the 3DChat game database does not really have a reference schema.  It used to, in the early stages of the Admin Panel repository, but was eventually phased out and replaced with a combination of PDFs and PNGs generated from a Visio document.

There is an issue of usefulness here, with two sides to consider.

1. As a developer, I prefer to have a text, written, SQL schema to work with and refer to.  Something I can feed directly into my local database engine.  When I have questions about what a column is used for, or how tables relate, this is where I look first and where I expect to see the explanations.

2. However, I may be a developer, but not a developer who deals directly with the schema above.  Consider 3DChat in particular. Our game database schema was in a Git repository that Golem Labs would never use.  It is much easier from their point-of-view to have an external (outside of the repo) schema to serve as their reference.  This is ultimately why the schema went onto Google docs like it did.

I think both of these are valid.  One problem that comes up—however—is that when it comes time to update the schema, we waste time doing the work twice, in two different ways.

So in the back of my head I had thought that surely there was some way to automatically generate diagrams and documentation from an SQL schema.  When the subject came up again today, some searching led me to the program SchemaSpy:

http://schemaspy.sourceforge.net/

It is a Java program that connects directly to your database and spits out a lot of information and diagrams.  To demonstrate, I ran the program on the staging Social Engine database.

You run SchemaSpy from the command line:
java -jar schemaSpy.jar -t dbType -db dbName [-s schema] -u user [-p password] -o outputDir

You can see the results here:
http://eric.cybersprocket.com/se-schema/

I am going to mess around some more with this over the weekend, but already I think this could be a viable tool to let developers like myself continue to maintain text-based schemata for reference, while being able to generate documentation that can be put elsewhere for other interested parties who don’t/can’t dig into the code directly.