Posted on

WordPress dbDelta Better In 4.2, Not Yet Perfect

WP dbDelta Bug Report Banner

There have been a LOT of changes in WordPress Core including updates to the dbDelta() method in the upgrade function set.   What is dbDelta()?  dbDelta is the primary function used by WordPress to determine if the structure of data tables needs to be augmented between releases.   It works great for upgrading core WordPress data tables.    Where it has problems is with developer-specific SQL statements used to create secondary tables in plugins and themes.    Store Locator Plus, for example, uses a secondary table to storing basic location data.

Unfortunately dbDelta has always had issues (see running commentary on Trac ticket 10404), especially when it comes to indices on a MySQL table.   Most well-designed data tables will have primary and secondary indexes on the table to improve query performance.    dbDelta has a hard time figuring out when an index already exists and has not been modified.   If you did not craft your CREATE TABLE command EXACTLY the way dbDelta wants to see it, the dbDelta method will create new indexes.    This can cause performance degradation and unnecessary overhead in the MySQL tables when a data column ends up with 10, 20 , 30+ copies of the index; one created every time the plugin or theme is updated.

Many plugins and themes suffer from this malady.   Over two years ago I dug into dbDelta and learned the syntax secrets to get WordPress to stop creating duplicate indexes.   Today I found a site that was back to its old tricks running the latest version of Store Locator Plus and WordPress 4.2.2.   As it turns out the rules for dbDelta have changed and thus the syntax for creating a “dbDelta friendly” SQL statement has changed as well.    With dbDelta a behind-the-scenes feature it is never broadcast to the general public that it has changed and thus goes undetected for a while when something goes awry.

Turns out the changes are not new.  They go back to at least WordPress 4.1.    The last time I dug into this issue was WordPress 3.8.   Guess it is time for some phpUnit testing on dbDelta within my own plugin suite testing.   If things are running properly the dbDelta command for an existing installation of Store Locator Plus 4.2 should yield NO QUERY UPDATES when activating/installing a newer version of the plugin.

WP 4.1 dbDelta Results

Source SQL:

dbDelta SQL: CREATE TABLE wp_store_locator ( sl_id mediumint(8) unsigned NOT NULL auto_increment, sl_store varchar(255) NULL, sl_address varchar(255) NULL, sl_address2 varchar(255) NULL, sl_city varchar(255) NULL, sl_state varchar(255) NULL, sl_zip varchar(255) NULL, sl_country varchar(255) NULL, sl_latitude varchar(255) NULL, sl_longitude varchar(255) NULL, sl_tags mediumtext NULL, sl_description text NULL, sl_email varchar(255) NULL, sl_url varchar(255) NULL, sl_hours varchar(255) NULL, sl_phone varchar(255) NULL, sl_fax varchar(255) NULL, sl_image varchar(255) NULL, sl_private varchar(1) NULL, sl_neat_title varchar(255) NULL, sl_linked_postid int NULL, sl_pages_url varchar(255) NULL, sl_pages_on varchar(1) NULL, sl_option_value longtext NULL, sl_lastupdated timestamp NOT NULL default current_timestamp, PRIMARY KEY (sl_id), KEY sl_store (sl_store), KEY sl_longitude (sl_longitude), KEY sl_latitude (sl_latitude) ) DEFAULT CHARACTER SET utf8

dbDelta Updates List:

QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_linked_postid sl_linked_postid int NULL
QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_lastupdated sl_lastupdated timestamp NOT NULL default current_timestamp
QUERY: ALTER TABLE wp_store_locator ADD PRIMARY KEY (sl_id)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_store (sl_store)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_longitude (sl_longitude)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_latitude (sl_latitude)
for update

Update Results:

[wp_store_locator.sl_linked_postid] => Changed type of wp_store_locator.sl_linked_postid from int(11) to int
[wp_store_locator.sl_lastupdated] => Changed type of wp_store_locator.sl_lastupdated from timestamp to
[0] => Added index wp_store_locator PRIMARY KEY (sl_id)
[1] => Added index wp_store_locator KEY sl_store (sl_store)
[2] => Added index wp_store_locator KEY sl_longitude (sl_longitude)
[3] => Added index wp_store_locator KEY sl_latitude (sl_latitude)

My WP 4.2 dbDelta Results

Better, but still not quite right.

Source SQL:

CREATE TABLE wp_store_locator ( sl_id mediumint(8) unsigned NOT NULL auto_increment, sl_store varchar(255) NULL, sl_address varchar(255) NULL, sl_address2 varchar(255) NULL, sl_city varchar(255) NULL, sl_state varchar(255) NULL, sl_zip varchar(255) NULL, sl_country varchar(255) NULL, sl_latitude varchar(255) NULL, sl_longitude varchar(255) NULL, sl_tags mediumtext NULL, sl_description text NULL, sl_email varchar(255) NULL, sl_url varchar(255) NULL, sl_hours varchar(255) NULL, sl_phone varchar(255) NULL, sl_fax varchar(255) NULL, sl_image varchar(255) NULL, sl_private varchar(1) NULL, sl_neat_title varchar(255) NULL, sl_linked_postid int NULL, sl_pages_url varchar(255) NULL, sl_pages_on varchar(1) NULL, sl_option_value longtext NULL, sl_lastupdated timestamp NOT NULL default current_timestamp, PRIMARY KEY (sl_id), KEY sl_store (sl_store), KEY sl_longitude (sl_longitude), KEY sl_latitude (sl_latitude) ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

dbDelta Updates List:

QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_linked_postid sl_linked_postid int NULL
QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_lastupdated sl_lastupdated timestamp NOT NULL default current_timestamp
QUERY: ALTER TABLE wp_store_locator ADD PRIMARY KEY (sl_id)

Update Results:

[wp_store_locator.sl_linked_postid] => Changed type of wp_store_locator.sl_linked_postid from int(11) to int
[wp_store_locator.sl_lastupdated] => Changed type of wp_store_locator.sl_lastupdated from timestamp to
[0] => Added index wp_store_locator PRIMARY KEY (sl_id)

WP 4.2 dbDelta SQL Best Practices

Some of these carry over from the prior dbDelta comments, some are new with later versions of dbDelta and MySQL.   The code is fairly robust, but if your MySQL installation comes back with a different format for the describe command this can wreak havoc on dbDelta().     I’ve presented things to look out for when crafting your dbDelta SQL statements as well as possible improvements in WP Core for dbDelta.    I may submit patches once again, but it has proven nearly impossible to get patches reviewed and approved for dbDelta.   Makes sense given the HUGE impact this function has on the WordPress Core engine, but there are thousands of plugins and themes wreaking havoc on WordPress performance due to some of these issues.    Maybe a detailed review like this will help get some updates into core.

Table Creation / Modification Commands

dbDelta() uses the MySQL DESCRIBE <table> command to fetch data about the table name passed in to dbDelta().  It is parsed with a regular expression out of the SQL statement looking for CREATE TABLE <…> , CREATE DATABASE <…> , INSERT INTO <..>, and UPDATE <…>  where the first “word” in the <…> part is assumed to be the table name.

Notes

  • The process appears to be case sensitive.  Use the commands as noted above in all caps.
  • Make sure you have no extra spaces, newlines, or other whitespace in between CREATE and TABLE or the other two-word directives noted here.
  • Make sure your table name has a leading space before and trailing space after.

Possible Improvements

  • Consider make thing preg_match() case insensitive.
  • Consider changing the single-space in commands like INSERT INTO into multi-whitespace regex i.e. INSERT\s+INTO

Defining Column Types

On some servers, such as my CentOS 7 development box, MySQL Ver 15.1 Distrib 5.5.41-MariaDB, has decided that post types of INT really mean INT(11).   Different variations may post different results back from the DESCRIBE command, but here is what I find working on MOST servers including my Linux dev box:

  • Always specify the LENGTH for each field type that accepts a length.  For most fields use the default length.
    • int(11)
    • mediumint(8)
    • varchar(1) to varchar(255), 255 is the default.
  • Fields without lengths:
    • longtext
    • mediumtext
    • text
  • Do not put spaces between the field type and the parens or the number in parens for the length.
  • User lowercase for field name and the field type.
  • Use uppercase for the NULL and NOT NULL specification.
  • Do NOT have extra spaces between the field name, the field type, and any of the additional definition parameters.

Examples

global $wpdb;
$sql =
'CREATE TABLE my_table (
my_id mediumint(8) unsigned NOT NULL auto_increment,
my_first_field varchar(1) NULL,
my_second_field varchar(255) NULL,
my_third_field int(11) NULL,
my_lastupdate timestamp NOT NULL default CURRENT_TIMESTAMP
)
DEFAULT CHARACTER SET ' . $wpdb->charset;

Things the cause problems in my setup:

  • Two spaces after the field name my_lastupdate and the word timestamp.
  • Using int instead of int(11) on my_third_field.
  • Using lowercase CURRENT_TIMESTAMP on my_lastupdate.

Possible Improvements

  • Consider make thing preg_match() case insensitive.
  • Make the comparison on the default case insensitive, or shift the inbound query uppercase.
  • Consider changing the single-space in commands into multi-whitespace regex tests.

 

Defining Keys (Indices)

The key comparison system is just as strict with whitespaces and case sensitivity as the column comparison routine in dbDelta.   You MUST have TWO spaces after the keywords “PRIMARY KEY” before defining your index field but only ONE SPACE after all other keys.

  •  ALWAYS declare the column names for every key type, including primary key.
  • ALWAYS put your column names inside parenthesis with no spaces.
  • PRIMARY KEY should NEVER have a key name.
  • PRIMARY KEY is always followed by two spaces then the parenthesis then the column name and closing parenthesis.
  • KEY is always followed by a single space, the column name, then a single space then parenthesis around the column name.
  • They keyword PRIMARY KEY must be uppercase.

Examples

global $wpdb;
$sql =
'CREATE TABLE my_table (
my_id mediumint(8) unsigned NOT NULL auto_increment,
my_first_field varchar(1) NULL,
my_second_field varchar(255) NULL,
my_third_field int(11) NULL,
my_lastupdate timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY  (my_id),
KEY third_key (my_third_field)
)
DEFAULT CHARACTER SET ' . $wpdb->charset;

Things the cause problems in my setup:

  • Missing the second space after the PRIMARY KEY definition.

Possible Improvements

  • Consider changing the single-space in commands into multi-whitespace regex tests.
  • Make the indices incoming text for the PRIMARY and KEY keywords force to uppercase.
Posted on

Janitor 4.1.15 Makes Quick Work Of Resetting Locations

WordPress Locations Tabbar Banner

The Janitor add-on pack for Store Locator Plus adds a new drop locations feature.    This feature makes quick work of deleting the Store Locator Plus locations when a simple database reset is required.   The new feature is far faster than the previous Clear Locations option, but Drop Locations does have its limits.

J 4.2.15 Clear or Drop Locations
J 4.2.15 Clear or Drop LocationsWo

Drop Locations

This new feature removes the main locations table and the extended meta data fields, used by the * Extender add-on packs, via a the SQL DROP TABLE command.  Both tables are immediately recreated.     This feature does NOT remove the associates custom WordPress “store pages” data NOR does it remove the Tagalong and custom taxonomy data.    If you are not using Tagalong categories and are not use Store Pages, then Drop Locations can remove tens- or even -hundreds-of-thousands of locations in less than a minute.    By comparison a 200,000 location table can take as long as 2 hours using Clear Locations.

Clear Locations

Clear Locations is a more thorough removal mechanism.  This process leaves the data tables intact but deletes the records one-at-time using the WordPress data interface which invokes all the hooks-and-filters.  That means any third party or CSA-derived add-on packs will fire all the “location cleanup” processes including Tagalong data cleanup , Store Pages cleanup, and any related meta data that may be hiding in WordPress options tables, taxonomy tables, custom page types, or custom data tables installed by add-on packs.

Janitor Change Log

Posted on

WordPress Performance Tuning

MySQL InnoDB Banner

The Charleston Software Associates website has been getting slower every week.   For the past month I’ve been spending an hour or two trying to figure out what was going on.    I’ve used several tools to analyze the site but it seems no matter what I do the site gets marginally faster at best.   I still receive daily alerts of “your site took more than 15 seconds to respond” from the Windows Azure monitoring service that pings my hosted Linux virtual machines and alerts me to issues.

Last night I decide to switch some of my tables from MyISAM to InnoDB.  The theory being that the MySQL InnoDB engine does row-level locking and since the bulk of my site is order transactions and forum posts the insert and update operations are at least as frequent as the read-only operations.   Row level locking should be better.

Wow, did I SCREW THAT ONE UP.

Actually I only partly screwed it up.

I made the change around 11PM last evening and everything looked OK.  No notable performance gains but I wanted to see what happened over 24 hours with the “your site is slow as hell” notices I get every day.   By noon today it was clear I moved in the wrong direction.   I was now getting “your site sucks” warnings from my monitoring service every half-hour.   Ouch!

Fixing InnoDB

It turns out you should not just convert a table from MyISAM to InnoDB without first “checking under the hood”.   While it is easy to convert from MyISAM to InnoDB with a simple MySQL ALTER TABLE <x> ENGINE = InnoDB command (make sure the table does not have Full Text indexes with SHOW INDEXES IN <x> first) that may not be prudent.

When I got on my server today I noticed the CPU would spike to 90% or more, then drop back to normal.  Every time it was MySQL spinning the CPU as hard as it could and always when accessing the half-dozen InnoDB tables I created.

Turns out that InnoDB could use some serious tuning before turning it on.  InnoDB can really use a helping hand by allowing multiple CPU threads and by utilizing a buffer.   Turns out there are some other general MySQL buffers and MyISAM settings that can help performance as well.

For MySQL on a Red Hat based distribution, and many other Linux distros, you can tweak the initial startup settings by editing the my.cnf file.  On CentOS it lives in /etc/my.cnf.

Here is what I added to the file in the [mysqld] section:

# default: 1M
innodb_additional_mem_pool_size=32M
# default: 8M
innodb_buffer_pool_size=256M
# default: 0
innodb_thread_concurrency=8
# default: 128K (131072)
join_buffer_size=64M
# default:8M
myisam_sort_buffer_size=32M
# default: 0 (disabled)
query_cache_size=128M
# default: 2M
sort_buffer_size=32M

The MySQL InnoDB settings page will tell you more about each setting. Google can help you find out what the other settings mean.

You can use the SHOW VARIABLES MySQL command to see what your current settings are.   Most of the settings can be added to the my.cnf file and tweaked.  Just make sure you read about settings before blindly guessing what correct values are for your server.  Learn what they do, tweak, and monitor.   Hopefully you can avoid doing this to your CPU and response times:

InnoDB Tuning Fail
InnoDB Tuning Fail

By increasing the buffer sizes and turning on multiple threads my CPU is not spinning up to 90% or more when an SQL query is sent along from WordPress. With 4GB of RAM on my server I am certain I can tune this even further and there are other settings that will help, but this is a great start to getting MySQL to perform better than it has been.

Lesson Learned

The lesson here, especially if you are on a dedicated server, is to look at your MySQL configuration. The “out of the box” setup is not usually ready for production in ANY environment. I should know better, but when running a business solo it is one of the many “I’ll do it later” tasks that don’t get addressed until it bites you in the ass. Like it did for me today.

My guess is my quad-core 4GB RAM server can more than handle the load my website brings to it.    If I turn all of the right dials in the right direction I should be able to get the page response times to under 1 second.   As one of the most impatient web surfers you’ll meet I find my current 4-second average page response time is simply far too long to wait.

Some good articles on WordPress DB Tuning

http://wordpresswebmaster.net/mysql-database-optimization-on-wordpress/
http://www.prelovac.com/vladimir/wordpress-optimization-guide

Posted on

WordPress dbDelta Bug Affects SLP

There is a bug that remains open in the WordPress Core tracking system that impacts Store Locator Plus database tables.  The bug will degrade database performance after each consecutive update to the Store Locator Plus base plugin.    The problem and resolution are described below.

The Problem

There is a glitch in a core WordPress function, dbDelta(), that creates duplicate table indexes.    Store Locator Plus uses 3 indexes on the locations table.  Indexes significantly increase the performance of database searches when used properly.  For Store Locator Plus an index exists on the store name, the latitude, and the longitude.   The latter two mean that whenever someone does a location search on your website the data returned from the locations table is found much more quickly than if the data index was not in place.  On large installs, those with hundreds or thousands of locations, the difference is notable.

To put the index and any other data table elements in place, SLP uses the built-in dbDelta function from WordPress.  This is the recommended method for managing table and index creation as well as structure updates.  It works great.  Most of the time.

The problem is that dbDelta() has a glitch that creates a new index every time the Store Locator Plus version changes.    It should be looking for the store, lat, and long indexes and if not there it will create them.  Instead if it finds an existing index it creates a new index and adds a numeric suffix.    After a number of upgrades to SLP you end up with a database that has multiple indexes like this:

  • sl_store
  • sl_store_1
  • sl_store_2
  • sl_latitude
  • sl_latitude_1
  • sl_latitude_2
  • sl_longitude
  • sl_longitude_1
  • sl_longitude_2

Those _# indexes are IDENTICAL to the non-suffixed version.

This is bad.   Any time a record is updated or added in the locations table it is MUCH slower as every index needs to be updated.  After just 3 updates MySQL is performing 9 index updates on every record operation versus 3.

It can also confuse and SLOW DOWN record searches as the MySQL engine will take longer to determine which is the best index to use as it has to load the entire index directory.  I assume MySQL is smart enough to drop after the “first perfect match”, but that is purely conjecture.   If that is true the search impact would be minimal.  If MySQL is not that intelligent and it looks at ALL INDEXES before selecting the “best match” during a search the impact here can be notable as well, though not nearly as dramatic as a updating or inserting new data.

Do You Have A Problem?

Unfortunately there are only 2 ways to tell.

First – your WordPress site starts throwing errors (usually hidden in the error log) that are coming from MySQL and tell you your index limit of 64 entries has been exceeded.    You shouldn’t need to increase this limit, it is per-table and there are very few (if any) cases where 64 indexes on a single table is helpful.

Second – use the MySQL command line tool or phpMyAdmin and look at the wp_store_locator table.   If it has _# indexes they need to go.

The Manual Fix

The way to fix installations by hand is to use the MySQL command line tool, or PHPMyAdmin and drop the _# indexes from the wp_store_locator table.   Since Store Locator Plus is not the only plugin affected you may want to check for any other tables that are created by plugins and see if they suffer from the same issue.  Contact the plugin author before randomly deleting indexes.

From the MySQL command line you can execute the DROP INDEX command as often as necessary based on how many duplicate indexes appear:

mysql> DROP INDEX sl_store_2 ON wp_store_locator;
mysql> DROP INDEX sl_latitude_2 ON wp_store_locator;
mysql> DROP INDEX sl_longitude_2 ON wp_store_locator;

You should leave the indexes without the _# intact.

The Automated Fix

I am looking into ways to detect and auto-delete the secondary indexes in Store Locator Plus in the 3.9.X+ updates, however there is a danger of bloating the update/install process which I want to avoid.    I am also looking for ways to prevent the problem from happening in the first place and have been active with the reported bug to see if there is a way to get WordPress Core developers to patch the problem.

If a workaround is found it will be noted in the updates road map for SLP.

###

Alexa Rank: 264,930 / 89,424 / 611

Technorati Rank: 27347

Posted on

Connect To Your Database With Emacs

The recent Emacs post made me think of a useful SQL trick I recently learned that I wanted to share. Emacs refers to different SQL engines as ‘products’, e.g. MySQL, SQLite, and so on. You can connect to an SQL database through Emacs but you have a call a different command based on the product.

An easier way to do this, as of version 24, is to use the keyboard shortcut C-u C-c TAB when viewing an SQL document. This will prompt you first for the product, and then for any connection information. So if you are working with an SQLite database you can press C-u C-c TAB, type sqlite, and then the filename of the database; that will open an interactive window connected to the database. Or if you are using PostgreSQL then Emacs will ask for a username, port, etc. The supported products are:

  • db2
  • informix
  • ingres
  • interbase
  • linter
  • ms
  • mysql
  • oracle
  • postgres
  • solid
  • sqlite
  • sybase

I don’t even know what some of those are…

Anyways, sometimes I find this faster than opening a terminal if I need to run some test queries or copy output into a document. (This is very useful with SQLite’s .mode html command, which will print the results of queries in <table> form.)


ejmr
南無妙法蓮華經

This article was originally posted on “The List” by EJMR.

Posted on

SQL: Using IN / NOT IN Operators

Here is an explanation I recently shared with someone about how to use NOT IN to select a subset of data from a one-to-many relationship.   It is a simple example that covers the basic use of NOT IN (and IN) operators in SQL.  These operators can be fairly powerful for performing quick queries on complex data relationships where a programmatic solution may otherwise be required.   This is not always the best solution but it can be a useful shortcut when used appropriately.

Syntax in various SQL dialects can vary, so I am using a generic SQL dialect based on my PostgreSQL knowledge.

The Problem

You have 2 tables you are linking together.   I forget the names so I’ll call them Nurses and RolesOfNurses.    For the purpose of this explanation let’s assume you are linking them by an ID field for the Nurse.

Your Nurse data may look like this:

ID: 1   Name: Karen
ID: 2  Name: Sharon

And your RolesOfNurses data:

Nurse ID: 1 RoleID: 10
Nurse ID: 1 RoleID: 555 
NurseID: 2 RoleID: 10

Normally with a one-to-many (1:M) relationship this RolesOfNurses table is just a middle-man so there would be another table called “Roles” that may have the RoleID and an common name to make human readable output (like reports) possible. So role #10 may be “nurse” and the specialized role #555 might be “Can Give Meds”.

In your select statement you want to find “All nurses that do not have the Can Give Meds” role. In database-like talk you want “All nurses that have role #10 but not role #555”. So your instinct is to join the two tables and filter by where role == 10 and role != 555. But that is wrong. The reason is that joining tables creates one unique row for every combination of data in the joined tables.

Breaking It Down : Select Nurses and Roles

Take it from the beginning.  Let’s just say “I want a list of all the roles nurses have”.  That is a simple join:

select * from nurses where nurse.id = rolesofnurses.nurseid;

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 1, Name: Karen, NurseID: 1, RoleID: 555
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Next: Filter The Data On Role # 10 Only

This data set is the basis of your advanced query, but before we get to the “dual and” let’s start with a single where clause to filter our data.  I’ve added the parenthesis only to make the where clause groupings easier to read.  The first where clause links the two tables together to give you the above data set.  The 2nd where clause filters that data to just show nurses.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Then : Add The Second Role Filter

Now add your 2nd where clause so you can have “all nurses that don’t have the Can Give Meds role” filter.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (roleofnurses.roleid != 555);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Notice that this is NOT the desired result!

Why? Look at the data.  All of your roles match BOTH of those filters.   The roles are all 10 AND they are NOT 555.  The first instinct is wrong. Personally I blame the fact that SQL, as powerful as it is, does not process English-based query syntax the same way that our brains do. Many people will make the same mistake. Sadly, the solution is a bit more complex to explain, but I’ll try to make it simple.

Fixing The Query: Step 1 – Build A List Of IDs

I’m going to start with the 2nd part of your filter, all the IDs of nurses that have role #555.  Note I said “IDs of nurses” here, that is important.  I don’t want the full set of nurse data like we have been doing up until now.  I simply want the list of ID numbers of nurses that have role #555.

select nurseid from rolesofnurses where roledid = 555;
NurseID: 1

This is the ONLY NurseID in our data set that has role # 555.

The Correct Query : Using NOT IN

Now let’s go and get our data we want by telling SQL “give me all the nurses that have role # 10 but are NOT in the list of nurses with role # 555”:

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id NOT IN (select nurseid from rolesofnurses where roledid = 555) );

And we get the desired result:

ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Explaining NOT IN

The trick here is the <blah> NOT IN <SQL QUERY> syntax.   Most (not all) SQL dialects support this, and it is an important advanced SQL construct. It allows you to filter data based on the results of another select statement. The trick with the NOT IN operator is to make the <SQL QUERY> part return a list that consists of a single elements you want to filter on. In our case the <SQL QUERY> returns the entire list of nurses that have role #555. We use NOT IN to tell it to only accept the record if the Nurse ID is NOT IN the list of nurse IDs that have role #555 or in English “pick all nurses that do not have role #555”. In our example it was a single record, but in a larger data set it could return an entire list of records.

Notes About IN/NOT IN Usage

For a few hundred records this construct is OK.  At some point you need to start looking into temp tables and other methods as using the <blah> NOT IN <SQL QUERY> construct can consume a lot of memory on the server. You need to be intelligent about what kind of resources your server has and how large the potential data set is for the second part of the IN query because most DB engines will load the entire list of ID #s into RAM.

Another note, you can create complex constructs with IN the same way as NOT IN, for example “give me a list of all nurses that have role #10 AND role #555”.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id IN (select nurseid from rolesofnurses where roledid = 555) );

BTW, I’ve used uppercase for emphasis only.   SQL is not case sensitive.

Posted on

MySQL – I Am a Dummy


Today I learned about an interesting command-line option for MySQL:
$ mysql --i-am-a-dummy
This is an alias for --safe-updates, which imposes three restrictions:

  1. Any UPDATE or DELETE that does not have a WHERE or LIMIT is rejected.
  2. Any SELECT without a LIMIT will be automatically limited to 1,000 rows in the result.
  3. Any join that would result in examining more than a million row combinations is rejected.

I thought it was funny at first, but the first restriction alone makes this useful for doing work on the command-line on live servers.  You ever intend to do something like DELETE FROM users WHERE id = … and you forget that tiny little where-clause?  Because I have, and all you want to do is hide under your desk, under the assumption that if no one can see you then you must be invisible.

Posted on

Consumer Web Apps with Abundatrade

When Abundatrade decided to take their project to the next level they chose Cyber Sprocket Labs to help them get there.   They brought their existing website to us and asked us to help.  They were looking for a more fluid, more enjoyable web experience for their users.   They needed an updated easy-to-use web calculator & they needed it fast.

Cyber Sprocket did the job quickly & did it right.   More than a year after that first quick “help us fix our site” project we are still helping them push their technology further.   We are now not only helping them slowly morph their consumer services into something bigger & better, we are also helping them run their operations more efficiently by integrating the web technologies with their daily operations & services that run the business.

The main part of the consumer interface is a product valuation calculator.  Using a proprietary algorithm the system looks up a bar code on a product that a consumer enters and runs a series of data queries against a variety of popular online stores such as Amazon to determine a fair market valuation on the product.  The system then compares that to current stock levels within the Abundatrade warehouse and generates a real-time trade-in value for the consumer.

During our first year with Abundatrade we have helped them improve the user experience, refined the user interface, and given them some valuable ideas and insight into how technology can make their business more efficient.   We are looking forward to forging an even closer relationship with Abundatrade in the coming year in both a business sense… and a physical sense; Abundatrade is now our closest neighbor – literally just a few hundred yards away from our new Mount Pleasant home!

We are looking forward to another fun-filled and exciting year working with Abundatrade and helping them grow!

Thanks Kent & Clayton, we appreciate working with you!

Technical Overview

Services Provided

  • Web Application Programming
    • PayPal API
    • Amazon API
    • Monsoon API
    • 3rd Party Music DB API
  • Web Interface Design
  • Process Flow Analysis

Platform Details

Posted on

Powerful Data Insertion Features in MySQL

There are several powerful techniques for getting your data into a MySQL database. MySQL supports non-standard extensions to SQL, giving you more flexibility in certain situations. If you want to take full advantage of MySQL, keep these tips in mind.

The Problem

Say you need to shove some new data into a table.  Naturally, you want to INSERT.  But it so happens that if the data is already there, you want to UPDATE the existing data. You don’t care if you have to INSERT or UPDATE – you just want the new data in the table. You decide do one of two things. You might try

DELETE FROM yourtable WHERE key='yourkey';
INSERT INTO yourtable (key, data1, data2) VALUES ('yourkey', 1, 2);

This is simple and effective. You could also check to see if the record already exists, and if so, UPDATE. What you really wish you had is an “INSERT OR UPDATE” command.

Allow me to introduce two features specific to MySQL. “REPLACE”, and “ON DUPLICATE KEY UPDATE”.

The Solutions

REPLACE

The following query is equivalent to the DELETE / INSERT pair above.

REPLACE INTO yourtable (key, data1, data2) values ('yourkey', 1, 2);

The MySQL manual states: “REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE  index, the old row is deleted before the new row is inserted.”

So it simply deletes any row that gets in the way if there is one, and then inserts the new data. This can be quite handy. Do not confuse it with the “Replace” function, however, which is part of standard SQL. The “Replace” function is for replacing parts of strings. REPLACE, as shown above, just blazes a trail, inserting right over existing data if it is there. I love it.

ON DUPLICATE KEY UPDATE

A more powerful but potentially slower method is ON DUPLICATE KEY UPDATE.  The following query would accomplish the same thing as the previous two examples:

INSERT INTO yourtable (key, data1, data2) values ('yourkey', 1, 2)
ON DUPLICATE KEY UPDATE data1=VALUES(data1), data2=VALUES(data2);

This will detect the duplicate key, and instead of performing the insert, it will perform the given update, referencing the values using the VALUES function. This is much faster than going back to the calling application with that logic, and also faster than other SQL-only means to the same end. Essentially, every INSERT may potentially trigger a duplicate key constraint violation. The ON DUPLICATE KEY UPDATE query essentially traps that error, and tries the update instead of throwing the error.

You can also do fancier stuff like the following, with “item_id” as a unique key:

INSERT INTO yourtable (item_id, quantity)
VALUES ('foo', 2), ('bar', 4), ('baz', 5)
ON DUPLICATE KEY UPDATE quantity=quantity+VALUES(quantity);

This will try to insert the values given. If it encounters a duplicate key constraint violation, it will update the row, incrementing the quantity by the given amount. This is very powerful and quite a nice feature.

Caveats

Be careful using these techniques on tables with multiple unique keys or compound keys. Remember that unique constraint violations may occur on any constrained column – not just on your primary key.

Also be careful with REPLACE if you are using InnoDB as your database engine. If you have any ON DELETE CASCADE relationships defined for the table you are working with, remember that REPLACE may delete records during its execution, in preparation for INSERTs. That DELETE will trigger a cascade, deleting from related tables. This is one downside of the REPLACE query as compared to the ON DUPLICATE KEY UPDATE query.

All in all, these are quite useful techniques.

Further reading can be found at MySQL’s REPLACE documentation and ON DUPLICATE KEY UPDATE documentation pages.

Posted on

Issues With MySQL

If I had my choice I’d ALWAYS choose PostgreSQL over MySQL every time. It is a stronger, more reliable, more robust database engine without question. So why do I even think about MySQL, ever? Because I have to. It is my job to know technology and, unfortunately for the industry at large, MySQL is the defacto standard for nearly every open source web application in existence. The very predominance of MySQL does not make it better, just a necessary evil that we have to deal with every day. Unfortunately the pure power of numbers often forces us to create new systems in MySQL just for the sake of simplicity on our client systems.

The Complaints List

So why do I dislike MySQL so much? There are a number of reasons. Granted, MySQL has done a good job in the past 2 years of finally putting decades-old technology in play. Things like stored procedures, though still quite a bit wonky in it’s early iteration, is not longer a “can’t do it” mark against MySQL. Same goes for views.

However, there are still many nagging issues out there that get to me every time I dig into MySQL.  Here is my short list as of the v5.1 release:

  • Want full text search? Can’t have foreign keys!
    This is because the MyISAM engine supports full text search but NOT foreign keys.
  • Want foreign keys?  Can’t have full text search!
    Same thing as above, but in reverse.   InnoDB supports foreign keys (hey, who needs ’em it’s only a relational DB we want) but alas does not provide any method for full text search.
  • Want foreign keys?  Can’t have static length records.
    Another give & take between MyISAM and InnoDB.  MyISAM provides for static length records, which provide MUCH faster access times in large lookup sets.   InnoDB does not have this.  Unfortunately the heaviest use of lookup tables happens to be in relational databases which by their definition require the foriegn keys of InnoDB.
  • Default NOW()
    This staple of many database engines like Oracle and PostgreSQL allows you to set datetime fields to a default that sets the time automatically for when a record was updated or inserted. Not available in MySQL  Yes,  you can use TIMESTAMP but you are only allowed ONE default CURENT_TIMESTAMP per table.  So much for created/modified date + time fields.
  • Default <any function here>
    Expanding on the above concept… you can’t use ANY function as a default value.  Yeah, it’s difficult to code in an DB engine, but not impossible.  Nearly all the other engines allow for this.  That means no default USER() either.
  • Stored Procedure Delimiters
    While stored procedures are a welcome addition, why do we need to redefined the standard end-of-command delimeter when creating the functions?   Every other DB engine is smart enough to know you are in a stored procedure block and allow for nested semi-colons.  Not MySQL, you need to change your delimiter to something funky that you’ll never use like maybe // and then remember to change it back when you’re done or all hell breaks lose.
  • Default Engine = MyISAM
    Yes, it is faster.  Yes, it has full text search.  BUT it is less stable than InnoDB AND it does not support foreign keys.  A properly designed database application should really be using foreign keys.   It ensures data integrity at the database v. application level and until the advent of stored procedures in MysQL was a good way to get deletes and updates to cascade.   Yes, you can change the deault, but shouldn’t a good database be relational and inherintly more stable by default?  How about making speed, instability, and lack of relations secondary?   Even more fun is the fact that if you did create a table with the default MyISAM engine and then need to refer to it via a foreign key in InnoDB, you need to conver the entire table.  Hope there’s not much data in there!

Summary

While there are some good features behind MySQL, most notably the strong online community and user documentation, for true heavy lifting I’ll always argue that PostgreSQL can do the same thing better 99% of the time. Until I can convince enough people to make the conversion, I’ll keep adding to my “dislike list” for MySQL.

Posted on

Custom Site & Store Builder with Energy Inc.

The Energy Detective (TED) is a consumer based product that helps home users track their energy usage on a per-device or cross-household level. When Energy Inc, the makers or TED needed to upgrade their site with an easy-to update content management system (CMS) and the addition of a custom storefront, they came to Cyber Sprocket Labs.

Within months we had ported their old static-page driven site to our new custom site builder. They could now easily update their own content without getting developers involved, and better yet – the system protected them from inadvertently breaking their site design. The staff at Energy Inc. soon became experts at the system and added new content as well as new product models to the site.

The site also started with a simple storefront module. It allowed Energy Inc. to upload new products and track inventory levels to ensure customers knew when an item was put on backorder. The new storefront module allowed Energy Inc. to easily show and sell their wares while automating part of the order process on the back end.

Soon the orders started to roll in and Energy Inc. needed more sophisticated order tracking and management. Updates were made to add automated interfaces with FedEx for real time shipping quotes anywhere in the US and it’s territories. New order search and tracking features where added so that Energy Inc. knew what shipped, what was backordered, and what was being returned under their return merchandise authorization policy.

Energy Inc’s TED product was doing well, and the media started to notice. So did Google. As one of the first partners in Google’s new energy management program, Energy Inc. realized that their shared Linux server was not going to be able to handle the new influx of traffic. Luckily, Cyber Sprocket Labs had already been working on the Amazon Web Services cloud for more than 18 months. We knew our way around the system and helped Energy Inc. navigate the maze of cloud computing and served as a guide to the new platform. Energy Inc. decided to make the move to the nearly infinite scalability and on-demand compute environment of cloud computing.

Cyber Sprocket Labs helped migrate Energy Inc over to the Amazon Cloud in less than a week. No downtime while at the same time providing a significant boost in processing power… just in time for Google’s big announcement.

Congratulations on your success, Dolph! Glad we could be there to help get your web services off the ground!

Technical Overview

Services Provided

  • Custom website builder
  • Custom shopping cart
  • Custom order processing and management system
  • Porting to Amazon Web Services Cloud

Platform Details

Posted on

MySQL Introduction

MySQL is a common database for web based applications, primarily because it is free and there is a lot of community support for the database. The most recent version of MySQL (v5 as of this writing) has even added some “big boy” features to help create robust database applications.

MySQL Versions

The current release of MySQL is 5.

Notice from MySQL AB…

End of Product Lifecycle

Active development and support for MySQL database server versions 3.23, 4.0, and 4.1 has ended. However, for MySQL 4.0 and 4.1, there is still extended support available. For details, see http://www.mysql.com/company/legal/lifecycle/#calendar. According to the MySQL Lifecycle Policy (see http://www.mysql.com/company/legal/lifecycle/#policy), only Security and Severity Level 1 issues will still be fixed for MySQL 4.0 and 4.1. Please consider upgrading to a recent version (MySQL 5.0 or 5.1).

Linux Command Line Commands

Some basic MySQL commands you can execute from the Linux command line are shown below.

The # shown in the lines below represent your Linux prompt when logged in via SSH or Telnet. You do NOT type the #, just the command after it.

The lines starting with // are Cyber Sprocket comments for your reading pleasure.

What version of MySQL am I using?

// Show your current MySQL install version
# mysql -V
mysql  Ver 14.7 Distrib 4.1.22, for pc-linux-gnu (i686) using readline 4.3

In the result shown above we are using MySQL 4.1 with release version 14.7. When reviewing available features, the DISTRIBUTION (commonly referred to as the “version”, I know – very confusing) is what determines the level of MySQL you are using. Most developers want to know the base level, which is typically 3, 4, or 5. In this case we are using MySQL 4 (the first number after Distrib.

Dump my data schema…

mysqldump -p -n -d --single-transaction -u cybersprocket --databases cybersprocket_db > schema_2007-12-17.sql

In the example above we connect as user “cybersprocket” to the cybersprocket_db database and dump all the schemas for the tables and indexes into a file name schema_2007-12-17.sql. This file can then be loaded into another MySQL database to recreate the tables.

The flags:

  • -p = ask for the password for the user specified
  • -n = don’t output the create database command
  • -d = don’t dump any of the data (dump the schema only)
  • -u <username> = log in as this user
  • –databases <dbname> = extract from this database

Links