Posted on

WordPress dbDelta Better In 4.2, Not Yet Perfect

WP dbDelta Bug Report Banner

There have been a LOT of changes in WordPress Core including updates to the dbDelta() method in the upgrade function set.   What is dbDelta()?  dbDelta is the primary function used by WordPress to determine if the structure of data tables needs to be augmented between releases.   It works great for upgrading core WordPress data tables.    Where it has problems is with developer-specific SQL statements used to create secondary tables in plugins and themes.    Store Locator Plus, for example, uses a secondary table to storing basic location data.

Unfortunately dbDelta has always had issues (see running commentary on Trac ticket 10404), especially when it comes to indices on a MySQL table.   Most well-designed data tables will have primary and secondary indexes on the table to improve query performance.    dbDelta has a hard time figuring out when an index already exists and has not been modified.   If you did not craft your CREATE TABLE command EXACTLY the way dbDelta wants to see it, the dbDelta method will create new indexes.    This can cause performance degradation and unnecessary overhead in the MySQL tables when a data column ends up with 10, 20 , 30+ copies of the index; one created every time the plugin or theme is updated.

Many plugins and themes suffer from this malady.   Over two years ago I dug into dbDelta and learned the syntax secrets to get WordPress to stop creating duplicate indexes.   Today I found a site that was back to its old tricks running the latest version of Store Locator Plus and WordPress 4.2.2.   As it turns out the rules for dbDelta have changed and thus the syntax for creating a “dbDelta friendly” SQL statement has changed as well.    With dbDelta a behind-the-scenes feature it is never broadcast to the general public that it has changed and thus goes undetected for a while when something goes awry.

Turns out the changes are not new.  They go back to at least WordPress 4.1.    The last time I dug into this issue was WordPress 3.8.   Guess it is time for some phpUnit testing on dbDelta within my own plugin suite testing.   If things are running properly the dbDelta command for an existing installation of Store Locator Plus 4.2 should yield NO QUERY UPDATES when activating/installing a newer version of the plugin.

WP 4.1 dbDelta Results

Source SQL:

dbDelta SQL: CREATE TABLE wp_store_locator ( sl_id mediumint(8) unsigned NOT NULL auto_increment, sl_store varchar(255) NULL, sl_address varchar(255) NULL, sl_address2 varchar(255) NULL, sl_city varchar(255) NULL, sl_state varchar(255) NULL, sl_zip varchar(255) NULL, sl_country varchar(255) NULL, sl_latitude varchar(255) NULL, sl_longitude varchar(255) NULL, sl_tags mediumtext NULL, sl_description text NULL, sl_email varchar(255) NULL, sl_url varchar(255) NULL, sl_hours varchar(255) NULL, sl_phone varchar(255) NULL, sl_fax varchar(255) NULL, sl_image varchar(255) NULL, sl_private varchar(1) NULL, sl_neat_title varchar(255) NULL, sl_linked_postid int NULL, sl_pages_url varchar(255) NULL, sl_pages_on varchar(1) NULL, sl_option_value longtext NULL, sl_lastupdated timestamp NOT NULL default current_timestamp, PRIMARY KEY (sl_id), KEY sl_store (sl_store), KEY sl_longitude (sl_longitude), KEY sl_latitude (sl_latitude) ) DEFAULT CHARACTER SET utf8

dbDelta Updates List:

QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_linked_postid sl_linked_postid int NULL
QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_lastupdated sl_lastupdated timestamp NOT NULL default current_timestamp
QUERY: ALTER TABLE wp_store_locator ADD PRIMARY KEY (sl_id)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_store (sl_store)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_longitude (sl_longitude)
QUERY: ALTER TABLE wp_store_locator ADD KEY sl_latitude (sl_latitude)
for update

Update Results:

[wp_store_locator.sl_linked_postid] => Changed type of wp_store_locator.sl_linked_postid from int(11) to int
[wp_store_locator.sl_lastupdated] => Changed type of wp_store_locator.sl_lastupdated from timestamp to
[0] => Added index wp_store_locator PRIMARY KEY (sl_id)
[1] => Added index wp_store_locator KEY sl_store (sl_store)
[2] => Added index wp_store_locator KEY sl_longitude (sl_longitude)
[3] => Added index wp_store_locator KEY sl_latitude (sl_latitude)

My WP 4.2 dbDelta Results

Better, but still not quite right.

Source SQL:

CREATE TABLE wp_store_locator ( sl_id mediumint(8) unsigned NOT NULL auto_increment, sl_store varchar(255) NULL, sl_address varchar(255) NULL, sl_address2 varchar(255) NULL, sl_city varchar(255) NULL, sl_state varchar(255) NULL, sl_zip varchar(255) NULL, sl_country varchar(255) NULL, sl_latitude varchar(255) NULL, sl_longitude varchar(255) NULL, sl_tags mediumtext NULL, sl_description text NULL, sl_email varchar(255) NULL, sl_url varchar(255) NULL, sl_hours varchar(255) NULL, sl_phone varchar(255) NULL, sl_fax varchar(255) NULL, sl_image varchar(255) NULL, sl_private varchar(1) NULL, sl_neat_title varchar(255) NULL, sl_linked_postid int NULL, sl_pages_url varchar(255) NULL, sl_pages_on varchar(1) NULL, sl_option_value longtext NULL, sl_lastupdated timestamp NOT NULL default current_timestamp, PRIMARY KEY (sl_id), KEY sl_store (sl_store), KEY sl_longitude (sl_longitude), KEY sl_latitude (sl_latitude) ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

dbDelta Updates List:

QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_linked_postid sl_linked_postid int NULL
QUERY: ALTER TABLE wp_store_locator CHANGE COLUMN sl_lastupdated sl_lastupdated timestamp NOT NULL default current_timestamp
QUERY: ALTER TABLE wp_store_locator ADD PRIMARY KEY (sl_id)

Update Results:

[wp_store_locator.sl_linked_postid] => Changed type of wp_store_locator.sl_linked_postid from int(11) to int
[wp_store_locator.sl_lastupdated] => Changed type of wp_store_locator.sl_lastupdated from timestamp to
[0] => Added index wp_store_locator PRIMARY KEY (sl_id)

WP 4.2 dbDelta SQL Best Practices

Some of these carry over from the prior dbDelta comments, some are new with later versions of dbDelta and MySQL.   The code is fairly robust, but if your MySQL installation comes back with a different format for the describe command this can wreak havoc on dbDelta().     I’ve presented things to look out for when crafting your dbDelta SQL statements as well as possible improvements in WP Core for dbDelta.    I may submit patches once again, but it has proven nearly impossible to get patches reviewed and approved for dbDelta.   Makes sense given the HUGE impact this function has on the WordPress Core engine, but there are thousands of plugins and themes wreaking havoc on WordPress performance due to some of these issues.    Maybe a detailed review like this will help get some updates into core.

Table Creation / Modification Commands

dbDelta() uses the MySQL DESCRIBE <table> command to fetch data about the table name passed in to dbDelta().  It is parsed with a regular expression out of the SQL statement looking for CREATE TABLE <…> , CREATE DATABASE <…> , INSERT INTO <..>, and UPDATE <…>  where the first “word” in the <…> part is assumed to be the table name.

Notes

  • The process appears to be case sensitive.  Use the commands as noted above in all caps.
  • Make sure you have no extra spaces, newlines, or other whitespace in between CREATE and TABLE or the other two-word directives noted here.
  • Make sure your table name has a leading space before and trailing space after.

Possible Improvements

  • Consider make thing preg_match() case insensitive.
  • Consider changing the single-space in commands like INSERT INTO into multi-whitespace regex i.e. INSERT\s+INTO

Defining Column Types

On some servers, such as my CentOS 7 development box, MySQL Ver 15.1 Distrib 5.5.41-MariaDB, has decided that post types of INT really mean INT(11).   Different variations may post different results back from the DESCRIBE command, but here is what I find working on MOST servers including my Linux dev box:

  • Always specify the LENGTH for each field type that accepts a length.  For most fields use the default length.
    • int(11)
    • mediumint(8)
    • varchar(1) to varchar(255), 255 is the default.
  • Fields without lengths:
    • longtext
    • mediumtext
    • text
  • Do not put spaces between the field type and the parens or the number in parens for the length.
  • User lowercase for field name and the field type.
  • Use uppercase for the NULL and NOT NULL specification.
  • Do NOT have extra spaces between the field name, the field type, and any of the additional definition parameters.

Examples

global $wpdb;
$sql =
'CREATE TABLE my_table (
my_id mediumint(8) unsigned NOT NULL auto_increment,
my_first_field varchar(1) NULL,
my_second_field varchar(255) NULL,
my_third_field int(11) NULL,
my_lastupdate timestamp NOT NULL default CURRENT_TIMESTAMP
)
DEFAULT CHARACTER SET ' . $wpdb->charset;

Things the cause problems in my setup:

  • Two spaces after the field name my_lastupdate and the word timestamp.
  • Using int instead of int(11) on my_third_field.
  • Using lowercase CURRENT_TIMESTAMP on my_lastupdate.

Possible Improvements

  • Consider make thing preg_match() case insensitive.
  • Make the comparison on the default case insensitive, or shift the inbound query uppercase.
  • Consider changing the single-space in commands into multi-whitespace regex tests.

 

Defining Keys (Indices)

The key comparison system is just as strict with whitespaces and case sensitivity as the column comparison routine in dbDelta.   You MUST have TWO spaces after the keywords “PRIMARY KEY” before defining your index field but only ONE SPACE after all other keys.

  •  ALWAYS declare the column names for every key type, including primary key.
  • ALWAYS put your column names inside parenthesis with no spaces.
  • PRIMARY KEY should NEVER have a key name.
  • PRIMARY KEY is always followed by two spaces then the parenthesis then the column name and closing parenthesis.
  • KEY is always followed by a single space, the column name, then a single space then parenthesis around the column name.
  • They keyword PRIMARY KEY must be uppercase.

Examples

global $wpdb;
$sql =
'CREATE TABLE my_table (
my_id mediumint(8) unsigned NOT NULL auto_increment,
my_first_field varchar(1) NULL,
my_second_field varchar(255) NULL,
my_third_field int(11) NULL,
my_lastupdate timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY  (my_id),
KEY third_key (my_third_field)
)
DEFAULT CHARACTER SET ' . $wpdb->charset;

Things the cause problems in my setup:

  • Missing the second space after the PRIMARY KEY definition.

Possible Improvements

  • Consider changing the single-space in commands into multi-whitespace regex tests.
  • Make the indices incoming text for the PRIMARY and KEY keywords force to uppercase.
Posted on

Connect To Your Database With Emacs

The recent Emacs post made me think of a useful SQL trick I recently learned that I wanted to share. Emacs refers to different SQL engines as ‘products’, e.g. MySQL, SQLite, and so on. You can connect to an SQL database through Emacs but you have a call a different command based on the product.

An easier way to do this, as of version 24, is to use the keyboard shortcut C-u C-c TAB when viewing an SQL document. This will prompt you first for the product, and then for any connection information. So if you are working with an SQLite database you can press C-u C-c TAB, type sqlite, and then the filename of the database; that will open an interactive window connected to the database. Or if you are using PostgreSQL then Emacs will ask for a username, port, etc. The supported products are:

  • db2
  • informix
  • ingres
  • interbase
  • linter
  • ms
  • mysql
  • oracle
  • postgres
  • solid
  • sqlite
  • sybase

I don’t even know what some of those are…

Anyways, sometimes I find this faster than opening a terminal if I need to run some test queries or copy output into a document. (This is very useful with SQLite’s .mode html command, which will print the results of queries in <table> form.)


ejmr
南無妙法蓮華經

This article was originally posted on “The List” by EJMR.

Posted on

SQL: Using IN / NOT IN Operators

Here is an explanation I recently shared with someone about how to use NOT IN to select a subset of data from a one-to-many relationship.   It is a simple example that covers the basic use of NOT IN (and IN) operators in SQL.  These operators can be fairly powerful for performing quick queries on complex data relationships where a programmatic solution may otherwise be required.   This is not always the best solution but it can be a useful shortcut when used appropriately.

Syntax in various SQL dialects can vary, so I am using a generic SQL dialect based on my PostgreSQL knowledge.

The Problem

You have 2 tables you are linking together.   I forget the names so I’ll call them Nurses and RolesOfNurses.    For the purpose of this explanation let’s assume you are linking them by an ID field for the Nurse.

Your Nurse data may look like this:

ID: 1   Name: Karen
ID: 2  Name: Sharon

And your RolesOfNurses data:

Nurse ID: 1 RoleID: 10
Nurse ID: 1 RoleID: 555 
NurseID: 2 RoleID: 10

Normally with a one-to-many (1:M) relationship this RolesOfNurses table is just a middle-man so there would be another table called “Roles” that may have the RoleID and an common name to make human readable output (like reports) possible. So role #10 may be “nurse” and the specialized role #555 might be “Can Give Meds”.

In your select statement you want to find “All nurses that do not have the Can Give Meds” role. In database-like talk you want “All nurses that have role #10 but not role #555”. So your instinct is to join the two tables and filter by where role == 10 and role != 555. But that is wrong. The reason is that joining tables creates one unique row for every combination of data in the joined tables.

Breaking It Down : Select Nurses and Roles

Take it from the beginning.  Let’s just say “I want a list of all the roles nurses have”.  That is a simple join:

select * from nurses where nurse.id = rolesofnurses.nurseid;

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 1, Name: Karen, NurseID: 1, RoleID: 555
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Next: Filter The Data On Role # 10 Only

This data set is the basis of your advanced query, but before we get to the “dual and” let’s start with a single where clause to filter our data.  I’ve added the parenthesis only to make the where clause groupings easier to read.  The first where clause links the two tables together to give you the above data set.  The 2nd where clause filters that data to just show nurses.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Then : Add The Second Role Filter

Now add your 2nd where clause so you can have “all nurses that don’t have the Can Give Meds role” filter.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (roleofnurses.roleid != 555);

You get this:

ID: 1, Name: Karen, NurseID: 1, RoleID: 10
ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Notice that this is NOT the desired result!

Why? Look at the data.  All of your roles match BOTH of those filters.   The roles are all 10 AND they are NOT 555.  The first instinct is wrong. Personally I blame the fact that SQL, as powerful as it is, does not process English-based query syntax the same way that our brains do. Many people will make the same mistake. Sadly, the solution is a bit more complex to explain, but I’ll try to make it simple.

Fixing The Query: Step 1 – Build A List Of IDs

I’m going to start with the 2nd part of your filter, all the IDs of nurses that have role #555.  Note I said “IDs of nurses” here, that is important.  I don’t want the full set of nurse data like we have been doing up until now.  I simply want the list of ID numbers of nurses that have role #555.

select nurseid from rolesofnurses where roledid = 555;
NurseID: 1

This is the ONLY NurseID in our data set that has role # 555.

The Correct Query : Using NOT IN

Now let’s go and get our data we want by telling SQL “give me all the nurses that have role # 10 but are NOT in the list of nurses with role # 555”:

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id NOT IN (select nurseid from rolesofnurses where roledid = 555) );

And we get the desired result:

ID: 2, Name: Sharon, NurseID: 2, RoleID: 10

Explaining NOT IN

The trick here is the <blah> NOT IN <SQL QUERY> syntax.   Most (not all) SQL dialects support this, and it is an important advanced SQL construct. It allows you to filter data based on the results of another select statement. The trick with the NOT IN operator is to make the <SQL QUERY> part return a list that consists of a single elements you want to filter on. In our case the <SQL QUERY> returns the entire list of nurses that have role #555. We use NOT IN to tell it to only accept the record if the Nurse ID is NOT IN the list of nurse IDs that have role #555 or in English “pick all nurses that do not have role #555”. In our example it was a single record, but in a larger data set it could return an entire list of records.

Notes About IN/NOT IN Usage

For a few hundred records this construct is OK.  At some point you need to start looking into temp tables and other methods as using the <blah> NOT IN <SQL QUERY> construct can consume a lot of memory on the server. You need to be intelligent about what kind of resources your server has and how large the potential data set is for the second part of the IN query because most DB engines will load the entire list of ID #s into RAM.

Another note, you can create complex constructs with IN the same way as NOT IN, for example “give me a list of all nurses that have role #10 AND role #555”.

select * from nurses where (nurse.id = rolesofnurses.nurseid) and (roleofnurses.roleid = 10) and (nurse.id IN (select nurseid from rolesofnurses where roledid = 555) );

BTW, I’ve used uppercase for emphasis only.   SQL is not case sensitive.

Posted on

SQL Movement Commands

Most programming modes in Emacs support two commands, beginning-of-defun and end-of-defun.  These move you to the beginning and end of the nearest function, and by default are bound to C-M-a and C-M-e, respectively.

However, sql-mode does not have anything similar.  I have the above two commands bound to C-up and C-down for quickly moving around functions, but I find myself often hitting them when viewing an SQL file to move through tables—which doesn’t work.  Fortunately that functionality is pretty easy to add.

For my fellow Emacs users, here is some code I whipped up that you can use to easily move between tables and views.

(defconst sql-beginning-of-table-or-view-regexp
"^\\s-*create\\s-+\\(?:table\\|view\\)"
"Regular expression for matching the start of an SQL table or view definition.")

(defun sql-prev-table-or-view ()
"Move backwards to the beginning of the nearest table or view
from point."
(interactive)
(re-search-backward sql-beginning-of-table-or-view-regexp
nil 'noerror))

(defun sql-next-table-or-view ()
"Move forwards to the beginning of the nearest table or view from point."
(interactive)
(re-search-forward sql-beginning-of-table-or-view-regexp
nil 'noerror))

Then you can add a hook for sql-mode to bind those commands to keys whenever sql-mode is first loaded. I know you guys already know about using global-set-key, but you can also create key bindings specific for modes. Each mode in Emacs has an associated “mode map”, for example, sql-mode has the sql-mode-map, php-mode has the php-mode-map, et cetera. You can use the function define-key to create a binding for a specific mode map. Like I said, I have C-up and C-down globally bound to move functions, but I use define-key like this to make the keys move by tables/views is sql-mode:

(global-set-key (kbd "<C-up>") 'beginning-of-defun)
(global-set-key (kbd "<C-down>") 'end-of-defun)

(add-hook 'sql-mode-hook
(lambda ()
(define-key sql-mode-map (kbd "<C-up>")
'sql-prev-table-or-view)
(define-key sql-mode-map (kbd "<C-down>")
'sql-next-table-or-view)))

I put my global key settings in there just for comparison. The call to define-key is almost identical, except the first argument is the mode map.

While I’m on the subject, there are two similar commands for text-mode that have no default binding, which I find come in handy:

(add-hook 'text-mode-hook
(lambda ()
(define-key text-mode-map (kbd "<C-down>")
'forward-paragraph)
(define-key text-mode-map (kbd "<C-up>")
'backward-paragraph)))

So there’s some hopefully useful commands for SQL work, and for configuring key bindings in general.