Posted on

Janitor 4.1.15 Makes Quick Work Of Resetting Locations

WordPress Locations Tabbar Banner

The Janitor add-on pack for Store Locator Plus adds a new drop locations feature.    This feature makes quick work of deleting the Store Locator Plus locations when a simple database reset is required.   The new feature is far faster than the previous Clear Locations option, but Drop Locations does have its limits.

J 4.2.15 Clear or Drop Locations
J 4.2.15 Clear or Drop LocationsWo

Drop Locations

This new feature removes the main locations table and the extended meta data fields, used by the * Extender add-on packs, via a the SQL DROP TABLE command.  Both tables are immediately recreated.     This feature does NOT remove the associates custom WordPress “store pages” data NOR does it remove the Tagalong and custom taxonomy data.    If you are not using Tagalong categories and are not use Store Pages, then Drop Locations can remove tens- or even -hundreds-of-thousands of locations in less than a minute.    By comparison a 200,000 location table can take as long as 2 hours using Clear Locations.

Clear Locations

Clear Locations is a more thorough removal mechanism.  This process leaves the data tables intact but deletes the records one-at-time using the WordPress data interface which invokes all the hooks-and-filters.  That means any third party or CSA-derived add-on packs will fire all the “location cleanup” processes including Tagalong data cleanup , Store Pages cleanup, and any related meta data that may be hiding in WordPress options tables, taxonomy tables, custom page types, or custom data tables installed by add-on packs.

Janitor Change Log

Posted on

WordPress dbDelta Too Many Keys Specified

dbDelta Ticket 10404

I just submitted a patch to WordPress Core for ticket #10404 which is related to the built-in dbDelta function and how it manages the creation of keys in a new data table.  Since the Store Locator Plus and the new Super Extendo add-on packs both use external tables with indices attached, the dbDelta function is used on a regular basis.    Here are some things I learned about the dbDelta function and how it handles key creation that other plugin developers may find useful.

First, the dbDelta method uses TWO SETS of information to decide what to do with key management.  First it strips out all of the key creation commands into a “user provided SQL that builds indexes” array.   This is an array that contains all of the “KEY … blah…. ” lines out of your SQL command to create a table.   Second, it runs the MySQL SHOW INDEX command for the table that is being created and uses the data from the SHOW INDEXES return results to build a separate set of “metadata-derived index related SQL” statements, also in an array.

As a side note, the way in which the CREATE TABLE SQL command is processed is fairly simplistic and is one reason why each of your field definitions and your KEY statements should be on separate lines with no extra whitespace.  The process is doing nothing more than breaking up the string based on newline characters.    If you have extra whitespace before or after your field or index definitions the algorithm will never match the existing data and start creating and re-creating your table structure.

Once dbDelta has built both index lists it then compares the two.   If the index strings that are build from the metadata from the MySQL SHOW INDEX command matches EXACTLY with the string in your CREATE TABLE command, all is well and the commands to create a new index are removed from the user provided list of SQL commands to build indices.   The dbDelta method then loops through all of the remaining create index commands that are left.

In simpler terms:

dbDelta builds a list of create index commands based on what is already in the WordPress database for your new data table.

dbDelta compares that list of create index commands to your create table command and deletes those entries that already exists.

In other words “hey, we already have that index for that table, so we are going to skip it”.

However, as many have learned, the dbDelta function can appear to “go crazy” with the index building and create the same index over-and-over.   By default most servers running MySQL set a limit of 64 indices on a single table.   Also, by design, the common practice when adding or modifying a table in WordPress is to hook to the “upgrade and/or activate plugin” methods in WordPress.   In the simplest setup that means your create table command would only be run ONCE PER UPGRADE or ACTIVATION of your plugin.  If your plugin is only creating a single index. on an ID field for example, that means someone would need to upgrade or activate your plugin SIXTY THREE TIMES before you started seeing any indication of trouble.

The problem is that the other 62 times are significantly slowing down the website that your plugin lives on whenever your data table is getting a new record written to it or having an existing record updated.  The reason is that MySQL does a lot of overhead and extra data I/O for each index on your table.  It is the reason you don’t want to index every field and only want to index fields where there is a definitive performance benefit in doing so.

By the time you see the first “Too Many Indexes” warnings your plugin has already been slowing down servers for quite some time.     This is why I first rewrote the Store Locator Plus data indices nearly a year ago.   It can be a notable burden on sites that routinely update data and with 30,000+ installs accounting for MILLIONS of data records, that is a LOT of extra processing going on out there.

However there are ways to prevent the duplicate index from occurring by understanding HOW dbDelta builds the comparison string and using the same syntax in your CREATE TABLE commands that you pass to dbDelta.  Here are the hints:

  • Always use an uppercase KEY directive.This is something that one of my patches to the dbDelta 10404 ticket resolves if accepted to core.
  • Always include an index name in lowercase.This is something that one of my patches to the dbDelta 10404 ticket resolves if accepted to core.
  • Always include your field list in lowercase.
  • Do not add field lengths to the keys.
  • When creating compound keys always eliminate whitespace. The means no spaces before or after parens or around commas.

Here are a few examples of the right and wrong syntax.  Doing it wrong will create a duplicate index on the data table every time your dbDelta command is run.  Hopefully that command is only run when upgrading/activating a plugin:


WRONG:
key ( id , slid )

RIGHT:
KEY mycompoundkey (id,slid)

 


WRONG:
KEY (id)

RIGHT:
KEY id (id)


WRONG:
KEY fldname (fldname(250))

RIGHT:
KEY fldname (fldname)

Make sure you do not have ANY extra whitespace in your commands.    No space before the KEY directive.   Only a single space before and after the index name.   No spaces before or after parens or commas in compound fields.

This will hopefully help avoid some extra index creation commands from executing on your tables.

If the patches I submitted to core are accepted the case sensitivity issue should go away and you may get away with the shorthand version where the index name and key are the same (using KEY (id) would be valid).

Posted on

Turn Off dbDelta Describe Errors

debug bar banner

I use the Debug Bar and Debug My Plugin plugins to assist in support and development of my WordPress plugins.   I also make use of the built-in WordPress database management tools provided by wpdb and the upgrade module of WordPress.    One of the nice features of the upgrade module is the dbDelta function.   This function provides a simple mechanism for either modifying an existing table structure or to create a new table as needed.     This is a Godsend for complex plugins with multiple data tables, like Store Locator Plus.

However the dbDelta function has a little side effect that causes some plugins like Debug Bar to throw up error messages when a brand new data table is created.    The problem is that the upgrade module runs dbDelta with suppress errors ($wpdb->suppress_errors()) which only prevents the immediate printing of errors to the screen.  The database errors are still pushed onto the global EZSQL_ERROR array.     While I can see the utility of such a method, it does generate some error messages in the log files and the Debug Bar interface if it is enabled.

Luckily there is an easy fix to this issue.  Immediately after running the dbDelta() function you can either “pop the stack” of the EZSQL_ERROR array or for the lazy people like me just set EZSQL_ERROR back to a blank array:


dbDelta($createTableSQL);
global $EZSQL_ERROR;

$EZSQL_ERROR = array();

This will strip the error message off the stack and bypass Debug Bars reporting of anything that is in the SQL error stack.  For new tables the benign “table doesn’t exist” error for the describe command will go away.

I do NOT recommend doing this on a regular basis.  I only use it on the production release of Tagalong to prevent the red box warning on Debug Bar that shows up every time the plugin is activated on a new install with Debug Bar running.

 

Posted on

PostgreSQL – Exclusive Locking On Triggers

We have been working on a client’s project and dealing with an especially complex view in a PostgreSQL database. The view has a few tables linked together as well as several sub queries, a case statement, and a number of elements that do aggregate functions to build a single record of the view.

After updating the admin panel to allow them to use the database & admin page for processing orders (versus an email + excel system) we ran into several issues. The first issue was a significant degradation in performance whenever the system was under moderate load. This is important to note as our baseline release testing did not uncover the problem. The logic in the app was syntactically and functionally correct. However the application misbehaved under load, causing a bottleneck and essentially shutting down the app.

The Table Trigger Loop

We implemented a series of update triggers and stored procedures to propagate certain data elements between tables. Part of the design allowed for 2 tables to update each other. The UI allowed for an admin user to update Table A, which called a trigger to push a status field update into Table B (the legacy data). In other places Table B *could* be updated first, so we had a trigger to update Table A. The problem is the triggers would loop. Table A triggers an update in Table B triggers an update in Table A etc.

The initial designed called for turning off the triggers in the other table within the stored procedure. This requires an ALTER TABLE command. ALTER TABLE forces an exclusive lock on the database. NOTHING else can happen during this command, it is one of the few (if not the only) truly exclusive locks on the locking hierarchy.

This really slows down bulk updates. “Lock entire table”… insert… “unlock entire table”.

Master/Slave Tables Is A Better Solution

A better solution was to take out the alter table commands completely and remove the Table B updating Table A trigger. We made an application logic decision to make Table A the master and Table B the slave and eliminate (and/or log) any where Table B gets updated first. That should NEVER happen in the new app but some legacy stuff may still do that.

This is much faster as it generates no exclusive DB locks.

Entire Application Assessment v. Technical Complexity

In this case thinking about the overall solution which includes the application design and implementation as well as the technical syntax on the application and DB layer all work together to improve performance. Knowing the application and coming up with an overall design solution is often as effective as fancy technological solutions. Sometimes less complexity or technical “mojo” is the right answer to a complex problem.

Posted on

PostgreSQL Cheat Sheet

PostgreSQL is one of our favorite database engines for a variety of reasons.  Here is our cheat sheet to help you get online and get around Postgres with minimal effort.

Database Access Security

Database security is handled primarily in two place, from the system service level via a file called pg_hba.conf and within the database metadata files themselves.   The pg_hba.conf file controls what level of credentials are needed based on what IP address the requesting connection is coming from.   The metadata within the engine itself generally controls user level access once they are connected and approved at the system level.

Systemwide Configuration via pg_hba.conf

This file matches IP address with a set of rules to determine how much data you need to provide in the first place before getting access to the database engine.   It includes the IP address, the username trying to connect, and what type of validation is needed.

The data comes in a series of tab separated columns including:

  • Connection Type
    • local = from local tty connection
    • host = from an internet connection
  • Database = which database is the user trying to connect to?
  • User = which user they are connecting as.
  • IP Address = what address are they coming from?
  • Method = how shall we authenticate them?
    • md5 = let them in if the password matches
    • ident sameuser = let them in in the password matches and their login user matches the user they are trying to connect as
    • trust = let them in as long as the ip address matches, no password required

Finding pg_hba.conf

The pg_hba.conf file lives in various places depending on the flavor of Linux.

  • Red Hat, CentOS, Fedora = /var/lib/pgsql/data/
  • Ubuntu = /etc/postgresql/<version>/main/

Command Line Access

You can do a lot of maintenance operations or test queries using the command line interpreter.  The command line in PostgreSQL is accessed via the psql command.   The most often used parameters with psql are to connect as a user other than your login user, provide your password, and give it the name of the database on which to connect.

Example:

# psql -U other_name -W other_db

Command Line Shortcuts

From the command line there are a variety of shortcuts to help you navigate around the database engine or see what is going on.  Here are a few of the most useful:

  • List Databases: \l
  • List (display) Tables : \dt
  • List Columns in a Table: \d <tablename>

Creating A Database

Here is how you create a new database that is owned by  a specific user.  This assumes a “clean slate” install.   You will need to have the postgres user login credentials and/or root access.  You will be creating a PostgreSQL user and password and will change the system-level postgres daemon security settings to allow access with the password regardless of which user you login as.

# # login as postgres user or su postgres if you are root

# psql
psql> create user lance with password ‘cleveland’;
psql> create database myfirstdb with owner lance;
psql> \q
# vi /var/lib/pgsql/data/pg_hba.conf

While in pg_hba.conf change this line:

local   all         all                              ident

to this:

local   all         all                              md5

Backing Up / Dumping Your Data

Data dumps are a quick way to put the schema, data, or a combination of both out into a file that can be used to re-create the database on other systems or just back it up to a remote location.  The PostgreSQL command for this is pg_dump.  It takes the same parameters as thecommand line access.

Simple Data Dump

Example:

# pg_dump -U myname -W the_db_name > dump_thedb_2010_0704_001.sql

Clean Data Dump

This is the format to use if you want to ensure the entire database is dropped & re-created when loading on a new system.

Example:

# pg_dump --clean --create -U myname -W the_db_name > dump_thedb_2010_0704_001.sql

Reloading Dumped Data

To reload such a script into a (freshly created) database named the_db_name :

# psql -d the_db_name -f dump_thedb_2010_0704_001.sql

If the clean data dump method was used you will want to login as postgres and let the sql script create the database:

# su postgres

# psql -f dump_thedb_2010_0704_001.sql

Summary

There are plenty more tips, tricks, and shortcuts.  These will get you started.  We’ll add more as time allows.

Posted on

Powerful Data Insertion Features in MySQL

There are several powerful techniques for getting your data into a MySQL database. MySQL supports non-standard extensions to SQL, giving you more flexibility in certain situations. If you want to take full advantage of MySQL, keep these tips in mind.

The Problem

Say you need to shove some new data into a table.  Naturally, you want to INSERT.  But it so happens that if the data is already there, you want to UPDATE the existing data. You don’t care if you have to INSERT or UPDATE – you just want the new data in the table. You decide do one of two things. You might try

DELETE FROM yourtable WHERE key='yourkey';
INSERT INTO yourtable (key, data1, data2) VALUES ('yourkey', 1, 2);

This is simple and effective. You could also check to see if the record already exists, and if so, UPDATE. What you really wish you had is an “INSERT OR UPDATE” command.

Allow me to introduce two features specific to MySQL. “REPLACE”, and “ON DUPLICATE KEY UPDATE”.

The Solutions

REPLACE

The following query is equivalent to the DELETE / INSERT pair above.

REPLACE INTO yourtable (key, data1, data2) values ('yourkey', 1, 2);

The MySQL manual states: “REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE  index, the old row is deleted before the new row is inserted.”

So it simply deletes any row that gets in the way if there is one, and then inserts the new data. This can be quite handy. Do not confuse it with the “Replace” function, however, which is part of standard SQL. The “Replace” function is for replacing parts of strings. REPLACE, as shown above, just blazes a trail, inserting right over existing data if it is there. I love it.

ON DUPLICATE KEY UPDATE

A more powerful but potentially slower method is ON DUPLICATE KEY UPDATE.  The following query would accomplish the same thing as the previous two examples:

INSERT INTO yourtable (key, data1, data2) values ('yourkey', 1, 2)
ON DUPLICATE KEY UPDATE data1=VALUES(data1), data2=VALUES(data2);

This will detect the duplicate key, and instead of performing the insert, it will perform the given update, referencing the values using the VALUES function. This is much faster than going back to the calling application with that logic, and also faster than other SQL-only means to the same end. Essentially, every INSERT may potentially trigger a duplicate key constraint violation. The ON DUPLICATE KEY UPDATE query essentially traps that error, and tries the update instead of throwing the error.

You can also do fancier stuff like the following, with “item_id” as a unique key:

INSERT INTO yourtable (item_id, quantity)
VALUES ('foo', 2), ('bar', 4), ('baz', 5)
ON DUPLICATE KEY UPDATE quantity=quantity+VALUES(quantity);

This will try to insert the values given. If it encounters a duplicate key constraint violation, it will update the row, incrementing the quantity by the given amount. This is very powerful and quite a nice feature.

Caveats

Be careful using these techniques on tables with multiple unique keys or compound keys. Remember that unique constraint violations may occur on any constrained column – not just on your primary key.

Also be careful with REPLACE if you are using InnoDB as your database engine. If you have any ON DELETE CASCADE relationships defined for the table you are working with, remember that REPLACE may delete records during its execution, in preparation for INSERTs. That DELETE will trigger a cascade, deleting from related tables. This is one downside of the REPLACE query as compared to the ON DUPLICATE KEY UPDATE query.

All in all, these are quite useful techniques.

Further reading can be found at MySQL’s REPLACE documentation and ON DUPLICATE KEY UPDATE documentation pages.