Posted on

Powerful Data Insertion Features in MySQL

There are several powerful techniques for getting your data into a MySQL database. MySQL supports non-standard extensions to SQL, giving you more flexibility in certain situations. If you want to take full advantage of MySQL, keep these tips in mind.

The Problem

Say you need to shove some new data into a table.  Naturally, you want to INSERT.  But it so happens that if the data is already there, you want to UPDATE the existing data. You don’t care if you have to INSERT or UPDATE – you just want the new data in the table. You decide do one of two things. You might try

DELETE FROM yourtable WHERE key='yourkey';
INSERT INTO yourtable (key, data1, data2) VALUES ('yourkey', 1, 2);

This is simple and effective. You could also check to see if the record already exists, and if so, UPDATE. What you really wish you had is an “INSERT OR UPDATE” command.

Allow me to introduce two features specific to MySQL. “REPLACE”, and “ON DUPLICATE KEY UPDATE”.

The Solutions

REPLACE

The following query is equivalent to the DELETE / INSERT pair above.

REPLACE INTO yourtable (key, data1, data2) values ('yourkey', 1, 2);

The MySQL manual states: “REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE  index, the old row is deleted before the new row is inserted.”

So it simply deletes any row that gets in the way if there is one, and then inserts the new data. This can be quite handy. Do not confuse it with the “Replace” function, however, which is part of standard SQL. The “Replace” function is for replacing parts of strings. REPLACE, as shown above, just blazes a trail, inserting right over existing data if it is there. I love it.

ON DUPLICATE KEY UPDATE

A more powerful but potentially slower method is ON DUPLICATE KEY UPDATE.  The following query would accomplish the same thing as the previous two examples:

INSERT INTO yourtable (key, data1, data2) values ('yourkey', 1, 2)
ON DUPLICATE KEY UPDATE data1=VALUES(data1), data2=VALUES(data2);

This will detect the duplicate key, and instead of performing the insert, it will perform the given update, referencing the values using the VALUES function. This is much faster than going back to the calling application with that logic, and also faster than other SQL-only means to the same end. Essentially, every INSERT may potentially trigger a duplicate key constraint violation. The ON DUPLICATE KEY UPDATE query essentially traps that error, and tries the update instead of throwing the error.

You can also do fancier stuff like the following, with “item_id” as a unique key:

INSERT INTO yourtable (item_id, quantity)
VALUES ('foo', 2), ('bar', 4), ('baz', 5)
ON DUPLICATE KEY UPDATE quantity=quantity+VALUES(quantity);

This will try to insert the values given. If it encounters a duplicate key constraint violation, it will update the row, incrementing the quantity by the given amount. This is very powerful and quite a nice feature.

Caveats

Be careful using these techniques on tables with multiple unique keys or compound keys. Remember that unique constraint violations may occur on any constrained column – not just on your primary key.

Also be careful with REPLACE if you are using InnoDB as your database engine. If you have any ON DELETE CASCADE relationships defined for the table you are working with, remember that REPLACE may delete records during its execution, in preparation for INSERTs. That DELETE will trigger a cascade, deleting from related tables. This is one downside of the REPLACE query as compared to the ON DUPLICATE KEY UPDATE query.

All in all, these are quite useful techniques.

Further reading can be found at MySQL’s REPLACE documentation and ON DUPLICATE KEY UPDATE documentation pages.

Posted on

On Delete Cascade

ON DELETE CASCADE

In a relational database where would you use the following construct? What does it do?

 … ON DELETE CASCADE

Answer

ON DELETE CASCADE is used when creating a table that has a field with a FOREIGN KEY. The purpose it to ensure that when a row in the parent table is deleted all of the related rows in the CHILD table are deleted.

Example

 CREATE TABLE cities (
       city     varchar(80) primary key
 );

 CREATE TABLE weather_history (
       city      varchar(80) references cities(city) ON DELETE CASCADE,
       temp_lo   int,
       temp_hi   int,
       date      date
  );

The parent table is cities, the child is weather_history. We normally load it up so that we can get reports like this:

 New Orleans
    Jan 1 , 60f 70f
    Jan 2 , 61f 74f
 Charleston
    Jan 1 , 35f 50f
    Jan 2 , 45f 65f

If New Orleans gets washed into the ocean due to a hurricane, we can now get rid of it like this:

 DELETE FROM cities where city='New Orleans';

The ON DELETE CASCADE ensures all the related records in weather_history are removed, preventing orphans (a bunch of records in weather history that have no city attached to them)