MySQL multiple rows update performance experiment

12 Mai 2012 - tags: mysql performance

MySQL doesn't offer something specifically for the job of updating multiple rows with different values (which is what you want to do when you're updating thousands of rows - one by one just won't do it), so you have to improvise.

The available methods, that I know of are as follows:

Insert-Update

INSERT INTO `test` (`id`, `value`) VALUES (1, 123), (2, 456), (3, 789) ON DUPLICATE KEY UPDATE `value` = VALUES(`value`);

This will generate a duplicate key exception on the `id` field and will update the `value` column with the apropiate value due to VALUES(`value`).

Pros: It's very fast.
Cons: If you happen to delete a row between fetching the id's to start processing the data and doing the update, this statement will insert a most likely invalid row, since you won't insert all the data of the row back, but just the portion you intend to update. After all, if you deleted it, you don't want that row in there anyway. This problem could be fixed by a constrain, but that will just result in an error, so not a great solution.

Bottom Line: If you know you won't delete rows, this option is great.
 

Temporary Table Update (Temp-Update)

CREATE TEMPORARY TABLE `test_update` LIKE `test`;

INSERT INTO `test_update` (`id`, `value`) VALUES (1, 123), (2, 456), (3, 789);

UPDATE `test` JOIN `test_update` ON `test`.`id` = `test_update`.`id` SET `test`.`value` = `test_update`.`value`

DROP TEMPORARY TABLE `test_update`;    // only needed if you're going to do this several times on the same connection

Pros: It won't fail if you delete rows from the main table in the meantime.
Const: Slower than an Insert-Update
 

CASE Update

UPDATE `test` SET `value` = CASE `id`
WHEN 1 THEN 123
WHEN 2 THEN 456
WHEN 3 THEN 789
ELSE `value`                    ; This makes sure that if the case statement fails, the value won't be set to NULL
END
WHERE `id` IN (1, 2, 3)

Pros: It works well with deleted rows, it's faster, in some circumstances, than Temp-Update
Cons: It looks shitty, gets slower on bigger updates.

 

The Test

In order to test the performance of this methods, I did the following things, 50 times for each method, and computed the average time it takes them to complete:

I generated 1000 id, value pairs and updated them.
I generated 1000 id, value pairs and updated them, generating new ones for each update.
Did the same with 5000 rows.

So for each method tested, I updated 1000/5000 rows, 50 times and divided the total elapsed time to 50.
Then I generated 1000/5000 rows, 50 times, and updated them, dividing the totall elapsed time to 50.

I only took into account the time the queries took to execute, excluding the additional code's execution time.

The environment:
OS: Win7 x64
CPU: 6 cores @ 3.3Ghz
Memory: 16Gb, 12Gb dedicated to MySQL
Test Table: `id` - primary key, `value` - INTEGER, 9 milllion rows. `value` was set to 0 after each test.
 

The Results

The time is expressed in seconds.

  Same Rows Unique Rows
  1000 Rows 5000 Rows 1000 Rows 5000 Rows
Insert-Update 0.009552 0.049210 0.028019 0.088515
Temp-Update 0.172427 0.267719 0.192720 0.309200
CASE-Update 0.031331 0.994978 0.045903 1.010313

 

What I haven't tested is what happens when there are multiple values to update for the same `id`. I have a feeling CASE-Update is gonna fall short on that one, but can't tell how the other ones will behave.

Comments

Name:
Email:
Title:
Message:
5 chars: captcha reload