MySQL doesn't offer something specifically for the job of updating multiple rows with different values (which is what you want to do when you're updating thousands of rows - one by one just won't do it), so you have to improvise.
The available methods, that I know of are as follows:
Insert-Update
INSERT INTO `test` (`id`, `value`) VALUES (1, 123), (2, 456), (3, 789) ON DUPLICATE KEY UPDATE `value` = VALUES(`value`);
This will generate a duplicate key exception on the `id` field and will update the `value` column with the apropiate value due to VALUES(`value`).
Pros: It's very fast.
Cons: If you happen to delete a row between fetching the id's to start processing the data and doing the update, this statement will insert a most likely invalid row, since you won't insert all the data of the row back, but just the portion you intend to update. After all, if you deleted it, you don't want that row in there anyway. This problem could be fixed by a constrain, but that will just result in an error, so not a great solution.
Bottom Line: If you know you won't delete rows, this option is great.
Temporary Table Update (Temp-Update)
CREATE TEMPORARY TABLE `test_update` LIKE `test`;
INSERT INTO `test_update` (`id`, `value`) VALUES (1, 123), (2, 456), (3, 789);
UPDATE `test` JOIN `test_update` ON `test`.`id` = `test_update`.`id` SET `test`.`value` = `test_update`.`value`
DROP TEMPORARY TABLE `test_update`; // only needed if you're going to do this several times on the same connection
Pros: It won't fail if you delete rows from the main table in the meantime.
Const: Slower than an Insert-Update
CASE Update
UPDATE `test` SET `value` = CASE `id`
WHEN 1 THEN 123
WHEN 2 THEN 456
WHEN 3 THEN 789
ELSE `value` ; This makes sure that if the case statement fails, the value won't be set to NULL
END
WHERE `id` IN (1, 2, 3)
Pros: It works well with deleted rows, it's faster, in some circumstances, than Temp-Update
Cons: It looks shitty, gets slower on bigger updates.
The Test
In order to test the performance of this methods, I did the following things, 50 times for each method, and computed the average time it takes them to complete:
I generated 1000 id, value pairs and updated them.
I generated 1000 id, value pairs and updated them, generating new ones for each update.
Did the same with 5000 rows.
So for each method tested, I updated 1000/5000 rows, 50 times and divided the total elapsed time to 50.
Then I generated 1000/5000 rows, 50 times, and updated them, dividing the totall elapsed time to 50.
I only took into account the time the queries took to execute, excluding the additional code's execution time.
The environment:
OS: Win7 x64
CPU: 6 cores @ 3.3Ghz
Memory: 16Gb, 12Gb dedicated to MySQL
Test Table: `id` - primary key, `value` - INTEGER, 9 milllion rows. `value` was set to 0 after each test.
The Results
The time is expressed in seconds.
| Same Rows | Unique Rows | |||
|---|---|---|---|---|
| 1000 Rows | 5000 Rows | 1000 Rows | 5000 Rows | |
| Insert-Update | 0.009552 | 0.049210 | 0.028019 | 0.088515 |
| Temp-Update | 0.172427 | 0.267719 | 0.192720 | 0.309200 |
| CASE-Update | 0.031331 | 0.994978 | 0.045903 | 1.010313 |
What I haven't tested is what happens when there are multiple values to update for the same `id`. I have a feeling CASE-Update is gonna fall short on that one, but can't tell how the other ones will behave.
Comments