/dev/blog/ID10T

Horde: Deleting Duplicates from MySQL Database

• MySQL and Horde • Comments
Advertisement

I recently encountered a problem with my Horde installation. Due to an app misconfiguration, my wife created hundreds of calendar entries for the same event [Edit 2018-04-17: It turned out, she also created thousand of duplicate tasks. Yay!]. It is tedious to delete every event by hand, so I wanted to drop the entries from the database. As this is something other people might profit from, I’m documenting it. Of course part of this procedure can be done with other MySQL databases as well.

Get listing of double entries

I wanted to see all duplicate entries first, to make safe I’m not deleting important stuff. This can be done with a SELECT COUNT command:

   SELECT event_title,event_start,COUNT(*) c FROM kronolith_events
     GROUP BY event_title,event_start HAVING c > 1 ORDER BY c ASC;

This is the result:

MySQL count duplicates table

Ouch! That are a lot of duplicate entries. Let’s delete them!

Advertisement

We will need the IDs of the entries for that. For Hordes Kronolith this is the event_uid. While we won’t use this as single command in this scenario, it’s interesting to know it:

SELECT GROUP_CONCAT(event_uid) FROM kronolith_events
  GROUP by event_title, event_start HAVING (
    COUNT(event_title) > 1 AND COUNT(event_start) > 1);

Get the affected tables

I had no idea in which tables I would find the affected entries. I am also not good enough in MySQL to find that out in the CLI (if it is even possible). So I used bash:

mysqldump --extended-insert=FALSE horde_db > /tmp/horde_db.sql \
  && for ID in $(mysql -B horde_db <<< "SELECT GROUP_CONCAT(event_uid) FROM kronolith_events
     GROUP by event_title, event_start HAVING (COUNT(event_title) > 1 AND COUNT(event_start) > 1);" \
  | tr ',' ' ' | tr '\n' ' '); \
  do grep "${ID}" tmp/horde_db.sql| awk '{print $3}'| tr -d '`'; \
  done | sort -u
# OUTPUT
horde_dav_objects
horde_histories
kronolith_events
nag_tasks
rampage_objects

This might differ for other users, so don’t rely on my results.

Now that we got all the affected tables and all the event_uids. We still need the column names the uids are in. As I have not worked thoroughly with MySQL in ages, I couldn’t find a quick way to get this, therefore I created a mapping manually. In my case it is this:

Delete the rows

Now I could have done some JOINs to get all the entries into one row, but again, due to my rusted skills it was too complicated for me and I could not find a quick way. So I just wrote a small bash script. It is very inefficient - my run took around 90s - because it iterates over every ID, but it gets the job done:

#!/bin/bash
set -euo pipefail
#set -x

for ID in $(mysql -B --skip-column-names horde_db <<< "
  SET @@group_concat_max_len = 10000000;
  SELECT GROUP_CONCAT(event_uid) FROM kronolith_events GROUP by event_title, event_start HAVING (COUNT(event_title)
  > 1 AND COUNT(event_start) > 1);" | tr ',' ' ' | tr '\n' ' ')
  do  
    echo "$ID"
    mysql --skip-column-names -B horde_db <<< "
    SET @@group_concat_max_len = 10000000;
    DELETE FROM horde_dav_objects WHERE id_external LIKE '${ID}.ics';
    DELETE FROM horde_histories WHERE object_uid LIKE '%${ID}%';
    DELETE FROM kronolith_events WHERE event_uid = '${ID}';
    DELETE FROM nag_tasks WHERE task_uid = '${ID}';
    DELETE FROM rampage_objects WHERE object_name = '${ID}';"
  done

This deletes every occurence of these entries, so you will need to recreate those you need.

If you have ideas how to improve this, because you are better in SQL than I am, feel free to comment. Other comments are of course welcome as well.

Advertisement
More posts
comments powered by isso

Advertisement