Bruno Raljić

A Java Developer's Blog

How to merge two MySQL databases manually [part 1]

If you want a thing done well, do it yourself!

Sometimes, when working on large projects, you’ll end up with customer’s request to merge two databases (db with identical structure but different data, stored probably on different servers). If you’re lucky enough, you won’t end up with this kind of requests, and you won’t find this article through a Google query “How to merge two databases”.

Is there any DB migration/merging tool on the internet?

This kind of job is very tricky and you need to be very careful what you are doing. Perhaps you can find some tools on the internet which will do the database migration for you in just few clicks. I’m not saying those tools can’t do the migration properly, nor they are generally bad choice. I believe some of them are really powerful and can do most of the hard work without you even noticing it. And that’s not good for you, in my opinion. That’s my point here. In order to response to some bugs after merging databases (you’ll probably have some application specific bugs with or without merging tools), you need to know exactly  what happened and where in your database during the migration.

Sometimes you have to reinvent the wheel!

In most cases, you’ll end up with problems that people already encountered before and made good solutions for them, but here and there you’ll need to implement your own solution (even if it is primitive), just to make sure you are controlling everything. Well, at least you think you are controlling everything.

Steps in manual DB migration

Since we are doing it manually, the idea is to write some SQL scripts and execute them on D-Day. Here are few general steps you can follow with your DB migration.

  • Examine every single table in your DB schema
  • Find all tables that can be commonly migrated (common rules, same behavior)
  • Find all tables that have some table-specific rules (these are tricky, high possibility of making mistake)
  • Create backup dump
  • Create TODO list
  • Execute migration on test server
  • Find some ways to test it
  • Execute migration production server
  • Find some god(s) to pray you didn’t screwed up anything
  • Take vacation

Examine every single table in your DB schema

I think this one is perfectly clear. In order to successfully migrate your DB you’ll need to know everything about it. If not everything, then the more is better. Don’t forget about indexes, unique keys etc. Take your time here, don’t rush. You’ll spend much more time in this step, and it would be great if more than one person is involved in this phase. You may see what other person(s) may not and vice verse. The point is here – take longer analysis so you don’t end up with errors during the migration or (worse case) after it.

Find all tables that can be commonly migrated

When doing migration, there are two main groups of tables. First one is the group of tables that shares a common migration rules (specific to your application or migration logic). This group is easier to migrate since all you need is to identify all the tables belonging to this group. After that, there is a lot of copy paste work.

I’ll give you an example of a common rule. Let’s say you have tables products, orders and many to many table orders_products. You have products in both dbA and dbB databases, but those products are not the same, nor the orders (but they can have same ID). Since the product_id is unique, you can’t just simply move products from dbB to dbA (error, duplicate product_id). You need to update product_id in dbB and then move it to dbA. Remember, you need to update product_id in both products table and orders_products (and in all other places where you can find product_id, I’ll show you later how to find those places). There it is,you have first common rule. Now you need only to identify all your tables that can be safely migrated applying common rules.

Find all tables that have some table-specific rules

This is the second group of tables. This is where the headaches come. The group of tables that don’t share common rules. Instead, every table will probably have own set of rules. As I said, this is the place where you can make mistakes very easily (sometimes without even noticing it). For these cases you need to know both your database and your application, how the table is used, what are specific cases, what’s done in usual way, what’s not etc. Spend some extra time analyzing this group of tables.

I’ll give you another example. Table users. Lets say you have users in dbA and dbB, but some of them are same users (same person, same username but different user_id). If you think for a moment, you’ll realize it’s not possible just to increment values in user_id field and move them like in first group. You’ll end up with duplicate users. I mean, you won’t be able to do it at all if you say the username is unique. So, for this table you have two rules. First you need to take care of same users (adapt their user_id from one db to another), and after that you increment user_id for the other users and move them freely. You may have another table, some configurations for example where you will need only to adapt the values to a new db. The point is – you need to analyze this group carefully and there is no common pattern. Every table has its story. That’s one of the reasons I prefer manual migration.

Create backup dump

I don’t need to explain why you need this. Migration is the high-risk operation. Prepare dump of both databases so you can go back if something was wrong.

Create TODO list

Write down on paper (or in some TODO application) every single step you need to perform. You can add it while analyzing, or testing or whatever but just add it. Believe me, there are a lot of steps and during the testing you’ll repeat those tasks many times. There is really no need to keep this in head together with all other migration things you need to think about.

Execute migration on test server

After you finished your scripts, it’s time to execute them all on your test server. Load your test server with fresh dumps from production DB. Watch out for errors. Here is allowed to have them. In this step you can find out some things you didn’t think about before during analyzing. For example, if you updated some many to many table, during the migration on the test server you can find out there are already a combination with those values (unique key, two fields). This is very likely to be related with that second group of tables where you simply forget about something. Fix those mistakes and try again (from beginning) until the whole migration pass without error.

Find some ways to test it

Yes, you need to find it. I can’t say – here are the needed tests, run them, they are sufficient. No. You need to improvise with this. Think about how this can be tested at all. I can tell you one thing for sure – you can never test it 100%. Few simple ideas for testing are maybe count the number of rows in each db, and check the sum after migration to see if it match. Other thing you can test is the ID field. If you have some other ideas, feel free to leave them in comments.

Prepare some simple table where you will keep some numbers, for example count, id values, it can be useful later. Let’s say you have 100 products in dbA and 20 products in dbB. The idea is to increment id values in db2 (+100) so they become 101, 102… 120. Now write down the last ID from dbA, total count from dbB and after some time if you want the information about products (whether they are from dbA, or dbB or they are created after migration), you can simply get that information from these numbers. You know for sure that products with ID between 1 and 100 are from dbA, products with ID between 101 and 120 comes from dbB, and every product with higher value than 120 are made after migration.

Execute migration production server

Stop the application servers, and follow the task list. If all went fine on a test server, you won’t have any problems on production server. Of course, there is always room for some problems, so be careful. But, again, if you created TODO list, just take care of your tasks one by one and soon enough you’ll have your databases merged.

Find some god(s) to pray you didn’t screwed up anything

No comment here : )

Take vacation

Consider taking vacation after this hard work. Why not? Day or two. Perhaps two days in a week, not in a row. Just to be fair with your team members. Use this time to rest, fill your batteries and then continue to do your everyday job.

Epilogue

I went through this whole process with my colleague Nikola Čakarević, and I just wanted to share with you some thoughts. As you can see, this post is splitted in two parts. First part is more theoretic, I just wanted to point out some steps, while the second part has more technical details.

, ,

Leave a Reply

Your email address will not be published. Required fields are marked *

Show my latest blog post with this comment!