As time goes by, applications get more complex, with more and more new features. This means the data models and the original architecture become outdated and need to be refactored.
Refactoring a relational database is complicated, particularly when structural changes involve migrating existing data.
Here’s how my team did it and the lessons we learned.
The project at hand required us to merge the table representations of two different types of objects: booking modifications (such as date changes or passenger additions) and the current state of booking data.
Denormalization allowed us to simplify our codebase by a great deal without having a strong negative impact to our performance. However there were some caveats,
- Booking data in more than 40 pairs of tables needed to be merged to new, unified tables.
- The site had to be up and running at the same time the migration was happening. For a a top ecommerce platform used worldwide, each minute down cost big money in bookings.
- We had 10 years worth of data, which meant millions of rows needed to be copied over.
As we discovered exactly what we had gotten into, we came up with a plan to migrate the data safely.
The Master Plan
We had to do the refactoring while being online the whole time,
- Creating the new tables with the desired schema
- This step was the easiest. We even used the opportunity to standardize naming for tables and columns.
- Copying rows to new tables
- We chose to have a background task that ran once in awhile and copied data in small batches, as to avoid putting too much pressure on the database servers
- Changing write and read paths
- Updating references to old tables in your codebase can be tiresome and, depending on your release process, a bit tricky. For us, it meant that during the code release, the app would reference both the old and new table names. We decided to use views and triggers to redirect all reads and writes to the appropriate table.
- Testing, testing, testing
TIP: Set up a test environment as close to production as possible and simulate high traffic load. Some problems only become apparent this way.
- I cannot emphasize enough the importance of testing. We used a combination of database diffs, and unit and automation tests to verify data consistency as much as possible.
- Getting rid of old data
- Now data in the old tables can be safely disregarded. However, since very delicate data operations took place, we decided to still keep the data around for a bit just in case we need to recover any piece of the original bookings.
- Things can and will go wrong in production. Prepare for faster recovery as well as fail-proofing your implementation.
- All the logs in the world won’t matter if you don’t monitor or have any alerting system working.
- Communication is key. Make sure all important parties are aware of your refactor, so that bugs are identified and fixed quicker.
- Incremental changes are better than huge, irreversible ones.
- Don’t be afraid of stopping the show. While deadlines are important, you also want to invest in thorough testing before your code touches critical data.
Database refactoring is often complicated. Each individual case is different, but I hope you now have a better idea of how to approach it. Be prepared to make hard decisions and remember that baby steps are safer.
Have you done a similar refactor before? Got ideas on how to do it better? Let me know!