A DDL Migration is a set of DDL changes applied in a consistent manner, often via version-controlled scripts or tools, which enable a schema to evolve alongside applications that use it. In an active-active multi-master database, properly managing DDL migration is key to maintaining replication health across your database nodes.

Generally, DDL changes in a multi-master database must be coordinated across all nodes to prevent divergence and to ensure that replication can appropriately apply ongoing changes to each node.

pgEdge provides Automatic DDL Replication (Auto-DDL) to make it easier to manage changes to your database schema. This feature allows you to make your DDL changes against a single node and have them replicated to other active nodes. With Spock’s powerful capabilities, you can also take full control of schema updates and their associated replication configuration using Replication Sets.

As you consider moving your applications to use a multi-master database, you should consider employing the following strategies to improve your application’s write availability and to limit unexpected downtime due to DDL changes that can be made more safely and efficiently.

Make Backward-compatible Changes

To ensure the safe evolution of your schema, it’s best to make safe, backward-compatible changes to your schema that ensure writes can continue to be replicated to other nodes without needing  downtime.

These guidelines can help you craft backward-compatible changes:

  • Ensure new columns are nullable or use a default value, to ensure that write transactions that are currently replicating can safely land on other nodes, regardless of the migration state.

  • Avoid renaming existing objects (tables, columns, indexes, etc).

  • Use a phased deployment approach for non-backward compatible changes, such as column, table, or index renames:
    On the first deployment, add the new column and begin dual writes to the old and new columns to populate the new column
    On the second deployment, adjust your application to apply read and write transactions only to the new column.

    ○ On the final deployment, drop the old column.

  • Avoid reusing object names across multiple dependent migrations:
    For example, if you are dropping a constraint and introducing a newer version of that constraint, use a different name so it’s easier to recognize the migration status, and perform corrective actions.

  • Avoid changing types for existing columns:
    Instead, introduce a new column with the new type and populate it through application logic or triggers to keep it in sync with the original column.
    Once your application has migrated write transactions to the new column, you can then safely remove the old column in a separate migration.

  • Avoid removing an object (table, column, index) until it is no longer used by application code.

  • Utilize pre-migration validations when applying UNIQUE or NOT NULL constraints, which safely skip these migrations (and any that depend on it) if the underlying data is not valid.

  • Use CREATE INDEX CONCURRENTLY separately on each node if you want to avoid blocking other queries in the database.

  • Use IF EXISTS / IF NOT EXISTS where possible in your DDL to defend against migrations being run multiple times:
    This also gives you an escape hatch to make manual corrective changes on specific nodes when queued DDL statements are not completing, as they will become a No op.

  • For more complex migrations that include longer operations, it may be best to apply your changes individually on each node using spock.repair_mode rather than making them with Spock's AutoDDL functionality:
    This may require coordination / scripting to run across all nodes, with verification performed to ensure it is complete.
    If necessary, you can leverage pg_advisory_lock in coordination with your application to better control the behavior of other write statements to the database.

Regardless of the approach you take to your DDL changes, it’s best to always test and verify the changes on a staging environment before rolling them out to production.

Run Migrations Once when Using AutoDDL

When you are relying on AutoDDL to propagate DDL changes to other nodes, it’s generally best to run your migrations only once against a single, dedicated node.

If you are leveraging DNS routing to your nodes or using a load balancer that routes writes to multiple nodes, you run the risk of creating a situation where DDL changes with dependencies are applied to different nodes out of order, causing changes to not apply cleanly due to missing dependencies, resulting in schema mismatch.

This issue can be made worse when using migration tools, including ones bundled with Object-Relational Mappers (ORMs). Migration tools typically rely on a migration state that is stored in a database table and updated as migrations are successfully applied in their desired order to a single node.

If your changes are applied on application startup, multiple deployments of your application may attempt to apply the same DDL changes multiple times. This can generate errors and break replication across your subscriptions if the replicated statements do not prevent this scenario.

To counteract this, you should configure your applications to only run DDL changes from a single migration or application instance. If you're using a standalone migration tool, such as the Liquibase CLI, you should only run that tool against a single pgEdge Postgres instance.

If you are leveraging a library that runs as part of your application startup, you may need to introduce an environment variable like RUN_MIGRATIONS and wrap any library code involved in the migration so that it only executes on one application instance.

Migration documentation for various libraries and tools is linked below. If you are using a home-grown solution, the same approach should work to ensure your changes are only run once.

Python

Go

Node.js

Java

Make DDL Changes in Advance

With the popularity of migration tooling integrations with applications, it’s common to perform migrations during startup and then immediately begin leveraging new tables and columns in newly deployed application code.

This can cause issues in a multi-master system, especially when deployments may contain many DDL updates that may take time to apply across all nodes, even with AutoDDL managing those changes. DDL changes are queued by Spock, and applied in order, so if one change takes longer than expected, other database nodes may not be updated immediately.

In that event, your application may begin to exhibit errors if connecting to those other nodes, because it expects new tables and columns to be available for write transactions. In general, making DDL changes well in advance of when they are actually needed will help to avoid this problem.

One strategy to avoid this problem is to perform migrations as their own deployments. DDL changes typically occur early in the development process for new features, and if you are using a continuous delivery approach, you can make smaller deployments that are only meant to migrate the database. When you combine this approach with backward-compatible DDL changes, your application should not be impacted by new columns and tables that are not required for functionality.

Another strategy is to run data migrations prior to application deployment. Within your deployment process, add a step prior to application deployment that performs and verifies migrations against your database. If you are deploying your application in Kubernetes, an init container or pre-deployment job can achieve the same outcome.

Data migrations should be verified using a separate verification script, ensuring they are applied on all nodes before moving on to deploying the application. If you integrate this separate step into your CI/CD process, it should be easy to set up alarms and notifications to know when something may be wrong

Monitoring Migrations and Resolving Problems

It’s important to be able to monitor the status of schema migrations across all nodes in the system to know when they have succeeded.

In general, you should monitor the PostgreSQL logs to ensure that there were no issues applying changes to other nodes within the system. Look for spock apply errors to see if there might be issues with updates taking too long, or not being applied due to a previous mismatch.

A good way to understand where differences may exist is to utilize the schema-diff capability in the Active Consistency Engine (ACE) to identify differences in your schemas across nodes. You can then understand if a previous mismatch is preventing a DDL statement from being applied, or where your schema is out of sync. ACE could be run after your migration process to ensure the schema is stable before proceeding to deploy applications.

If you identify any differences across nodes, you can fix up individual node(s) using a manual DDL update with spocks’ repair_mode feature enabled. This can help to unblock a queued DDL change that may have relied on the existence of a specific column or table without also applying that DDL to other nodes.

If you make manual updates to resolve inconsistencies using spock.repair_mode, you may need to delete a queued DDL statement from spock.queue to allow other DDL changes and data replication to be unblocked. This is uncommon if you follow the best practices highlighted here, but may be necessary if a troublesome DDL migration is applied that is causing timeouts within Spock’s apply worker.

Conclusion

While DDL changes in a multi-master database require additional care, a thoughtful, staged approach can make schema evolution as manageable as it is with a single-master PostgreSQL setup. pgEdge provides the tools you need to create, manage, and monitor schema changes, ensuring your database remains highly available and consistent as it evolves alongside your applications.