Beginning with dotCMS 23.06, MS SQL Server support has been deprecated; 23.01 LTS will be the last LTS major version to receive active MSSQL support. Similarly, 21.06 LTS, already past end of life by the time of writing, was the last major version to support either MySQL or Oracle databases.
Accordingly, PostgreSQL will soon be the only dotCMS-supported database. This page is intended to assist in dotCMS database migration to PostgreSQL.
Disclaimer: The information in this document are recommendations based on previous migration efforts; while this may suffice for many, unique database problems must be analyzed and resolved according to their specific characteristics and needs.
Requirements
- Linux-like server: We will assume the use of Amazon Linux 2.
- Resources: Since this server will run dotCMS, Postgres, and a second database, the recommendation is 16 GB of RAM and at least 4 CPUs.
- DBeaver 22.2.1+ (DBeaver Community)
- Postgres 12+ (15+ preferred)
- Docker
- Docker images: Postgres, dotCMS, and ElasticSearch
Migration
1. Upgrading to 21.06
If using a version prior to 21.06, we recommend first upgrading your dotCMS instance to dotCMS 21.06.14 LTS, the last patch in the 21.06 LTS series.
This is all the more crucial if you are using MySQL or Oracle; for either of those, 21.06 is the last-supported version. Once upgraded, begin with your dotCMS 21.06 docker instance configured to point to your MySQL or Oracle database.
2. Before Migration, Address Known Issues
a) Remove null characters
It has been found in some MySQL databases that some null characters caused the copy from MySQL to Postgres to crash. So far, the findings have been on the contentlet
table and text_area1
field. To verify the presence of null characters, the following query can be run; if 0 results are returned, then the field is ok to be migrated:
select count(*) from contentlet where text_area1 like '%\0%';
In the event that null characters have been found, the issue can be fixed by running the following command before transferring the data to the PostgreSQL database:
UPDATE contentlet AS c1
INNER JOIN contentlet AS c2 ON c2.inode = c1.inode
SET c2.text_area1 = REPLACE(c1.text_area1,'\0','')
WHERE c1.text_area1 like '%\0%';
Note: The latter code snippet has been tested on MySQL; it has been syntax-validated for, but not directly tested on, MS SQL Server and Oracle. On Oracle,
CHR(0)
may also be preferable to\0
. This document will expand to reflect successful tests on MSSQL and/or Oracle.
Please keep in mind this fix takes as an assumption based on past findings that the field with the null character is text_area1
on the contentlet
table. However, it is not impossible that such may be present on another field and another table. For this reason, it is important to remain mindful of any further errors during the copy process via DBeaver.
b) Drop old asset versions
Certain inconsistencies on the contentlet
table can be fixed by deleting old asset versions.
In the Admin Panel, go to System → Maintenance → Tools and locate the Drop Old Assets Versions operation. For the “Remove assets older than” option, use 01/01/2010 as the date, and execute. Then repeat the same step incrementing one year (01/01/2011) and so on, until the present date. Remember to follow the logs for any issues.
As mentioned in the above-linked document, this process will never remove live or working versions of any assets, regardless of their age.
3. Get the Database Schema
First, download 21.06.16-schema-only-no-constraints.sql.
Please be aware that this dump contains NO data and NO constraints. It will serve as a base to facilitate the data import. Constraints will be imported later during this migration.
Note also that no database changes occurred between 21.06.11 and the remaining 21.06.x patches, and so this schema remains fully compatible with the final 21.06 patch.
4. Prepare the Target Database
Next, we'll build a target PostgreSQL database to receive the data.
Create a new database in Postgres with CREATE DATABASE dotcms
and restore the dotCMS schema using the schema-only.sql
file from the previous sections:
pg_restore -U <db_username> -v -d <database_name> <dump_file>
For example:
pg_restore -U postgres -v -d dotcms 21.06.16-schema-only-no-constraints.sql
5. Copy Data With DBeaver
Install DBeaver and connect to both databases — the legacy DB and the Postgres DB. Once that is done, we can start migrating data from the tables on the former to the same tables on the latter. The below examples will assume a MySQL legacy DB, but MS SQL Server and Oracle should proceed similarly.
a) Migrating data
Using DBeaver's Database Navigator, search for the tables on the MySQL database. For example, let’s migrate the image table. Once the table is selected, click on it and then Export Data:
Use the Database method to export:
On the Tables mapping section, select the Postgres database if is not select by default:
You can make sure that the columns mappings are right by clicking on the arrow on the left:
If the table does not exist in the target, by default DBeaver will suggest creating it. However, three circled tables are no longer needed, so set the mapping to skip:
The list below includes all tables that can be safely skipped, because they are temporal or related to the live environment:
abcontact | dot_cluster |
abcontacts_ablists | dist_journal |
ablist | dist_process |
analytic_summary | dist_reindex_journal |
analytic_summary_404 | indicies |
analytic_summary_pages | notification |
analytic_summary_period | publishing_bundle |
analytic_summary_referer | publishing_end_point |
analytic_summary_visits | publishing_environment |
analytic_summary_workstream | publishing_pushed_assets |
analytic_summary_content | publishing_queue |
click | publishing_queue_audit |
clickstream | publishing_bundle_environment |
clickstream_404 | pollschoice |
clickstream_request | pollsdisplay |
cluster_server_uptime | pollsquestion |
cluster_server_action | pollsvote |
cluster_server | system_event |
Now, modify the extraction settings. These are the configurations that will use DBeaver to pull the data from the MySQL database. For more information please see DBeaver's data migration documentation:
If necessary, change the data load settings. These are the configurations used by DBeaver to insert the data into the Postgres database. Again, more detail can be found in DBeaver's data migration documentation:
Finally, confirm the operation to begin the transfer:
These steps should be performed on all tables, minus the exceptions in the table above.
b) Finishing the Migration
After copying all the tables from MySQL to Postgres, the last step is to recreate the constraints on the Postgres database.
Download 21.06.16-constraint-create.sql.
Now run the following command with reference to it:
psql -U dotcmsdbsuer dotcms < 21.06.16-constraint-create.sql
c) After Migration
There is a known issue that affects QRTZ jobs after migration. The following operations have been shown to fix it:
DELETE FROM qrtz_locks;
INSERT INTO qrtz_locks values('TRIGGER_ACCESS');
INSERT INTO qrtz_locks values('JOB_ACCESS');
INSERT INTO qrtz_locks values('CALENDAR_ACCESS');
INSERT INTO qrtz_locks values('STATE_ACCESS');
INSERT INTO qrtz_locks values('MISFIRE_ACCESS');
DELETE from QRTZ_EXCL_locks;
INSERT INTO QRTZ_EXCL_locks values('TRIGGER_ACCESS');
INSERT INTO QRTZ_EXCL_locks values('JOB_ACCESS');
INSERT INTO QRTZ_EXCL_locks values('CALENDAR_ACCESS');
INSERT INTO QRTZ_EXCL_locks values('STATE_ACCESS');
INSERT INTO QRTZ_EXCL_locks values('MISFIRE_ACCESS');