Assets and files saved in dotCMS are stored on an underlying shared file system, with pointers and metadata stored in the database. Generally, dotCMS nodes in a cluster share an NFS
or ReadWriteMany
volume, which allows all nodes to be able to save and retrieve assets as they are requested.
Binary Assets
All assets in dotCMS use the idea of binary fields to store assets. A binary field is just a content type field that can be added to any content type to include and store binary files along with content. The beauty of binary fields is that binary assets, such as images, files and videos, can be included, versioned and updated whenever the content object itself is referenced and updated. dotCMS stores these binary files on an underlying shared file system. These files are saved in the /assets
folder based on a b-tree like folder structure that uses {inode} + {fieldVariable}
as the key. For example, take a PDF file named a-big-document.pdf
referenced in a content type field called associatedPdf
and having a content version inode of 71b8a1ca-37b6-4b6e-a43b-c7482f28db6c. In this case, the underlying file would be located in the following FS location from the root of the /assets
folder:
/assets/7/1/71b8a1ca-37b6-4b6e-a43b-c7482f28db6c/associatedPdf/a-big-document.pdf
Shared Assets Location
In the dotCMS container, the directory /data/shared
directory is used to share the dotCMS assets and should be mapped in as a volume or persistant claim for use by the running containers. In our docker compose examples, we map this directory to a docker managed volume, cms-shared
, e.g.
volumes:
- cms-shared:/data/shared
In multi-node production environments, you will want to use a ReadWriteMany
file system share or a network shared volume using NFS
.
Changing the Path to the Assets Folder (Depreciated)
While not generally recommended, you may change the location where the hard assets are stored by changing the DOT_ASSET_REAL_PATH
environmental property.
export DOT_ASSET_REAL_PATH=/var/data/dotcms/assets
HardLinks & Storage Space
In order to minimize storage requirements, dotCMS uses hardlinks when storing versions of the same asset. In essence, this means that uploaded files, images or videos in dotCMS are only stored once. If further edits are made to the metadata surrounding that asset, which create new versions of the content in dotCMS, the file is unmodified and is still stored once - across all versions that . As a demonstration:
In our starter site, we have the file /images/404.jpg. You can see that this image has a version inode of 249eeb5c-7002-48e8-9ef3-ea6cd8e
Looking on dotCMS’s /assets filesystem at that stored image, you can see it stored under the /assets directory here with a size of 47k and an INODE on the file system of 62669676.
$ ls -lih ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
62669676 -rw-r--r-- 5 will staff 47K Jul 30 17:14 ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
Now if edits are made to the 404 content - if we change the title of the image or set show on menu=true, dotCMS will create new versions of the content but under the covers, the actual 404 image that is stored is stored as hardlinks to the original image. And this is where the magic happens - hardlinks are just pointers and take up almost no disk space. You can test this by editing the content a few times and doing a find on the fs and report back the filesystem inode.
$ find ./assets -name 404.jpg -exec ls -i {} \;
62669676 ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
62669676 ./assets/4/a/4a352130-523d-44bc-934a-f63e7af4779a/fileAsset/404.jpg
62669676 ./assets/9/c/9c6b1880-c78e-42e4-94d9-725a50a99235/fileAsset/404.jpg
62669676 ./assets/3/0/305d7840-7b1d-45e3-8be1-e6bf8aeb697e/fileAsset/404.jpg
You can see they are all the same inode - 62669676 - which means they are all just hardlinks to the same file system space on disk which is only stored once. You can test this by doing a du on all the 404.jpg files found:
$ du -shc \
> ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg \
> ./assets/4/a/4a352130-523d-44bc-934a-f63e7af4779a/fileAsset/404.jpg \
> ./assets/9/c/9c6b1880-c78e-42e4-94d9-725a50a99235/fileAsset/404.jpg \
> ./assets/3/0/305d7840-7b1d-45e3-8be1-e6bf8aeb697e/fileAsset/404.jpg
48K ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
48K total
The original image was 47k. Storing 4 versions of it the image using hardlinks only takes up 48k rather than the expected 188k (47k*4). Now if I edit the /images/404.jpg content again, and this time choose to upload a new image instead, things look different. Let’s say I replace the 404.jpg with another jpg that is 100k and save my content, creating a new version. If I run my find again, I get
$ find ./assets -name 404.jpg -exec ls -i {} \;
62700210 ./assets/0/c/0cef7994-2bc4-4fdc-82f7-f74ac57270f9/fileAsset/404.jpg
62669676 ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
62669676 ./assets/4/a/4a352130-523d-44bc-934a-f63e7af4779a/fileAsset/404.jpg
62669676 ./assets/3/0/305d7840-7b1d-45e3-8be1-e6bf8aeb697e/fileAsset/404.jpg
62669676 ./assets/9/c/9c6b1880-c78e-42e4-94d9-725a50a99235/fileAsset/404.jpg
And you can see now the inode list has two unique inodes in it - 62700210 and 62669676. To check out how much disk space is now being taken up by these 5 versions of content - we can run our du again and it returns the space taken my these 5 files
$ du -shc \
> ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg \
> ./assets/4/a/4a352130-523d-44bc-934a-f63e7af4779a/fileAsset/404.jpg \
> ./assets/9/c/9c6b1880-c78e-42e4-94d9-725a50a99235/fileAsset/404.jpg \
> ./assets/3/0/305d7840-7b1d-45e3-8be1-e6bf8aeb697e/fileAsset/404.jpg \
> ./assets/0/c/0cef7994-2bc4-4fdc-82f7-f74ac57270f9/fileAsset/404.jpg
100K ./assets/0/c/0cef7994-2bc4-4fdc-82f7-f74ac57270f9/fileAsset/404.jpg
48K ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
148K total