Discussion:FAQ new config
Draft squashfs/mountimg
How can I use many small files efficiently?
You can gain in performance and minimize the pressure under /data in the following cases:
- case1 your jobs are only reading under the directories where your zotfiles reside
- case2 your jobs are reading your zotfiles but add new files in them
- case3 your jobs generate zotfiles, but they will be accessed only for reading or adding new files afterwards
For case1:
- convert your zotfiles directories to squashfs images
- in your jobs:
- mount those images using sudo mountimg
- use those mounted directories for processing
For case2:
- convert your zotfiles directories to squashfs images
- in your jobs:
- mount those images using sudo mountimg
- use those mounted directories for processing but generate new files on the local filesystems of the node (ex: /tmp)
- unmount the images with sudo mounting -u
- add the new files to the images with mksquashfs-no-compression
For case3:
- in your jobs:
- generates your zotfiles on the local filesystems of the node (ex: /tmp)
- convert them to squashfs images under /data with mksquashfs-no-compression
Creating squashfs images
To convert your zotfiles to images, choose first the granularity apropriate to your case.
sudo mounting allows actually to mount at most 4000 images on a node.
If you have for example a really big directoy /data/.../DDD/DD/ containing hundreds of sub-directories D1 D2 ... DN, you may prefer to make one image per such directory.
Example (in bash):
cd /data/.../DDD # Build a separate directory for the images and the mountpoints mkdir DD-img DD-mnt cd DD for i in D*; do # Create the image mksquashfs-no-compression $i ../DD-img/$i.squashfs # Create the mountpoint for your jobs mkdir ../DD-mnt/$i done
then in your jobs, if you need to mount all those images:
cd /data/.../DDD/DD-mnt || exit for i in *; do sudo mounting ../DD-img/$i.squashfs $i || exit done
Some mksquashfs hints:
- if the destination image exist, the source files/directories will be added (appended) to the image.
- use the -noappend if you want to re-create completely the image, or remove it first.
- If a single directory is specified (i.e. mksquashfs source output_fs) the squashfs filesystem will consist of that directory, with the top-level root directory corresponding to the source directory.
- use the -keep-as-directory option to circumvent that.
- If multiple source directories or files are specified, mksquashfs will merge the specified sources into a single filesystem, with the root directory containing each of the source files/directories. The name of each directory entry will be the basename of the source path. If more than one source entry maps to the same name, the conflicts are named xxx_1, xxx_2, etc. where xxx is the original name.
mksquashfs-no-compression is a simple wrapper to mksquashfs that disable any kind of compression to focus on speed. Feel free to try mksquashfs directly with other options like -comp lzo to save disk space.