Discussion:FAQ new config
De ClustersSophia
Draft squashfs/mountimg
How can I use many small files efficiently?
You can gain in performance and minimize the pressure under /data in the following cases:
- case1 your jobs are only reading under the directories where your zotfiles reside
- case2 your jobs are reading your zotfiles but add new files in them
- case3 your jobs generate zotfiles, but they will be accessed only for reading or adding new files afterwards
For case1:
- convert your zotfiles directories to squashfs images
- in your jobs:
- mount those images using sudo mountimg
- use those mounted directories for processing
For case2:
- convert your zotfiles directories to squashfs images
- in your jobs:
- mount those images using sudo mountimg
- use those mounted directories for processing but generate new file on the local filesystems of the node (ex: /tmp)
- unmount the images with sudo mounting -u
- add the new files to the images with mksquashfs-no-compression
For case3:
- in your jobs:
- generates your zotfiles on the local filesystems of the node (ex: /tmp)
- convert them to squashfs images under /data with mksquashfs-no-compression
To convert your zotfiles to images, choose first the granularity apropriate to your case.
sudo mounting allows actually to mount at most 4000 images on a node.
If you have for example a really big directoy /data/.../DDD/DD/ containing hundreds of sub-directories D1 D2 ... DN, you may prefer to make one image per such directory.
Example (in bash):
cd /data/.../DDD # Build a separate directory for the images and the mountpoints mkdir DD-img DD-mnt cd DD for i in D*; do # Create the image mksquashfs-no-compression $i ../DD-img/$i.squashfs # Create the mountpoint for your jobs mkdir ../DD-mnt/$i done
then in your jobs:
cd /data/.../DDD/DD-mnt for i in *; do sudo mounting ../DD-img/$i.squashfs $i done
mksquashfs-no-compression is a simple wrapper to mksquashfs that disable any kind of compression to focus on speed. Feel free to try mksquashfs with other options like -comp lzo to save disk space.
Refs:
- man mksquashfs
- /usr/share/doc/squashfs-tools/README