Discussion:FAQ new config

De ClustersSophia
Révision datée du 11 février 2019 à 14:12 par Fm (discussion | contributions)
Aller à : navigation, rechercher

Draft squashfs/mountimg

How can I use many small files efficiently?

You can gain in performance and minimize the pressure under /data in the following cases:

  • case1 your jobs are only reading under the directories where your zotfiles reside
  • case2 your jobs are reading your zotfiles but add new files in them
  • case3 your jobs generate zotfiles, but they will be accessed only for reading or adding new files afterwards

For case1:

  • convert your zotfiles directories to squashfs images
  • in your jobs:
    • mount those images using sudo mountimg
    • use those mounted directories for processing

For case2:

  • convert your zotfiles directories to squashfs images
  • in your jobs:
    • mount those images using sudo mountimg
    • use those mounted directories for processing but generate new files on the local filesystems of the node (ex: /tmp)
    • unmount the images with sudo mounting -u
    • add the new files to the images with mksquashfs-no-compression

For case3:

  • in your jobs:
    • generates your zotfiles on the local filesystems of the node (ex: /tmp)
    • convert them to squashfs images under /data with mksquashfs-no-compression

Creating squashfs images

To convert your zotfiles to images, choose first the granularity apropriate to your case.

sudo mounting allows actually to mount at most 4000 images on a node.

If you have for example a really big directoy /data/.../DDD/DD/ containing hundreds of sub-directories D1 D2 ... DN, you may prefer to make one image per such directory.

Example (in bash):

 cd /data/.../DDD
 # Build a separate directory for the images and the mountpoints
 mkdir DD-img DD-mnt
 cd DD
 for i in D*; do
   # Create the image
   mksquashfs-no-compression $i ../DD-img/$i.squashfs
   # Create the mountpoint for your jobs
   mkdir ../DD-mnt/$i
 done

then in your jobs, if you need to mount all those images:

 cd /data/.../DDD/DD-mnt || exit
 for i in *; do
   sudo mounting ../DD-img/$i.squashfs $i || exit
 done

Some mksquashfs hints:

  • if the destination image exist, the source files/directories will be added (appended) to the image.
    • use the -noappend if you want to re-create completely the image, or remove it first.
  • If a single directory is specified (i.e. mksquashfs source output_fs) the squashfs filesystem will consist of that directory, with the top-level root directory corresponding to the source directory.
    • use the -keep-as-directory option to circumvent that.
  • If multiple source directories or files are specified, mksquashfs will merge the specified sources into a single filesystem, with the root directory containing each of the source files/directories. The name of each directory entry will be the basename of the source path. If more than one source entry maps to the same name, the conflicts are named xxx_1, xxx_2, etc. where xxx is the original name.

mksquashfs-no-compression is a simple wrapper to mksquashfs that disable any kind of compression to focus on speed. Feel free to try mksquashfs directly with other options like -comp lzo to save disk space.

Using sudo mountimg