Discussion:FAQ new config

De ClustersSophia
Révision datée du 13 février 2019 à 08:37 par Fm (discussion | contributions) (check spell)
Aller à : navigation, rechercher

Draft squashfs/mountimg

How can I use many small files efficiently?

You can gain in performance and minimize the pressure under /data in the following cases:

  • case1 your jobs are only reading under the directories where your zotfiles reside
  • case2 your jobs are reading your zotfiles but add only new files or directories in them
  • case3 your jobs generate zotfiles, but they will be accessed only for reading or adding new files afterwards

For case1:

  • convert your zotfiles directories to squashfs images
  • in your jobs:
    • mount those images using sudo mountimg
    • use those mounted directories for processing

For case2:

  • convert your zotfiles directories to squashfs images
  • in your jobs:
    • mount those images using sudo mountimg
    • use those mounted directories for processing but generate new files on the local filesystems of the node (ex: /tmp)
    • unmount the images with sudo mountimg -u
    • add the new files to the images with mksquashfs-no-compression

For case3:

  • in your jobs:
    • generates your zotfiles on the local filesystems of the node (ex: /tmp)
    • convert them to squashfs images under /data with mksquashfs-no-compression

Creating squashfs images

You can convert your zotfiles on nef-devel or nef-devel2.

To convert your zotfiles to images, choose first the granularity appropriate to your case.

sudo mountimg allows actually to mount at most 4000 images on a node.

If you have for example a really big directory /data/.../DDD/DD/ containing hundreds of sub-directories D1 D2 ... DN, you may prefer to make one image per such sub-directory.

Example (in bash):

 cd /data/.../DDD
 # Build a separate directory for the images and the mountpoints
 mkdir DD-img DD-mnt
 cd DD
 for i in D*; do
   # Create the image
   mksquashfs-no-compression $i ../DD-img/$i.squashfs
   # Create the mountpoint for your future jobs
   mkdir ../DD-mnt/$i
 done

mksquashfs-no-compression is a simple wrapper to mksquashfs that disable any kind of compression to focus on speed. Feel free to try mksquashfs directly with other options like -comp lzo to save disk space.

Some mksquashfs hints:

  • if the destination image exist, the source files/directories will be added (appended) to the image.
    • In addition, if a file/directory with a same name already exist in the image, the new file/directory will be added with the name xxx_1 xxx_2, etc, where xxx is the original name.
  • If a single directory is specified (i.e. mksquashfs source output.squashfs) the squashfs filesystem will consist of that directory, with the top-level root directory corresponding to the source directory.
    • use the -keep-as-directory option to tell mksquashfs to keep the basename of the directory in its output.
  • If multiple source directories or files are specified, mksquashfs will merge the specified sources into a single filesystem, with the root directory containing each of the source files/directories. The name of each directory entry will be the basename of the source path. If more than one source entry maps to the same name, the conflicts are named xxx_1, xxx_2, etc. where xxx is the original name.

Mounting squashfs images

To mount one image, simply call: sudo mountimg <image path> <directory>

To unmount: sudo mountimg -u <directory>

Example: mount every squashfs images of /data/.../DDD/DD-img/ on the corresponding sub-directory under /data/.../DDD/DD-mnt/

 cd /data/.../DDD/DD-mnt || exit
 for i in *; do
   sudo mountimg ../DD-img/$i.squashfs $i || exit
 done

In an oar job, a mount done with mountimg will be automatically unmounted when the job terminates.

Such a mount can also be shared by more than one oar job and by more than one user. In this case, the unmount will be done when all the jobs terminate. Beware that every job has to do this mount to register to the list of processes needing it.

mountimg allows actually to mount at most 4000 images on a node.