Discussion:FAQ new config : Différence entre versions

Version du 11 février 2019 à 15:51

Draft squashfs/mountimg

How can I use many small files efficiently?

You can gain in performance and minimize the pressure under /data in the following cases:

case1 your jobs are only reading under the directories where your zotfiles reside
case2 your jobs are reading your zotfiles but add only new files or directories in them
case3 your jobs generate zotfiles, but they will be accessed only for reading or adding new files afterwards

For case1:

convert your zotfiles directories to squashfs images
in your jobs:
- mount those images using sudo mountimg
- use those mounted directories for processing

For case2:

convert your zotfiles directories to squashfs images
in your jobs:
- mount those images using sudo mountimg
- use those mounted directories for processing but generate new files on the local filesystems of the node (ex: /tmp)
- unmount the images with sudo mounting -u
- add the new files to the images with mksquashfs-no-compression

For case3:

in your jobs:
- generates your zotfiles on the local filesystems of the node (ex: /tmp)
- convert them to squashfs images under /data with mksquashfs-no-compression

Creating squashfs images

You can convert your zotfiles on nef-devel or nef-devel2.

To convert your zotfiles to images, choose first the granularity apropriate to your case.

sudo mounting allows actually to mount at most 4000 images on a node.

If you have for example a really big directoy /data/.../DDD/DD/ containing hundreds of sub-directories D1 D2 ... DN, you may prefer to make one image per such sub-directory.

Example (in bash):

 cd /data/.../DDD
 # Build a separate directory for the images and the mountpoints
 mkdir DD-img DD-mnt
 cd DD
 for i in D*; do
   # Create the image
   mksquashfs-no-compression $i ../DD-img/$i.squashfs
   # Create the mountpoint for your future jobs
   mkdir ../DD-mnt/$i
 done

mksquashfs-no-compression is a simple wrapper to mksquashfs that disable any kind of compression to focus on speed. Feel free to try mksquashfs directly with other options like -comp lzo to save disk space.

Some mksquashfs hints:

if the destination image exist, the source files/directories will be added (appended) to the image.
- In addition, if a file/directory with a same name already exist in the image, the new file/directory will be added with the name xxx_1 xxx_2, etc, where xxx is the original name.
If a single directory is specified (i.e. mksquashfs source output.squashfs) the squashfs filesystem will consist of that directory, with the top-level root directory corresponding to the source directory.
- use the -keep-as-directory option to tell mksquashfs to keep the basename of the directory in its output.
If multiple source directories or files are specified, mksquashfs will merge the specified sources into a single filesystem, with the root directory containing each of the source files/directories. The name of each directory entry will be the basename of the source path. If more than one source entry maps to the same name, the conflicts are named xxx_1, xxx_2, etc. where xxx is the original name.

Mounting squashfs images

To mount one image with mounting, simply call: sudo mountimg <image path> <directory>

To unmount: sudo mountimg -u <directory>

Example: mount every squashfs images of /data/.../DDD/DD-img/ on the corresponding sub-directory under /data/.../DDD/DD-mnt/

 cd /data/.../DDD/DD-mnt || exit
 for i in *; do
   sudo mounting ../DD-img/$i.squashfs $i || exit
 done

In an oar job, a mount done with mounting will be automatically unmounted when the job terminates.

Such a mount can also be shared by more than one oar job and by more than one user. In this case, the unmount will be done when all the jobs terminate. Beware that every job has to do this mount to register to the list of processes needing it.

mounting allows actually to mount at most 4000 images on a node.

@@ Ligne 3 : / Ligne 3 : @@
 == How can I use many small files efficiently? ==
-You can gain in performance and minimize the pressure under /data in
+You can gain in performance and minimize the pressure under '''/data''' in
 the following  cases:
 * '''case1''' your jobs are only reading under the directories where your zotfiles reside
-* '''case2''' your jobs are reading your zotfiles but add new files in them
+* '''case2''' your jobs are reading your zotfiles but add only new files or directories in them
 * '''case3''' your jobs generate zotfiles, but they will be accessed only for reading or adding new files afterwards
@@ Ligne 26 : / Ligne 26 : @@
 * in your jobs:
 ** generates your zotfiles on the local filesystems of the node (ex: /tmp)
-** convert them to squashfs images under /data with '''mksquashfs-no-compression'''
+** convert them to squashfs images under '''/data''' with '''mksquashfs-no-compression'''
 === Creating squashfs images ===
+You can convert your zotfiles on '''nef-devel''' or '''nef-devel2'''.
 To convert your zotfiles to images, choose first the granularity
 apropriate to your case.
@@ Ligne 36 : / Ligne 39 : @@
 If you have for example a really big directoy /data/.../DDD/DD/
 containing hundreds of sub-directories D1 D2 ... DN, you may prefer to
-make one image per such directory.
+make one image per such sub-directory.
 Example (in bash):
@@ Ligne 47 : / Ligne 50 : @@
      # Create the image
      mksquashfs-no-compression $i ../DD-img/$i.squashfs
-     # Create the mountpoint for your jobs
+     # Create the mountpoint for your future jobs
      mkdir ../DD-mnt/$i
    done
-then in your jobs, if you need to mount all those images:
+'''mksquashfs-no-compression''' is a simple wrapper to '''mksquashfs''' that
+disable any kind of compression to focus on speed. Feel free to try
+'''mksquashfs''' directly with other options like '''-comp lzo''' to
+save disk space.
+Some mksquashfs hints:
+* if the destination image exist, the source files/directories will be added (appended) to the image.
+**  In addition, if a file/directory with a same name already exist in the image, the new file/directory will be added with the name xxx_1 xxx_2, etc, where xxx is the original name.
+* If a single directory is specified (i.e. mksquashfs source output.squashfs) the squashfs filesystem will consist of that directory, with the top-level root directory corresponding to the source directory.
+** use the '''-keep-as-directory''' option to tell mksquashfs to keep the basename of the directory in its output.
+* If multiple source directories or files are specified, mksquashfs will merge the specified sources into a single filesystem, with the root directory containing each of the source files/directories.  The name of each directory entry will be the basename of the source path. If more than one source entry maps to the same name, the conflicts are named xxx_1, xxx_2, etc. where xxx is the original name.
+=== Mounting squashfs images ===
+To mount one image with mounting, simply call: '''sudo mountimg <image path> <directory>'''
+To unmount: '''sudo mountimg -u <directory>'''
+Example: mount every squashfs images of /data/.../DDD/DD-img/ on the
+corresponding sub-directory under /data/.../DDD/DD-mnt/
    cd /data/.../DDD/DD-mnt || exit
@@ Ligne 58 : / Ligne 80 : @@
    done
-Some mksquashfs hints:
+In an oar job, a mount done with mounting will be automatically
-* if the destination image exist, the source files/directories will be added (appended) to the image.
+unmounted when the job terminates.
-** use the '''-noappend''' if you want to re-create completely the image, or remove it first.
-* If a single directory is specified (i.e. mksquashfs source output_fs) the squashfs filesystem will consist of that directory, with the top-level root directory corresponding to the source directory.
-** use the '''-keep-as-directory''' option to circumvent that.
-* If multiple source directories or files are specified, mksquashfs will merge the specified sources into a single filesystem, with the root directory containing each of the source files/directories.  The name of each directory entry will be the basename of the source path.   If more than one source entry maps to the same name, the conflicts are named xxx_1, xxx_2, etc. where xxx is the original name.
-'''mksquashfs-no-compression''' is a simple wrapper to mksquashfs that
+Such a mount can also be shared by more than one oar job and by more
-disable any kind of compression to focus on speed. Feel free to try
+than one user. In this case, the unmount will be done when all the jobs
-'''mksquashfs''' directly with other options like '''-comp lzo''' to
+terminate. Beware that every job has to do this mount to register to
-save disk space.
+the list of processes needing it.
-=== Using sudo mountimg ===
+mounting allows actually to mount at most 4000 images on a node.

Discussion:FAQ new config : Différence entre versions

Version du 11 février 2019 à 15:51

How can I use many small files efficiently?

Creating squashfs images

Mounting squashfs images

Menu de navigation

Outils personnels

Espaces de noms

Variantes

Affichages

Plus

Rechercher

Navigation

Clusters Howto

Clusters Guides & Tools

Outils