lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yf1EUCgLIzb+rnIQ@kroah.com>
Date:   Fri, 4 Feb 2022 16:20:48 +0100
From:   Greg KH <gregkh@...uxfoundation.org>
To:     Imran Khan <imran.f.khan@...cle.com>
Cc:     tj@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/2] kernfs: use hashed mutex and spinlock in place of
 global ones.

On Thu, Feb 03, 2022 at 01:50:25AM +1100, Imran Khan wrote:
> Reduce contention around global locks used in kernfs.
> 
> The current kernfs design makes use of 3 global locks to synchronize
> various operations. There are a few other global locks as well but
> they are used for specific cases and hence don't cause severe contention.
> 
> The above mentioned 3 main global locks used in kernfs are:
> 
> 1. A global mutex, kernfs_open_file_mutex, to protect the list of
> kernfs_open_file instances correspondng to a sysfs attribute.
> 
> 2. A global spinlock, kernfs_open_node_lock, to protect
> kernfs_node->attr.open which points to kernfs_open_node instance
> corresponding to a kernfs_node.
> 
> 3. A global per-fs rw semaphore, root->kernfs_rwsem, to synchronize most
> of the other operations like adding, removing, renaming etc. of a file
> or directory.
> 
> Since these locks are global, they can cause contention when multiple
> (for example few hundred) applications try to access sysfs (or other kernfs
> based file system) files in parallel, even if the applications are
> accessing different and unrelated files.
> 
> For example on a system with 384 cores, if I run 200 instances of an
> application which is mostly executing the following loop:
> 
>   for (int loop = 0; loop <100 ; loop++)
>   {
>     for (int port_num = 1; port_num < 2; port_num++)
>     {
>       for (int gid_index = 0; gid_index < 254; gid_index++ )
>       {
>         char ret_buf[64], ret_buf_lo[64];
>         char gid_file_path[1024];
> 
>         int      ret_len;
>         int      ret_fd;
>         ssize_t  ret_rd;
> 
>         ub4  i, saved_errno;
> 
>         memset(ret_buf, 0, sizeof(ret_buf));
>         memset(gid_file_path, 0, sizeof(gid_file_path));
> 
>         ret_len = snprintf(gid_file_path, sizeof(gid_file_path),
>                            "/sys/class/infiniband/%s/ports/%d/gids/%d",
>                            dev_name,
>                            port_num,
>                            gid_index);
> 
>         ret_fd = open(gid_file_path, O_RDONLY | O_CLOEXEC);
>         if (ret_fd < 0)
>         {
>           printf("Failed to open %s\n", gid_file_path);
>           continue;
>         }
> 
>         /* Read the GID */
>         ret_rd = read(ret_fd, ret_buf, 40);
> 
>         if (ret_rd == -1)
>         {
>           printf("Failed to read from file %s, errno: %u\n",
>                  gid_file_path, saved_errno);
> 
>           continue;
>         }
> 
>         close(ret_fd);
>       }
>     }
> 
> I can see contention around above mentioned locks as follows:
> 
> -   54.07%    53.60%  showgids         [kernel.kallsyms]       [k] osq_lock
>    - 53.60% __libc_start_main
>       - 32.29% __GI___libc_open
>            entry_SYSCALL_64_after_hwframe
>            do_syscall_64
>            sys_open
>            do_sys_open
>            do_filp_open
>            path_openat
>            vfs_open
>            do_dentry_open
>            kernfs_fop_open
>            mutex_lock
>          - __mutex_lock_slowpath
>             - 32.23% __mutex_lock.isra.5
>                  osq_lock
>       - 21.31% __GI___libc_close
>            entry_SYSCALL_64_after_hwframe
>            do_syscall_64
>            exit_to_usermode_loop
>            task_work_run
>            ____fput
>            __fput
>            kernfs_fop_release
>            kernfs_put_open_node.isra.8
>            mutex_lock
>          - __mutex_lock_slowpath
>             - 21.28% __mutex_lock.isra.5
>                  osq_lock
> 
> -   10.49%    10.39%  showgids         [kernel.kallsyms]      [k] down_read
>      10.39% __libc_start_main
>         __GI___libc_open
>         entry_SYSCALL_64_after_hwframe
>         do_syscall_64
>         sys_open
>         do_sys_open
>         do_filp_open
>       - path_openat
>          - 9.72% link_path_walk
>             - 5.21% inode_permission
>                - __inode_permission
>                   - 5.21% kernfs_iop_permission
>                        down_read
>             - 4.08% walk_component
>                  lookup_fast
>                - d_revalidate.part.24
>                   - 4.08% kernfs_dop_revalidate
> 
> -    7.48%     7.41%  showgids         [kernel.kallsyms]       [k] up_read
>      7.41% __libc_start_main
>         __GI___libc_open
>         entry_SYSCALL_64_after_hwframe
>         do_syscall_64
>         sys_open
>         do_sys_open
>         do_filp_open
>       - path_openat
>          - 7.01% link_path_walk
>             - 4.12% inode_permission
>                - __inode_permission
>                   - 4.12% kernfs_iop_permission
>                        up_read
>             - 2.61% walk_component
>                  lookup_fast
>                - d_revalidate.part.24
>                   - 2.61% kernfs_dop_revalidate
> 
> Moreover this run of 200 application isntances takes 32-34 secs. to
> complete.                                                     
> 
> This patch set is reducing the above mentioned contention by replacing
> these global locks with hashed locks. 
> 
> For example with the patched kernel and on the same test setup, we no
> longer see contention around osq_lock (i.e kernfs_open_file_mutex) and also
> contention around per-fs kernfs_rwsem has reduced significantly as well.
> This can be seen in the following perf snippet:
> 
> -    1.66%     1.65%  showgids         [kernel.kallsyms]      [k] down_read
>      1.65% __libc_start_main
>         __GI___libc_open
>         entry_SYSCALL_64_after_hwframe
>         do_syscall_64
>         sys_open
>         do_sys_open
>         do_filp_open
>       - path_openat
>          - 1.62% link_path_walk
>             - 0.98% inode_permission
>                - __inode_permission
>                   + 0.98% kernfs_iop_permission
>             - 0.52% walk_component
>                  lookup_fast
>                - d_revalidate.part.24
>                   - 0.52% kernfs_dop_revalidate
> 
> -    1.12%     1.11%  showgids         [kernel.kallsyms]      [k] up_read
>      1.11% __libc_start_main
>         __GI___libc_open
>         entry_SYSCALL_64_after_hwframe
>         do_syscall_64
>         sys_open
>         do_sys_open
>         do_filp_open
>       - path_openat
>          - 1.11% link_path_walk
>             - 0.69% inode_permission
>                - __inode_permission
>                   - 0.69% kernfs_iop_permission
>                        up_read
> 
> Moreover the test execution time has reduced from 32-34 secs to 18-19 secs.
> 
> The patches of this patchset introduce following chanages:
> 
> PATCH-1: Make global kernfs_open_file_mutex and kernfs_open_node_lock
> 	 per-fs hashed locks, where address of a kernfs_node acts as hash key.
> 	 This results in kernfs_node objects, whose address give the
> 	 different hash value, using different kernfs_open_file_mutex
> 	 and kernfs_open_node_lock rather than all kernfs_node objects
> 	 using the same kernfs_open_file_mutex and kernfs_open_node_lock
> 	 as was the case earlier.
> 
> PATCH-2: Replace per-fs single rw semaphore with per-fs hashed semaphore,
> 	 where address of a kernfs_node acts as hash key to reduce contention
> 	 around single per-fs rw semaphore.
> 
> ------------------------------------------------------------------
> 
> Changes since v3:
>  - Make open_file_mutex and open_node_lock per-fs.
>  - Replace per-fs rwsem with per-fs hashed rwsem.
>    (Suggested by Tejun Heo <tj@...nel.org>)

Can you fix the issues found by the 0-day bot?

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ