[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yf1EUCgLIzb+rnIQ@kroah.com>
Date: Fri, 4 Feb 2022 16:20:48 +0100
From: Greg KH <gregkh@...uxfoundation.org>
To: Imran Khan <imran.f.khan@...cle.com>
Cc: tj@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/2] kernfs: use hashed mutex and spinlock in place of
global ones.
On Thu, Feb 03, 2022 at 01:50:25AM +1100, Imran Khan wrote:
> Reduce contention around global locks used in kernfs.
>
> The current kernfs design makes use of 3 global locks to synchronize
> various operations. There are a few other global locks as well but
> they are used for specific cases and hence don't cause severe contention.
>
> The above mentioned 3 main global locks used in kernfs are:
>
> 1. A global mutex, kernfs_open_file_mutex, to protect the list of
> kernfs_open_file instances correspondng to a sysfs attribute.
>
> 2. A global spinlock, kernfs_open_node_lock, to protect
> kernfs_node->attr.open which points to kernfs_open_node instance
> corresponding to a kernfs_node.
>
> 3. A global per-fs rw semaphore, root->kernfs_rwsem, to synchronize most
> of the other operations like adding, removing, renaming etc. of a file
> or directory.
>
> Since these locks are global, they can cause contention when multiple
> (for example few hundred) applications try to access sysfs (or other kernfs
> based file system) files in parallel, even if the applications are
> accessing different and unrelated files.
>
> For example on a system with 384 cores, if I run 200 instances of an
> application which is mostly executing the following loop:
>
> for (int loop = 0; loop <100 ; loop++)
> {
> for (int port_num = 1; port_num < 2; port_num++)
> {
> for (int gid_index = 0; gid_index < 254; gid_index++ )
> {
> char ret_buf[64], ret_buf_lo[64];
> char gid_file_path[1024];
>
> int ret_len;
> int ret_fd;
> ssize_t ret_rd;
>
> ub4 i, saved_errno;
>
> memset(ret_buf, 0, sizeof(ret_buf));
> memset(gid_file_path, 0, sizeof(gid_file_path));
>
> ret_len = snprintf(gid_file_path, sizeof(gid_file_path),
> "/sys/class/infiniband/%s/ports/%d/gids/%d",
> dev_name,
> port_num,
> gid_index);
>
> ret_fd = open(gid_file_path, O_RDONLY | O_CLOEXEC);
> if (ret_fd < 0)
> {
> printf("Failed to open %s\n", gid_file_path);
> continue;
> }
>
> /* Read the GID */
> ret_rd = read(ret_fd, ret_buf, 40);
>
> if (ret_rd == -1)
> {
> printf("Failed to read from file %s, errno: %u\n",
> gid_file_path, saved_errno);
>
> continue;
> }
>
> close(ret_fd);
> }
> }
>
> I can see contention around above mentioned locks as follows:
>
> - 54.07% 53.60% showgids [kernel.kallsyms] [k] osq_lock
> - 53.60% __libc_start_main
> - 32.29% __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> sys_open
> do_sys_open
> do_filp_open
> path_openat
> vfs_open
> do_dentry_open
> kernfs_fop_open
> mutex_lock
> - __mutex_lock_slowpath
> - 32.23% __mutex_lock.isra.5
> osq_lock
> - 21.31% __GI___libc_close
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> exit_to_usermode_loop
> task_work_run
> ____fput
> __fput
> kernfs_fop_release
> kernfs_put_open_node.isra.8
> mutex_lock
> - __mutex_lock_slowpath
> - 21.28% __mutex_lock.isra.5
> osq_lock
>
> - 10.49% 10.39% showgids [kernel.kallsyms] [k] down_read
> 10.39% __libc_start_main
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> sys_open
> do_sys_open
> do_filp_open
> - path_openat
> - 9.72% link_path_walk
> - 5.21% inode_permission
> - __inode_permission
> - 5.21% kernfs_iop_permission
> down_read
> - 4.08% walk_component
> lookup_fast
> - d_revalidate.part.24
> - 4.08% kernfs_dop_revalidate
>
> - 7.48% 7.41% showgids [kernel.kallsyms] [k] up_read
> 7.41% __libc_start_main
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> sys_open
> do_sys_open
> do_filp_open
> - path_openat
> - 7.01% link_path_walk
> - 4.12% inode_permission
> - __inode_permission
> - 4.12% kernfs_iop_permission
> up_read
> - 2.61% walk_component
> lookup_fast
> - d_revalidate.part.24
> - 2.61% kernfs_dop_revalidate
>
> Moreover this run of 200 application isntances takes 32-34 secs. to
> complete.
>
> This patch set is reducing the above mentioned contention by replacing
> these global locks with hashed locks.
>
> For example with the patched kernel and on the same test setup, we no
> longer see contention around osq_lock (i.e kernfs_open_file_mutex) and also
> contention around per-fs kernfs_rwsem has reduced significantly as well.
> This can be seen in the following perf snippet:
>
> - 1.66% 1.65% showgids [kernel.kallsyms] [k] down_read
> 1.65% __libc_start_main
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> sys_open
> do_sys_open
> do_filp_open
> - path_openat
> - 1.62% link_path_walk
> - 0.98% inode_permission
> - __inode_permission
> + 0.98% kernfs_iop_permission
> - 0.52% walk_component
> lookup_fast
> - d_revalidate.part.24
> - 0.52% kernfs_dop_revalidate
>
> - 1.12% 1.11% showgids [kernel.kallsyms] [k] up_read
> 1.11% __libc_start_main
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> sys_open
> do_sys_open
> do_filp_open
> - path_openat
> - 1.11% link_path_walk
> - 0.69% inode_permission
> - __inode_permission
> - 0.69% kernfs_iop_permission
> up_read
>
> Moreover the test execution time has reduced from 32-34 secs to 18-19 secs.
>
> The patches of this patchset introduce following chanages:
>
> PATCH-1: Make global kernfs_open_file_mutex and kernfs_open_node_lock
> per-fs hashed locks, where address of a kernfs_node acts as hash key.
> This results in kernfs_node objects, whose address give the
> different hash value, using different kernfs_open_file_mutex
> and kernfs_open_node_lock rather than all kernfs_node objects
> using the same kernfs_open_file_mutex and kernfs_open_node_lock
> as was the case earlier.
>
> PATCH-2: Replace per-fs single rw semaphore with per-fs hashed semaphore,
> where address of a kernfs_node acts as hash key to reduce contention
> around single per-fs rw semaphore.
>
> ------------------------------------------------------------------
>
> Changes since v3:
> - Make open_file_mutex and open_node_lock per-fs.
> - Replace per-fs rwsem with per-fs hashed rwsem.
> (Suggested by Tejun Heo <tj@...nel.org>)
Can you fix the issues found by the 0-day bot?
thanks,
greg k-h
Powered by blists - more mailing lists