linux-kernel - Re: [RFC PATCH] kernfs: use kernfs_node specific mutex and spinlock.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <3f340003-2339-85a9-7d3a-51a488dcc720@oracle.com>
Date:   Mon, 3 Jan 2022 20:07:09 +1100
From:   Imran Khan <imran.f.khan@...cle.com>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     tj@...nel.org, linux-kernel@...r.kernel.org,
        akpm@...ux-foundation.org
Subject: Re: [RFC PATCH] kernfs: use kernfs_node specific mutex and spinlock.

Hi Greg,

On 24/12/21 7:40 pm, Greg KH wrote:
> On Fri, Dec 24, 2021 at 09:52:51AM +1100, Imran Khan wrote:
>> Hi everyone,
>>
>> On 16/12/21 2:06 am, Imran Khan wrote:
>>> Right now a global mutex (kernfs_open_file_mutex) protects list of
>>> kernfs_open_file instances corresponding to a sysfs attribute, so even
>>> if different tasks are opening or closing different sysfs files they
>>> can contend on osq_lock of this mutex. The contention is more apparent
>>> in large scale systems with few hundred CPUs where most of the CPUs have
>>> running tasks that are opening, accessing or closing sysfs files at any
>>> point of time. Since each list of kernfs_open_file belongs to a
>>> kernfs_open_node instance which in turn corresponds to one kernfs_node,
>>> move global kernfs_open_file_mutex within kernfs_node so that it does
>>> not block access to kernfs_open_file lists corresponding to other
>>> kernfs_node.
>>>
>>> Also since kernfs_node->attr.open points to kernfs_open_node instance
>>> corresponding to the kernfs_node, we can use a kernfs_node specific
>>> spinlock in place of current global spinlock i.e kernfs_open_node_lock.
>>> So make this spinlock local to kernfs_node instance as well.
>>>
>>> Signed-off-by: Imran Khan <imran.f.khan@...cle.com>
>>> ---
>>> I have kept this patch as RFC, as I am not sure if I have overlooked any
>>> scenario(s) where these global locks are needed.
>>>
>>
>> Could someone please provide some feedback about this change? Also if
>> there is any issues in this change, can I make these locks per-fs as has
>> been done in [1].
>>
>> [1] https://urldefense.com/v3/__https://lore.kernel.org/lkml/YZvV0ESA*zHHqHBU@google.com/__;Kw!!ACWV5N9M2RV99hQ!ZNLlzuX1cVFEAE5Ila2y8AzhvA3xI4HG4q13ZdcaQN__JaPLy6yuzdV0lypeVEIOHA$ 
> 
> Please test this using some tests to verify that sysfs is still working
> properly and that this actually takes less time overall.  In the
> conversations about the last time kernfs was changed, there were lots of
> discussions about proving that it actually mattered.
> 

Thanks for getting back on this.

Yes sysfs and cgroup are working with this change.

I verified the change:

1. Using LTP sysfs tests
2. Doing CPU hotplugging and reading CPU topology info from sysfs in
parallel. I was getting correct topology information or "No such file or
directory error"

If you could suggest me some further tests, I can test with those as well.

As far as overall time taken was concerned, I did not see any
improvement with my test application (I am running 200 instances of it
on a system with 384 CPUs).
The main loop of this application is as follows (One can use any other
sysfs hierarchy as well):


for (int loop = 0; loop <100 ; loop++)
{
  for (int port_num = 1; port_num < 2; port_num++)
  {
    for (int gid_index = 0; gid_index < 254; gid_index++ )
    {
      char ret_buf[64], ret_buf_lo[64];
      char gid_file_path[1024];
      int      ret_len, ret_fd;
      ssize_t  ret_rd;
      unsigned int  i, saved_errno;

      memset(ret_buf, 0, sizeof(ret_buf));
      memset(gid_file_path, 0, sizeof(gid_file_path));

      ret_len = snprintf(gid_file_path, sizeof(gid_file_path),
                         "/sys/class/infiniband/%s/ports/%d/gids/%d",
                         dev_name, port_num, gid_index);

      ret_fd = open(gid_file_path, O_RDONLY | O_CLOEXEC);
      if (ret_fd < 0)
      {
        printf("Failed to open %s\n", gid_file_path);
        continue;
      }

      /* Read the GID */
      ret_rd = read(ret_fd, ret_buf, 40);

      if (ret_rd == -1)
      {
        printf("Failed to read from file %s, errno: %u\n",
                gid_file_path, saved_errno);

        continue;
      }
      close(ret_fd);
    }
  }
}

The patch just moved the contention from osq_lock (corresponding to
kernfs_open_file_mutex) to read-write semaphore (kernfs_rwsem). I have
tried to address the kernfs_rwsem contention in v2 of this patch set at [1].

v2 of the patch set, reduces the test execution time to half (From ~36
secs to ~18 secs)
and contention around kernfs_rwsem lock is reduced to 1/3rd of earlier case.

8.61%     8.55%  showgids   [kernel.kallsyms]          [k] down_read
7.80%     7.75%  showgids [kernel.kallsyms]            [k] up_read

I will await feedback regarding v2 of this patchset.

[1]
https://lore.kernel.org/lkml/20220103084544.1109829-1-imran.f.khan@oracle.com/

> thanks,
> 
> greg k-h