linux-kernel - Re: [PATCH] kernfs: Change kernfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YraLOZb2SnKl0wUO@mtj.duckdns.org>
Date:   Sat, 25 Jun 2022 13:12:41 +0900
From:   Tejun Heo <tj@...nel.org>
To:     Imran Khan <imran.f.khan@...cle.com>
Cc:     gregkh@...uxfoundation.org, viro@...iv.linux.org.uk,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kernfs: Change kernfs_rwsem to a per-cpu rwsem.

On Mon, Jun 20, 2022 at 01:26:34PM +1000, Imran Khan wrote:
> On large systems when few hundred CPUs simulateously acquire kernfs_rwsem,
> for reading we see performance degradation due to bouncing of cache line
> that contains kernfs_rwsem. Changing kernfs_rwsem into a per-fs, per-cpu
> rwsem can fix this degradation.
...
> Moreover this run of 200 applications take more than 32 secs to finish on
> this system.
> 
> After changing kernfs_rwsem to a per-cpu rwsem, I can see that contention
> for kernfs_rwsem is no longer visible in perf data and the test execution
> time has reduced to almost half (17 secs or less from 32 secs or more).
> 
> The overhead involving write operations with per-cpu rwsem will be higher
> but frequency of creation/deletion of kernfs files is much less than
> frequency at which kernfs (cgroup, sysfs) files are read.

The problem with percpu_rwsem is that write locking requires going
through a RCU grace period, which can easily add two or more digit
millisec latencies. I'm pretty sure there are code paths which are
pretty heavy on write - e.g. during boot, depending on the machine
configuration, we could be write-acquiring the rwsem hundreds of
thousands of times and we'd be constantly doing RCU grace periods.

So, I don't think kernfs_rwsem is a good candidate for percpu rwsem.
There likely are plenty of cases where write path isn't cold enough.

Thanks.

-- 
tejun