linux-kernel - Re: [PATCH 1/2] lockref: speculatively spin waiting for the lock to be released

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5cixyyivolodhsru23y5gf5f6w6ov2zs5rbkxleljeu6qvc4gu@ivawdfkvus3p>
Date: Thu, 13 Jun 2024 08:09:57 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, jack@...e.cz, 
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 1/2] lockref: speculatively spin waiting for the lock to
 be released

On Wed, Jun 12, 2024 at 06:23:18PM -0700, Linus Torvalds wrote:
> So I have no problem with your patch 2/2 - moving the lockref data
> structure away from everything else that can be shared read-only makes
> a ton of sense independently of anything else.
> 
> Except you also randomly increased a retry count in there, which makes no sense.
> 

Cmon man, that's a change which unintentionally crept into the patch and
I failed to notice.

> But this patch 1/2 makes me go "Eww, hacky hacky".
> 
> We already *have* the retry loop, it's just that currently it only
> covers the cmpxchg failures.
> 
> The natural thing to do is to just make the "wait for unlocked" be
> part of the same loop.
> 

I was playing with a bpftrace probe reporting how many spins were
performed waiting for the lock. For my intentional usage with ls it was
always < 30 or so. The random-ass intruder which messes with my bench on
occasion needed over 100.

The bump is something I must have fat-fingered into the wrong patch.

Ultimately even if *some* iterations will still take the lock, they
should be able to avoid it next time around, so the permanent
degradation problem is still solved.

> Mind testing this approach instead?
> 

So I'm not going to argue about the fix.

I tested your code and confirm it fixes the problem, nothing blows up
either and I fwiw I don't see any bugs in it.

When writing the commit message feel free to use mine in whatever capacity
(including none) you want. 

On Wed, Jun 12, 2024 at 06:49:00PM -0700, Linus Torvalds wrote:

> On Wed, 12 Jun 2024 at 18:23, Linus Torvalds
> One of the ideas behind the reflux was that locking should be an
> exceptional thing when something special happens. So things like final
> dput() and friends.
> 
> What I *think* is going on - judging by your description of how you
> triggered this - is that sadly our libfs 'readdir()' thing is pretty
> nasty.
> 
> It does use d_lock a lot for the cursor handling, and things like
> scan_positives() in particular.
> 
> I understand *why* it does that, and maybe it's practically unfixable,
> but I do think the most likely deeper reason for that "go into slow
> mode" is the cursor on the directory causing issues.
> 
> Put another way: while I think doing the retry loop will help
> benchmarks, it would be lovely if you were to look at that arguably
> deeper issue of the 'd_sib' list.
> 

I think lockref claiming to be a general locking facility means it
should not be suffering the degradation problem to begin with, so that
would be the real problem as far as lockref goes.

For vfs as a whole I do agree that the d_lock usage in various spots is
probably avoidable and if so would be nice to get patched out, readdir
included. So Someone(tm) should certainly look into it.

As for me at the moment I have other stuff on my todo list, so I am not
going to do it for the time being.

Regardless, patching up lockref to dodge the lock is a step forward in
the right direction AFAICS.

=====

All that aside, you did not indicate how do you want to move forward
regarding patch submission.

As indicated in my cover letter the vfs change (if it is to land) needs
to be placed *after* the lockref issue is addressed, otherwise it may
result in bogus regression reports. Thus I think it makes most sense to
just ship them together.

So maybe you can send me a patch and I send a v2 of the patchset with
that, alternatively perhaps you can edit out the unintional retry
adjustment from my dentry patch and handle the rest yourself.

Or maybe you have some other idea.

The thing that matters to me is not landing in a state where d_lockref
is moved around, but the lockref corner case is not patched (even it is
patched *after*). I really don't want to investigate a regression report
only to find it's caused by the above.