linux-kernel - Re: [GIT PULL] f2fs for 5.18

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wi99R8i=uvHiHo3jjZPzg6oTJW1rin3ekuPbuccS5XZqA@mail.gmail.com>
Date:   Wed, 23 Mar 2022 10:06:43 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Jaegeuk Kim <jaegeuk@...nel.org>
Cc:     Tim Murray <timmurray@...gle.com>,
        Waiman Long <longman@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux F2FS Dev Mailing List 
        <linux-f2fs-devel@...ts.sourceforge.net>
Subject: Re: [GIT PULL] f2fs for 5.18

On Wed, Mar 23, 2022 at 9:26 AM Jaegeuk Kim <jaegeuk@...nel.org> wrote:
>
> OTOH, I was suspecting the major contetion would be
>         f2fs_lock_op -> f2fs_down_read(&sbi->cp_rwsem);
> , which was used for most of filesystem operations.

Very possible, I was just looking at a random one in f2fs/file.c
obviously with no actual numbers in hand.

In general, I really hate seeing specialized locks, but this f2fs use
case is in some ways worse than other ad-hoc locks I've seen - simply
because it's been one whole-sale conversion of "down_read/write()" to
"f2fs_down_read/write()" - regardless of _which_ lock is being locked.

(Now, it's not all bad news - in other respects it's much better than
some ad-hoc locking: at least you still will participate in lockdep,
and you use the actual low-level locking primitives instead of making
up your own and getting memory ordering wrong).

But basically I think it would have been much nicer if you would have
done this for just the _one_ lock that mattered, whichever lock that
might be. Partly as documentation, and partly so that maybe some day
you can split that lock up (or maybe notice cases where you can avoid
it entirely).

For example, if it's really just f2fs_lock_op() that needs this, the
special "wait_event(trylock)" hack could have been entirely local to
just *that*, rather than affecting all the other locks too.

And the very first f2fs_lock_op() case I find, I see that the lock is
pointless. Again, that's unlikely to be the *cause* of any of these
problems, but the fact that I've now looked at two of the f2fs locks,
and gone "the locking seems to be pointlessly badly done" does imply
that the problem isn't "down_read()", it's the use.

That other lock I reacted to was the f2fs_lock_op(sbi) at the top of
f2fs_new_inode().

Look, you have a new inode that you just allocated, that nobody else
can yet access.

And the only thing that that f2fs_lock_op(sbi) -> f2fs_unlock_op(sbi)
sequence protects is the f2fs_alloc_nid() for that new inode.

Ok, so maybe f2fs_alloc_nid() needs that lock?

No it doesn't. It already has

 - &nm_i->nid_list_lock spinlock for its own in-memory internal NID caches

*and* when that fails

 - &NM_I(sbi)->build_lock for protecting all of f2fs_build_free_nids()

*and* inside of that lock

 - f2fs_down_read(&nm_i->nat_tree_lock) for protecting the NAT tree structures.

So I see two major issues in the very first user of that
f2fs_lock_op() that I look at:

 (a) it seems to be entirely unnecessary

 (b) it is a classic case of "multiple nested locks".

Now, it's possible that I'm wrong on (a) and there's some odd reason
that lock is needed (maybe there is a lock ordering problem for one of
the other locks between readers and writers, and the op-lock acts as a
mutual exclusion for that).

But (b) really is a classic problem case for locking: nested locks are
*much* more likely to cause horrible contention, because not any
contention in any of the locks will end up affecting the others (and
you easily get "bunching up" of different processes when they get
synchronized with each other thanks to the inner lock).

Nested locking is often required, but it's one of those things where
you just need to be aware that they can be horribly bad for
performance, _particularly_ if an inner lock sees contention and
essentially "transfers" that contention to an outer lock.

Maybe I've been unlucky. Maybe the two cases I happened to look at
were just completely harmless, and very unusual. But the fact that I'm
two-for-two and go "that locking looks like a prime candidate to be
fixed" makes me suspect there's a lot of low-hanging fruit in there.

And that whole "wait_event(trylock)" thing is a symptom of problematic
f2fs locking, rather than a solution to it.

                 Linus