lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 26 Aug 2022 10:19:53 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Mikulas Patocka <mpatocka@...hat.com>
Cc:     Alan Stern <stern@...land.harvard.edu>,
        Andrea Parri <parri.andrea@...il.com>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Nicholas Piggin <npiggin@...il.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Akira Yokosawa <akiyks@...il.com>,
        Daniel Lustig <dlustig@...dia.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org
Subject: Re: [PATCH v3] wait_on_bit: add an acquire memory barrier

On Fri, Aug 26, 2022 at 9:45 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> So narrowing the load is fine (but you generally never want to narrow
> a *store*, because that results in huge problems with subsequent wider
> loads).

.. so making that statement made me go look around, and it's exactly
what we use for clear_bit() on x86.

Which is actually ok too, because it's a locked operation, and at that
point the whole "store buffer forwarding" issue pretty much goes out
the window anyway.

But because I looked at where the new test_bit_acquire() triggers and
where there are other bitops around it, I found this beauty in
fs/buffer.c: clean_bdev_aliases():

                                if (!buffer_mapped(bh) ||
(bh->b_blocknr < block))
                                        goto next;
                                if (bh->b_blocknr >= block + len)
                                        break;
                                clear_buffer_dirty(bh);
                                wait_on_buffer(bh);
                                clear_buffer_req(bh);

where it basically works on four different bits (buffer_mapped, dirty,
lock, req) right next to each other.

That code sequence really doesn't matter, but it was interesting
seeing the generated code. Not pretty, but the ugliest part was
actually how the might_sleep() calls in those helper functions result
in __cond_resched() when PREEMPT_VOLUNTARY is on, and how horrid that
is for register allocation.

And in bh_submit_read() we have another ugly thing:

        mov    (%rdi),%rax
        test   $0x4,%al
        je     <bh_submit_read+0x7d>
        push   %rbx
        mov    %rdi,%rbx
        testb  $0x1,(%rdi)

where we have a mix of those two kinds of "test_bit()" (and test
"testb" version most definitely looks better. But that's due to the
source code being

        BUG_ON(!buffer_locked(bh));
        if (buffer_uptodate(bh)) {

ie we have that *ancient* BUG_ON() messing things up. Oh well.

None of the buffer-head code matters any more, bit it's certainly
showing the effects of that patch of yours, and the code isn't pretty.

But none of the ugliness I found was actually _due_ to your patch. It
was all due to other things, and your patch just makes code generation
better in the cases I looked at.

             Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ