linux-ext4 - Re: Kernel Benchmarking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200916103446.GB3607@quack2.suse.cz>
Date:   Wed, 16 Sep 2020 12:34:46 +0200
From:   Jan Kara <jack@...e.cz>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Matthieu Baerts <matthieu.baerts@...sares.net>,
        Michael Larabel <Michael@...haellarabel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Amir Goldstein <amir73il@...il.com>,
        Ted Ts'o <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

On Tue 15-09-20 16:35:45, Linus Torvalds wrote:
> On Tue, Sep 15, 2020 at 12:56 PM Matthieu Baerts
> <matthieu.baerts@...sares.net> wrote:
> >
> > I am sorry, I am not sure how to verify this. I guess it was one
> > processor because I removed "-smp 2" option from qemu. So I guess it
> > switched to a uniprocessor mode.
> 
> Ok, that all sounds fine. So yes, your problem happens even with just
> one CPU, and it's not any subtle SMP race.
> 
> Which is all good - apart from the bug existing in the first place, of
> course. It just reinforces the "it's probably a latent deadlock"
> thing.

So from the traces another theory that appeared to me is that it could be a
"missed wakeup" problem. Looking at the code in wait_on_page_bit_common() I
found one suspicious thing (which isn't a great match because the problem
seems to happen on UP as well and I think it's mostly a theoretical issue but
still I'll write it here):

wait_on_page_bit_common() has:

        spin_lock_irq(&q->lock);
        SetPageWaiters(page);
        if (!trylock_page_bit_common(page, bit_nr, wait))
	  - which expands to:
	  (
	        if (wait->flags & WQ_FLAG_EXCLUSIVE) {
        	        if (test_and_set_bit(bit_nr, &page->flags))
                	        return false;
	        } else if (test_bit(bit_nr, &page->flags))
        	        return false;
	  )

                __add_wait_queue_entry_tail(q, wait);
        spin_unlock_irq(&q->lock);

Now the suspicious thing is the ordering here. What prevents the compiler
(or the CPU for that matter) from reordering SetPageWaiters() call behind
the __add_wait_queue_entry_tail() call? I know SetPageWaiters() and
test_and_set_bit() operate on the same long but is it really guaranteed
something doesn't reorder these?

In unlock_page() we have:

        if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
                wake_up_page_bit(page, PG_locked);

So if the reordering happens, clear_bit_unlock_is_negative_byte() could
return false even though we have a waiter queued.

And this seems to be a thing commit 2a9127fcf22 ("mm: rewrite
wait_on_page_bit_common() logic") introduced because before we had
set_current_state() between SetPageWaiters() and test_bit() which implies a
memory barrier.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR