lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201026094948.GA29758@quack2.suse.cz>
Date:   Mon, 26 Oct 2020 10:49:48 +0100
From:   Jan Kara <jack@...e.cz>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Qian Cai <cai@....pw>, Christoph Hellwig <hch@...radead.org>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>,
        linux-mm@...ck.org
Subject: Re: kernel BUG at mm/page-writeback.c:2241 [
 BUG_ON(PageWriteback(page); ]

On Thu 22-10-20 01:49:06, Matthew Wilcox wrote:
> On Wed, Oct 21, 2020 at 08:30:18PM -0400, Qian Cai wrote:
> > Today's linux-next starts to trigger this wondering if anyone has any clue.
> 
> I've seen that occasionally too.  I changed that BUG_ON to VM_BUG_ON_PAGE
> to try to get a clue about it.  Good to know it's not the THP patches
> since they aren't in linux-next.
> 
> I don't understand how it can happen.  We have the page locked, and then we do:
> 
>                         if (PageWriteback(page)) {
>                                 if (wbc->sync_mode != WB_SYNC_NONE)
>                                         wait_on_page_writeback(page);
>                                 else
>                                         goto continue_unlock;
>                         }
> 
>                         VM_BUG_ON_PAGE(PageWriteback(page), page);
> 
> Nobody should be able to put this page under writeback while we have it
> locked ... right?  The page can be redirtied by the code that's supposed
> to be writing it back, but I don't see how anyone can make PageWriteback
> true while we're holding the page lock.

FWIW here's very similar report for ext4 [1] and I strongly suspect this
started happening after Linus' rewrite of the page bit waiting logic. Linus
thinks it's preexisting bug which just got exposed by his changes (which is
possible). I've been searching a culprit for some time but so far I failed.
It's good to know it isn't ext4 specific so we should be searching in the
generic code ;). So far I was concentrating more on ext4 bits...

								Honza

[1] https://lore.kernel.org/lkml/000000000000d3a33205add2f7b2@google.com/

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ