linux-kernel - Re: kernel BUG at mm/page-writeback.c:LINE!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210112104425.GA8760@quack2.suse.cz>
Date:   Tue, 12 Jan 2021 11:44:25 +0100
From:   Jan Kara <jack@...e.cz>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        syzbot <syzbot+2fc0712f8f8b8b8fa0ef@...kaller.appspotmail.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>
Subject: Re: kernel BUG at mm/page-writeback.c:LINE!

On Fri 08-01-21 18:04:21, Linus Torvalds wrote:
> On Tue, Jan 5, 2021 at 11:53 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > I took your "way to go" statement as an ack, and made it all be commit
> > c2407cf7d22d ("mm: make wait_on_page_writeback() wait for multiple
> > pending writebacks").
> 
> Oh, and Michael Larabel (of phoronix) reports that that one-liner does
> something bad to a few PostgreSQL tests, on the order of 5-10%
> regression on some machines (but apparently not others).

Do you have more details? From my experience (we do regular pgbench runs
for various kernels in various configs in SUSE) PostgreSQL numbers tend to
be somewhat noisy and more dependent on CPU scheduling and NUMA locality
than anything else. But it very much depends on the exact config passed to
pgbench so that's why I'm asking...

> I suspect that's a sign of instability in the benchmark numbers, but
> it probably also means that we have some silly condition where
> multiple threads want to clean the same page.
> 
> I sent him a patch to try if it ends up being better to just not wake
> things up early at all (instead of the "if" -> "while") conversion.
> That trivial patch appended here in case anybody has comments.
> 
> Just the fact that that one-liner made a performance impact makes me
> go "hmm", though. Michael didn't see the BUG_ON(), so it's presumably
> some _other_ user of wait_on_page_writeback() than the
> write_cache_pages() one that causes issues.
> 
> Anybody got any suspicions? Honestly, when working on the page wait
> queues, I was working under the assumption that it's really just the
> page lock that truly matters.
> 
> I'm thinking things like __filemap_fdatawait_range(), which doesn't
> hold the page lock at all, so it's all kinds of non-serialized, and
> could now be waiting for any number of IO's ro complete..
> 
> Oh well. This email doesn't really have a point, it's more of a
> heads-up that that "wait to see one or multiple writebacks" thing
> seems to matter more than I would have expected for some loads..

Honestly I'm surprised your patch made a difference as well. It is pretty
common a page gets redirtied while it's being written back but usually it
takes time before next writeback of the page is started. But I guess with
the DB load it is possible e.g. if we frequently flush out some page for
data consistency reasons (I know PostgreSQL is using sync_file_range(2)
interface to start flushing pages early and then uses fsync(2) when it
really needs the pages written which could create a situation with unfair
treatment of PageWriteback bit).

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR