linux-kernel - Re: [syzbot] [mm?] kernel BUG in vma_replace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZQShw8lESIBle7GF@casper.infradead.org>
Date:   Fri, 15 Sep 2023 19:26:11 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Suren Baghdasaryan <surenb@...gle.com>,
        Yang Shi <shy828301@...il.com>, Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        syzbot <syzbot+b591856e0f0139f83023@...kaller.appspotmail.com>,
        akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [mm?] kernel BUG in vma_replace_policy

On Thu, Sep 14, 2023 at 09:26:15PM -0700, Hugh Dickins wrote:
> On Thu, 14 Sep 2023, Suren Baghdasaryan wrote:
> > Yes, I just finished running the reproducer on both upstream and
> > linux-next builds listed in
> > https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023 and the
> > problem does not happen anymore.
> > I'm fine with your suggestion too, just wanted to point out it would
> > introduce change in the behavior. Let me know how you want to proceed.
> 
> Well done, identifying the mysterious cause of this problem:
> I'm glad to hear that you've now verified that hypothesis.
> 
> You're right, it would be a regression to follow Matthew's suggestion.
> 
> Traditionally, modulo bugs and inconsistencies, the queue_pages_range()
> phase of do_mbind() has done the best it can, gathering all the pages it
> can that need migration, even if some were missed; and proceeds to do the
> mbind_range() phase if there was nothing "seriously" wrong (a gap causing
> -EFAULT).  Then at the end, if MPOL_MF_STRICT was set, and not all the
> pages could be migrated (or MOVE was not specified and not all pages
> were well placed), it returns -EIO rather than 0 to inform the caller
> that not all could be done.
> 
> There have been numerous tweaks, but I think most importantly
> 5.3's d883544515aa ("mm: mempolicy: make the behavior consistent when
> MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") added those "return 1"s
> which stop the pagewalk early.  In my opinion, not an improvement - makes
> it harder to get mbind() to do the best job it can (or is it justified as
> what you're asking for if you say STRICT?).

I suspect you agree that it's inconsistent to stop early.  Userspace
doesn't know at which point we found an unmovable page, so it can't behave
rationally.  Perhaps we should remove the 'early stop' and attempt to
migrate every page in the range, whether it's before or after the first
unmovable page?