linux-kernel - Re: [syzbot] [mm?] kernel BUG in vma_replace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ace766bc-49c1-8399-9548-5faac04e87c8@google.com>
Date:   Fri, 15 Sep 2023 19:54:16 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Matthew Wilcox <willy@...radead.org>
cc:     Hugh Dickins <hughd@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Yang Shi <shy828301@...il.com>, Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        syzbot <syzbot+b591856e0f0139f83023@...kaller.appspotmail.com>,
        akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [mm?] kernel BUG in vma_replace_policy

On Fri, 15 Sep 2023, Matthew Wilcox wrote:
> On Thu, Sep 14, 2023 at 09:26:15PM -0700, Hugh Dickins wrote:
> > On Thu, 14 Sep 2023, Suren Baghdasaryan wrote:
> > > Yes, I just finished running the reproducer on both upstream and
> > > linux-next builds listed in
> > > https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023 and the
> > > problem does not happen anymore.
> > > I'm fine with your suggestion too, just wanted to point out it would
> > > introduce change in the behavior. Let me know how you want to proceed.
> > 
> > Well done, identifying the mysterious cause of this problem:
> > I'm glad to hear that you've now verified that hypothesis.
> > 
> > You're right, it would be a regression to follow Matthew's suggestion.
> > 
> > Traditionally, modulo bugs and inconsistencies, the queue_pages_range()
> > phase of do_mbind() has done the best it can, gathering all the pages it
> > can that need migration, even if some were missed; and proceeds to do the
> > mbind_range() phase if there was nothing "seriously" wrong (a gap causing
> > -EFAULT).  Then at the end, if MPOL_MF_STRICT was set, and not all the
> > pages could be migrated (or MOVE was not specified and not all pages
> > were well placed), it returns -EIO rather than 0 to inform the caller
> > that not all could be done.
> > 
> > There have been numerous tweaks, but I think most importantly
> > 5.3's d883544515aa ("mm: mempolicy: make the behavior consistent when
> > MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") added those "return 1"s
> > which stop the pagewalk early.  In my opinion, not an improvement - makes
> > it harder to get mbind() to do the best job it can (or is it justified as
> > what you're asking for if you say STRICT?).
> 
> I suspect you agree that it's inconsistent to stop early.  Userspace
> doesn't know at which point we found an unmovable page, so it can't behave
> rationally.  Perhaps we should remove the 'early stop' and attempt to
> migrate every page in the range, whether it's before or after the first
> unmovable page?

Yes, that's what I was arguing for, and how it was done in olden days.
Though (after Yang Shi's following comments, and looking back at my
last attempted patch here) I may disagree with myself about the right
behavior in the MPOL_MF_STRICT case.

Hugh