linux-kernel - Re: [PATCH 1/1] mm: lock VMAs skipped by a failed queue_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAHbLzkpV8+0Bn_mpGODDbRsAOmDexG_JofUKQEVW-tGPJB-iyw@mail.gmail.com>
Date:   Tue, 19 Sep 2023 14:09:43 -0700
From:   Yang Shi <shy828301@...il.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Suren Baghdasaryan <surenb@...gle.com>, akpm@...ux-foundation.org,
        willy@...radead.org, hughd@...gle.com, vbabka@...e.cz,
        syzkaller-bugs@...glegroups.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        syzbot+b591856e0f0139f83023@...kaller.appspotmail.com
Subject: Re: [PATCH 1/1] mm: lock VMAs skipped by a failed queue_pages_range()

On Tue, Sep 19, 2023 at 1:53 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Mon 18-09-23 14:16:08, Suren Baghdasaryan wrote:
> > When queue_pages_range() encounters an unmovable page, it terminates
> > its page walk. This walk, among other things, locks the VMAs in the range.
> > This termination might result in some VMAs being left unlock after
> > queue_pages_range() completes. Since do_mbind() continues to operate on
> > these VMAs despite the failure from queue_pages_range(), it will encounter
> > an unlocked VMA.
> > This mbind() behavior has been modified several times before and might
> > need some changes to either finish the page walk even in the presence
> > of unmovable pages or to error out immediately after the failure to
> > queue_pages_range(). However that requires more discussions, so to
> > fix the immediate issue, explicitly lock the VMAs in the range if
> > queue_pages_range() failed. The added condition does not save much
> > but is added for documentation purposes to understand when this extra
> > locking is needed.
>
> The semantic of the walk in this case is really clear as mud. I was
> trying to reconstruct the whole picture and it really hurts... Then I
> found http://lkml.kernel.org/r/CAHbLzkrmTaqBRmHVdE2kyW57Uoghqd_E+jAXC9cB5ofkhL-uvw@mail.gmail.com
> and that helped a lot. Let's keep it a reference at least in the email
> thread here for future.

FYI, I'm working on a fix for the regression mentioned in that series,
and Hugh has some clean up and enhancement for that too.

>
> > Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")
> > Reported-by: syzbot+b591856e0f0139f83023@...kaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/000000000000f392a60604a65085@google.com/
> > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
>
> I cannot say I like the patch (it looks like a potential double locking
> unless you realize this lock is special) but considering this might be just
> temporal I do not mind.
>
> Acked-by: Michal Hocko <mhocko@...e.com>
>
> Thanks!
>
> > ---
> >  mm/mempolicy.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 42b5567e3773..cbc584e9b6ca 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start, unsigned long len,
> >       vma_iter_init(&vmi, mm, start);
> >       prev = vma_prev(&vmi);
> >       for_each_vma_range(vmi, vma, end) {
> > +             /* If queue_pages_range failed then not all VMAs might be locked */
> > +             if (ret)
> > +                     vma_start_write(vma);
> >               err = mbind_range(&vmi, vma, &prev, start, end, new);
> >               if (err)
> >                       break;
> > --
> > 2.42.0.459.ge4e396fd5e-goog
>
> --
> Michal Hocko
> SUSE Labs