linux-kernel - Re: [v2 PATCH] fs/proc: task_mmu.c: don't read mapcount for migration entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG48ez17d3p53tSfuDTNCaANyes8RNNU-2i+eFMqkMwuAbRT4Q@mail.gmail.com>
Date:   Wed, 26 Jan 2022 12:48:29 +0100
From:   Jann Horn <jannh@...gle.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Yang Shi <shy828301@...il.com>, kirill.shutemov@...ux.intel.com,
        willy@...radead.org, akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [v2 PATCH] fs/proc: task_mmu.c: don't read mapcount for migration entry

On Wed, Jan 26, 2022 at 12:38 PM David Hildenbrand <david@...hat.com> wrote:
> On 26.01.22 12:29, Jann Horn wrote:
> > On Wed, Jan 26, 2022 at 11:51 AM David Hildenbrand <david@...hat.com> wrote:
> >> On 20.01.22 21:28, Yang Shi wrote:
> >>> The syzbot reported the below BUG:
> >>>
> >>> kernel BUG at include/linux/page-flags.h:785!
[...]
> >>> RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline]
> >>> RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744
[...]
> >> Does this point at the bigger issue that reading the mapcount without
> >> having the page locked is completely unstable?
> >
> > (See also https://lore.kernel.org/all/CAG48ez0M=iwJu=Q8yUQHD-+eZDg6ZF8QCF86Sb=CN1petP=Y0Q@mail.gmail.com/
> > for context.)
>
> Thanks for the pointer.
>
> >
> > I'm not sure what you mean by "unstable". Do you mean "the result is
> > not guaranteed to still be valid when the call returns", "the result
> > might not have ever been valid", or "the call might crash because the
> > page's state as a compound page is unstable"?
>
> A little bit of everything :)
[...]
> > In case you mean "the result might not have ever been valid":
> > Yes, even with this patch applied, in theory concurrent THP splits
> > could cause us to count some page mappings twice. Arguably that's not
> > entirely correct.
>
> Yes, the snapshot is not atomic and, thereby, unreliable. That what I
> mostly meant as "unstable".
>
> >
> > In case you mean "the call might crash because the page's state as a
> > compound page could concurrently change":
>
> I think that's just a side-product of the snapshot not being "correct",
> right?

I guess you could see it that way? The way I look at it is that
page_mapcount() is designed to return a number that's at least as high
as the number of mappings (rarely higher due to races), and using
page_mapcount() on an unlocked page is legitimate if you're fine with
the rare double-counting of references. In my view, the problem here
is:

There are different types of references to "struct page" - some of
them allow you to call page_mapcount(), some don't. And in particular,
get_page() doesn't give you a reference that can be used with
page_mapcount(), but locking a (real, non-migration) PTE pointing to
the page does give you such a reference.

This concept of "different types of references" is the same as you
e.g. get with mmgrab() vs mmget() - they both give references to the
same object, but those references have different usage restrictions.