lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Oct 2019 15:12:00 -0400
From:   Qian Cai <cai@....pw>
To:     David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        "mhocko@...e.com" <mhocko@...e.com>
Subject: Re: [PATCH] mm/page_owner: fix a crash after memory offline

On Thu, 2019-10-10 at 20:55 +0200, David Hildenbrand wrote:
> On 10.10.19 20:32, Qian Cai wrote:
> > The linux-next series "mm/memory_hotplug: Shrink zones before removing
> > memory" [1] seems make a crash easier to reproduce while reading
> > /proc/pagetypeinfo after offlining a memory section. Fix it by using
> > pfn_to_online_page() in the PFN walker.
> 
> Can you please rephrase the subject+description to describe the actual 
> problem and drop the reference to the series?

I'd figure it is better for you to post this as you are on the top of this whole
mess. What do you think?

> 
> E.g., similar to my recent patches:
> 
> "mm/page_owner: Don't access uninitialized memmaps when reading 
> /proc/pagetypeinfo
> 
> Uninitialized memmaps contain garbage and in the worst case trigger 
> kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
> touched.
> 
> For example, when not onlining a memory block that is spanned by a zone 
> and reading /proc/pagetypeinfo, we can trigger a kernel BUG: ...
> "
> 
> However, you also have to justify why it is okay to no longer consider 
> ZONE_DEVICE (I think walk_zones_in_node() will skip ZONE_DEVICE due to 
> assert_populated == true and ZONE_DEVICE will never be populated, 
> Therefore, we will never end in this code path with ZONE_DEVICE).
> 
> 
> > 
> > [1] https://lore.kernel.org/linux-mm/20191006085646.5768-1-david@redhat.com/
> > 
> > page:ffffea0021200000 is uninitialized and poisoned
> > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> > There is not page extension available.
> > ------------[ cut here ]------------
> > kernel BUG at include/linux/mm.h:1107!
> > RIP: 0010:pagetypeinfo_showmixedcount_print+0x4fb/0x680
> > Call Trace:
> >   walk_zones_in_node+0x3a/0xc0
> >   pagetypeinfo_show+0x260/0x2c0
> >   seq_read+0x27e/0x710
> >   proc_reg_read+0x12e/0x190
> >   __vfs_read+0x50/0xa0
> >   vfs_read+0xcb/0x1e0
> >   ksys_read+0xc6/0x160
> >   __x64_sys_read+0x43/0x50
> >   do_syscall_64+0xcc/0xaec
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > Signed-off-by: Qian Cai <cai@....pw>
> > ---
> >   mm/page_owner.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/page_owner.c b/mm/page_owner.c
> > index dee931184788..03a6b19b3cdd 100644
> > --- a/mm/page_owner.c
> > +++ b/mm/page_owner.c
> > @@ -296,11 +296,10 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
> >   		pageblock_mt = get_pageblock_migratetype(page);
> >   
> 
> What about the pfn_valid() in the outermost loop? You can skip over the 
> whole pageblock if the first page is not online.
> 
> >   		for (; pfn < block_end_pfn; pfn++) {
> > -			if (!pfn_valid_within(pfn))
> > +			page = pfn_to_online_page(pfn);
> > +			if (!page)
> >   				continue;
> >   
> > -			page = pfn_to_page(pfn);
> > -
> >   			if (page_zone(page) != zone)
> >   				continue;
> >   
> > 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ