lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Oct 2019 20:55:41 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Qian Cai <cai@....pw>, akpm@...ux-foundation.org
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        "mhocko@...e.com" <mhocko@...e.com>
Subject: Re: [PATCH] mm/page_owner: fix a crash after memory offline

On 10.10.19 20:32, Qian Cai wrote:
> The linux-next series "mm/memory_hotplug: Shrink zones before removing
> memory" [1] seems make a crash easier to reproduce while reading
> /proc/pagetypeinfo after offlining a memory section. Fix it by using
> pfn_to_online_page() in the PFN walker.

Can you please rephrase the subject+description to describe the actual 
problem and drop the reference to the series?

E.g., similar to my recent patches:

"mm/page_owner: Don't access uninitialized memmaps when reading 
/proc/pagetypeinfo

Uninitialized memmaps contain garbage and in the worst case trigger 
kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.

For example, when not onlining a memory block that is spanned by a zone 
and reading /proc/pagetypeinfo, we can trigger a kernel BUG: ...
"

However, you also have to justify why it is okay to no longer consider 
ZONE_DEVICE (I think walk_zones_in_node() will skip ZONE_DEVICE due to 
assert_populated == true and ZONE_DEVICE will never be populated, 
Therefore, we will never end in this code path with ZONE_DEVICE).


> 
> [1] https://lore.kernel.org/linux-mm/20191006085646.5768-1-david@redhat.com/
> 
> page:ffffea0021200000 is uninitialized and poisoned
> raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> There is not page extension available.
> ------------[ cut here ]------------
> kernel BUG at include/linux/mm.h:1107!
> RIP: 0010:pagetypeinfo_showmixedcount_print+0x4fb/0x680
> Call Trace:
>   walk_zones_in_node+0x3a/0xc0
>   pagetypeinfo_show+0x260/0x2c0
>   seq_read+0x27e/0x710
>   proc_reg_read+0x12e/0x190
>   __vfs_read+0x50/0xa0
>   vfs_read+0xcb/0x1e0
>   ksys_read+0xc6/0x160
>   __x64_sys_read+0x43/0x50
>   do_syscall_64+0xcc/0xaec
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Signed-off-by: Qian Cai <cai@....pw>
> ---
>   mm/page_owner.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index dee931184788..03a6b19b3cdd 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -296,11 +296,10 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
>   		pageblock_mt = get_pageblock_migratetype(page);
>   

What about the pfn_valid() in the outermost loop? You can skip over the 
whole pageblock if the first page is not online.

>   		for (; pfn < block_end_pfn; pfn++) {
> -			if (!pfn_valid_within(pfn))
> +			page = pfn_to_online_page(pfn);
> +			if (!page)
>   				continue;
>   
> -			page = pfn_to_page(pfn);
> -
>   			if (page_zone(page) != zone)
>   				continue;
>   
> 


-- 

Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ