lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201105134305.GA16424@shbuild999.sh.intel.com>
Date:   Thu, 5 Nov 2020 21:43:05 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>, dave.hansen@...el.com,
        ying.huang@...el.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable
 zone only node

On Thu, Nov 05, 2020 at 02:12:45PM +0100, Michal Hocko wrote:
> On Thu 05-11-20 21:07:10, Feng Tang wrote:
> [...]
> > My debug traces shows it is, and its gfp_mask is 'GFP_KERNEL'
> 
> Can you provide the full information please? Which node has been
> requested. Which cpuset the calling process run in and which node has
> the allocation succeeded from? A bare dump_stack without any further
> context is not really helpful.

I don't have the same platform as the original report, so I simulated
one similar setup (with fakenuma and movablecore), which has 2 memory
nodes: node 0 has DMA0/DMA32/Movable zones, while node 1 has only
Movable zone. With it, I can got the same error and same oom callstack
as the original report (as in the cover-letter).

The test command is:
	# docker run -it --rm --cpuset-mems 1 ubuntu:latest bash -c "grep Mems_allowed /proc/self/status"

To debug I only added some trace in the __alloc_pages_nodemask(), and
for the callstack which get the page successfully:

	[  567.510903] Call Trace:
	[  567.510909]  dump_stack+0x74/0x9a
	[  567.510910]  __alloc_pages_nodemask.cold+0x22/0xe5
	[  567.510913]  alloc_pages_current+0x87/0xe0
	[  567.510914]  __vmalloc_node_range+0x14c/0x240
	[  567.510918]  module_alloc+0x82/0xe0
	[  567.510921]  bpf_jit_alloc_exec+0xe/0x10
	[  567.510922]  bpf_jit_binary_alloc+0x7a/0x120
	[  567.510925]  bpf_int_jit_compile+0x145/0x424
	[  567.510926]  bpf_prog_select_runtime+0xac/0x130

The incomming parameter nodemask is NULL, and the function will first try the
cpuset nodemask (1 here), and the zoneidx is only granted 2, which makes the
'ac's preferred zone to be NULL. so it goes into __alloc_pages_slowpath(),
which will first set the nodemask to 'NULL', and this time it got a preferred
zone: zone DMA32 from node 0, following get_page_from_freelist will allocate
one page from that zone. 

Thanks,
Feng

> 
> -- 
> Michal Hocko
> SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ