linux-kernel - Re: Stuck looping on list_empty(list) in free_pcppages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <11357114-eb6e-39a6-b16d-5e380f770943@suse.cz>
Date:   Tue, 31 Aug 2021 18:51:23 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Sultan Alsawaf <sultan@...neltoast.com>, linux-mm@...ck.org
Cc:     mhocko@...e.com, mgorman@...hsingularity.net,
        akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: Stuck looping on list_empty(list) in free_pcppages_bulk()

On 8/31/21 01:12, Sultan Alsawaf wrote:
> With some more gdb digging, I found that the `count` variable was stored in %ESI
> at the time of the stall. According to register dump in the splat, %ESI was 7.
> 
> It looks like, for some reason, the pcp count was 7 higher than the number of
> pages actually present in the pcp lists.
> 
> I tried to find some way that this could happen, but the only thing I could
> think of was that maybe an allocation had both __GFP_RECLAIMABLE and
> __GFP_MOVABLE set in its gfp mask, in which case the rmqueue() call in
> get_page_from_freelist() would pass in a migratetype equal to MIGRATE_PCPTYPES
> and then pages could be added to an out-of-bounds pcp list while still
> incrementing the overall pcp count. This seems pretty unlikely though. As
> another side note, it looks like there's nothing stopping this from occurring;
> there's only a VM_WARN_ON() in gfp_migratetype() that checks if both bits are
> set.
> 
> Any ideas on what may have happened here, or a link to a commit that may have
> fixed this issue in newer kernels, would be much appreciated.

Does the kernel have commit 88e8ac11d2ea3 backported? If not, and there were
memory hotplug operations happening, the infinite loop could happen. If that
commit was not backported, and instead 5c3ad2eb7104 was backported, possibly
there are more scenarios outside hotplug that can cause trouble.

> Thanks,
> Sultan
>