lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 18 Aug 2020 14:26:02 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Oscar Salvador <osalvador@...e.de>, stable@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhocko@...e.com,
        vbabka@...e.com, Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH STABLE 4.9] mm: Avoid calling build_all_zonelists_init
 under hotplug context

On 18.08.20 13:00, Oscar Salvador wrote:
> Recently a customer of ours experienced a crash when booting the
> system while enabling memory-hotplug.
> 
> The problem is that Normal zones on different nodes don't get their private
> zone->pageset allocated, and keep sharing the initial boot_pageset.
> The sharing between zones is normally safe as explained by the comment for
> boot_pageset - it's a percpu structure, and manipulations are done with
> disabled interrupts, and boot_pageset is set up in a way that any page placed
> on its pcplist is immediately flushed to shared zone's freelist, because
> pcp->high == 1.
> However, the hotplug operation updates pcp->high to a higher value as it
> expects to be operating on a private pageset.
> 
> The problem is in build_all_zonelists(), which is called when the first range
> of pages is onlined for the Normal zone of node X or Y:
> 
> 	if (system_state == SYSTEM_BOOTING) {
> 		build_all_zonelists_init();
> 	} else {
> 	#ifdef CONFIG_MEMORY_HOTPLUG
> 		if (zone)
> 			setup_zone_pageset(zone);
> 	#endif
> 		/* we have to stop all cpus to guarantee there is no user
> 		of zonelist */
> 		stop_machine(__build_all_zonelists, pgdat, NULL);
> 		/* cpuset refresh routine should be here */
> 	}
> 
> When called during hotplug, it should execute the setup_zone_pageset(zone)
> which allocates the private pageset.
> However, with memhp_default_state=online, this happens early while
> system_state == SYSTEM_BOOTING is still true, hence this step is skipped.
> (and build_all_zonelists_init() is probably unsafe anyway at this point).
> 
> Another hotplug operation on the same zone then leads to zone_pcp_update(zone)
> called from online_pages(), which updates the pcp->high for the shared
> boot_pageset to a value higher than 1.
> At that point, pages freed from Node X and Y Normal zones can end up on the same
> pcplist and from there they can be freed to the wrong zone's freelist,
> leading to the corruption and crashes.
> 
> Please, note that upstream has fixed that differently (and unintentionally) by
> adding another boot state (SYSTEM_SCHEDULING), which is set before smp_init().
> That should happen before memory hotplug events even with memhp_default_state=online.
> Backporting that would be too intrusive.
> 
> Signed-off-by: Oscar Salvador <osalvador@...e.de>
> Debugged-by: Vlastimil Babka <vbabka@...e.cz>

So, we have ACPI running and already adding DIMMs while booting? Crazy.

Looks sane to me. Thanks!


-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ