lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <df95f828-1963-d8b9-ab58-6d29d2d152d2@linux.vnet.ibm.com>
Date:   Tue, 2 Oct 2018 09:51:40 -0500
From:   Michael Bringmann <mwb@...ux.vnet.ibm.com>
To:     Tyrel Datwyler <tyreld@...ux.vnet.ibm.com>,
        Michal Hocko <mhocko@...nel.org>
Cc:     Thomas Falcon <tlfalcon@...ux.vnet.ibm.com>,
        Kees Cook <keescook@...omium.org>,
        Mathieu Malaterre <malat@...ian.org>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Nicholas Piggin <npiggin@...il.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>,
        Juliet Kim <minkim@...ibm.com>,
        Thiago Jung Bauermann <bauerman@...ux.vnet.ibm.com>,
        Nathan Fontenot <nfont@...ux.vnet.ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        YASUAKI ISHIMATSU <yasu.isimatu@...il.com>,
        linuxppc-dev@...ts.ozlabs.org,
        Dan Williams <dan.j.williams@...el.com>,
        Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node

See below.

On 10/01/2018 06:20 PM, Tyrel Datwyler wrote:
> On 10/01/2018 01:27 PM, Michal Hocko wrote:
>> On Mon 01-10-18 13:56:25, Michael Bringmann wrote:
>>> In some LPAR migration scenarios, device-tree modifications are
>>> made to the affinity of the memory in the system.  For instance,
>>> it may occur that memory is installed to nodes 0,3 on a source
>>> system, and to nodes 0,2 on a target system.  Node 2 may not
>>> have been initialized/allocated on the target system.
>>>
>>> After migration, if a RTAS PRRN memory remove is made to a
>>> memory block that was in node 3 on the source system, then
>>> try_offline_node tries to remove it from node 2 on the target.
>>> The NODE_DATA(2) block would not be initialized on the target,
>>> and there is no validation check in the current code to prevent
>>> the use of a NULL pointer.
>>
>> I am not familiar with ppc and the above doesn't really help me
>> much. Sorry about that. But from the above it is not clear to me whether
>> it is the caller which does something unexpected or the hotplug code
>> being not robust enough. From your changelog I would suggest the later
>> but why don't we see the same problem for other archs? Is this a problem
>> of unrolling a partial failure?
>>
>> dlpar_remove_lmb does the following
>>
>> 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
>>
>> 	remove_memory(nid, lmb->base_addr, block_sz);
>>
>> 	/* Update memory regions for memory remove */
>> 	memblock_remove(lmb->base_addr, block_sz);
>>
>> 	dlpar_remove_device_tree_lmb(lmb);
>>
>> Is the whole operation correct when remove_memory simply backs off
>> silently. Why don't we have to care about memblock resp
>> dlpar_remove_device_tree_lmb parts? In other words how come the physical
>> memory range is valid while the node association is not?
>>
> 
> I think the issue here is a race between the LPM code updating affinity and PRRN events being processed. Does your other patch[1] not fix the issue? Or is it that the LPM affinity updates don't do any of the initialization/allocation you mentioned?

This patch addresses the specific case where PRRN changes to CPU or memory are occurring on a system that also observes affinity changes during migration.  Yes,  there is a race condition -- if the PRRN events reliably occurred before the device-tree was updated, this error would not occur.  However, they overlap or occur after the changes during most LPMs observed.

When the device-tree affinity attributes have changed for memory, the 'nid' affinity calculated points to a different node for the memory block than the one used to install it, previously on the source system.  The newly calculated 'nid' affinity may not yet be initialized on the target system.  The current memory tracking mechanisms do not record the node to which a memory block was associated when it was added.  Nathan is looking at adding this feature to the new implementation of LMBs, but it is not there yet, and won't be present in earlier kernels without backporting a significant number of changes.

My other patch[1] is more intended to address locking and CPU update dependencies between PRRN changes and RTAS requests which did not necessarly involve memory updates.  The node to memory association problem after migration would be present even with this issue resolved.

> 
> -Tyrel
> 
> [1] https://lore.kernel.org/linuxppc-dev/20181001185603.11373.61650.stgit@ltcalpine2-lp9.aus.stglabs.ibm.com/T/#u
> 

Michael

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@...ux.vnet.ibm.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ