lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <e7dd66c1-d196-3a14-0115-acdaf538ebfd@linux.vnet.ibm.com>
Date:   Tue, 2 Oct 2018 13:13:22 -0500
From:   Michael Bringmann <mwb@...ux.vnet.ibm.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Thomas Falcon <tlfalcon@...ux.vnet.ibm.com>,
        Kees Cook <keescook@...omium.org>,
        Mathieu Malaterre <malat@...ian.org>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Nicholas Piggin <npiggin@...il.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>,
        Juliet Kim <minkim@...ibm.com>,
        Tyrel Datwyler <tyreld@...ux.vnet.ibm.com>,
        Thiago Jung Bauermann <bauerman@...ux.vnet.ibm.com>,
        Nathan Fontenot <nfont@...ux.vnet.ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        YASUAKI ISHIMATSU <yasu.isimatu@...il.com>,
        linuxppc-dev@...ts.ozlabs.org,
        Dan Williams <dan.j.williams@...el.com>,
        Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node



On 10/02/2018 11:04 AM, Michal Hocko wrote:
> On Tue 02-10-18 10:14:49, Michael Bringmann wrote:
>> On 10/02/2018 09:59 AM, Michal Hocko wrote:
>>> On Tue 02-10-18 09:51:40, Michael Bringmann wrote:
>>> [...]
>>>> When the device-tree affinity attributes have changed for memory,
>>>> the 'nid' affinity calculated points to a different node for the
>>>> memory block than the one used to install it, previously on the
>>>> source system.  The newly calculated 'nid' affinity may not yet
>>>> be initialized on the target system.  The current memory tracking
>>>> mechanisms do not record the node to which a memory block was
>>>> associated when it was added.  Nathan is looking at adding this
>>>> feature to the new implementation of LMBs, but it is not there
>>>> yet, and won't be present in earlier kernels without backporting a
>>>> significant number of changes.
>>>
>>> Then the patch you have proposed here just papers over a real issue, no?
>>> IIUC then you simply do not remove the memory if you lose the race.
>>
>> The problem occurs when removing memory after an affinity change
>> references a node that was previously unreferenced.  Other code
>> in 'kernel/mm/memory_hotplug.c' deals with initializing an empty
>> node when adding memory to a system.  The 'removing memory' case is
>> specific to systems that perform LPM and allow device-tree changes.
>> The powerpc kernel does not have the option of accepting some PRRN
>> requests and accepting others.  It must perform them all.
> 
> I am sorry, but you are still too cryptic for me. Either there is a
> correctness issue and the the patch doesn't really fix anything or the
> final race doesn't make any difference and then the ppc code should be
> explicit about that. Checking the node inside the hotplug core code just
> looks as a wrong layer to mitigate an arch specific problem. I am not
> saying the patch is a no-go but if anything we want a big fat comment
> explaining how this is possible because right now it just points to an
> incorrect API usage.
> 
> That being said, this sounds pretty much ppc specific problem and I
> would _prefer_ it to be handled there (along with a big fat comment of
> course).

Let me try again.  Regardless of the path to which we get to this condition,
we currently crash the kernel.  This patch changes that to a WARN_ON notice
and continues executing the kernel without shutting down the system.  I saw
the problem during powerpc testing, because that is the focus of my work.
There are other paths to this function besides powerpc.  I feel that the
kernel should keep running instead of halting.

Regards,

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@...ux.vnet.ibm.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ