lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7f354b7-d2f9-71c0-7311-97255933b9a2@nvidia.com>
Date:   Thu, 5 Dec 2019 15:24:05 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     Qian Cai <cai@....pw>
CC:     Yang Shi <yang.shi@...ux.alibaba.com>, <fabecassis@...dia.com>,
        <mhocko@...e.com>, <cl@...ux.com>, <vbabka@...e.cz>,
        <mgorman@...hsingularity.net>, <akpm@...ux-foundation.org>,
        <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        <stable@...r.kernel.org>
Subject: Re: [v3 PATCH] mm: move_pages: return valid node id in status if the
 page is already on the target node

On 12/5/19 3:16 PM, Qian Cai wrote:
> 
> 
>> On Dec 5, 2019, at 5:41 PM, John Hubbard <jhubbard@...dia.com> wrote:
>>
>> Please recall how this started: it was due to a report from a real end user, who was 
>> seeing a real problem. After a few emails, it was clear that there's not a good
>> work around available for cases like this:
>>
>> * User space calls move_pages(), gets 0 (success) returned, and based on that,
>> proceeds to iterate through the status array.
>>
>> * The status array remains untouched by the move_pages() call, so confusion and
>> wrong behavior ensues.
>>
>> After some further discussion, we decided that the current behavior really is 
>> incorrect, and that it needs fixing in the kernel. Which this patch does.
> 
> Well, that test code itself  does not really tell any real world use case.  Also, thanks to the discussion, it brought to me it is more obvious and critical  that the return code is wrong according to the spec. Then, if that part is taking care of, it would kill two-bird with one stone because there is no need to return status array anymore. Make sense?
> 

Let's check in the fix that is clearly correct and non-controversial, in one
patch. Then another patch can be created for the other case. This allows forward
progress and quick resolution of the user's bug report, while still dealing
with all the problems.

If you try to fix too many problems in one patch (and remember, sometimes ">1"
is too many), then things bog down. It's always a judgment call, but what's 
unfolding here is quite consistent with the usual judgment calls in this area.

I don't think anyone is saying, "don't work on the second problem", it's just
that it's less urgent, due to no reports from the field. If you are passionate
about fixing the second problem (and are ready and willing to handle the fallout
from user space, if it occurs), then I'd encourage you to look into it.

It could turn out to be one of those "cannot change this because user space expectations
have baked and hardened, and changes would break user space" situations, just to
warn you in advance, though.

thanks,
-- 
John Hubbard
NVIDIA

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ