[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7f354b7-d2f9-71c0-7311-97255933b9a2@nvidia.com>
Date: Thu, 5 Dec 2019 15:24:05 -0800
From: John Hubbard <jhubbard@...dia.com>
To: Qian Cai <cai@....pw>
CC: Yang Shi <yang.shi@...ux.alibaba.com>, <fabecassis@...dia.com>,
<mhocko@...e.com>, <cl@...ux.com>, <vbabka@...e.cz>,
<mgorman@...hsingularity.net>, <akpm@...ux-foundation.org>,
<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
<stable@...r.kernel.org>
Subject: Re: [v3 PATCH] mm: move_pages: return valid node id in status if the
page is already on the target node
On 12/5/19 3:16 PM, Qian Cai wrote:
>
>
>> On Dec 5, 2019, at 5:41 PM, John Hubbard <jhubbard@...dia.com> wrote:
>>
>> Please recall how this started: it was due to a report from a real end user, who was
>> seeing a real problem. After a few emails, it was clear that there's not a good
>> work around available for cases like this:
>>
>> * User space calls move_pages(), gets 0 (success) returned, and based on that,
>> proceeds to iterate through the status array.
>>
>> * The status array remains untouched by the move_pages() call, so confusion and
>> wrong behavior ensues.
>>
>> After some further discussion, we decided that the current behavior really is
>> incorrect, and that it needs fixing in the kernel. Which this patch does.
>
> Well, that test code itself does not really tell any real world use case. Also, thanks to the discussion, it brought to me it is more obvious and critical that the return code is wrong according to the spec. Then, if that part is taking care of, it would kill two-bird with one stone because there is no need to return status array anymore. Make sense?
>
Let's check in the fix that is clearly correct and non-controversial, in one
patch. Then another patch can be created for the other case. This allows forward
progress and quick resolution of the user's bug report, while still dealing
with all the problems.
If you try to fix too many problems in one patch (and remember, sometimes ">1"
is too many), then things bog down. It's always a judgment call, but what's
unfolding here is quite consistent with the usual judgment calls in this area.
I don't think anyone is saying, "don't work on the second problem", it's just
that it's less urgent, due to no reports from the field. If you are passionate
about fixing the second problem (and are ready and willing to handle the fallout
from user space, if it occurs), then I'd encourage you to look into it.
It could turn out to be one of those "cannot change this because user space expectations
have baked and hardened, and changes would break user space" situations, just to
warn you in advance, though.
thanks,
--
John Hubbard
NVIDIA
Powered by blists - more mailing lists