[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ada6743c-a212-39ef-f206-fc81ed4492ef@arm.com>
Date: Tue, 24 Sep 2019 17:12:19 +0100
From: Valentin Schneider <valentin.schneider@....com>
To: Dietmar Eggemann <dietmar.eggemann@....com>,
shikemeng <shikemeng@...wei.com>, mingo@...hat.com,
peterz@...radead.org
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched: fix migration to invalid cpu in
__set_cpus_allowed_ptr
On 24/09/2019 15:09, Dietmar Eggemann wrote:
> On 9/23/19 6:06 PM, Valentin Schneider wrote:
>> On 23/09/2019 16:43, Dietmar Eggemann wrote:
>>> I'm not sure that CONFIG_DEBUG_PER_CPU_MAPS=y will help you here.
>>>
>>> __set_cpus_allowed_ptr(...)
>>> {
>>> ...
>>> dest_cpu = cpumask_any_and(...)
>>> ...
>>> }
>>>
>>> With:
>>>
>>> #define cpumask_any_and(mask1, mask2) cpumask_first_and((mask1), (mask2))
>>> #define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p),
>>> (src2p))
>>>
>>> cpumask_next_and() is called with n = -1 and in this case does not
>>> invoke cpumask_check().
>>>
>>
>> It won't warn here because it's still a valid return value, but it should
>> warn in the cpumask_test_cpu() that follows (in is_cpu_allowed()) because
>> it would be passed a value >= nr_cpu_ids. So at the very least this config
>> does catch cpumask_any*() return values being blindly passed to
>> cpumask_test_cpu().
>
> OK, I see and agree.
>
> But IMHO, we still don't call cpumask_test_cpu(dest_cpu, ...), right.
>
> What the patch fixes is that it closes the window between two reads of
> cpu_active_mask in which cpuhp can potentially punch a hole into the
> cpu_active_mask.
>
> If p is not running or queued and it's state is unequal to TASK_WAKING,
> a 'dest_cpu == nr_cpu_ids' goes unnoticed.
In this case we don't need to force it off to another CPU, since that will
get sorted out at its next wakeup. However, the patch still catches that
, since it does an early
if (dest_cpu >= nr_cpu_ids) {
ret = -EINVAL;
goto out;
and that's regardless of the task's state.
> Otherwise we see an 'unable
> to handle kernel paging request' or 'unable to handle page fault for
> address' bug in migration_cpu_stop() or move_queued_task().
>
> Do I miss something?
>
> [...]
>
Powered by blists - more mailing lists