linux-kernel - Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6620fb0a-19c1-526c-77b9-61098f59256d@huawei.com>
Date:   Tue, 22 Aug 2023 16:58:26 +0800
From:   Xiongfeng Wang <wangxiongfeng2@...wei.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>
CC:     <vschneid@...hat.com>, Phil Auld <pauld@...hat.com>,
        <vdonnefort@...gle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Wei Li <liwei391@...wei.com>,
        "liaoyu (E)" <liaoyu15@...wei.com>, <zhangqiao22@...wei.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ingo Molnar <mingo@...nel.org>, <xiafukun@...wei.com>,
        "Chenhui (Judy)" <judy.chenhui@...wei.com>, <tanghui20@...wei.com>
Subject: Re: [Question] report a race condition between CPU hotplug state
 machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling

(+Cc other colleagues who are testing the modification Thomas gave)

Kindly ping

Does Thomas's modification look all right ? I can help to send the patch.
Also other colleagues from my department are doing some stress tests base on
this modification.

Thanks,
Xiongfeng

On 2023/6/29 16:30, Vincent Guittot wrote:
> On Thu, 29 Jun 2023 at 00:01, Thomas Gleixner <tglx@...utronix.de> wrote:
>>
>> On Wed, Jun 28 2023 at 14:35, Vincent Guittot wrote:
>>> On Wed, 28 Jun 2023 at 14:03, Thomas Gleixner <tglx@...utronix.de> wrote:
>>>> No, because this is fundamentally wrong.
>>>>
>>>> If the CPU is on the way out, then the scheduler hotplug machinery
>>>> has to handle the period timer so that the problem Xiongfeng analyzed
>>>> does not happen in the first place.
>>>
>>> But the hrtimer was enqueued before it starts to offline the cpu
>>
>> It does not really matter when it was enqueued. The important point is
>> that it was enqueued on that outgoing CPU for whatever reason.
>>
>>> Then, hrtimers_dead_cpu should take care of migrating the hrtimer out
>>> of the outgoing cpu but :
>>> - it must run on another target cpu to migrate the hrtimer.
>>> - it runs in the context of the caller which can be throttled.
>>
>> Sure. I completely understand the problem. The hrtimer hotplug callback
>> does not run because the task is stuck and waits for the timer to
>> expire. Circular dependency.
>>
>>>> sched_cpu_wait_empty() would be the obvious place to cleanup armed CFS
>>>> timers, but let me look into whether we can migrate hrtimers early in
>>>> general.
>>>
>>> but for that we must check if the timer is enqueued on the outgoing
>>> cpu and we then need to choose a target cpu.
>>
>> You're right. I somehow assumed that cfs knows where it queued stuff,
>> but obviously it does not.
> 
> scheduler doesn't know where hrtimer enqueues the timer
> 
>>
>> I think we can avoid all that by simply taking that user space task out
>> of the picture completely, which avoids debating whether there are other
>> possible weird conditions to consider alltogether.
> 
> yes, the offline sequence should not be impacted by the caller context
> 
>>
>> Something like the untested below should just work.
>>
>> Thanks,
>>
>>         tglx
>> ---
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -1490,6 +1490,13 @@ static int cpu_down(unsigned int cpu, en
>>         return err;
>>  }
>>
>> +static long __cpu_device_down(void *arg)
>> +{
>> +       struct device *dev = arg;
>> +
>> +       return cpu_down(dev->id, CPUHP_OFFLINE);
>> +}
>> +
>>  /**
>>   * cpu_device_down - Bring down a cpu device
>>   * @dev: Pointer to the cpu device to offline
>> @@ -1502,7 +1509,12 @@ static int cpu_down(unsigned int cpu, en
>>   */
>>  int cpu_device_down(struct device *dev)
>>  {
>> -       return cpu_down(dev->id, CPUHP_OFFLINE);
>> +       unsigned int cpu = cpumask_any_but(cpu_online_mask, dev->id);
>> +
>> +       if (cpu >= nr_cpu_ids)
>> +               return -EBUSY;
>> +
>> +       return work_on_cpu(cpu, __cpu_device_down, dev);
> 
> The comment for work_on_cpu :
> 
>  * It is up to the caller to ensure that the cpu doesn't go offline.
>  * The caller must not hold any locks which would prevent @fn from completing.
> 
> make me wonder if this should be done only once the hotplug lock is
> taken so the selected cpu will not go offline
> 
>>  }
>>
>>  int remove_cpu(unsigned int cpu)
> .
>