linux-kernel - Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m2ilo779f9.fsf@gmail.com>
Date:   Sat, 09 Jul 2022 02:19:42 +0800
From:   Schspa Shi <schspa@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks


Steven Rostedt <rostedt@...dmis.org> writes:

> On Fri, 08 Jul 2022 12:51:14 +0800
> Schspa Shi <schspa@...il.com> wrote:
>
>> Steven Rostedt <rostedt@...dmis.org> writes:
>> 
>> > On Fri,  8 Jul 2022 00:50:14 +0800
>> > Schspa Shi <schspa@...il.com> wrote:
>> >  
>> >> Please refer to the following scenarios.  
>> >
>> > I'm not sure this is what is happening. Do you have a trace to 
>> > back this up?
>> >  
>> 
>> I don't have a trace. This is inferred from the exception log.
>> 
>> >> 
>> >>            CPU0                                  CPU1
>> >> ------------------------------------------------------------------
>> >> push_rt_task
>> >>   check is_migration_disabled(next_task)
>> >>                                         task not running and
>> >>                                         migration_disabled == 0
>> >>   find_lock_lowest_rq(next_task, rq);
>> >>     _double_lock_balance(this_rq, busiest);
>> >>       raw_spin_rq_unlock(this_rq);
>> >>       double_rq_lock(this_rq, busiest);
>> >>         <<wait for busiest rq>>
>> >>                                             <wakeup>  
>> >
>> > Here's the problem I have. next_task is queued on CPU0, 
>> > (otherwise CPU0
>> > would not be pushing it). As CPU0 is currently running 
>> > push_rt_task, how
>> > did next_task start running to set its migrate_disable flag?  
>> 
>> THe next_task wasn't queued on CPU0, it's queued on CPU1 in this
>> scenarios.
>
> Bah, I forgot that we still do pushing for other CPUs. I was thinking that
> we removed that in favor of pulling. It's been a while since I worked on
> this.
>
>> 
>> And it's because when task wakup, the rq argument is not the
>> current running CPU rq, it's next_task's rq
>> (i.e. CPU1's rq in this sample scenarios).
>> 
>> And you can check this with the Call trace from the crash log.
>> 
>>     [123671.996969] Call trace:
>>     [123671.996975]  set_task_cpu+0x8c/0x108
>>     [123671.996984]  push_rt_task.part.0+0x144/0x184
>>     [123671.996995]  push_rt_tasks+0x28/0x3c
>>     [123671.997002]  task_woken_rt+0x58/0x68
>>     [123671.997009]  ttwu_do_wakeup+0x5c/0xd0
>>     [123671.997019]  ttwu_do_activate+0xc0/0xd4
>>     [123671.997028]  try_to_wake_up+0x244/0x288
>>     [123671.997036]  wake_up_process+0x18/0x24
>>     [123671.997045]  __irq_wake_thread+0x64/0x80
>>     [123671.997056]  __handle_irq_event_percpu+0x110/0x124
>> 
>> Function ttwu_do_wakeup will lock the task's rq, not current 
>> running
>> cpu rq.
>> 
>> >
>> > Even if it was woken up on another CPU and ran there, by setting
>> > migrate_disable, it would not be put back to CPU0, because its
>> > migrate_disable flag is set (if it is, then there's the bug).
>> >  
>> 
>> It no needs to put it back to CPU0 for this issue, it's still on 
>> CPU1.
>> 
>
> Worse things can actually happen then migrating a migrate disabled task.
> What prevents next_task from being scheduled and in a running state, or
> even migrated?
>
> Hmm, that's covered in find_lock_lowest_rq().
>
> Looks like the the migrate disable check needs to go there.
>
> 		/* if the prio of this runqueue changed, try again */
> 		if (double_lock_balance(rq, lowest_rq)) {
> 			/*
> 			 * We had to unlock the run queue. In
> 			 * the mean time, task could have
> 			 * migrated already or had its affinity changed.
> 			 * Also make sure that it wasn't scheduled on its rq.
> 			 */
> 			if (unlikely(task_rq(task) != rq ||
> 				     !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
> 				     task_running(rq, task) ||
> 				     !rt_task(task) ||
> +				     is_migrate_disabled(task) ||
> 				     !task_on_rq_queued(task))) {
>

Yes, it's what I did in the V1 patch.
Link: https://lore.kernel.org/all/20220623182932.58589-1-schspa@gmail.com/

But I think it's not the best solution for this problem.
In these scenarios, we still have a chance to make the task run faster
by retrying to retry to push the currently running task on this CPU away.

There is more details on V2 patch's replay message.
Link: https://lore.kernel.org/all/CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com/#t

> 				double_unlock_balance(rq, lowest_rq);
> 				lowest_rq = NULL;
> 				break;
> 			}
> 		}
>
> -- Steve

-- 
BRs
Schspa Shi