linux-kernel - Re: [External] Re: [PATCH v2 1/2] sched/core: Avoid obvious double update_rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a44b811d-6106-f609-4f43-e3f760c98151@bytedance.com>
Date:   Tue, 26 Apr 2022 15:57:11 +0800
From:   Hao Jia <jiahao.os@...edance.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, rostedt@...dmis.org,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com
Cc:     linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH v2 1/2] sched/core: Avoid obvious double
 update_rq_clock warning



On 2022/4/25 Dietmar Eggemann wrote:
> On 22/04/2022 11:09, Hao Jia wrote:
>> When we use raw_spin_rq_lock to acquire the rq lock and have to
>> update the rq clock while holding the lock, the kernel may issue
>> a WARN_DOUBLE_CLOCK warning.
>>
>> Since we directly use raw_spin_rq_lock to acquire rq lock instead of
>> rq_lock, there is no corresponding change to rq->clock_update_flags.
>> In particular, we have obtained the rq lock of other cores,
>> the core rq->clock_update_flags may be RQCF_UPDATED at this time, and
>> then calling update_rq_clock will trigger the WARN_DOUBLE_CLOCK warning.
>>
>> So we need to clear RQCF_UPDATED of rq->clock_update_flags synchronously
>> to avoid the WARN_DOUBLE_CLOCK warning.
>>
>> Some call trace reports:
>> Call Trace 1:
>>   <IRQ>
>>   sched_rt_period_timer+0x10f/0x3a0
>>   ? enqueue_top_rt_rq+0x110/0x110
>>   __hrtimer_run_queues+0x1a9/0x490
>>   hrtimer_interrupt+0x10b/0x240
> 
> [...]
> 
> For the  sched_rt_period_timer() case you simply replace
> raw_spin_rq_lock()/raw_spin_rq_unlock() with rq_lock()/rq_unlock().
> I.e. you're not using the new double_rq_clock_clear_update() which
> is depicted in the text above.
> 

Thanks for your review comments.
Yes, adding this description would make it clearer.
I will do it in patch v3.
Thanks.


>> Call Trace 2:
>>   <TASK>
>>   activate_task+0x8b/0x110
>>   push_rt_task.part.108+0x241/0x2c0
>>   push_rt_tasks+0x15/0x30
>>   finish_task_switch+0xaa/0x2e0
>>   ? __switch_to+0x134/0x420
>>   __schedule+0x343/0x8e0
> 
> [...]
> 
>> Call Trace 3:
>>   <TASK>
>>   deactivate_task+0x93/0xe0
>>   pull_rt_task+0x33e/0x400
>>   balance_rt+0x7e/0x90
>>   __schedule+0x62f/0x8e0
>>   do_task_dead+0x3f/0x50
> 
> [...]
> 
>> Steps to reproduce:
>> 1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
>> 2. echo 1 > /sys/kernel/debug/clear_warn_once
>>     echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched_features
> 
> s/sched_features/sched/features

I will fix it in patch v3.
Thanks.

> 
>>     echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched_features
> 
> s/sched_features/sched/features

I will fix it in patch v3.
Thanks.

> 
>> 3. Run some rt tasks that periodically change the priority and sleep
> 
> Not sure about the `change the priority` part? I'm using rt-app with 2*n
> rt or dl (90% running) tasks (on a system with n CPUs) and can trigger
> all of these cases.

Yes, this problem is relatively easy to reproduce. I wrote a test case 
for rt scheduler to trigger all cases. I create 150 rt threads (my test 
environment has 96 cpus), change priority and sleep periodically. 
Constantly changing priorities in order to trigger push/pull_rt_task.


void *ThreadFun(void *arg)
{
	int cnt = *(int*)arg;
	struct sched_param param;

	while (1) {
		sqrt(MAGIC_NUM);
		cnt = cnt % 10 + 1;
		param.sched_priority = cnt;
		pthread_setschedparam(pthread_self(), SCHED_RR, &param);
		sqrt(MAGIC_NUM);
		sqrt(MAGIC_NUM);
		sleep(cnt);
	}
	return NULL;
}


I will try to reproduce again using rt-app, and will change the 
description of the reproduction steps in patch v3.
Thanks.


>   
>> Signed-off-by: Hao Jia <jiahao.os@...edance.com>
>> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@....com>
> 
> You can remove this Signed-off-by. I can provide a Reviewed-By for the
> next version.


OK, Thanks again for your review and suggestion, it really helped me.

> 
> [...]
> 
> When running PREEMPT_RT kernel there is another similar case in DL.
> 
> [  215.158100] ------------[ cut here ]------------
> [  215.158105] rq->clock_update_flags & RQCF_UPDATED
> [  215.158113] WARNING: CPU: 2 PID: 1942 at kernel/sched/core.c:690 update_rq_clock+0x128/0x1a0
> ...
> [  215.158245]  update_rq_clock+0x128/0x1a0
> [  215.158253]  migrate_task_rq_dl+0xec/0x310    <--- !!!
> [  215.158259]  set_task_cpu+0x84/0x1e4
> [  215.158264]  try_to_wake_up+0x1d8/0x5c0
> [  215.158268]  wake_up_process+0x1c/0x30
> [  215.158272]  hrtimer_wakeup+0x24/0x3c
> [  215.158279]  __hrtimer_run_queues+0x114/0x270
> [  215.158285]  hrtimer_interrupt+0xe8/0x244
> [  215.158291]  arch_timer_handler_phys+0x30/0x50
> [  215.158301]  handle_percpu_devid_irq+0x88/0x140
> [  215.158306]  generic_handle_domain_irq+0x40/0x60
> [  215.158313]  gic_handle_irq+0x48/0xe0
> [  215.158320]  call_on_irq_stack+0x2c/0x60
> [  215.158327]  do_interrupt_handler+0x80/0x84
> 
> For a non_contending task, this is the same issue as in sched_rt_period_timer().
> 
> -->8--
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cfe7b40bc4ff..668a9910cd6d 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1804,6 +1804,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
>   
>   static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
>   {
> +	struct rq_flags rf;
>   	struct rq *rq;
>   
>   	if (READ_ONCE(p->__state) != TASK_WAKING)
> @@ -1815,7 +1816,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
>   	 * from try_to_wake_up(). Hence, p->pi_lock is locked, but
>   	 * rq->lock is not... So, lock it
>   	 */
> -	raw_spin_rq_lock(rq);
> +	rq_lock(rq, &rf);
>   	if (p->dl.dl_non_contending) {
>   		update_rq_clock(rq);
>   		sub_running_bw(&p->dl, &rq->dl);
> @@ -1831,7 +1832,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
>   			put_task_struct(p);
>   	}
>   	sub_rq_bw(&p->dl, &rq->dl);
> -	raw_spin_rq_unlock(rq);
> +	rq_unlock(rq, &rf);
>   }
>   

I will fix it in patch v3.
Thanks.