linux-kernel - Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8052b12a-98ac-4309-8286-a330315d531b@arm.com>
Date: Tue, 4 Jun 2024 15:49:59 +0100
From: Hongyan Xia <hongyan.xia2@....com>
To: Luis Machado <luis.machado@....com>, Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
 linux-kernel@...r.kernel.org, kprateek.nayak@....com,
 wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de, nd
 <nd@....com>, John Stultz <jstultz@...gle.com>
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

On 04/06/2024 15:23, Luis Machado wrote:
> On 6/4/24 11:11, Peter Zijlstra wrote:
>> On Mon, Jun 03, 2024 at 08:30:43PM +0100, Luis Machado wrote:
>>
>>> Exchanging some information with Hongyan today, he was a bit suspicious of the uclamp
>>> behavior with the eevdf complete series applied.
>>>
>>> Checking the uclamp code, I see we have some refcounting tied to enqueuing/dequeuing
>>> of tasks, and the uclamp values are organized in buckets.
>>>
>>> Just for fun I added a few trace_printk's in uclamp_eff_value, uclamp_rq_inc_id and
>>> uclamp_rq_dec_id.
>>>
>>> Booting up the system with delayed_dequeue disabled and running the benchmark, I
>>> see the uclamp bucket management pretty stable. Tasks get added to the uclamp
>>> buckets but then get removed. At the end of the benchmark, the uclamp buckets
>>> are (almost always) clean of tasks.
>>>
>>> Enabling delayed dequeue, I can see the uclamp buckets slowly filling up with
>>> tasks. At the end of the benchmark, I see uclamp buckets with 30, 40 or 50
>>> tasks still. If I do another run, I can see 80, 100 tasks still.
>>>
>>> I suspect refcounting might be going wrong somewhere due to delayed dequeue
>>> tasks, but that's more of a guess right now. Hopefully that is useful
>>> information. I'll resume investigation tomorrow.
>>
>> Thank you both!!
>>
>> Does the below help?
>>
>> Note how dequeue_task() does uclamp_rq_dec() unconditionally, which is
>> then not balanced in the case below.
>>
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3664,6 +3664,7 @@ static int ttwu_runnable(struct task_str
>>   			/* mustn't run a delayed task */
>>   			SCHED_WARN_ON(task_on_cpu(rq, p));
>>   			enqueue_task(rq, p, ENQUEUE_DELAYED);
>> +			uclamp_rq_inc(rq, p);
>>   		}
>>   		if (!task_on_cpu(rq, p)) {
>>   			/*
> 
> As Hongyan pointed out in a separate message, the above makes things
> worse, as we end up with even more leftover tasks in the uclamp
> buckets.
> 
> I'm trying a fix in kernel/sched/core.c:enqueue_task that only
> calls uclamp_rq_inc if the task is not sched_delayed, so:
> 
> -       uclamp_rq_inc(rq, p);
> +       if (!p->se.sched_delayed)
> +         uclamp_rq_inc(rq, p);
> 
> I'm not entirely sure it is correct, but it seems to fix things,
> but I'm still running some tests.

This seems to so far not trigger any WARN or bad behavior in my tests.

> 
> With the current code, given uclamp_rq_inc and uclamp_rq_dec get
> called in enqueue_task and dequeue_task, the additional enqueue_task
> call from ttwu_runnable for a delayed_dequeue task may do an additional
> unconditional call to uclamp_rq_inc, no?