linux-kernel - Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c3611c0a-007f-4e09-b92d-3752438e653e@gmail.com>
Date: Thu, 11 Apr 2024 09:32:23 +0800
From: Yan-Jie Wang <yanjiewtw@...il.com>
To: Peter Zijlstra <peterz@...radead.org>, Chen Yu <yu.c.chen@...el.com>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
 linux-kernel@...r.kernel.org, kprateek.nayak@....com,
 wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de,
 yu.chen.surf@...il.com
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

> 
>> The 99th wakeup latency increases a little bit, and should be in the acceptible
>> range(25 -> 31 us).
> 
> Ah, my test runs haven't been stable enough to observe that.
> 
>> Meanwhile the throughput increases accordingly. Here are
>> the possible reason I can think of:
>>
>> 1. wakeup latency: The time to find an eligible entity in the tree
>>     during wakeup might take longer - if there are more delayed-dequeue
>>     tasks in the tree.
> 
> Another possible cause might be that previously a schedule() would be
> 1 dequeue, 1 pick.
> 
> But now it can be much more variable, a pick can basically do N dequeues
> and N+1 picks.
> 
> So not only do we do more picks, but if you're focussed on worst case
> latency, it goes up, because we can do multiple dequeues for a single
> pick.
> 
> If we find this to really be a problem, I had some half baked ideas to
> fix it, but it added significant complexity, so keep it simple until
> need proves we need more etc.

I have an alternative approach to delayed-dequeue inspired by the 
original CFS implementation.

The idea is to keep the task's vruntime when it goes to sleep.
When the task is woken up, see if the lag is positive at the woken time, 
if it is the case, clamp it to 0 by setting vruntime to avg_vruntime().

<Sleep>

In dequeue_entity(): Remove the task from runqueue, but keep the task's 
vruntime, and do not calculate vlag at this time.

<Wake Up on the same CPU>

In enqueue_entity():
  1. Do not call place_entity().
  2. If the task's vruntume is less than the cfs_rq's avg_vruntime(), 
set the task's vruntime to avg_vruntime(), and update the task's 
deadline according to its timeslice.
  3. Insert the task into the runqueue.

<Wake Up on different CPU>

In migrate_task_rq_fair():
  1. Calculate the task's vlag as if it is on the original cfs_rq.
  2. Set the task's vlag to 0 if it is positive.

In enqueue_entity(): Use place_entity() to calculate the task's new 
vruntime and deadline according to the vlag and the new runqueue before 
inserting it into the runqueue.