linux-kernel - Re: [RFC PATCH v4 1/2] sched/fair: Introduce short duration task check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y7gwWUKyG+6OUYd9@chenyu5-mobl1>
Date:   Fri, 6 Jan 2023 22:29:45 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Tim Chen <tim.c.chen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Juri Lelli <juri.lelli@...hat.com>,
        "Rik van Riel" <riel@...riel.com>, Aaron Lu <aaron.lu@...el.com>,
        Abel Wu <wuyun.abel@...edance.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Yicong Yang" <yangyicong@...ilicon.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        "Daniel Bristot de Oliveira" <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Hillf Danton <hdanton@...a.com>,
        Honglei Wang <wanghonglei@...ichuxing.com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        "Tianchen Ding" <dtcccc@...ux.alibaba.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Don <joshdon@...gle.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v4 1/2] sched/fair: Introduce short duration task
 check

On 2023-01-06 at 12:28:26 +0100, Dietmar Eggemann wrote:
> On 06/01/2023 09:34, Chen Yu wrote:
> > Hi Dietmar,
> > thanks for reviewing the patch!
> > On 2023-01-05 at 12:33:16 +0100, Dietmar Eggemann wrote:
> >> On 16/12/2022 07:11, Chen Yu wrote:
> >>
> >> [...]
> >>
> >>> @@ -5995,6 +6005,18 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >>>  
> >>>  static void set_next_buddy(struct sched_entity *se);
> >>>  
> >>> +static inline void dur_avg_update(struct task_struct *p, bool task_sleep)
> >>> +{
> >>> +	u64 dur;
> >>> +
> >>> +	if (!task_sleep)
> >>> +		return;
> >>> +
> >>> +	dur = p->se.sum_exec_runtime - p->se.prev_sum_exec_runtime_vol;
> >>> +	p->se.prev_sum_exec_runtime_vol = p->se.sum_exec_runtime;
> >>
> >> Shouldn't se->prev_sum_exec_runtime_vol be set in enqueue_task_fair()
> >> and not in dequeue_task_fair()->dur_avg_update()? Otherwise `dur` will
> >> contain sleep time.
> >>
> > After the task p is dequeued, p's sum_exec_runtime will not be increased.
> 
> True.
> 
> > Unless task p is switched in again, p's sum_exec_runtime will continue to
> > increase. So dur should not include the sleep time, because we substract
> 
> Not sure I get this sentence? p's se->sum_exec_runtime will only
> increase if p is current, so running?
>
Yes, it was a typo, should be "will not continue to increase".
> > between the sum_exec_runtime rather than rq->clock_task. Not sure if I understand
> > this correctly?
> 
> No, you're right. We're not dealing with time snapshots but rather with
> sum_exec_runtime snapshots. So the value will not change between dequeue
> and the next enqueue.
> 
> e ... enqueue_task_fair()
> d ... dequeue_task_fair()
> s ... set_next_entity()
> p ... put_prev_entity()
> u ... update_curr_fair()->update_curr()
> 
> p1:
> 
> ---|---||--|--|---|--|--||---
>    d   es  u  p   s  u  pd
> 
>    ^   ^
>    |   |
>   (A) (B)
> 
> Same se->prev_sum_exec_runtime_vol value in (A) and (B).
> 
Yes.
> > My original thought was that, record the average run time of every section:
> > Only consider that task voluntarily relinquishes the CPU.
> > For example, suppose on CPU1, task p1 and p2 run alternatively:
> > 
> >  --------------------> time
> > 
> >  | p1 runs 1ms | p2 preempt p1 | p1 switch in, runs 0.5ms and blocks |
> >                ^               ^                                     ^
> >  |_____________|               |_____________________________________|
> >                                                                      ^
> >                                                                      |
> >                                                                   p1 dequeued
> > 
> > p1's duration in one section is (1 + 0.5)ms. Because if p2 does not
> > preempt p1, p1 can run 1.5ms. This reflects the nature of a task,
> > how long it wishes to run at most.
> > 
> >> Like we do for se->prev_sum_exec_runtime in set_next_entity() but for
> >> one `set_next_entity()-put_prev_entity()` run section.
> >>
> >> AFAICS, you want to measure the exec_runtime sum over all run sections
> >> between enqueue and dequeue.
> > Yes, we tried to record the 'decayed' average exec_runtime for each section.
> > Say, task p runs for a ms , then p is dequeued and blocks for b ms, and then
> > runs for c ms, its average duration is 0.875 * a + 0.125 * c , which is
> > what update_avg() does.
> 
> OK.
> 
I'll add more descriptions in next version to avoid confusing.

thanks,
Chenyu