linux-kernel - Re: [PATCH v3 03/12] sched/fair: Change the variable to hold the number of periods to 32bit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDdyZ4e8FcARTq1ANc5noBWQ5c7=fDJAZH1aWoN2f+VmA@mail.gmail.com>
Date:	Thu, 5 May 2016 13:23:33 +0200
From:	Vincent Guittot <vincent.guittot@...aro.org>
To:	Morten Rasmussen <morten.rasmussen@....com>
Cc:	Yuyang Du <yuyang.du@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Benjamin Segall <bsegall@...gle.com>,
	Paul Turner <pjt@...gle.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Juri Lelli <juri.lelli@....com>
Subject: Re: [PATCH v3 03/12] sched/fair: Change the variable to hold the
 number of periods to 32bit

Hi Morten,

On 5 May 2016 at 13:13, Morten Rasmussen <morten.rasmussen@....com> wrote:
> On Wed, May 04, 2016 at 04:02:44AM +0800, Yuyang Du wrote:
>> In sched average update, a period is about 1ms, so a 32-bit unsigned
>> integer can approximately hold a maximum of 49 (=2^32/1000/3600/24)
>> days.
>>
>> For usual cases, 32bit is big enough and 64bit is needless. But if
>> a task sleeps longer than it, there can be two outcomes:
>>
>> Consider a task sleeps whatever m milliseconds, then n = (u32)m.
>
> Maybe it crystal clear that you mean: For any m > UINT_MAX?
>
>>
>> 1. If n >= 32*64, then the task's sched avgs will be surely decayed
>>    to 0. In this case, it really doesn't matter that the 32bit is not
>>    big enough to hold m. In other words, a task sleeps 2 secs or sleeps
>>    50 days are the same from sched average point of view.
>>
>> 2. If m < 32*64, the chance to be here is very very low, but if so,
>
> Should that be: n < 32*64 ?
>
> Talking about 32*64, I don't get why we don't use LOAD_AVG_MAX_N. I had
> a patch ready to post for that:
>
> From 5055e5f82c8d207880035c2ec4ecf1ac1e7f9e91 Mon Sep 17 00:00:00 2001
> From: Morten Rasmussen <morten.rasmussen@....com>
> Date: Mon, 11 Apr 2016 15:41:37 +0100
> Subject: [PATCH] sched/fair: Fix decay to zero period in decay_load()
>
> In __compute_runnable_contrib() we are happy with returning LOAD_AVG_MAX
> when the update period n >= LOAD_AVG_MAX_N (=345), so we should be happy
> with returning zero for n >= LOAD_AVG_MAX_N when decaying in
> decay_load() as well instead of only returning zero for n >
> LOAD_AVG_PERIOD * 63 (=2016).
>
> As both conditions are unlikely() the impact is quite likely very small,
> but at least it makes the rounding errors for load accumulation and
> decay symmetrical.
>
> Signed-off-by: Morten Rasmussen <morten.rasmussen@....com>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 56b7d4b..42515b6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2527,7 +2527,7 @@ static __always_inline u64 decay_load(u64 val, u64 n)
>
>         if (!n)
>                 return val;
> -       else if (unlikely(n > LOAD_AVG_PERIOD * 63))
> +       else if (unlikely(n > LOAD_AVG_MAX_N))

I had the same question about this 63 and IIUC, the 63 comes from 64bits-1.
The load can be that high that the 64bits are used. We shift right
every LOAD_AVG_PERIOD so we will be sure that the value will be 0
after shifting all 64 bits

>                 return 0;
>
>         /* after bounds checking we can collapse to 32-bit */
> --
> 1.9.1
>
>
>>    the task's sched avgs MAY NOT be decayed to 0, depending on how
>>    big its sums are, and the chance to 0 is still good as load_sum
>>    is way less than ~0ULL and util_sum way less than ~0U.
>
> I don't get the last bit about load_sum < ~0ULL and util_sum < ~0U.
> Whether you get to zero depends on the sums (as you say) and the actual
> value of 'n'. It is true that you might get to zero even if n <
> LOAD_AVG_MAX_N if the sums are small.
>
>> Nevertheless, what really maters is what happens in the worst-case
>> scenario, which is when (u32)m = 0? So in that case, it would be like
>> after so long a sleep, we treat the task as it never slept, and has the
>> same sched averages as before it slept. That is actually not bad or
>> nothing to worry about, and think of it as the same as how we treat
>> a new born task.
>
> There is subtle but important difference between not decaying a task
> correctly and adding new task: The sleeping task is already accounted
> for in the cfs_rq.avg.{load,util}_sum. The sleeping task's contribution
> to cfs_rq.avg has been decayed correctly in the mean time which means
> that by not guaranteeing a decay of the se.avg at wake-up you introduce
> a discrepancy between the task's owen view of its contribution (se.avg)
> and the cfs_rq view (cfs_rq.avg). That might lead to trouble if the task
> is migrated at wake-up as we remove se.avg amount of contribution from
> the previous cpu's cfs_rq.avg. In other words, you remove too much from
> the cfs_rq.avg.
>
> The discrepancy will decay and disappear after LOAD_AVG_MAX_N ms, which
> might be acceptable, but it is not a totally harmless change IMHO.