linux-kernel - Re: [PATCH v4 1/4] sched/fair: Introduce primitives for CFS bandwidth burst

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <B29F9250-E432-4CBD-8D51-2302A889C9F8@linux.alibaba.com>
Date:   Sat, 20 Mar 2021 10:06:52 +0800
From:   changhuaixin <changhuaixin@...ux.alibaba.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     changhuaixin <changhuaixin@...ux.alibaba.com>,
        Benjamin Segall <bsegall@...gle.com>, dietmar.eggemann@....com,
        juri.lelli@...hat.com, khlebnikov@...dex-team.ru,
        open list <linux-kernel@...r.kernel.org>, mgorman@...e.de,
        mingo@...hat.com, Odin Ugedal <odin@...d.al>,
        Odin Ugedal <odin@...dal.com>, pauld@...head.com,
        Paul Turner <pjt@...gle.com>, rostedt@...dmis.org,
        Shanpei Chen <shanpeic@...ux.alibaba.com>,
        Tejun Heo <tj@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        xiyou.wangcong@...il.com
Subject: Re: [PATCH v4 1/4] sched/fair: Introduce primitives for CFS bandwidth
 burst



> On Mar 19, 2021, at 8:39 PM, changhuaixin <changhuaixin@...ux.alibaba.com> wrote:
> 
> 
> 
>> On Mar 18, 2021, at 11:05 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>> 
>> On Thu, Mar 18, 2021 at 09:26:58AM +0800, changhuaixin wrote:
>>>> On Mar 17, 2021, at 4:06 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>> 
>>>> So what is the typical avg,stdev,max and mode for the workloads where you find
>>>> you need this?
>>>> 
>>>> I would really like to put a limit on the burst. IMO a workload that has
>>>> a burst many times longer than the quota is plain broken.
>>> 
>>> I see. Then the problem comes down to how large the limit on burst shall be.
>>> 
>>> I have sampled the CPU usage of a bursty container in 100ms periods. The statistics are:
>> 
>> So CPU usage isn't exactly what is required, job execution time is what
>> you're after. Assuming there is a relation...
>> 
> 
> Yes, job execution time is important. To be specific, it is to improve the CPU usage of the whole
> system to reduce the total cost of ownership, while not damaging job execution time. This
> requires lower the average CPU resource of underutilized cgroups, and allowing their bursts
> at the same time.
> 
>>> average	: 42.2%
>>> stddev	: 81.5%
>>> max		: 844.5%
>>> P95		: 183.3%
>>> P99		: 437.0%
>> 
>> Then your WCET is 844% of 100ms ? , which is .84s.
>> 
>> But you forgot your mode; what is the most common duration, given P95 is
>> so high, I doubt that avg is representative of the most common duration.
>> 
> 
> It is true.
> 
>>> If quota is 100000ms, burst buffer needs to be 8 times more in order
>>> for this workload not to be throttled.
>> 
>> Where does that 100s come from? And an 800s burst is bizarre.
>> 
>> Did you typo [us] as [ms] ?
>> 
> 
> Sorry, it should be 100000us.
> 
>>> I can't say this is typical, but these workloads exist. On a machine
>>> running Kubernetes containers, where there is often room for such
>>> burst and the interference is hard to notice, users would prefer
>>> allowing such burst to being throttled occasionally.
>> 
>> Users also want ponies. I've no idea what kubernetes actually is or what
>> it has to do with containers. That's all just word salad.
>> 
>>> In this sense, I suggest limit burst buffer to 16 times of quota or
>>> around. That should be enough for users to improve tail latency caused
>>> by throttling. And users might choose a smaller one or even none, if
>>> the interference is unacceptable. What do you think?
>> 
>> Well, normal RT theory would suggest you pick your runtime around 200%
>> to get that P95 and then allow a full period burst to get your P99, but
>> that same RT theory would also have you calculate the resulting
>> interference and see if that works with the rest of the system...
>> 
> 
> I am sorry that I don't know much about the RT theory you mentioned, and can't provide
> the desired calculation now. But I'd like to try and do some reading if that is needed.
> 
>> 16 times is horrific.
> 
> So can we decide on a more relative value now? Or is the interference probabilities still the
> missing piece?

A more [realistic] value, I mean.

> 
> Is the paper you mentioned about called "Insensitivity results in statistical bandwidth sharing",
> or some related ones on statistical bandwidth results under some kind of fairness?