lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAC=E7cVrLFDOm3uNQ_45DzmMVJ_i9bNvjsi7yRPaPe0HcaykUg@mail.gmail.com>
Date:   Tue, 28 May 2019 17:25:22 -0500
From:   Dave Chiluk <chiluk+linux@...eed.com>
To:     Peter Oskolkov <posk@...k.io>
Cc:     Phil Auld <pauld@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, cgroups@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Brendan Gregg <bgregg@...flix.com>,
        Kyle Anderson <kwa@...p.com>,
        Gabriel Munos <gmunoz@...flix.com>,
        John Hammond <jhammond@...eed.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org,
        Ben Segall <bsegall@...gle.com>
Subject: Re: [PATCH v2 1/1] sched/fair: Fix low cpu usage with high throttling
 by removing expiration of cpu-local slices

On Fri, May 24, 2019 at 5:07 PM Peter Oskolkov <posk@...k.io> wrote:
> Linux CPU scheduling tail latency is a well-known issue and a major
> pain point in some workloads:
> https://www.google.com/search?q=linux+cpu+scheduling+tail+latency
>
> Even assuming that nobody noticed this particular cause
> of CPU scheduling latencies, it does not mean the problem should be waved
> away. At least it should be documented, if at this point it decided that
> it is difficult to address it in a meaningful way. And, preferably, a way
> to address the issue later on should be discussed and hopefully agreed to.

Pursuing reducing tail latencies for our web application is the
precise reason I created this patch set.  Those applications that
previously were responding in 20ms 95% where now taking 220ms.  Those
were correctly sized applications prior to 512ac999. After which, they
started seeing massive increases in their latencies  due to hitting
throttling with lower than quota amounts of cpu usage.

I'll see if I can rework the documentation.  Any specific
suggestions for how that can be worded would be appreciated.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ