lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 30 Mar 2020 12:44:15 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Huaixin Chang <changhuaixin@...ux.alibaba.com>
Cc:     linux-kernel@...r.kernel.org, shanpeic@...ux.alibaba.com,
        yun.wang@...ux.alibaba.com, xlpang@...ux.alibaba.com,
        mingo@...hat.com, bsegall@...gle.com, chiluk+linux@...eed.com,
        vincent.guittot@...aro.org
Subject: Re: [PATCH v3] sched/fair: Fix race between runtime distribution and
 assignment

On Fri, Mar 27, 2020 at 11:26:25AM +0800, Huaixin Chang wrote:
> Currently, there is a potential race between distribute_cfs_runtime()
> and assign_cfs_rq_runtime(). Race happens when cfs_b->runtime is read,
> distributes without holding lock and finds out there is not enough
> runtime to charge against after distribution. Because
> assign_cfs_rq_runtime() might be called during distribution, and use
> cfs_b->runtime at the same time.
> 
> Fibtest is the tool to test this race. Assume all gcfs_rq is throttled
> and cfs period timer runs, slow threads might run and sleep, returning
> unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local
> pool. If all this happens sufficiently quickly, cfs_b->runtime will drop
> a lot. If runtime distributed is large too, over-use of runtime happens.
> 
> A runtime over-using by about 70 percent of quota is seen when we
> test fibtest on a 96-core machine. We run fibtest with 1 fast thread and
> 95 slow threads in test group, configure 10ms quota for this group and
> see the CPU usage of fibtest is 17.0%, which is far more than the
> expected 10%.
> 
> On a smaller machine with 32 cores, we also run fibtest with 96
> threads. CPU usage is more than 12%, which is also more than expected
> 10%. This shows that on similar workloads, this race do affect CPU
> bandwidth control.
> 
> Solve this by holding lock inside distribute_cfs_runtime().
> 
> Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
> Signed-off-by: Huaixin Chang <changhuaixin@...ux.alibaba.com>
> Reviewed-by: Ben Segall <bsegall@...gle.com>

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ