lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBz/V5a7/6PZeM7S@slm.duckdns.org>
Date:   Thu, 23 Mar 2023 15:39:35 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     Josef Bacik <josef@...icpanda.com>, Jens Axboe <axboe@...nel.dk>,
        Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vasily Averin <vasily.averin@...ux.dev>,
        cgroups@...r.kernel.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        bpf@...r.kernel.org
Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the
 percpu lock

Hello,

On Thu, Mar 23, 2023 at 04:00:31AM +0000, Yosry Ahmed wrote:
> Currently, when sleeping is not allowed during rstat flushing, we hold
> the global rstat lock with interrupts disabled throughout the entire
> flush operation. Flushing in an O(# cgroups * # cpus) operation, and
> having interrupts disabled throughout is dangerous.
> 
> For some contexts, we may not want to sleep, but can be interrupted
> (e.g. while holding a spinlock or RCU read lock). As such, do not
> disable interrupts throughout rstat flushing, only when holding the
> percpu lock. This breaks down the O(# cgroups * # cpus) duration with
> interrupts disabled to a series of O(# cgroups) durations.
> 
> Furthermore, if a cpu spinning waiting for the global rstat lock, it
> doesn't need to spin with interrupts disabled anymore.

I'm generally not a fan of big spin locks w/o irq protection. They too often
become a source of unpredictable latency spikes. As you said, the global
rstat lock can be held for quite a while. Removing _irq makes irq latency
better on the CPU but on the other hand it makes a lot more likely that the
lock is gonna be held even longer, possibly significantly so depending on
the configuration and workload which will in turn stall other CPUs waiting
for the lock. Sure, irqs are being serviced quicker but if the cost is more
and longer !irq context multi-cpu stalls, what's the point?

I don't think there's anything which requires the global lock to be held
throughout the entire flushing sequence and irq needs to be disabled when
grabbing the percpu lock anyway, so why not just release the global lock on
CPU boundaries instead? We don't really lose anything significant that way.
The durations of irq disabled sections are still about the same as in the
currently proposed solution at O(# cgroups) and we avoid the risk of holding
the global lock for too long unexpectedly from getting hit repeatedly by
irqs while holding the global lock.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ