linux-kernel - Re: [PATCH 3/3] mm: memcg: use non-unified stats flushing for userspace reads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZOcDLD/1WaOwWis9@dhcp22.suse.cz>
Date:   Thu, 24 Aug 2023 09:13:48 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>, Tejun Heo <tj@...nel.org>,
        linux-mm@...ck.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] mm: memcg: use non-unified stats flushing for
 userspace reads

On Wed 23-08-23 07:55:40, Yosry Ahmed wrote:
> On Wed, Aug 23, 2023 at 12:33 AM Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Tue 22-08-23 08:30:05, Yosry Ahmed wrote:
> > > On Tue, Aug 22, 2023 at 2:06 AM Michal Hocko <mhocko@...e.com> wrote:
> > > >
> > > > On Mon 21-08-23 20:54:58, Yosry Ahmed wrote:
> > [...]
> > > So to answer your question, I don't think a random user can really
> > > affect the system in a significant way by constantly flushing. In
> > > fact, in the test script (which I am now attaching, in case you're
> > > interested), there are hundreds of threads that are reading stats of
> > > different cgroups every 1s, and I don't see any negative effects on
> > > in-kernel flushers in this case (reclaimers).
> >
> > I suspect you have missed my point.
> 
> I suspect you are right :)
> 
> 
> > Maybe I am just misunderstanding
> > the code but it seems to me that the lock dropping inside
> > cgroup_rstat_flush_locked effectivelly allows unbounded number of
> > contenders which is really dangerous when it is triggerable from the
> > userspace. The number of spinners at a moment is always bound by the
> > number CPUs but depending on timing many potential spinners might be
> > cond_rescheded and the worst time latency to complete can be really
> > high. Makes more sense?
> 
> I think I understand better now. So basically because we might drop
> the lock and resched, there can be nr_cpus spinners + other spinners
> that are currently scheduled away, so these will need to wait to be
> scheduled and then start spinning on the lock. This may happen for one
> reader multiple times during its read, which is what can cause a high
> worst case latency.
> 
> I hope I understood you correctly this time. Did I?

Yes. I would just add that this could also influence the worst case
latency for a different reader - so an adversary user can stall others.
Exposing a shared global lock in uncontrolable way over generally
available user interface is not really a great idea IMHO.
-- 
Michal Hocko
SUSE Labs