lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <qsncixzj7s7jd7f3l2erjjs7cx3fanmlbkh4auaapsvon45rx3@62o2nqwrb43e>
Date: Thu, 7 Aug 2025 13:52:39 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Daniel Sedlak <daniel.sedlak@...77.com>
Cc: Kuniyuki Iwashima <kuniyu@...gle.com>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Jonathan Corbet <corbet@....net>, Neal Cardwell <ncardwell@...gle.com>, 
	David Ahern <dsahern@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	Yosry Ahmed <yosry.ahmed@...ux.dev>, linux-mm@...ck.org, netdev@...r.kernel.org, 
	Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Muchun Song <muchun.song@...ux.dev>, cgroups@...r.kernel.org, 
	Tejun Heo <tj@...nel.org>, Michal Koutný <mkoutny@...e.com>, 
	Matyas Hurtik <matyas.hurtik@...77.com>
Subject: Re: [PATCH v4] memcg: expose socket memory pressure in a cgroup

On Thu, Aug 07, 2025 at 12:22:01PM +0200, Daniel Sedlak wrote:
> On 8/7/25 1:34 AM, Shakeel Butt wrote:
> > On Wed, Aug 06, 2025 at 03:01:44PM -0700, Kuniyuki Iwashima wrote:
> > > On Wed, Aug 6, 2025 at 2:54 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> > > > 
> > > > On Wed, Aug 06, 2025 at 12:20:25PM -0700, Kuniyuki Iwashima wrote:
> > > > > > > -                     WRITE_ONCE(memcg->socket_pressure, jiffies + HZ);
> > > > > > > +                     socket_pressure = jiffies + HZ;
> > > > > > > +
> > > > > > > +                     jiffies_diff = min(socket_pressure - READ_ONCE(memcg->socket_pressure), HZ);
> > > > > > > +                     memcg->socket_pressure_duration += jiffies_to_usecs(jiffies_diff);
> > > > > > 
> > > > > > KCSAN will complain about this. I think we can use atomic_long_add() and
> > > > > > don't need the one with strict ordering.
> 
> Thanks for the KCSAN recommendation, I didn't know about this sanitizer.
> 
> > > > > 
> > > > > Assuming from atomic_ that vmpressure() could be called concurrently
> > > > > for the same memcg, should we protect socket_pressure and duration
> > > > > within the same lock instead of mixing WRITE/READ_ONCE() and
> > > > > atomic?  Otherwise jiffies_diff could be incorrect (the error is smaller
> > > > > than HZ though).
> > > > > 
> > > > 
> > > > Yeah good point. Also this field needs to be hierarchical. So, with lock
> > > > something like following is needed:
> 
> Thanks for the snippet, will incorporate it.

Cool though that is just like a pseudo code, so be careful in using it.

> 
> > > > 
> > > >          if (!spin_trylock(memcg->net_pressure_lock))
> > > >                  return;
> > > > 
> > > >          socket_pressure = jiffies + HZ;
> > > >          diff = min(socket_pressure - READ_ONCE(memcg->socket_pressure), HZ);
> > > 
> > > READ_ONCE() should be unnecessary here.
> > > 
> > > > 
> > > >          if (diff) {
> > > >                  WRITE_ONCE(memcg->socket_pressure, socket_pressure);
> > > >                  // mod_memcg_state(memcg, MEMCG_NET_PRESSURE, diff);
> > > >                  // OR
> > > >                  // while (memcg) {
> > > >                  //      memcg->sk_pressure_duration += diff;
> > > >                  //      memcg = parent_mem_cgroup(memcg);
> > > 
> > > The parents' sk_pressure_duration is not protected by the lock
> > > taken by trylock.  Maybe we need another global mutex if we want
> > > the hierarchy ?
> > 
> > We don't really need lock protection for sk_pressure_duration. The lock
> 
> By this you mean that we don't need the possible new global lock or the
> local memcg->net_pressure_lock?

We definitely don't need a global lock. For memcg->net_pressure_lock, we
need to be very clear why we need this lock. Basically we are doing RMW
on memcg->socket_pressure and we want known 'consistently' how much
further we are pushing memcg->socket_pressure. In other words the
consistent value of diff. The lock is one way to get that consistent
diff. We can also play some atomic ops trick to get the consistent value
without lock but I don't think that complexity is worth it. Third option
might be to remove our consistency requirement for diff as the error is
bound to HZ (as mentioned by Kuniyuki) but I think this code path is not
performance critical to make this option worth it. So, option 1 of
simple memcg->net_pressure_lock is good enough.

We don't need memcg->net_pressure_lock's protection for
sk_pressure_duration of the memcg and its ancestors if additions to
sk_pressure_duration are atomic.

Someone might notice that we are doing upward traversal for
memcg->sk_pressure_duration but not for memcg->socket_pressure. We can
do that if we decide to optimize mem_cgroup_under_socket_pressure().
However that is orthogonal work. Also the consistency requirement will
change.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ