lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Dec 2022 13:53:00 +0100
From:   Johannes Weiner <hannes@...xchg.org>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        Ivan Babrou <ivan@...udflare.com>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "David S. Miller" <davem@...emloft.net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, cgroups@...r.kernel.org,
        kernel-team <kernel-team@...udflare.com>
Subject: Re: Low TCP throughput due to vmpressure with swap enabled

On Tue, Dec 06, 2022 at 11:10:49PM +0000, Shakeel Butt wrote:
> On Tue, Dec 06, 2022 at 09:51:01PM +0100, Johannes Weiner wrote:
> > On Tue, Dec 06, 2022 at 08:13:50PM +0100, Eric Dumazet wrote:
> > > On Tue, Dec 6, 2022 at 8:00 PM Johannes Weiner <hannes@...xchg.org> wrote:
> > > > @@ -1701,10 +1701,10 @@ void mem_cgroup_sk_alloc(struct sock *sk);
> > > >  void mem_cgroup_sk_free(struct sock *sk);
> > > >  static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
> > > >  {
> > > > -       if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure)
> > > > +       if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->socket_pressure)
> > > 
> > > && READ_ONCE(memcg->socket_pressure))
> > > 
> > > >                 return true;
> > > >         do {
> > > > -               if (time_before(jiffies, READ_ONCE(memcg->socket_pressure)))
> > > > +               if (memcg->socket_pressure)
> > > 
> > > if (READ_ONCE(...))
> > 
> > Good point, I'll add those.
> > 
> > > > @@ -7195,10 +7194,10 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
> > > >                 struct page_counter *fail;
> > > >
> > > >                 if (page_counter_try_charge(&memcg->tcpmem, nr_pages, &fail)) {
> > > > -                       memcg->tcpmem_pressure = 0;
> > > 
> > > Orthogonal to your patch, but:
> > > 
> > > Maybe avoid touching this cache line too often and use READ/WRITE_ONCE() ?
> > > 
> > >     if (READ_ONCE(memcg->socket_pressure))
> > >       WRITE_ONCE(memcg->socket_pressure, false);
> > 
> > Ah, that's a good idea.
> > 
> > I think it'll be fine in the failure case, since that's associated
> > with OOM and total performance breakdown anyway.
> > 
> > But certainly, in the common case of the charge succeeding, we should
> > not keep hammering false into that variable over and over.
> > 
> > How about the delta below? I also flipped the branches around to keep
> > the common path at the first indentation level, hopefully making that
> > a bit clearer too.
> > 
> > Thanks for taking a look, Eric!
> > 
> 
> I still think we should not put a persistent state of socket pressure on
> unsuccessful charge which will only get reset on successful charge. I
> think the better approach would be to limit the pressure state by time
> window same as today but set it on charge path. Something like below:

I don't mind doing that if necessary, but looking at the code I don't
see why it would be.

The socket code sets protocol memory pressure on allocations that run
into limits, and clears pressure on allocations that succeed and
frees. Why shouldn't we do the same thing for memcg?

@@ -7237,6 +7235,9 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
        mod_memcg_state(memcg, MEMCG_SOCK, -nr_pages);
 
        refill_stock(memcg, nr_pages);
+
+       if (unlikely(READ_ONCE(memcg->socket_pressure)))
+               WRITE_ONCE(memcg->socket_pressure, false);
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ