netdev - Re: [PATCH v2 0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20201210221605.4236-1-sjpark@amazon.com>
Date:   Thu, 10 Dec 2020 23:16:05 +0100
From:   SeongJae Park <sjpark@...zon.com>
To:     Eric Dumazet <edumazet@...gle.com>
CC:     SeongJae Park <sjpark@...zon.com>,
        David Miller <davem@...emloft.net>,
        SeongJae Park <sjpark@...zon.de>,
        Jakub Kicinski <kuba@...nel.org>,
        "Alexey Kuznetsov" <kuznet@....inr.ac.ru>,
        Florian Westphal <fw@...len.de>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        netdev <netdev@...r.kernel.org>, <rcu@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)'

On Thu, 10 Dec 2020 15:09:10 +0100 Eric Dumazet <edumazet@...gle.com> wrote:

> On Thu, Dec 10, 2020 at 9:09 AM SeongJae Park <sjpark@...zon.com> wrote:
> >
> > From: SeongJae Park <sjpark@...zon.de>
> >
> > On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls
> > make the number of active slab objects including 'sock_inode_cache' type
> > rapidly and continuously increase.  As a result, memory pressure occurs.
> >
> > In more detail, I made an artificial reproducer that resembles the
> > workload that we found the problem and reproduce the problem faster.  It
> > merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop.  It takes
> > about 2 minutes.  On 40 CPU cores, 70GB DRAM machine, it reduced about
> > 15GB of available memory in total.  Note that the issue don't reproduce
> > on every machine.  On my 6 CPU cores machine, the problem didn't
> > reproduce.
> 
> OK, that is the number before the patch, but what is the number after
> the patch ?

No continuous memory reduction but some fluctuation observed.  Nevertheless,
the available memory reduction was only up to about 400MB.

> 
> I think the idea is very nice, but this will serialize fqdir hash
> tables destruction on one single cpu,
> this might become a real issue _if_ these hash tables are populated.
> 
> (Obviously in your for (i=1;i<50000;i++) unshare(CLONE_NEWNET);  all
> these tables are empty...)
> 
> As you may now, frags are often used as vectors for DDOS attacks.
> 
> I would suggest maybe to not (ab)use system_wq, but a dedicated work queue
> with a limit (@max_active argument set to 1 in alloc_workqueue()) , to
> make sure that the number of
> threads is optimal/bounded.
> 
> Only the phase after hash table removal could benefit from your
> deferral to a single context,
> so that a single rcu_barrier() is active, since the part after
> rcu_barrier() is damn cheap and _can_ be serialized
> 
>   if (refcount_dec_and_test(&f->refcnt))
>                 complete(&f->completion);

Good point, thanks for this kind suggestion.  I will do so in next version.


Thanks,
SeongJae Park