netdev - Re: [RFC PATCH] net: frag limit checks need to use percpu_counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20170901081006.6hfftamapa56m2jv@unicorn.suse.cz>
Date:   Fri, 1 Sep 2017 10:10:06 +0200
From:   Michal Kubecek <mkubecek@...e.cz>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     liujian56@...wei.com, netdev@...r.kernel.org,
        Florian Westphal <fw@...len.de>
Subject: Re: [RFC PATCH] net: frag limit checks need to use
 percpu_counter_compare

On Fri, Sep 01, 2017 at 09:41:56AM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 31 Aug 2017 18:23:49 +0200 Michal Kubecek <mkubecek@...e.cz> wrote:
> 
> > If we go this way (which would IMHO require some benchmarks to make sure
> > it doesn't harm performance too much) we can drop the explicit checks
> > for zero thresholds which were added to work around the unreliability of
> > fast checks of percpu counters (or at least the second one was by commit
> > 30759219f562 ("net: disable fragment reassembly if high_thresh is zero").
>   
> After much considerations, together with Florian, I'm now instead
> looking at reverting the use of percpu_counter for this memory
> accounting use-case.  The complexity and maintenance cost is not worth
> it.  And I'm of-cause testing the perf effect, and currently I'm _not_
> seeing any perf regression on my 10G + 100G testlab (although this is
> not a NUMA system, which were my original optimization case).

This sounds reasonable to me. It is indeed questionable if percpu
counters are still worth the complexity if all checks have to be changed
to the exact version.

Perhaps there would be some gain for many CPUs if thresholds are large
enough to (almost) always avoid the need to calculate the sum. But once
we leave that safe area, I would be surprised if simple atomic_t
wouldn't be more efficient.

Michal Kubecek