netdev - RE: [RFC PATCH] net: frag limit checks need to use percpu_counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F88C5DDA1E80143B232E89585ACE27D018F6A9B@DGGEMA502-MBX.china.huawei.com>
Date:   Fri, 1 Sep 2017 02:25:32 +0000
From:   "liujian (CE)" <liujian56@...wei.com>
To:     Michal Kubecek <mkubecek@...e.cz>,
        Jesper Dangaard Brouer <brouer@...hat.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Florian Westphal <fw@...len.de>
Subject: RE: [RFC PATCH] net: frag limit checks need to use
 percpu_counter_compare




Best Regards,
liujian


> -----Original Message-----
> From: Michal Kubecek [mailto:mkubecek@...e.cz]
> Sent: Friday, September 01, 2017 12:24 AM
> To: Jesper Dangaard Brouer
> Cc: liujian (CE); netdev@...r.kernel.org; Florian Westphal
> Subject: Re: [RFC PATCH] net: frag limit checks need to use
> percpu_counter_compare
> 
> On Thu, Aug 31, 2017 at 12:20:19PM +0200, Jesper Dangaard Brouer wrote:
> > To: Liujian can you please test this patch?
> >  I want to understand if using __percpu_counter_compare() solves  the
> > problem correctness wise (even-though this will be slower  than using
> > a simple atomic_t on your big system).

I have test the patch, it can work. 
1. make sure frag_mem_limit reach to thresh
  ===>FRAG: inuse 0 memory 0 frag_mem_limit 5386864
2. change NIC rx irq's affinity to a fixed CPU
3. iperf -u -c 9.83.1.41 -l 10000 -i 1 -t 1000 -P 10 -b 20M
  And check /proc/net/snmp, there are no ReasmFails.

And I think it is a better way that adding some counter sync points as you said.

> > Fix bug in fragmentation codes use of the percpu_counter API, that
> > cause issues on systems with many CPUs.
> >
> > The frag_mem_limit() just reads the global counter (fbc->count),
> > without considering other CPUs can have upto batch size (130K) that
> > haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
> > this become dangerous at >=24 CPUs (3*1024*1024/130000=24).
> >
> > The __percpu_counter_compare() does the right thing, and takes into
> > account the number of (online) CPUs and batch size, to account for
> > this and call __percpu_counter_sum() when needed.
> >
> > On systems with many CPUs this will unfortunately always result in the
> > heavier fully locked __percpu_counter_sum() which touch the
> > per_cpu_ptr of all (online) CPUs.
> >
> > On systems with a smaller number of CPUs this solution is also not
> > optimal, because __percpu_counter_compare()/__percpu_counter_sum()
> > doesn't help synchronize the global counter.
> >  Florian Westphal have an idea of adding some counter sync points,
> > which should help address this issue.
> > ---
> >  include/net/inet_frag.h  |   16 ++++++++++++++--
> >  net/ipv4/inet_fragment.c |    6 +++---
> >  2 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index
> > 6fdcd2427776..b586e320783d 100644
> > --- a/include/net/inet_frag.h
> > +++ b/include/net/inet_frag.h
> > @@ -147,9 +147,21 @@ static inline bool inet_frag_evicting(struct
> inet_frag_queue *q)
> >   */
> >  static unsigned int frag_percpu_counter_batch = 130000;
> >
> > -static inline int frag_mem_limit(struct netns_frags *nf)
> > +static inline bool frag_mem_over_limit(struct netns_frags *nf, int
> > +thresh)
> >  {
> > -	return percpu_counter_read(&nf->mem);
> > +	/* When reading counter here, __percpu_counter_compare() call
> > +	 * will invoke __percpu_counter_sum() when needed.  Which
> > +	 * depend on num_online_cpus()*batch size, as each CPU can
> > +	 * potentential can hold a batch count.
> > +	 *
> > +	 * With many CPUs this heavier sum operation will
> > +	 * unfortunately always occur.
> > +	 */
> > +	if (__percpu_counter_compare(&nf->mem, thresh,
> > +				     frag_percpu_counter_batch) > 0)
> > +		return true;
> > +	else
> > +		return false;
> >  }
> >
> >  static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)
> > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index
> > 96e95e83cc61..ee2cf56900e6 100644
> > --- a/net/ipv4/inet_fragment.c
> > +++ b/net/ipv4/inet_fragment.c
> > @@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct
> > inet_frags *f)  static bool inet_fragq_should_evict(const struct
> > inet_frag_queue *q)  {
> >  	return q->net->low_thresh == 0 ||
> > -	       frag_mem_limit(q->net) >= q->net->low_thresh;
> > +		frag_mem_over_limit(q->net, q->net->low_thresh);
> >  }
> >
> >  static unsigned int
> > @@ -355,7 +355,7 @@ static struct inet_frag_queue
> > *inet_frag_alloc(struct netns_frags *nf,  {
> >  	struct inet_frag_queue *q;
> >
> > -	if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {
> > +	if (!nf->high_thresh || frag_mem_over_limit(nf, nf->high_thresh)) {
> >  		inet_frag_schedule_worker(f);
> >  		return NULL;
> >  	}
> 
> If we go this way (which would IMHO require some benchmarks to make sure it
> doesn't harm performance too much) we can drop the explicit checks for zero
> thresholds which were added to work around the unreliability of fast checks of
> percpu counters (or at least the second one was by commit
> 30759219f562 ("net: disable fragment reassembly if high_thresh is zero").
> 
> Michal Kubecek
> 
> > @@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct
> netns_frags *nf,
> >  	struct inet_frag_queue *q;
> >  	int depth = 0;
> >
> > -	if (frag_mem_limit(nf) > nf->low_thresh)
> > +	if (frag_mem_over_limit(nf, nf->low_thresh))
> >  		inet_frag_schedule_worker(f);
> >
> >  	hash &= (INETFRAGS_HASHSZ - 1);
> >