[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20151103132232.GQ7701@secunet.com>
Date: Tue, 3 Nov 2015 14:22:32 +0100
From: Steffen Klassert <steffen.klassert@...unet.com>
To: Dan Streetman <dan.streetman@...onical.com>
CC: Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
James Morris <jmorris@...ei.org>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
Patrick McHardy <kaber@...sh.net>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
Dan Streetman <ddstreet@...e.org>
Subject: Re: [PATCHv3] xfrm: dst_entries_init() per-net dst_ops
On Thu, Oct 29, 2015 at 09:51:16AM -0400, Dan Streetman wrote:
> Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
> templates; their dst_entries counters will never be used. Move the
> xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
> xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
> and dst_entries_destroy for each net namespace.
>
> The ipv4 and ipv6 xfrms each create dst_ops template, and perform
> dst_entries_init on the templates. The template values are copied to each
> net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops
> pcpuc_entries field is a percpu counter and cannot be used correctly by
> simply copying it to another object.
>
> The result of this is a very subtle bug; changes to the dst entries
> counter from one net namespace may sometimes get applied to a different
> net namespace dst entries counter. This is because of how the percpu
> counter works; it has a main count field as well as a pointer to the
> percpu variables. Each net namespace maintains its own main count
> variable, but all point to one set of percpu variables. When any net
> namespace happens to change one of the percpu variables to outside its
> small batch range, its count is moved to the net namespace's main count
> variable. So with multiple net namespaces operating concurrently, the
> dst_ops entries counter can stray from the actual value that it should
> be; if counts are consistently moved from one net namespace to another
> (which my testing showed is likely), then one net namespace winds up
> with a negative dst_ops count while another winds up with a continually
> increasing count, eventually reaching its gc_thresh limit, which causes
> all new traffic on the net namespace to fail with -ENOBUFS.
>
> Signed-off-by: Dan Streetman <dan.streetman@...onical.com>
> Signed-off-by: Dan Streetman <ddstreet@...e.org>
Applied to the ipsec tree, thanks Dan!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists