lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 10 Sep 2015 17:01:26 -0400
From:	Dan Streetman <dan.streetman@...onical.com>
To:	Steffen Klassert <steffen.klassert@...unet.com>
Cc:	Dan Streetman <ddstreet@...e.org>,
	Jay Vosburgh <jay.vosburgh@...onical.com>,
	netdev@...r.kernel.org
Subject: xfrm4_garbage_collect reaching limit

Hi Steffen,

I've been working with Jay on a ipsec issue, which I believe he
discussed with you.  In this case the xfrm4_garbage_collect is
returning error because the number of xfrm4 dst entries has exceeded
twice the gc_thresh, which causes new allocations of xfrm4 dst objects
to fail, thus making the ipsec connection unusable (until dst objects
are removed/freed).

The main reason the count gets to the limit is because the
xfrm4_policy_afinfo.garbage_collect function - which points to
flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
dst will get cleaned up, it only cleans up unused entries.

The flow cache hashtable size limit watermark does restrict how many
flow cache entries exist (by shrinking the per-cpu hashtable once it
has 4k entries), and therefore indirectly controls the total number of
xfrm4 dst objects.  However, there's a mismatch between the default
xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
objects) - and the flow cache hashtable limit of 4k objects per cpu.
Any system with 16 or less cpus will have a total limit of 64k (or
less) flow cache entries, so the 64k xfrm4 dst entry limit will never
be reached.  However for any system with more than 16 cpus, the flow
cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
allocation can fail, rendering the ipsec connection unusable.

The most obvious solution is for the system admin to increase the
xfrm4_gc_thresh value, although it's not really an obvious solution to
the end-user what value they should set it to :-)  Possibly the
default value of xfrm4_gc_thresh could be set proportional to
num_online_cpus(), but that doesn't help when cpus are onlined after
boot.  Also, a warning message indicating the xfrm4_gc_thresh limit
was reached, and a suggestion to increase the limit, may help anyone
who hits the issue.

I'm not sure if something more aggressive is appropriate, like
removing active entries during garbage collection.  Or, removing the
failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
always be allocated, or just increasing it from gc_thresh * 2 up to *
4 or more.

Also, I refer to xfrm4 above, but I believe this will affect xfrm6 as well.

Any thoughts and/or suggestions?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ