[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOZ2QJPAb4PfoBybPuhMokh9jhRFY0XSNMajf75Mgx=3=KXfUg@mail.gmail.com>
Date: Mon, 14 Sep 2015 23:14:59 -0400
From: Dan Streetman <dan.streetman@...onical.com>
To: Steffen Klassert <steffen.klassert@...unet.com>
Cc: Dan Streetman <ddstreet@...e.org>,
Jay Vosburgh <jay.vosburgh@...onical.com>,
netdev@...r.kernel.org
Subject: Re: xfrm4_garbage_collect reaching limit
On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert
<steffen.klassert@...unet.com> wrote:
> Hi Dan.
>
> On Thu, Sep 10, 2015 at 05:01:26PM -0400, Dan Streetman wrote:
>> Hi Steffen,
>>
>> I've been working with Jay on a ipsec issue, which I believe he
>> discussed with you.
>
> Yes, we talked about this at the LPC.
>
>> In this case the xfrm4_garbage_collect is
>> returning error because the number of xfrm4 dst entries has exceeded
>> twice the gc_thresh, which causes new allocations of xfrm4 dst objects
>> to fail, thus making the ipsec connection unusable (until dst objects
>> are removed/freed).
>>
>> The main reason the count gets to the limit is because the
>> xfrm4_policy_afinfo.garbage_collect function - which points to
>> flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
>> dst will get cleaned up, it only cleans up unused entries.
>>
>> The flow cache hashtable size limit watermark does restrict how many
>> flow cache entries exist (by shrinking the per-cpu hashtable once it
>> has 4k entries), and therefore indirectly controls the total number of
>> xfrm4 dst objects. However, there's a mismatch between the default
>> xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
>> objects) - and the flow cache hashtable limit of 4k objects per cpu.
>> Any system with 16 or less cpus will have a total limit of 64k (or
>> less) flow cache entries, so the 64k xfrm4 dst entry limit will never
>> be reached. However for any system with more than 16 cpus, the flow
>> cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
>> allocation can fail, rendering the ipsec connection unusable.
>>
>> The most obvious solution is for the system admin to increase the
>> xfrm4_gc_thresh value, although it's not really an obvious solution to
>> the end-user what value they should set it to :-)
>
> Yes, a static gc threshold is always wrong for some workloads. So
> the user needs to adjust it to his needs, even if the right value
> is not obvious.
>
>> Possibly the
>> default value of xfrm4_gc_thresh could be set proportional to
>> num_online_cpus(), but that doesn't help when cpus are onlined after
>> boot.
>
> This could be an option, we could change the xfrm4_gc_thresh value with
> a cpu notifier callback if more cpus come up after boot.
the issue there is, if the value is changed by the user, does a cpu
hotplug reset it back to default...
>
>> Also, a warning message indicating the xfrm4_gc_thresh limit
>> was reached, and a suggestion to increase the limit, may help anyone
>> who hits the issue.
what do you think about this? it's the simplest option; something like
pr_warn_ratelimited("xfrm4_gc_limit exceeded\n");
or with a suggestion...
pr_warn_ratelimited("xfrm4_gc_limit exceeded, you may want to increase
to %d or more",
2048 * num_online_cpus());
>>
>> I'm not sure if something more aggressive is appropriate, like
>> removing active entries during garbage collection.
>
> It would not make too much sense to push an active flow out of the
> fastpath just to add some other flow. If the number of active
> entries is to high, there is no other option than increasing the
> gc threshold.
>
> You could try to reduce the number of active entries by shutting
> down stale security associations frequently.
>
>> Or, removing the
>> failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
>> always be allocated,
>
> This would open doors for DOS attacks, we can't do this.
>
>> or just increasing it from gc_thresh * 2 up to *
>> 4 or more.
>
> This would just defer the problem, so not a real solution.
>
> That said, whatever we do, we just paper over the real problem,
> that is the flowcache itself. Everything that need this kind
> of garbage collecting is fundamentally broken. But as long as
> nobody volunteers to work on a replacement, we have to live
> with this situation somehow.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists