[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1366366287.3205.98.camel@edumazet-glaptop>
Date: Fri, 19 Apr 2013 03:11:27 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
netdev@...r.kernel.org
Subject: Re: [net-next PATCH 2/3] net: fix enforcing of fragment queue hash
list depth
On Thu, 2013-04-18 at 23:38 +0200, Jesper Dangaard Brouer wrote:
> I have found an issues with commit:
>
> commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055
> Author: Hannes Frederic Sowa <hannes@...essinduktion.org>
> Date: Fri Mar 15 11:32:30 2013 +0000
>
> inet: limit length of fragment queue hash table bucket lists
>
> There is a connection between the fixed 128 hash depth limit and the
> frag mem limit/thresh settings, which limits how high the thresh can
> be set.
>
> The 128 elems hash depth limit, results in bad behaviour if mem limit
> thresh holds are increased, via /proc/sys/net ::
>
> /proc/sys/net/ipv4/ipfrag_high_thresh
> /proc/sys/net/ipv4/ipfrag_low_thresh
>
> If we increase the thresh, to something allowing 128 elements in each
> bucket, which is not that high given the hash array size of 64
> (64*128=8192), e.g.
> big MTU frags (2944(truesize)+208(ipq))*8192(max elems)=25755648
> small frags ( 896(truesize)+208(ipq))*8192(max elems)=9043968
>
> The problem with commit 5a3da1fe (inet: limit length of fragment queue
> hash table bucket lists) is that, once we hit the limit, the we *keep*
> the existing frag queues, not allowing new frag queues to be created.
> Thus, an attacker can effectivly block handling of fragments for 30
> sec (as each frag queue have a timeout of 30 sec).
>
> Even without increasing the limit, as Hannes showed, an attacker on
> IPv6 can "attack" a specific hash bucket, and via that change, can
> block/drop new fragments also (trying to) utilize this bucket.
>
> Summary:
> With the default mem limit/thresh settings, this is not general
> problem, but adjusting the thresh limits result in some-what
> unexpected behavior.
>
> Proposed solution:
> IMHO instead of keeping existing frag queues, we should kill one of
> the frag queues in the hash instead.
This strategy wont really help DDOS attacks. No frag will ever complete.
I am not sure its worth adding extra complexity.
>
> Implementation complications:
> Killing of frag queues while only holding the hash bucket lock, and
> not the frag queue lock, complicates the implementation, as we race
> and can end up (trying to) remove the hash element twice (resulting in
> an oops). This have been addressed by using hlist_del_init() and a
> hlist_unhashed() check in fq_unlink_hash().
>
> Extra:
> * Added new sysctl "max_hash_depth" option, to allow users to adjust the hash
> depth along with adjusting the thresh limits.
> * Change max hash depth to 32, thus limit handling to approx 2048 frag queues.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> ---
>
> include/net/inet_frag.h | 9 +---
> net/ipv4/inet_fragment.c | 64 ++++++++++++++++++++-----------
> net/ipv4/ip_fragment.c | 13 +++++-
> net/ipv6/netfilter/nf_conntrack_reasm.c | 5 +-
> net/ipv6/reassembly.c | 15 ++++++-
> 5 files changed, 68 insertions(+), 38 deletions(-)
Hmm... adding a new sysctl without documentation is a clear sign you'll
be the only user of it.
You are also setting a default limit of 32, more likely to hit the
problem than current 128 value.
We know the real solution is to have a correctly sized hash table, so
why adding a temporary sysctl ?
As soon as /proc/sys/net/ipv4/ipfrag_high_thresh is changed, a resize
should be attempted.
But the max depth itself should be a reasonable value, and doesn't need
to be tuned IMHO.
The 64 slots hash table was chosen years ago, when machines had 3 order
of magnitude less ram than today.
Before hash resizing, I would just bump hash size to something more
reasonable like 1024.
That would allow some admin to set /proc/sys/net/ipv4/ipfrag_high_thresh
to a quite large value.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists