netdev - Re: [PATCH] net: implement emergency route cache rebulds when gc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48E1C104.2080801@cosmosbay.com>
Date:	Tue, 30 Sep 2008 08:02:44 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Neil Horman <nhorman@...driver.com>
Cc:	netdev@...r.kernel.org, kuznet@....inr.ac.ru, davem@...emloft.net,
	pekkas@...core.fi, jmorris@...ei.org, yoshfuji@...ux-ipv6.org,
	kaber@...sh.net
Subject: Re: [PATCH] net: implement emergency route cache rebulds when	gc_elasticity
 is exceeded

Neil Horman a écrit :
> On Mon, Sep 29, 2008 at 11:00:35PM +0200, Eric Dumazet wrote:
>> Neil Horman a écrit :
>>> On Mon, Sep 29, 2008 at 10:22:03PM +0200, Eric Dumazet wrote:
>>>> Neil Horman a écrit :
>>>>> Hey all-
>>>>> 	We currently have the ability to disable our route cache secret interval
>>>>> rebuild timer (by setting it to zero), but if we do that its possible for an
>>>>> attacker (if they guess our route cache hash secret, to fill our system with
>>>>> routes that all hash to the same bucket, destroying our performance.  This patch
>>>>> provides a backstop for that issues.  In the event that our rebuild interval is
>>>>> disabled (or very large), if any hash chain exceeds ip_rt_gc_elasticity, we do
>>>>> an emergency hash rebuild.  During the hash rebuild we:
>>>>> 1) warn the user of the emergency
>>>>> 2) disable the rebuild timer
>>>>> 3) invalidate the route caches
>>>>> 4) re-enable the rebuild timer with its old value
>>>>>
>>>>> Regards
>>>>> Neil
>>>> This sounds not good at all to me.
>>>>
>>>> 1) Dont set ip_rt_secret_interval to zero, this is plain silly, since
>>>>   you give attackers infinite time to break your machine.
>>>>
>>>> To quote Herbert (who allowed to set this interval to 0)
>>>>
>>>>    "Let me first state that disabling the route cache hash rebuild
>>>>     should not be done without extensive analysis on the risk profile
>>>>     and careful deliberation.
>>>>
>>>>     However, there are times when this can be done safely or for
>>>>     testing.  For example, when you have mechanisms for ensuring
>>>>     that offending parties do not exist in your network."
>>>>
>>> Thats really rather the motivation behind this.  The patch that Herbert
>>> submitted with that commit explicitly lets one disable their rebuild timer.  I
>>> agree its stupid to do that, but we added code to allow it.  This provides a
>>> patch to help people who are victimized because they've done exactly this
>>> (additionaly providing them a warning to stop doing it).
>> Strange... many kernel parameters can be set to hazardous values that make machine unusable...
>>
>> ip_rt_gc_interval can also be set to a very large value : No more route cache gc
>>
> If you want to talk philosophy, then I accept your premise: we can tune the
> kernel in ways that make it unusable.  Does that mean we should avoid doing
> things to prevent that and maintain usibility.  Ithink this patch accomplishes
> that goal in its small way.
>  
>>>
>>>> 2) Many machines have ip_rt_gc_elasticity set to 2,
>>>>   because they have a huge hash table, but low chain depths.
>>> Ok, that seem reasonable, and this isn't going to disallow that.  By the same
>>> resoning, people who have huge hash tables, and low chain depths won't
>>> want their low chain length being violated, would they?  This patch will warn
>>> them if their assumptions are being violated.
>> Warn only ? If I read your patch, you not only warn in this case.
> No, not warn only, Warn and correct is the clear intent here
> 
>> (you invalidate cache for each struct net, potentially wraping rt_genid)
>>
> If you overflow your elasticity 2^16 times, yes (I think rt_genid is a 16 bit
> value, but I don't have the kernel in front of me).  I would hope that would be
> enough warnings to make a sysadmin address the problem
> 
>> When you have 2^20 slots in route cache hash table, you dont care if few slots have 3 or 4 elements.
>> And chance is very high that more than one slot has 3 or even 4 elements, no need for an attacker.
> Ok, then I would assert if your ok with a few chains getting to be X entries in
> length, then you should set your elasticity to be X.

Nope. We set elasticity to 2 when we want to trigger code starting at line 1048

        if (cand) {
                if (chain_length > ip_rt_gc_elasticity) {
                        *candp = cand->u.dst.rt_next;
                        rt_free(cand);
                }
        }

That is, freeing one entry if chain length is > 2 AND one entry is freeable.
This means that *some* chains eventually can have live entries and length > 2
without any problem. This is not an emergency case. I have seen on real 
machines some chains hitting length of 13 (elasticity=2), that was normal traffic,
and rt cache was on equilibrium.

Your patch adds to the "if (chain_length > elasticity)" case :

If no entry is freeable in this slot, invalidate *all* cache and put a warning.

Invalidating live entries is puting a high pressure on dst_garbage.list.
Having 1.000.000 entries in this list is not very cool, and should be done
only if really necessary.

When you know you have about 1.000.000 live entries in rt cache, you
can safely make your hash table sized to 2^20 slots, and elasticity set to 2
so that average chain length is <= 2, reducing cache misses at lookup time.

That is quite important to be able to handle 100.000 packets per second

> 
>> Now if you change your code to something like
>>
>> if (unlikely(chain_length > some_quite_big_number &&
>>     ip_rt_secret_interval == 0)) {
>> 	do_something();
>> }
>>
>> some_quite_big_number could be >= 30 or something...
>>
> I'd be ok with that, if some others chimed in with consensus on the matter.  I
> felt that we had a value definining what chain length should be
> (ip_rt_gc_elasticity is already used in comparison on chain length in
> rt_intern_hash, so I was keeping with that).  But if some others speak up and
> support a new sysctl, I can get behind that

I said 30, but could have said 100. No need for a sysctl.
If only one chain is really that long (and attacked), all its entries are hot.
If many chains are hit, then other rt...sysctl_params will control and limit cache grow.

> 
>> then it might be OK (at least it wont break common setups)
>>
> I don't think it will break many setups, the default value is 8 for this sysctl.
> If someone has set it lower, and sees a warning, they won't loose their system
> altogether, they'll just see a momentary reduction in throughput, and a
> warning to increase the value they have set.  Its going to far to say anything
> will 'break'.

I spent many days so that route cache doesnt crash some big machines I have around,
I have feelings your patch will make them crawl again, unless sysadmin is smart
enough to change once again rt sysctls.
So my replies are not just random things from me.

When a machine is targeted by a DDOS attack, about all slots of the hash table
are fully loaded (ie chain length >= elasticity). We dont need to invalidate 
the cache, but find an equilibrium, with small adjustements.

Thank you




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html