netdev - Re: bad nat connection tracking performance with ip

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A8AE76D.7040707@iki.fi>
Date:	Tue, 18 Aug 2009 20:39:57 +0300
From:	Timo Teräs <timo.teras@....fi>
To:	Patrick McHardy <kaber@...sh.net>
CC:	netfilter-devel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: bad nat connection tracking performance with ip_gre

Patrick McHardy wrote:
> Timo Teräs wrote:
>> Looped back by multicast routing:
>>
>> raw:PREROUTING:policy:1 IN=eth1 OUT= MAC= SRC=10.252.5.1
>> DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00 TTL=8 ID=36594 DF
>> PROTO=UDP SPT=33977 DPT=1234 LEN=1324 mangle:PREROUTING:policy:1 IN=eth1
>> OUT= MAC= SRC=10.252.5.1 DST=239.255.12.42 LEN=1344 TOS=0x00 PREC=0x00
>> TTL=8 ID=36594 DF PROTO=UDP SPT=33977 DPT=1234 LEN=1324
>> The cpu hogging happens somewhere below this, since the more
>> multicast destinations I have the more CPU it takes.
> 
> So you're sending to multiple destinations? That obviously increases
> the time spent in netfilter and the remaining networking stack.

Yes. But my observation was that for the same amount of packets
sent locally the CPU usage is significantly higher than if they
are forwarded from physical interface. That's what made me
curious.

If I had remember that icmp conn track entries get pruned right
when they get icmp reply back, I would not have probably bothered
to bug you. But that made me think it was more of generic problem
than my patch.

>> Multicast forwarded (I hacked this into the code; but similar
>> dump happens on local sendto()):
>>
>> Actually, now that I think, here we should have the inner IP
>> contents, and not the incomplete outer yet. So apparently
>> the ipgre_header() messes the network_header position.
> 
> It shouldn't even have been called at this point. Please retry this
> without your changes.

I patched ipmr.c to explicitly call dev_hard_header to setup the
ipgre nbma receiver. Sadly, the call was wrong side of the nf_hook.
Adjusting that makes the forward hooks look ok.

I thought hook was using network_header to figure out where the
IP header is, but looks like that isn't the case.

>> mangle:FORWARD:policy:1 IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip
>> LEN=0 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=47 filter:FORWARD:rule:2
>> IN=eth1 OUT=gre1 SRC=0.0.0.0 DST=re.mo.te.ip LEN=0 TOS=0x00 PREC=0x00
>> TTL=64 ID=0 DF PROTO=47
> 
> This looks really broken. Why is the protocol already 47 before it even
> reaches the gre tunnel?

Broken by me as explained.

>> ip_gre xmit sends out:
> 
> There should be a POSTROUTING hook here.

Hmm... Looking at the code I probably broke this too. Could missing
this hook have a performance penalty for future packets for the
same flow?

Ok. I'll go back to drawing board. I should have done the
multicast handling for nbma destinations on ip_gre side as I was
wondering earlier. I'll also double check with oprofile the local
sendto() approach where it dies.

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html