lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 6 May 2010 17:25:32 +0530 From: Bhaskar Dutta <bhaskie@...il.com> To: Eric Dumazet <eric.dumazet@...il.com> Cc: Stephen Hemminger <shemminger@...tta.com>, Ben Hutchings <bhutchings@...arflare.com>, netdev@...r.kernel.org Subject: Re: TCP-MD5 checksum failure on x86_64 SMP On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@...il.com> wrote: > Le mercredi 05 mai 2010 à 23:33 +0530, Bhaskar Dutta a écrit : > >> Hi, >> >> TSO, GSO and SG are already turned off. >> rx/tx checksumming is on, but that shouldn't matter, right? >> >> # ethtool -k eth0 >> Offload parameters for eth0: >> rx-checksumming: on >> tx-checksumming: on >> scatter-gather: off >> tcp segmentation offload: off >> udp fragmentation offload: off >> generic segmentation offload: off >> >> The bad packets are very small in size, most have no data at all (<300 bytes). >> >> After adding some logs to kernel 2.6.31-12, it seems that >> tcp_v4_md5_hash_skb (function that calculates the md5 hash) is >> (might?) getting corrupt. >> >> The tcp4_pseudohdr (bp = &hp->md5_blk.ip4) structure's saddr, daddr >> and len fields get modified to different values towards the end of the >> tcp_v4_md5_hash_skb function whenever there is a checksum error. >> >> The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is >> filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr). >> >> Using a local copy of the tcp4_pseudohdr in the same function >> tcp_v4_md5_hash_skb (copied all fields from the original >> tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5 >> checksum with the local tcp4_pseudohdr seems to solve the issue >> (don't see bad packets for a hours in load tests, and without the >> change I can see them instantaneously in the load tests). >> >> I am still unable to figure out how this is happening. Please let me >> know if you have any pointers. > > I am not familiar with this code, but I suspect same per_cpu data can be > used at both time by a sender (process context) and by a receiver > (softirq context). > > To trigger this, you need at least two active md5 sockets. > > tcp_get_md5sig_pool() should probably disable bh to make sure current > cpu wont be preempted by softirq processing > > > Something like : > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index fb5c66b..e232123 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool *tcp_get_md5sig_pool(void) > struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu); > if (!ret) > put_cpu(); > + else > + local_bh_disable(); > return ret; > } > > static inline void tcp_put_md5sig_pool(void) > { > __tcp_put_md5sig_pool(); > + local_bh_enable(); > put_cpu(); > } > > > I put in the above change and ran some load tests with around 50 active TCP connections doing MD5. I could see only 1 bad packet in 30 min (earlier the problem used to occur instantaneously and repeatedly). I think there is another possibility of being preempted when calling tcp_alloc_md5sig_pool() this function releases the spinlock when calling __tcp_alloc_md5sig_pool(). I will run some more tests after changing the tcp_alloc_md5sig_pool and see if the problem is completely resolved. Thanks a lot for your help! Bhaskar -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists