netdev - Re: TCP-MD5 checksum failure on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <n2u571fb4001005070159y91d8b13crb20d2f14ea26dd1a@mail.gmail.com>
Date:	Fri, 7 May 2010 14:29:48 +0530
From:	Bhaskar Dutta <bhaskie@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Stephen Hemminger <shemminger@...tta.com>,
	Ben Hutchings <bhutchings@...arflare.com>,
	netdev@...r.kernel.org, David Miller <davem@...emloft.net>
Subject: Re: TCP-MD5 checksum failure on x86_64 SMP

On Fri, May 7, 2010 at 1:30 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Le vendredi 07 mai 2010 à 07:39 +0200, Eric Dumazet a écrit :
>> Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit :
>> > On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>>
>> > > I am not familiar with this code, but I suspect same per_cpu data can be
>> > > used at both time by a sender (process context) and by a receiver
>> > > (softirq context).
>> > >
>> > > To trigger this, you need at least two active md5 sockets.
>> > >
>> > > tcp_get_md5sig_pool() should probably disable bh to make sure current
>> > > cpu wont be preempted by softirq processing
>> > >
>> > >
>> > > Something like :
>> > >
>> > > diff --git a/include/net/tcp.h b/include/net/tcp.h
>> > > index fb5c66b..e232123 100644
>> > > --- a/include/net/tcp.h
>> > > +++ b/include/net/tcp.h
>> > > @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool          *tcp_get_md5sig_pool(void)
>> > >        struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
>> > >        if (!ret)
>> > >                put_cpu();
>> > > +       else
>> > > +               local_bh_disable();
>> > >        return ret;
>> > >  }
>> > >
>> > >  static inline void             tcp_put_md5sig_pool(void)
>> > >  {
>> > >        __tcp_put_md5sig_pool();
>> > > +       local_bh_enable();
>> > >        put_cpu();
>> > >  }
>> > >
>> > >
>> > >
>> >
>> > I put in the above change and ran some load tests with around 50
>> > active TCP connections doing MD5.
>> > I could see only 1 bad packet in 30 min (earlier the problem used to
>> > occur instantaneously and repeatedly).
>> >
>>
>>
>> > I think there is another possibility of being preempted when calling
>> > tcp_alloc_md5sig_pool()
>> > this function releases the spinlock when calling __tcp_alloc_md5sig_pool().
>> >
>> > I will run some more tests after changing the  tcp_alloc_md5sig_pool
>> > and see if the problem is completely resolved.
>
> Here is my official patch submission, could you please test it ?
>


Eric,

Thanks a lot! I will test it out and let you know.
BTW this patch seems to essentially do the same as the earlier fix you
had posted (where you just do bh disable/enable).
Am I missing something?

With the earlier fix, I ran load tests with 80 TCP connections for
over 6 hrs and found 5 bad checksum packets.
So there is still a problem. Without the fix I see a bad packet every
minute or so.

Bhaskar
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html