netdev - Re: 3.9.5+: Crash in tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51D398C0.5060802@candelatech.com>
Date:	Tue, 02 Jul 2013 20:21:36 -0700
From:	Ben Greear <greearb@...delatech.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	netdev <netdev@...r.kernel.org>
Subject: Re: 3.9.5+:  Crash in tcp_input.c:4810.

On 07/02/2013 06:04 PM, Eric Dumazet wrote:
> On Mon, 2013-07-01 at 11:10 -0700, Ben Greear wrote:
>
>> offset: -1459  start: -1146162927 seq: -1146161468 size: 16047 copy: 3576
>> ...
>>
>> There were 80 total splats of this nature grouped together, and then
>> the system recovered and continue to function normally as far as I
>> can tell.  The later splats are a bit farther apart...maybe the
>> TCP connection is dying.
>>
>> It appears my 'work-around' is poor at best, but I'd rather kill
>> a TCP connection and spam the logs than crash the OS.
>>
>> I'd be more than happy to add more/different debugging code.
>
> It would be nice to pinpoint the origin of the bug. Really.
>
> This BUG_ON() is at least 7 years old. I do not think invariant has
> changed ?
>
> Sure we can avoid crashes but it looks like we could randomly corrupt
> tcp payload or whatever kernel memory, if it turns out its caused by a
> buggy driver.
>
> Is it happening while collapsing the receive queue, or the ofo queue ?

What kinds of things could a driver do to cause this.  Maybe modify an
skb after it has sent it up the stack, or something like that?

We haven't been able to reproduce on a clean 3.10 yet...but it often takes days,
so we'll leave the test up through end of this week if we don't hit it
sooner...

I'll add your patch to my 3.9 tree.

Thanks,
Ben


> In receive queue, all skbs skb2 following skb1 must have
>
> TCP_SKB_CB(skb1)->end_seq >= TCP_SKB_CB(skb2)->seq
>
> Only on ofo, we could have this not respected, and it should be handled
> properly in tcp_collapse_ofo_queue()
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 28af45a..d77f1f0 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4457,7 +4457,12 @@ restart:
>   			int offset = start - TCP_SKB_CB(skb)->seq;
>   			int size = TCP_SKB_CB(skb)->end_seq - start;
>
> -			BUG_ON(offset < 0);
> +			if (unlikely(offset < 0)) {
> +				pr_err("tcp_collapse() bug on %s offset:%d size:%d copy:%d skb->len %u truesize %u, nskb->len %u\n",
> +					list == &sk->sk_receive_queue ? "receive_queue" : "ofo_queue",
> +					offset, size, copy, skb->len, skb->truesize, nskb->len);
> +				return;
> +			}
>   			if (size > 0) {
>   				size = min(copy, size);
>   				if (skb_copy_bits(skb, offset, skb_put(nskb, size), size))
>


-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html