[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1103312212360.9481@melkinpaasi.cs.helsinki.fi>
Date: Thu, 31 Mar 2011 22:37:21 +0300 (EEST)
From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To: henrick19777@...oo.com
cc: Netdev <netdev@...r.kernel.org>
Subject: Re: Fw: [Bug 32322] New: Kernel crashes randomly due to unknown
reason
On Thu, 31 Mar 2011, Stephen Hemminger wrote:
> Begin forwarded message:
>
> Date: Thu, 31 Mar 2011 08:22:53 GMT
> From: bugzilla-daemon@...zilla.kernel.org
> To: shemminger@...ux-foundation.org
> Subject: [Bug 32322] New: Kernel crashes randomly due to unknown reason
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=32322
>
> Summary: Kernel crashes randomly due to unknown reason
> Product: Networking
> Version: 2.5
> Kernel Version: 2.6.37.2
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: IPV4
> AssignedTo: shemminger@...ux-foundation.org
> ReportedBy: henrick19777@...oo.com
> Regression: No
>
>
> Created an attachment (id=52732)
> --> (https://bugzilla.kernel.org/attachment.cgi?id=52732)
> kernel config file
>
> Got a second kernel panic on this machine just randomly after ~26 days of
> uptime. This server runs Debian Squeeze with vsftpd, rsync and apache2 services
> installed from repositories. Here is the crash log:
>
> ------------[ cut here ]------------
> kernel BUG at net/ipv4/tcp_output.c:994!
BUG(len < skb->len); it seems...
len = (packets-oldcount) * gso_size, but:
oldcnt < packets < cnt == oldcnt + pcount.
...I'd say there has to be some other invariant violated as skb should
always have length of at least gso_size * (pcount-1) + 1?
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/devices/pci0000:05/0000:05:07.1/local_cpus
> Modules linked in:
>
> Pid: 0, comm: kworker/0:1 Not tainted 2.6.37.2-hid3 #2 IBM eserver xSeries 235
> -
> [8671MAX]-/
> EIP: 0060:[<c11c7f53>] EFLAGS: 00010206 CPU: 3
> EIP is at tcp_fragment+0x15/0x239
> EAX: c039ee00 EBX: f40e1200 ECX: 00003de0 EDX: f40e1200
> ESI: f40e1200 EDI: c039ee00 EBP: 00003880 ESP: f50b7dd8
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process kworker/0:1 (pid: 0, ti=f50b6000 task=f50b1bc0 task.ti=f50b2000)
> Stack:
> 00003de0 00000286 c102841f c039ee00 f40e1200 f40e1218 00000023 c11c132d
> 000005a0 00000000 00000002 c039ee7c 00000001 c039ee00 0000072e 00000000
> c11c57f6 00000001 00000001 00000000 00000001 00000026 00000006 c039ee7c
> Call Trace:
> [<c102841f>] ? __mod_timer+0xe3/0xec
> [<c11c132d>] ? tcp_mark_head_lost+0x100/0x1a4
...Another point... this particular check has unnecessarily high severity
as the callers need to be prepared to failures anyway... A patch below
(but this doesn't resolve the actual issue).
Yet another point, I suppose is should also be changed to check for
equality as then there isn't any point in calling fragment, but I don't
think it makes difference here now.
--
i.
--
[PATCH] tcp: len check is unnecessarily devastating, change to WARN_ON
All callers are prepared to alloc failures anyway, so this error
can safely be boomeranged to the callers domain without super
bad consequences. ...At worst the connection might go into a state
where each RTO tries to (unsuccessfully) re-fragment with such
a mis-sized value and eventually dies.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
---
net/ipv4/tcp_output.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index dfa5beb..8b0d016 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1003,7 +1003,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
int nlen;
u8 flags;
- BUG_ON(len > skb->len);
+ if (WARN_ON(len > skb->len))
+ return -EINVAL;
nsize = skb_headlen(skb) - len;
if (nsize < 0)
--
1.7.2.5
Powered by blists - more mailing lists