netdev - Re: oops in tcp_xmit_retransmit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1007161448330.13946@melkinpaasi.cs.helsinki.fi>
Date:	Fri, 16 Jul 2010 15:02:48 +0300 (EEST)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Lennart Schulte <lennart.schulte@...s.rwth-aachen.de>
cc:	Eric Dumazet <eric.dumazet@...il.com>, Tejun Heo <tj@...nel.org>,
	"David S. Miller" <davem@...emloft.net>,
	lkml <linux-kernel@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Fehrmann, Henning" <henning.fehrmann@....mpg.de>,
	Carsten Aulbert <carsten.aulbert@....mpg.de>
Subject: Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15

On Thu, 15 Jul 2010, Lennart Schulte wrote:

> Since tcp_xmit_retransmit_queue also gets skb == NULL I'm pretty sure it is
> the same bug.
> Up to now I only experienced the problem with ACK loss (without ACK loss the
> test ran about 30min without problems, with ACK loss it had paniced within
> 10min).
> The data sender only has a HTB queue for traffic shaping (set to 20 Mbit/s).
> The ACK loss is done by another router.
> The setup looks like this. This way it seems to be the most realistic.
> 
> o sender with HTB
> |
> |
> o netem queue for forward path delay
> |
> o netem queue for a queue limit
> |
> o netem queue for backward path delay
> |
> o netem queue for ACK loss
> |
> |
> o receiver with HTB
> 
> Perhaps now it is a little big clearer.

> > > [ 2754.413150] NULL head, pkts 0
> > > [ 2754.413156] Errors caught so far 1

Thanks for reporting the results.

Could you post the oops too or double check do the timestamps really match 
(and there wasn't more "Errors caught" prints in between)? Since this 
condition doesn't seem to crash the kernel as also send_head should be 
NULL, which saves the day here exiting the loop (unless send head would 
too be corrupt). ...However, I don't like too much anyway that we can end 
up into tcp_xmit_retransmit_queue loop with packets_out being zero and 
only send_head check side-effect causes proper action.

Besides, Tejun has also found that it's hint->next ptr which is NULL in 
his case so this won't solve his case anyway. Tejun, can you confirm 
whether it was retransmit_skb_hint->next being NULL on _entry time_ to 
tcp_xmit_retransmit_queue() or later on in the loop after the updates done 
by the loop itself to the hint (or that your testing didn't conclude 
either)?

-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html