netdev - Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 12 Nov 2007 00:40:39 +0200 (EET)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Chazarain Guillaume <guichaz@...oo.fr>
cc:	David Miller <davem@...emloft.net>, Netdev <netdev@...r.kernel.org>
Subject: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571
 tcp_remove_reno_sacks()

On Sun, 11 Nov 2007, Chazarain Guillaume wrote:

> > Do you have GSO enabled?
>
> According to ethtool -k, no.

Ok, thanks, it excludes lot of possibilities...

> > Is this reproducable?
>
> Unfortunately not, I saw it only once.

The messages you had in the other mail are very likely symptom of the 
same problem, it's just hard to tell from them where it really originates 
from (because it would requires expensive verification that nobody wants 
to do by default after simple operations). In many cases that WARN_ON is 
simply too late to tell when the problem causing adjustment/corruption 
occurred but it's still better than nothing as a starting point :-).

> > You can try to provoke it by setting tcp_sack sysctl 
> >  to 0 as this seems to be non-SACK related... If so, you could try the 
> > debug patch below
>
> > > # CONFIG_DEBUG_LIST is not set

> I'm currently running bittorrent with all of this, I just saw this (for 
> the first time ever), but otherwise it works fine:

WARNING: at net/ipv4/tcp_output.c:1807 tcp_simple_retransmit()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f6a79>] tcp_simple_retransmit+0xfa/0x185
 [<c02fa072>] tcp_v4_err+0x35d/0x4cb
 [<c0301f7d>] icmp_unreach+0x327/0x352

Hmm, that's related to path MTU things... It might have something to do 
with this... I'm not at all sure how it handles pcounts...

> > Have you run memtest recently?
>
> Just ran it with no errors for 6 minutes 30. The box is otherwise stable though.

Yeah, it's more likely a miscount somewhere rather than corruption but 
that wasn't obvious from the first mail...

...but alas, I haven't yet been able to come up with any theory on how 
a miscount could occur....

> I forgot to say that I have a kdump image of the crash (I had to 
> recompile this
>
> 2.6.24-rc2 kernel as I deleted its vmlinux), so I could check that you 
> are right on track with your assertions at the time of the crash.
>
> > +    if (WARN_ON(tcp_write_queue_head(sk) == NULL))
> > +        return;
> 
> (gdb) p sk->sk_write_queue.next
> $11 = (struct sk_buff *) 0xe43a04b0
> (gdb) p &sk->sk_write_queue
> $12 = (struct sk_buff_head *) 0xe43a04b0
> 
> 
> > +    if (WARN_ON(!tp->packets_out))
> > +        return;
> 
> (gdb) p ((struct tcp_sock *) sk)->packets_out
> $13 = 0

Yeah, they are expected, the write_queue is empty. Another cause for 
those could have been corrupted write_queue (that's why I asked for the 
list debugging).

> > +    if (tp->lost_out > tp->packets_out)
> > +        printk(KERN_ERR "Lost underflowed to %u\n", tp->lost_out);
>
> (gdb) p ((struct tcp_sock *) sk)->lost_out
> $14 = 4294967295

Underflows by one. ...We should just find out what causes this and fix 
that and we're done with it. :-)


> Some more gdb output for information:

Thanks about them, though they're not that useful because the problem 
occurred prior to its detection... :-)

> My naive attempt at understanding what's going on:
> 
> My oops starts with:
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 00000045
>
>
> gdb tells me the crash is in:
> #0  tcp_xmit_retransmit_queue (sk=0xe43a0440) at 
> net/ipv4/tcp_output.c:1962
> 1962                            __u8 sacked = TCP_SKB_CB(skb)->sacked;
>
> (gdb) p ((struct tcp_skb_cb *)((struct sk_buff *)0)->cb)->sacked
> Cannot access memory at address 0x45
> 
> A 0x45 offset is definitely a ->sacked on a null skb, but:

This is right.

> (gdb) p skb
> $5 = (struct sk_buff *) 0xe43a04b0
>
> which is sk->sk_write_queue so I don't understand why the 
> tcp_for_write_queue_from made an iteration.
>
> I don't know if gdb is playing tricks or if it's because I had to 
> recompile the crashing kernel.

No, it won't happen like that. ...I'd say that gdb is just confused. In 
case packets_out is zero (it occurs after a cumulative ACK only), for sure 
skb will become NULL because the retransmit_skb_hint was cleared due to 
cumulative ACK.

The crash location is the expected one in case packets_out gets zero 
during recovery and lost_out is miscounted/corrupt, as your dump shows.

Anyway, thanks for digging these out.


Here's a bruteforce patch below... Since you had couple of them during 
your overnight test, I'm sure it's relatively easy to catch... The 
first place where the tcp_verify_lost is triggered is the most 
interesting, rest are likely ripples due to that earlier corruption... 
(Hopefully I've placed them this time to places where both queue and 
lost_out states should agree, once did similar patch that had
incorrectly placed some verification calls which caused lot of spurious 
stacktraces :-))

...I left those !packets_out things there to prevent crashing when it 
occurs though it's not the main problem itself.

Please keep the tcp_sack set to 0, and once you have at least one 
stacktrace with it, you could try also with tcp_sack if the same thing 
occurs there as well.

--
[PATCH] TCP DEBUG

- Check if empty queue is passed to xmit_retrans...
- Print lost_out underflow value
- Track lost_out and LOST discrepancies everywhere (costs a bit).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
---
 include/net/tcp.h     |    2 ++
 net/ipv4/tcp_input.c  |   21 +++++++++++++++++++++
 net/ipv4/tcp_ipv4.c   |   19 +++++++++++++++++++
 net/ipv4/tcp_output.c |   14 ++++++++++++++
 4 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d695cea..a939bd5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)	SNMP_ADD_STATS_BH(tcp_statistics, field, val)
 #define TCP_ADD_STATS_USER(field, val)	SNMP_ADD_STATS_USER(tcp_statistics, field, val)
 
+extern void			tcp_verify_lost(struct sock *sk);
+
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void			tcp_shutdown (struct sock *sk, int how);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ca9590f..588b105 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1512,6 +1512,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 		flag |= tcp_mark_lost_retrans(sk, highest_sack_end_seq);
 
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	if ((reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss &&
 	    (!tp->frto_highmark || after(tp->snd_una, tp->frto_highmark)))
@@ -1552,6 +1553,7 @@ static void tcp_add_reno_sack(struct sock *sk)
 	tp->sacked_out++;
 	tcp_check_reno_reordering(sk, 0);
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 }
 
 /* Account for ACK, ACKing some data in Reno Recovery phase. */
@@ -1569,6 +1571,7 @@ static void tcp_remove_reno_sacks(struct sock *sk, int acked)
 	}
 	tcp_check_reno_reordering(sk, acked);
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 }
 
 static inline void tcp_reset_reno_sack(struct tcp_sock *tp)
@@ -1670,6 +1673,7 @@ void tcp_enter_frto(struct sock *sk)
 		tp->retrans_out -= tcp_skb_pcount(skb);
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	/* Earlier loss recovery underway (see RFC4138; Appendix B).
 	 * The last condition is necessary at least in tp->frto_counter case.
@@ -1727,6 +1731,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments;
 	tp->snd_cwnd_cnt = 0;
@@ -1812,6 +1817,7 @@ void tcp_enter_loss(struct sock *sk, int how)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	tp->reordering = min_t(unsigned int, tp->reordering,
 					     sysctl_tcp_reordering);
@@ -2044,6 +2050,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 }
 
 /* Account newly detected lost packet(s) */
@@ -2088,6 +2095,7 @@ static void tcp_update_scoreboard(struct sock *sk)
 		tp->scoreboard_skb_hint = skb;
 
 		tcp_verify_left_out(tp);
+		tcp_verify_lost(sk);
 	}
 }
 
@@ -2304,6 +2312,7 @@ static void tcp_try_to_open(struct sock *sk, int flag)
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	if (tp->retrans_out == 0)
 		tp->retrans_stamp = 0;
@@ -2403,6 +2412,7 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 
 	/* D. Check consistency of the current state. */
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	/* E. Check state exit conditions. State can be terminated
 	 *    when high_seq is ACKed. */
@@ -2521,6 +2531,12 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	if (do_lost || tcp_head_timedout(sk))
 		tcp_update_scoreboard(sk);
 	tcp_cwnd_down(sk, flag);
+	
+	if (WARN_ON(tcp_write_queue_head(sk) == NULL))
+		return;
+	if (WARN_ON(!tp->packets_out))
+		return;
+	
 	tcp_xmit_retransmit_queue(sk);
 }
 
@@ -2721,6 +2737,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 		sk_stream_free_skb(sk, skb);
 		tcp_clear_all_retrans_hints(tp);
 	}
+	
+	tcp_verify_lost(sk);
 
 	if (flag & FLAG_ACKED) {
 		u32 pkts_acked = prior_packets - tp->packets_out;
@@ -2759,6 +2777,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 #if FASTRETRANS_DEBUG > 0
 	BUG_TRAP((int)tp->sacked_out >= 0);
 	BUG_TRAP((int)tp->lost_out >= 0);
+	if (tp->lost_out > tp->packets_out)
+		printk(KERN_ERR "Lost underflowed to %u\n", tp->lost_out);
 	BUG_TRAP((int)tp->retrans_out >= 0);
 	if (!tp->packets_out && tcp_is_sack(tp)) {
 		icsk = inet_csk(sk);
@@ -2931,6 +2951,7 @@ static int tcp_process_frto(struct sock *sk, int flag)
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	/* Duplicate the behavior from Loss state (fastretrans_alert) */
 	if (flag&FLAG_DATA_ACKED)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e566f3c..5e10d90 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,25 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
 	.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_verify_lost(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 lost = 0;
+	struct sk_buff *skb;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			lost += tcp_skb_pcount(skb);
+	}
+	
+	if (WARN_ON(lost != tp->lost_out)) {
+		printk(KERN_ERR "Lost: %u vs %u, %u (%d)\n", lost, tp->lost_out,
+		       tp->packets_out, tcp_is_sack(tp));
+	}
+}
+
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
 {
 	return inet_csk_get_port(&tcp_hashinfo, sk, snum,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 324b420..09260ac 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -779,6 +779,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 	skb_header_release(buff);
 	tcp_insert_write_queue_after(skb, buff, sk);
 
+	tcp_verify_lost(sk);
+
 	return 0;
 }
 
@@ -1443,10 +1445,12 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
 
 	/* Do MTU probing. */
 	if ((result = tcp_mtu_probe(sk)) == 0) {
+		tcp_verify_lost(sk);
 		return 0;
 	} else if (result > 0) {
 		sent_pkts = 1;
 	}
+	tcp_verify_lost(sk);
 
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
@@ -1767,6 +1771,8 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		}
 
 		sk_stream_free_skb(sk, next_skb);
+		
+		tcp_verify_lost(sk);
 	}
 }
 
@@ -1798,6 +1804,8 @@ void tcp_simple_retransmit(struct sock *sk)
 			}
 		}
 	}
+	
+	tcp_verify_lost(sk);
 
 	tcp_clear_all_retrans_hints(tp);
 
@@ -1819,6 +1827,8 @@ void tcp_simple_retransmit(struct sock *sk)
 		tcp_set_ca_state(sk, TCP_CA_Loss);
 	}
 	tcp_xmit_retransmit_queue(sk);
+	
+	tcp_verify_lost(sk);
 }
 
 /* This retransmits one SKB.  Policy decisions and retransmit queue
@@ -2000,6 +2010,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 			}
 		}
 	}
+	
+	tcp_verify_lost(sk);
 
 	/* OK, demanded retransmission is finished. */
 
@@ -2058,6 +2070,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 		NET_INC_STATS_BH(LINUX_MIB_TCPFORWARDRETRANS);
 	}
+	
+	tcp_verify_lost(sk);
 }
 
 
-- 
1.5.0.6