[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8e1da0801221744m37021e88h5c188e5b31ea42bf@mail.gmail.com>
Date: Wed, 23 Jan 2008 09:44:30 +0800
From: "Dave Young" <hidave.darkstar@...il.com>
To: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
Cc: LKML <linux-kernel@...r.kernel.org>,
"David Miller" <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
"Andrew Morton" <akpm@...ux-foundation.org>
Subject: Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
On Jan 22, 2008 6:47 PM, Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
>
> On Tue, 22 Jan 2008, Dave Young wrote:
>
> > On Jan 22, 2008 12:37 PM, Dave Young <hidave.darkstar@...il.com> wrote:
> > >
> > > On Jan 22, 2008 5:14 AM, Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
> > > >
> > > > On Mon, 21 Jan 2008, Dave Young wrote:
> > > >
> > > > > Please see the kernel messages following,(trigged while using some qemu session)
> > > > > BTW, seems there's some e100 error message as well.
> > > > >
> > > > > PCI: Setting latency timer of device 0000:00:1b.0 to 64
> > > > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > > > ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > > > modprobe:2331 conflicting cache attribute efaff000-efb00000 uncached<->default
> > > > > e100: 0000:03:08.0: e100_probe: Cannot map device registers, aborting.
> > > > > ACPI: PCI interrupt for device 0000:03:08.0 disabled
> > > > > e100: probe of 0000:03:08.0 failed with error -12
> > > > > eth0: setting full-duplex.
> > > > > ------------[ cut here ]------------
> > > > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > > > > [<c0132100>] ? printk+0x0/0x20
> > > > > [<c0131834>] warn_on_slowpath+0x54/0x80
> > > > > [<c03e8df8>] ? ip_finish_output+0x128/0x2e0
> > > > > [<c03e9527>] ? ip_output+0xe7/0x100
> > > > > [<c03e8a88>] ? ip_local_out+0x18/0x20
> > > > > [<c03e991c>] ? ip_queue_xmit+0x3dc/0x470
> > > > > [<c043641e>] ? _spin_unlock_irqrestore+0x5e/0x70
> > > > > [<c0186be1>] ? check_pad_bytes+0x61/0x80
> > > > > [<c03f6031>] tcp_mark_head_lost+0x121/0x150
> > > > > [<c03f60ac>] tcp_update_scoreboard+0x4c/0x170
> > > > > [<c03f6e0a>] tcp_fastretrans_alert+0x48a/0x6b0
> > > > > [<c03f7d93>] tcp_ack+0x1b3/0x3a0
> > > > > [<c03fa14b>] tcp_rcv_established+0x3eb/0x710
> > > > > [<c04015c5>] tcp_v4_do_rcv+0xe5/0x100
> > > > > [<c0401bbb>] tcp_v4_rcv+0x5db/0x660
> > > >
> > > > Doh, once more these S+L things..., the rest are symptom of the first
> > > > problem.
> > >
> > > What is the S+L thing? Could you explain a bit?
>
> It means that one of the skbs is both SACKed and marked as LOST at the
> same time in the counters (might be due to miscount of lost/sacked_out
> too, not necessarilily in the ->sacked bits). Such state is logically
> invalid because it would mean that the sender thinks that the same packet
> both reached the receiver and is lost in the network.
>
> Traditionally TCP has just silently "corrected" over-estimates
> (sacked_out+lost_out > packets_out). I changed this couple of releases ago
> because those over-estimates often are due to bugs that should be fixed
> (there have been couple of them but it has been very quite on this front
> long time, months or even half year already; but I might have broken
> something with the early Dec changes).
>
> These problem may originate from a bug that occurred a number of ACKs
> earlier the WARN_ON triggered, therefore they are a bit tricky to track,
> those WARN_ON serve just for alerting purposes and usually do not point
> out where the bug actually occurred.
>
> I usually just asked people to include exhaustive verifier which compares
> ->sacked bitmaps with sacked/lost_out counters and report immediately when
> the problem shows up, rather than waiting for the cheaper S+L check we do
> in the WARN_ON to trigger. I tried to collect tracking patch from the
> previous efforts (hopefully got it right after modifications).
>
> > > I'm a bit worried about its
> > > > reproducability if it takes this far to see it...
> > > >
> >
> > It's trigged again in my pc, just while using firefox.
>
> ...Good, then there's some chance to catch it.
>
> --
> i.
>
> [PATCH] [TCP]: debug S+L
Thanks, If there's new findings I will let you know.
>
> ---
> include/net/tcp.h | 8 +++-
> net/ipv4/tcp_input.c | 6 +++
> net/ipv4/tcp_ipv4.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++
> net/ipv4/tcp_output.c | 21 +++++++---
> 4 files changed, 129 insertions(+), 7 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 7de4ea3..0685035 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
> #define TCP_ADD_STATS_BH(field, val) SNMP_ADD_STATS_BH(tcp_statistics, field, val)
> #define TCP_ADD_STATS_USER(field, val) SNMP_ADD_STATS_USER(tcp_statistics, field, val)
>
> +extern void tcp_verify_wq(struct sock *sk);
> +
> extern void tcp_v4_err(struct sk_buff *skb, u32);
>
> extern void tcp_shutdown (struct sock *sk, int how);
> @@ -768,7 +770,11 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
> }
>
> /* Use define here intentionally to get WARN_ON location shown at the caller */
> -#define tcp_verify_left_out(tp) WARN_ON(tcp_left_out(tp) > tp->packets_out)
> +#define tcp_verify_left_out(tp) \
> + do { \
> + WARN_ON(tcp_left_out(tp) > tp->packets_out); \
> + tcp_verify_wq((struct sock *)tp); \
> + } while(0)
>
> extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
> extern __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst);
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index fa2c85c..0bda0e1 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2645,6 +2645,10 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
> if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
> tcp_update_scoreboard(sk, fast_rexmit);
> tcp_cwnd_down(sk, flag);
> +
> + WARN_ON(tcp_write_queue_head(sk) == NULL);
> + WARN_ON(!tp->packets_out);
> +
> tcp_xmit_retransmit_queue(sk);
> }
>
> @@ -2848,6 +2852,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets)
> tcp_clear_all_retrans_hints(tp);
> }
>
> + tcp_verify_left_out(tp);
> +
> if (skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
> flag |= FLAG_SACK_RENEGING;
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 9aea88b..21f5888 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -108,6 +108,107 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
> .lhash_wait = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
> };
>
> +void tcp_print_queue(struct sock *sk)
> +{
> + struct tcp_sock *tp = tcp_sk(sk);
> + struct sk_buff *skb;
> + char s[50+1];
> + char h[50+1];
> + int idx = 0;
> + int i;
> +
> + tcp_for_write_queue(skb, sk) {
> + if (skb == tcp_send_head(sk))
> + break;
> +
> + for (i = 0; i < tcp_skb_pcount(skb); i++) {
> + if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
> + s[idx] = 'S';
> + if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
> + s[idx] = 'B';
> +
> + } else if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) {
> + s[idx] = 'L';
> + } else {
> + s[idx] = ' ';
> + }
> + if (s[idx] != ' ' && skb->len < tp->mss_cache)
> + s[idx] += 'a' - 'A';
> +
> + if (i == 0) {
> + if (TCP_SKB_CB(skb)->seq == tcp_highest_sack_seq(tp))
> + h[idx] = 'h';
> + else
> + h[idx] = '+';
> + } else {
> + h[idx] = '-';
> + }
> +
> + if (++idx >= 50) {
> + s[idx] = 0;
> + h[idx] = 0;
> + printk(KERN_ERR "TCP wq(s) %s\n", s);
> + printk(KERN_ERR "TCP wq(h) %s\n", h);
> + idx = 0;
> + }
> + }
> + }
> + if (idx) {
> + s[idx] = '<';
> + s[idx+1] = 0;
> + h[idx] = '<';
> + h[idx+1] = 0;
> + printk(KERN_ERR "TCP wq(s) %s\n", s);
> + printk(KERN_ERR "TCP wq(h) %s\n", h);
> + }
> + printk(KERN_ERR "l%u s%u f%u p%u seq: su%u hs%u sn%u\n",
> + tp->lost_out, tp->sacked_out, tp->fackets_out, tp->packets_out,
> + tp->snd_una, tcp_highest_sack_seq(tp), tp->snd_nxt);
> +}
> +
> +void tcp_verify_wq(struct sock *sk)
> +{
> + struct tcp_sock *tp = tcp_sk(sk);
> + u32 lost = 0;
> + u32 sacked = 0;
> + u32 packets = 0;
> + struct sk_buff *skb;
> +
> + tcp_for_write_queue(skb, sk) {
> + if (skb == tcp_send_head(sk))
> + break;
> +
> + if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
> + sacked += tcp_skb_pcount(skb);
> + if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
> + printk(KERN_ERR "Sacked bitmap S+L: %u %u-%u/%u\n",
> + TCP_SKB_CB(skb)->sacked,
> + TCP_SKB_CB(skb)->end_seq - tp->snd_una,
> + TCP_SKB_CB(skb)->seq - tp->snd_una,
> + tp->snd_una);
> + }
> + if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
> + lost += tcp_skb_pcount(skb);
> +
> + packets += tcp_skb_pcount(skb);
> + }
> +
> + WARN_ON(lost != tp->lost_out);
> + WARN_ON(sacked != tp->sacked_out);
> + WARN_ON(packets != tp->packets_out);
> + if ((lost != tp->lost_out) ||
> + (sacked != tp->sacked_out) ||
> + (packets != tp->packets_out)) {
> + printk(KERN_ERR "P: %u L: %u vs %u S: %u vs %u w: %u-%u (%u)\n",
> + tp->packets_out,
> + lost, tp->lost_out,
> + sacked, tp->sacked_out,
> + tp->snd_una, tp->snd_nxt,
> + tp->rx_opt.sack_ok);
> + tcp_print_queue(sk);
> + }
> +}
> +
> static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
> {
> return inet_csk_get_port(&tcp_hashinfo, sk, snum,
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 89f0188..648340f 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -779,10 +779,9 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
> tp->lost_out -= diff;
>
> /* Adjust Reno SACK estimate. */
> - if (tcp_is_reno(tp) && diff > 0) {
> + if (tcp_is_reno(tp) && diff > 0)
> tcp_dec_pcount_approx_int(&tp->sacked_out, diff);
> - tcp_verify_left_out(tp);
> - }
> +
> tcp_adjust_fackets_out(sk, skb, diff);
> }
>
> @@ -790,6 +789,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
> skb_header_release(buff);
> tcp_insert_write_queue_after(skb, buff, sk);
>
> + tcp_verify_left_out(tp);
> +
> return 0;
> }
>
> @@ -1463,6 +1464,7 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
> } else if (result > 0) {
> sent_pkts = 1;
> }
> + tcp_verify_left_out(tp);
>
> while ((skb = tcp_send_head(sk))) {
> unsigned int limit;
> @@ -1764,6 +1766,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb,
> tcp_clear_retrans_hints_partial(tp);
>
> sk_wmem_free_skb(sk, next_skb);
> + tcp_verify_left_out(tp);
> }
>
> /* Do a simple retransmit without using the backoff mechanisms in
> @@ -1795,13 +1798,13 @@ void tcp_simple_retransmit(struct sock *sk)
> }
> }
>
> + tcp_verify_left_out(tp);
> +
> tcp_clear_all_retrans_hints(tp);
>
> if (!lost)
> return;
>
> - tcp_verify_left_out(tp);
> -
> /* Don't muck with the congestion window here.
> * Reason is that we do not increase amount of _data_
> * in network, but units changed and effective
> @@ -1970,8 +1973,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> * packet to be MSS sized and all the
> * packet counting works out.
> */
> - if (tcp_packets_in_flight(tp) >= tp->snd_cwnd)
> + if (tcp_packets_in_flight(tp) >= tp->snd_cwnd) {
> + tcp_verify_left_out(tp);
> return;
> + }
>
> if (sacked & TCPCB_LOST) {
> if (!(sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))) {
> @@ -1997,6 +2002,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
> }
> }
>
> + tcp_verify_left_out(tp);
> +
> /* OK, demanded retransmission is finished. */
>
> /* Forward retransmissions are possible only during Recovery. */
> @@ -2054,6 +2061,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
>
> NET_INC_STATS_BH(LINUX_MIB_TCPFORWARDRETRANS);
> }
> +
> + tcp_verify_left_out(tp);
> }
>
> /* Send a fin. The caller locks the socket for us. This cannot be
> --
> 1.5.2.2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists