[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1346937667.2484.33.camel@edumazet-glaptop>
Date: Thu, 06 Sep 2012 15:21:07 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Shawn Bohrer <sbohrer@...advisors.com>
Cc: netdev@...r.kernel.org
Subject: Re: Increased multicast packet drops in 3.4
On Thu, 2012-09-06 at 08:03 -0500, Shawn Bohrer wrote:
> On Thu, Sep 06, 2012 at 08:22:40AM +0200, Eric Dumazet wrote:
> > On Wed, 2012-09-05 at 19:11 -0500, Shawn Bohrer wrote:
> > > I've been testing the 3.4 kernel compared to the 3.1 kernel and
> > > noticed my application is experiencing a noticeable increase in packet
> > > drops compared to 3.1. In this case I have 8 processes all listening
> > > on the same multicast group and occasionally 1 or more of the
> > > processes will report drops based on gaps in the sequence numbers on
> > > the packets. One thing I find interesting is that some of the time 2
> > > or 3 of the 8 processes will report that they missed the exact same
> > > 50+ packets. Since the other processes receive the packets I know
> > > that they are making it to the machine and past the driver.
> > >
> > > So far I have not been able to _see_ any OS counters increase when the
> > > drops occur but perhaps there is a location that I have not yet
> > > looked. I've been looking for drops in /proc/net/udp /proc/net/snmp
> > > and /proc/net/dev.
> > >
> > > I've tried using dropwatch/drop_monitor but it is awfully noisy even
> > > after back porting many of the patches Eric Dumazet has contributed to
> > > silence the false positives. Similarly I setup trace-cmd/ftrace to
> > > record skb:kfree_skb calls with a stacktrace and had my application
> > > stop the trace when a drop was reported. From these traces I see a
> > > number of the following:
> > >
> > > md_connector-12791 [014] 7952.982818: kfree_skb: skbaddr=0xffff880583bd7500 protocol=2048 location=0xffffffff813c930b
> > > md_connector-12791 [014] 7952.982821: kernel_stack: <stack trace>
> > > => skb_release_data (ffffffff813c930b)
> > > => __kfree_skb (ffffffff813c934e)
> > > => skb_free_datagram_locked (ffffffff813ccca8)
> > > => udp_recvmsg (ffffffff8143335c)
> > > => inet_recvmsg (ffffffff8143cbfb)
> > > => sock_recvmsg_nosec (ffffffff813be80f)
> > > => __sys_recvmsg (ffffffff813bfe70)
> > > => __sys_recvmmsg (ffffffff813c2392)
> > > => sys_recvmmsg (ffffffff813c25b0)
> > > => system_call_fastpath (ffffffff8148cfd2)
> > >
> > > Looking at the code it does look like these could be the drops, since
> > > I do not see any counters incremented in this code path. However I'm
> > > not very familiar with this code so it could also be a false positive.
> > > It does look like the above stack only gets called if
> > > skb_has_frag_list(skb) does this imply the packet was over one MTU
> > > (1500)?
> > >
> > > I'd appreciate any input on possible causes/solutions for these drops.
> > > Or ways that I can further debug this issue to find the root cause of
> > > the increase in drops on 3.4.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> > What NIC driver are you using ?
>
> $ sudo ethtool -i eth4
> driver: mlx4_en
> version: 2.0 (Dec 2011)
> firmware-version: 2.10.700
> bus-info: 0000:05:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
>
> This is the in tree driver from 3.4.9
>
> [ sbohrer@...box12:/home/sbohrer ]
> $ /sbin/lspci | grep -i mell
> 05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>
> > Could you trace if skb_copy_and_csum_datagram_iovec() or
> > skb_copy_datagram_iovec() returns an error (it could be EFAULT by
> > example) ?
> >
> > If so, you could add some debugging to these functions to track what
> > exact error it is
> >
> > It seems following patch is needed anyway :
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 6f6d1ac..2c965c9 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1226,6 +1226,8 @@ try_again:
> >
> > if (unlikely(err)) {
> > trace_kfree_skb(skb, udp_recvmsg);
> > + if (!peeked)
> > + UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
> > goto out_free;
> > }
>
> Sorry, I should have mentioned that it doesn't appear I'm hitting that
> tracepoint. That tracepoint would have a location=udp_recvmsg and I
> believe the stack trace would also start at udp_recvmsg. I didn't see
> any of these in the traces I captured.
>
> I think the one I'm hitting is the following with some of my own extra
> annotation:
>
> => kfree_skb()
> => skb_drop_list()
> => skb_drop_fraglist()
> > > => skb_release_data (ffffffff813c930b)
> => skb_release_all()
> > > => __kfree_skb (ffffffff813c934e)
> > > => skb_free_datagram_locked (ffffffff813ccca8)
> > > => udp_recvmsg (ffffffff8143335c)
> > > => inet_recvmsg (ffffffff8143cbfb)
> > > => sock_recvmsg_nosec (ffffffff813be80f)
> > > => __sys_recvmsg (ffffffff813bfe70)
> > > => __sys_recvmmsg (ffffffff813c2392)
> > > => sys_recvmmsg (ffffffff813c25b0)
> > > => system_call_fastpath (ffffffff8148cfd2)
>
> kfree_skb() has the trace_kfree_skb() call on net/core/skbuff.c:3283
>
> I can of course still try your patch and double check that I'm not
> hitting that one.
kfree_skb() can free a list of skb, and we use a generic function to do
so, without forwarding the drop/notdrop status. So its unfortunate, but
adding extra parameters just for the sake of drop_monitor is not worth
it. skb_drop_fraglist() doesnt know if the parent skb is dropped or
only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()
Are you receiving fragmented UDP frames ?
I ask this because with latest kernels (linux-3.5), we should no longer
build a list of skb, but a single skb with page fragments.
commit 3cc4949269e01f39443d0fcfffb5bc6b47878d45
Author: Eric Dumazet <edumazet@...gle.com>
Date: Sat May 19 03:02:20 2012 +0000
ipv4: use skb coalescing in defragmentation
ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
reducing memory used by them (truesize), and reducing number of cache
line misses and overhead for the consumer.
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Cc: Alexander Duyck <alexander.h.duyck@...el.com>
Signed-off-by: David S. Miller <davem@...emloft.net>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists