netdev - Re: Increased multicast packet drops in 3.4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Fri, 7 Sep 2012 17:38:43 -0500
From:	Shawn Bohrer <sbohrer@...advisors.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: Increased multicast packet drops in 3.4

On Fri, Sep 07, 2012 at 08:08:45AM +0200, Eric Dumazet wrote:
> On Thu, 2012-09-06 at 23:00 -0500, Shawn Bohrer wrote:
> > On Thu, Sep 06, 2012 at 03:21:07PM +0200, Eric Dumazet wrote:
> > > kfree_skb() can free a list of skb, and we use a generic function to do
> > > so, without forwarding the drop/notdrop status. So its unfortunate, but
> > > adding extra parameters just for the sake of drop_monitor is not worth
> > > it.  skb_drop_fraglist() doesnt know if the parent skb is dropped or
> > > only freed, so it calls kfree_skb(), not consume_skb() or kfree_skb()
> > 
> > I understand that this means that dropwatch or the skb:kfree_skb
> > tracepoint won't know if the packet was really dropped, but do we
> > know in this case from the context of the stack trace?  I'm assuming
> > since we didn't receive an error that the packet was delivered and
> > these aren't real drops.
> 
> I am starting to believe this is an application error.
> 
> This application uses recvmmsg() to fetch a lot of messages in one
> syscall, and it might well be it throws out a batch of 50+ messages
> because of an application bug. Yes, this starts with 3.4, but it can b
> triggered by a timing difference or something that is not a proper
> kernel bug...

Eric, you are absolutely correct.  There is at least one bug in the
application.  The code that re-orders out of order packets would give
up around the 50+ point and assume the packets in between were
dropped.  I did prove that if we keep reading from the socket we do
receive those packets.  So no packets are being dropped in the kernel.
I also proved I this is happening on 3.1 as well but 3.4 does trigger
it more often.

I'm still debugging the application because it appears I'm getting
very large batches of packets out of order.  It isn't clear to me if
this is another application bug, the kernel, the switch or something
else.

Thanks for all of your help looking into this (non)-issue.  If I have
further questions about the kernel side I'll let you know.

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html