lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090924152451.GB19787@hmsreliant.think-freely.org>
Date:	Thu, 24 Sep 2009 11:24:51 -0400
From:	Neil Horman <nhorman@...driver.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org, davem@...emloft.net
Subject: Re: [PATCH] af_packet: add interframe drop cmsg (v2)

On Thu, Sep 24, 2009 at 04:10:43PM +0200, Eric Dumazet wrote:
> Neil Horman a écrit :
> > On Thu, Sep 24, 2009 at 07:24:47AM +0200, Eric Dumazet wrote:
> >> Neil Horman a écrit :
> >>>    
> >>> Ok, Version 2 of this patch, Taking Erics change notes into account.  Again,
> >>> tested by me, using the same test harness, and it works fine.
> >>> Neil
> >>>  
> >>>
> >>>     Add Ancilliary data to better represent loss information
> >>>     
> >>>     I've had a few requests recently to provide more detail regarding frame loss
> >>>     during an AF_PACKET packet capture session.  Specifically the requestors want to
> >>>     see where in a packet sequence frames were lost, i.e. they want to see that 40
> >>>     frames were lost between frames 302 and 303 in a packet capture file.  In order
> >>>     to do this we need:
> >>>     
> >>>     1) The kernel to export this data to user space
> >>>     2) The applications to make use of it
> >>>     
> >>>     This patch addresses item (1).  It does this by doing the following:
> >>>     
> >>>     A) attaching ancilliary data to any skb enqueued to a socket recieve queue for
> >>>     which frames were lost between it and the previously enqueued frame.  Note I use
> >>>     a ring buffer with a correlator value (the skb pointer) to do this.  This was
> >>>     done because the skb->cb array is exhausted already for AF_PACKET
> >> Yes, skb->cb array is fully used... Hmm...
> >>
> >> As we do skb_dst_drop(skb) before queueing skb into sk_receive_queue,
> >> we could overide _skb_dst to store the 'gap', (but would need 
> >> a special skb_queue_purge() implementation to ignore _skb_dst or force it to 0)
> >>
> > Yeah, we could do that, but I really hate the idea of overriding fields in an
> > skb.  I think that leads to so much confusion should someone have a need for
> > that fields legitimate purpose in the future.  I was thinking about perhaps
> > changing origlen to be a u16, and making gap a u16 so that we could fit both
> > fields into skb->cb. It would imply that we would loose info if any gap exceeded
> > 65535 packets, or if the origional length of an skb exceeded 65535 bytes.  I
> > think both conditions are highly unlikely.  We could mitigate the gap overflow
> > by checking the gap counter on increment and letting the 0xffff value be a
> > special 'more than 65k frames lost' meaning. We could perhaps do something
> > simmilar with the origional length.  Thoughts?
> 
> Well, changing origlen might be OK
> 
Ok, I'll give such a patch a go and post shortly.

> > 
> >> Problem with your implementation is that the 'ring buffer' could be exhausted as well
> >> and user could miss some information. (not counting you didnt rate limit the error message ;) )
> >>
> > Yeah, I'm not a fan of that either.  As I write it down above, I'm liking
> > splitting the origlen value into two 16 bit values more and more.  Let me know
> > what you think, and I'll make the appropriate changes.
> > 
> > Thanks!
> > Neil
> > 
> > P.S. I was thinking about your mmap comments last night, and I recalled that
> > Arnaldo was pushing some patches to allows for multiple skb receive and send
> > operations with a single syscall.  I'm wondering if the judicious use of such a
> > syscall might mitigate the performance advantage of mmap for libpcap?
> 
> Quite frankly I dont like adding super-syscalls like this one. This is bloat,
> and actually slows down the normal/legacy apps (larger kernel -> bigger latencies)
> 
> It would be better to sit down and think about better practices, to speedup
> normal path (ie existing syscalls)
> 
> The performance problem comes from cache line ping pongs between cpus, and
> lock contention.
> Hitting the spinlock each time a frame is queueued, and each time a frame is
> dequeued is the problem.
> We could imagine a two level receive queue, so that the application
> touch the "provider cache line" only to tranfert a batch of skb, when/if the first queue
> is empty.
> 
I think this is (in a sense) what acme's patch did, only using 1 queue.  A
single aquisition of the socket lock allows for a dequeue of N frames, where N
is equal to min(number of frames requested from user space, number of frames on
queue).  That reduces lock contnention by reducing the number of time it needs
to be acquired.

Of course, thats all thought experiment, I'd certainly defer to evidence to the
contrary.  Either way.  The patches are either on netdev (or may already be in
the net-next tree), I don't quite recall.

> See previous thread from Christoph Lameter, that showed that something as simple
> as a UDP echoer (of say 100 frames per second) is worse with current kernels.
> 
> I am not saying mmap() thing is the way to go, maybe ring buffer new interface could
> give us faster af_packet, not requiring pre-allocation of huge zones
> (__get_free_pages() of insane order pages. This is just not working !!!),
> allowing splice() or things like that...
> 
I don't think acmes patch did that, all it did was allow for the reception of
multiple skbs using 1 syscall.


Anywho, let me make some adjustments to my patch here, and I'll repost.

Thanks!
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ