netdev - Re: [PATCH] af_packet: add interframe drop cmsg (v2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4ABB7DE3.5000300@gmail.com>
Date:	Thu, 24 Sep 2009 16:10:43 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Neil Horman <nhorman@...driver.com>
CC:	netdev@...r.kernel.org, davem@...emloft.net
Subject: Re: [PATCH] af_packet: add interframe drop cmsg (v2)

Neil Horman a écrit :
> On Thu, Sep 24, 2009 at 07:24:47AM +0200, Eric Dumazet wrote:
>> Neil Horman a écrit :
>>>    
>>> Ok, Version 2 of this patch, Taking Erics change notes into account.  Again,
>>> tested by me, using the same test harness, and it works fine.
>>> Neil
>>>  
>>>
>>>     Add Ancilliary data to better represent loss information
>>>     
>>>     I've had a few requests recently to provide more detail regarding frame loss
>>>     during an AF_PACKET packet capture session.  Specifically the requestors want to
>>>     see where in a packet sequence frames were lost, i.e. they want to see that 40
>>>     frames were lost between frames 302 and 303 in a packet capture file.  In order
>>>     to do this we need:
>>>     
>>>     1) The kernel to export this data to user space
>>>     2) The applications to make use of it
>>>     
>>>     This patch addresses item (1).  It does this by doing the following:
>>>     
>>>     A) attaching ancilliary data to any skb enqueued to a socket recieve queue for
>>>     which frames were lost between it and the previously enqueued frame.  Note I use
>>>     a ring buffer with a correlator value (the skb pointer) to do this.  This was
>>>     done because the skb->cb array is exhausted already for AF_PACKET
>> Yes, skb->cb array is fully used... Hmm...
>>
>> As we do skb_dst_drop(skb) before queueing skb into sk_receive_queue,
>> we could overide _skb_dst to store the 'gap', (but would need 
>> a special skb_queue_purge() implementation to ignore _skb_dst or force it to 0)
>>
> Yeah, we could do that, but I really hate the idea of overriding fields in an
> skb.  I think that leads to so much confusion should someone have a need for
> that fields legitimate purpose in the future.  I was thinking about perhaps
> changing origlen to be a u16, and making gap a u16 so that we could fit both
> fields into skb->cb. It would imply that we would loose info if any gap exceeded
> 65535 packets, or if the origional length of an skb exceeded 65535 bytes.  I
> think both conditions are highly unlikely.  We could mitigate the gap overflow
> by checking the gap counter on increment and letting the 0xffff value be a
> special 'more than 65k frames lost' meaning. We could perhaps do something
> simmilar with the origional length.  Thoughts?

Well, changing origlen might be OK

> 
>> Problem with your implementation is that the 'ring buffer' could be exhausted as well
>> and user could miss some information. (not counting you didnt rate limit the error message ;) )
>>
> Yeah, I'm not a fan of that either.  As I write it down above, I'm liking
> splitting the origlen value into two 16 bit values more and more.  Let me know
> what you think, and I'll make the appropriate changes.
> 
> Thanks!
> Neil
> 
> P.S. I was thinking about your mmap comments last night, and I recalled that
> Arnaldo was pushing some patches to allows for multiple skb receive and send
> operations with a single syscall.  I'm wondering if the judicious use of such a
> syscall might mitigate the performance advantage of mmap for libpcap?

Quite frankly I dont like adding super-syscalls like this one. This is bloat,
and actually slows down the normal/legacy apps (larger kernel -> bigger latencies)

It would be better to sit down and think about better practices, to speedup
normal path (ie existing syscalls)

The performance problem comes from cache line ping pongs between cpus, and
lock contention.
Hitting the spinlock each time a frame is queueued, and each time a frame is
dequeued is the problem.
We could imagine a two level receive queue, so that the application
touch the "provider cache line" only to tranfert a batch of skb, when/if the first queue
is empty.

See previous thread from Christoph Lameter, that showed that something as simple
as a UDP echoer (of say 100 frames per second) is worse with current kernels.

I am not saying mmap() thing is the way to go, maybe ring buffer new interface could
give us faster af_packet, not requiring pre-allocation of huge zones
(__get_free_pages() of insane order pages. This is just not working !!!),
allowing splice() or things like that...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html