netdev - Re: [PATCH 1/3] tuntap: rx batching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <39c36d36-9029-5d1f-496f-6ff404c3b77a@redhat.com>
Date:   Tue, 15 Nov 2016 11:14:48 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     John Fastabend <john.fastabend@...il.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] tuntap: rx batching



On 2016年11月12日 00:20, Michael S. Tsirkin wrote:
> On Fri, Nov 11, 2016 at 12:28:38PM +0800, Jason Wang wrote:
>>
>> On 2016年11月11日 12:17, John Fastabend wrote:
>>> On 16-11-10 07:31 PM, Michael S. Tsirkin wrote:
>>>>> On Fri, Nov 11, 2016 at 10:07:44AM +0800, Jason Wang wrote:
>>>>>>>
>>>>>>> On 2016年11月10日 00:38, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Nov 09, 2016 at 03:38:31PM +0800, Jason Wang wrote:
>>>>>>>>>>> Backlog were used for tuntap rx, but it can only process 1 packet at
>>>>>>>>>>> one time since it was scheduled during sendmsg() synchronously in
>>>>>>>>>>> process context. This lead bad cache utilization so this patch tries
>>>>>>>>>>> to do some batching before call rx NAPI. This is done through:
>>>>>>>>>>>
>>>>>>>>>>> - accept MSG_MORE as a hint from sendmsg() caller, if it was set,
>>>>>>>>>>>     batch the packet temporarily in a linked list and submit them all
>>>>>>>>>>>     once MSG_MORE were cleared.
>>>>>>>>>>> - implement a tuntap specific NAPI handler for processing this kind of
>>>>>>>>>>>     possible batching. (This could be done by extending backlog to
>>>>>>>>>>>     support skb like, but using a tun specific one looks cleaner and
>>>>>>>>>>>     easier for future extension).
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Jason Wang<jasowang@...hat.com>
>>>>>>>>> So why do we need an extra queue?
>>>>>>> The idea was borrowed from backlog to allow some kind of bulking and avoid
>>>>>>> spinlock on each dequeuing.
>>>>>>>
>>>>>>>>>    This is not what hardware devices do.
>>>>>>>>> How about adding the packet to queue unconditionally, deferring
>>>>>>>>> signalling until we get sendmsg without MSG_MORE?
>>>>>>> Then you need touch spinlock when dequeuing each packet.
>>> Random thought, I have a cmpxchg ring I am using for the qdisc work that
>>> could possibly replace the spinlock implementation. I haven't figured
>>> out the resizing API yet because I did not need it but I assume it could
>>> help here and let you dequeue multiple skbs in one operation.
>>>
>>> I can post the latest version if useful or an older version is
>>> somewhere on patchworks as well.
>>>
>>> .John
>>>
>>>
>> Look useful here, and I can compare the performance if you post.
>>
>> A question is can we extend the skb_array to support that?
>>
>> Thanks
> I'd like to start with simple patch adding napi with one queue, then add
> optimization patches on top.

The point is tun is using backlog who uses two queues (process_queue and 
input_pkt_queue).

How about something like:

1) NAPI support with skb_array
2) MSG_MORE support
3) other optimizations on top

?

>
> One issue that comes to mind is that write queue limits
> are byte based, they do not count packets unlike tun rx queue.

I'm not sure I get the issue, write queue is not exported and only used 
for batching. We probably need an internal limit in tun to avoid OOM 
attacker from guest.

Thanks