lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 3 Nov 2017 22:55:35 +0900
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     "Karlsson, Magnus" <magnus.karlsson@...el.com>
Cc:     Björn Töpel <bjorn.topel@...il.com>,
        "Duyck, Alexander H" <alexander.h.duyck@...el.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Alexei Starovoitov <ast@...com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        "michael.lundkvist@...csson.com" <michael.lundkvist@...csson.com>,
        "ravineet.singh@...csson.com" <ravineet.singh@...csson.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Network Development <netdev@...r.kernel.org>,
        "Topel, Bjorn" <bjorn.topel@...el.com>,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        "Singhai, Anjali" <anjali.singhai@...el.com>,
        "Rosen, Rami" <rami.rosen@...el.com>,
        "Shaw, Jeffrey B" <jeffrey.b.shaw@...el.com>,
        "Yigit, Ferruh" <ferruh.yigit@...el.com>,
        "Zhang, Qi Z" <qi.z.zhang@...el.com>
Subject: Re: [RFC PATCH 00/14] Introducing AF_PACKET V4 support

On Fri, Nov 3, 2017 at 7:13 PM, Karlsson, Magnus
<magnus.karlsson@...el.com> wrote:
>
>
>> -----Original Message-----
>> From: Willem de Bruijn [mailto:willemdebruijn.kernel@...il.com]
>> Sent: Friday, November 3, 2017 5:35 AM
>> To: Björn Töpel <bjorn.topel@...il.com>
>> Cc: Karlsson, Magnus <magnus.karlsson@...el.com>; Duyck, Alexander H
>> <alexander.h.duyck@...el.com>; Alexander Duyck
>> <alexander.duyck@...il.com>; John Fastabend
>> <john.fastabend@...il.com>; Alexei Starovoitov <ast@...com>; Jesper
>> Dangaard Brouer <brouer@...hat.com>; michael.lundkvist@...csson.com;
>> ravineet.singh@...csson.com; Daniel Borkmann <daniel@...earbox.net>;
>> Network Development <netdev@...r.kernel.org>; Topel, Bjorn
>> <bjorn.topel@...el.com>; Brandeburg, Jesse
>> <jesse.brandeburg@...el.com>; Singhai, Anjali <anjali.singhai@...el.com>;
>> Rosen, Rami <rami.rosen@...el.com>; Shaw, Jeffrey B
>> <jeffrey.b.shaw@...el.com>; Yigit, Ferruh <ferruh.yigit@...el.com>; Zhang,
>> Qi Z <qi.z.zhang@...el.com>
>> Subject: Re: [RFC PATCH 00/14] Introducing AF_PACKET V4 support
>>
>> On Tue, Oct 31, 2017 at 9:41 PM, Björn Töpel <bjorn.topel@...il.com>
>> wrote:
>> > From: Björn Töpel <bjorn.topel@...el.com>
>> >
>> > This RFC introduces AF_PACKET_V4 and PACKET_ZEROCOPY that are
>> > optimized for high performance packet processing and zero-copy
>> > semantics. Throughput improvements can be up to 40x compared to V2
>> and
>> > V3 for the micro benchmarks included. Would be great to get your
>> > feedback on it.
>> >
>> > The main difference between V4 and V2/V3 is that TX and RX descriptors
>> > are separated from packet buffers.
>>
>> Cool feature. I'm looking forward to the netdev talk. Aside from the inline
>> comments in the patches, a few architecture questions.
>
> Glad to hear. Are you going to Netdev in Seoul? If so, let us hook up
> and discuss your comments in further detail. Some initial thoughts
> below.

Sounds great. I'll be there.

>> Is TX support needed? Existing PACKET_TX_RING already sends out packets
>> without copying directly from the tx_ring. Indirection through a descriptor
>> ring is not helpful on TX if all packets still have to come from a pre-registered
>> packet pool. The patch set adds a lot of tx-only code and is complex enough
>> without it.
>
> That is correct, but what if the packet you are going to transmit came
> in from the receive path and is already in the packet buffer?

Oh, yes, of course. That is a common use case. I should have
thought of that.

> This
> might happen if the application is examining/sniffing packets then
> sending them out, or doing some modification to them. In that case we
> avoid a copy in V4 since the packet is already in the packet
> buffer. With V2 and V3, a copy from the RX ring to the TX ring would
> be needed. In the PACKET_ZEROCOPY case, avoiding this copy increases
> performance quite a lot.
>
>> Can you use the existing PACKET_V2 format for the packet pool? The
>> v4 format is nearly the same as V2. Using the same version might avoid some
>> code duplication and simplify upgrading existing legacy code.
>> Instead of continuing to add new versions whose behavior is implicit,
>> perhaps we can add explicit mode PACKET_INDIRECT to PACKET_V2.
>
> Interesting idea that I think is worth thinking more about. One
> problem though with the V2 ring format model, and the current V4
> format too by the way, when applied to a user-space allocated memory
> is that they are symmetric, i.e. that user space and kernel space have
> to produce and consume the same amount of entries (within the length
> of the descriptor area). User space sends down a buffer entry that the
> kernel fills in for RX for example. Symmetric queues do not work when
> you have a shared packet buffer between two processes. (This is not a
> requirement, but someone might do a mmap with MAP_SHARED for the
> packet buffer and then fork of a child that then inherits this packet
> buffer.) One of the processes might just receive packets, while the
> other one is transmitting. Or if you have a veth link pair between two
> processes and they have been set up to share packet buffer area. With
> a symmetric queue you have to copy even if they share the same packet
> buffer, but with an asymmetric queue, you do not and the driver only
> needs to copy the packet buffer id between the TX desc ring of the
> sender to the RX desc ring of the receiver, not the data. I think this
> gives an indication that we need a new structure. Anyway, I like your
> idea and I think it is worth thinking more about it. Let us have a
> discussion about this at Netdev, if you are there.

Okay. I don't quite understand the definition of symmetric here. At
least one problem that you describe, the veth pair, is solved by
introducing descriptor rings as level of indirection, regardless of the
format of the frames in the packet ring (now, really, random access
packet pool).

>> Finally, is it necessary to define a new descriptor ring format? Same for the
>> packet array and frame set. The kernel already has a few, such as virtio for
>> the first, skb_array/ptr_ring, even linux list for the second. These containers
>> add a lot of new boilerplate code. If new formats are absolutely necessary, at
>> least we should consider making them generic (like skb_array and ptr_ring).
>> But I'd like to understand first why, e.g., virtio cannot be used.
>
> Agree with you. Good if we can use something existing. The descriptor
> format of V4 was based on one of the first Virtio 1.1 proposal by
> Michael Tsirkin (tools/virtio/ringtest/ring.c). Then we have diverged
> somewhat due to performance reasons and Virtio 1.1 has done the same
> but in another direction. We should take a look at the latest Virtio
> 1.1 proposal again and see what it offers. The reason we did not go
> with Virtio 0.9 was for performance. Too many indirections, something
> that the people behind Virtio 1.1 had identified too. With ptr_ring,
> how do we deal with the pointers in the structure as this now has to
> go to user-space? In any way, we would like to have a ring structure
> that is asymmetric for the reasons above. Other than that, we would
> not mind using anything as long as it is fast. If it already exists,
> perfect.

Thanks for that context. I was not aware that this format branched off
of the early virtio 1.1 draft at all. Yes, I'm not sure where that stands
and which workloads it is targeting. One issue is dealing with hw and
minimizing communication over the PCI bus. That is not immediately
relevant to this virtual descriptor model.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ