netdev - Re: Focusing the XDP project

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Feb 2017 09:22:53 -0800
From:   Tom Herbert <tom@...bertland.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Saeed Mahameed <saeedm@....mellanox.co.il>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        David Miller <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Brenden Blanco <bblanco@...il.com>
Subject: Re: Focusing the XDP project

On Wed, Feb 22, 2017 at 1:43 AM, Jesper Dangaard Brouer
<brouer@...hat.com> wrote:
>
> On Tue, 21 Feb 2017 14:54:35 -0800 Tom Herbert <tom@...bertland.com> wrote:
>> On Tue, Feb 21, 2017 at 2:29 PM, Saeed Mahameed <saeedm@....mellanox.co.il> wrote:
> [...]
>> > The only complexity XDP is adding to the drivers is the constrains on
>> > RX memory management and memory model, calling the XDP program itself
>> > and handling the  action is really a simple thing once you have the
>> > correct memory model.
>
> Exactly, that is why I've been looking at introducing a generic
> facility for a memory model for drivers.  This should help simply
> drivers.  Due to performance needs this need to be a very thin API layer
> on top of the page allocator. (That's why I'm working with Mel Gorman
> to get more close integration with the page allocator e.g. a bulking
> facility).
>
>> > Who knows! maybe someday XDP will define one unified RX API for all
>> > drivers and it even will handle normal stack delivery it self :).
>> >
>> That's exactly the point and what we need for TXDP. I'm missing why
>> doing this is such rocket science other than the fact that all these
>> drivers are vastly different and changing the existing API is
>> unpleasant. The only functional complexity I see in creating a generic
>> batching interface is handling return codes asynchronously. This is
>> entirely feasible though...
>
> I'll be happy as long as we get a batching interface, then we can
> incrementally do the optimizations later.
>
> In the future, I do hope (like Saeed) this RX API will evolve into
> delivering (a bulk of) raw-packet-pages into the netstack, this should
> simplify drivers, and we can keep the complexity and SKB allocations
> out of the drivers.
> To start with, we can play with doing this delivering (a bulk of)
> raw-packet-pages into Tom's TXDP engine/system?
>
Hi Jesper,

Maybe we can to start to narrow in on what a batching API might look like.

Looking at mlx5 (as a model of how XDP is implemented) the main RX
loop in ml5e_poll_rx_cq calls the backend handler in one indirect
function call. The XDP path goes through mlx5e_handle_rx_cqe,
skb_from_cqe, and mlx5e_xdp_handle. The first two deal a lot with
building the skbuf. As a prerequisite to RX batching it would be
helpful if this could be flatten so that most of the logic is obvious
in the main RX loop.

The model of RX batching seems straightforward enough-- pull packets
from the ring, save xdp_data information in a vector, periodically
call into the stack to handle a batch where argument is the vector of
packets and another argument is an output vector that gives return
codes (XDP actions), process the each return code for each packet in
the driver accordingly. Presumably, there is a maximum allowed batch
that may or may not be the same as the NAPI budget so the so the
batching call needs to be done when the limit is reach and also before
exiting NAPI. For each packet the stack can return an XDP code,
XDP_PASS in this case could be interpreted as being consumed by the
stack; this would be used in the case the stack creates an skbuff for
the packet. The stack on it's part can process the batch how it sees
fit, it can process each packet individual in the canonical model, or
we can continue processing a batch in a VPP-like fashion.

The batching API could be transparent to the stack or not. In the
transparent case, the driver calls what looks like a receive function
but the stack may defer processing for batching. A callback function
(that can be inlined) is used to process return codes as I mentioned
previously. In the non-transparent model, the driver knowingly creates
the packet vector and then explicitly calls another function to
process the vector. Personally, I lean towards the transparent API,
this may be less complexity in drivers and gives the stack more
control over the parameters of batching (for instance it may choose
some batch size to optimize its processing instead of driver guessing
the best size).

Btw the logic for RX batching is very similar to how we batch packets
for RPS (I think you already mention an skb-less RPS and that should
hopefully be something would falls out from this design).

Tom