lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 3 Jun 2019 10:27:02 +0200
From:   Magnus Karlsson <magnus.karlsson@...il.com>
To:     Saeed Mahameed <saeedm@...lanox.com>
Cc:     "daniel@...earbox.net" <daniel@...earbox.net>,
        "magnus.karlsson@...el.com" <magnus.karlsson@...el.com>,
        "ast@...nel.org" <ast@...nel.org>,
        "bjorn.topel@...el.com" <bjorn.topel@...el.com>,
        Maxim Mikityanskiy <maximmi@...lanox.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "yhs@...com" <yhs@...com>,
        "songliubraving@...com" <songliubraving@...com>,
        Tariq Toukan <tariqt@...lanox.com>,
        "kafai@...com" <kafai@...com>,
        "jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "maciejromanfijalkowski@...il.com" <maciejromanfijalkowski@...il.com>,
        "bsd@...com" <bsd@...com>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next v3 00/16] AF_XDP infrastructure improvements and
 mlx5e support

On Fri, May 31, 2019 at 11:59 PM Saeed Mahameed <saeedm@...lanox.com> wrote:
>
> On Fri, 2019-05-24 at 12:18 +0200, Björn Töpel wrote:
> > On 2019-05-24 11:35, Maxim Mikityanskiy wrote:
> > > This series contains improvements to the AF_XDP kernel
> > > infrastructure
> > > and AF_XDP support in mlx5e. The infrastructure improvements are
> > > required for mlx5e, but also some of them benefit to all drivers,
> > > and
> > > some can be useful for other drivers that want to implement AF_XDP.
> > >
> > >
> [...]
> >
> > Maxim, this doesn't address the uapi concern we had on your v2.
> > Please refer to Magnus' comment here [1].
> >
> > Please educate me why you cannot publish AF_XDP without the uapi
> > change?
> > It's an extension, right? If so, then existing XDP/AF_XDP program can
> > use Mellanox ZC without your addition? It's great that Mellanox has a
> > ZC
> > capable driver, but the uapi change is a NAK.
> >
> > To reiterate; We'd like to get the queue setup/steering for AF_XDP
> > correct. I, and Magnus, dislike this approach. It requires a more
> > complicated XDP program, and is hard for regular users to understand.
> >
> >
>
> Hi Bjorn and Magnus,
>
> It is not clear to me why you don't like this approach, if anything,
> this approach is addressing many concerns you raised about current
> limited approach of re-using/"stealing" only regular RX rings for xsk
> traffic !
>
> for instance
> 1) xsk ring now has a unique id. (wasn't this the plan from the
> beginning ?)
> 2) No RSS issues, only explicit steering rules got the the newly
> created isolated xsk ring, default RSS is not affected regular RX rings
> are still intact.
> 3) the new scheme is flexible and will allow as much xsk sockets as
> needed, and can co-exist with regular rings.
> 4) We want to have a solution that will replace DPDK, having such
> limitations of a limited number of RX rings and stealing from regular
> rings, is really not a worthy design, just because some drivers do not
> want to deal or don't know how to deal with creating dedicated
> resources.
> 5) i think it is wrong to compare xsk rings with regular rings, xsk
> rings are actually just a a device context that redirects traffic to a
> special buffer space, there is no real memory buffers model behind it,
> other than the rx/tx descriptors. (memory model is handled outside the
> driver).
> 6) mlx5 is designed and optimized for such use cases (dedicated/unique
> rx/tx rings for XDP), limiting us to current AF_XDP limitation without
> allowing us to improve the AF_XDP design is really not fair.

Hi Saeed,

Agree on all your statements. We need and should proceed in this
direction (as I have stated before) so I am all aligned with what you
want and hopefully also what other users or potential users want. The
only issue I personally have is that we identify the "create a new
channel/pipe/flow of packets from the NIC completely separated from
the regular SKB flows" with the queue id. The user really does not
care about queue ids. He/She only wants an endpoint that can be used
to receive and send packets and then to be able to program this
endpoint to receive the packets that the application is interested in.
Queue ids and conglomerates of queue ids expose too much of HW
concepts that we really should not care about in user space, IMHO.
What I would like is to be able to do all these things without
exposing queue ids. Maybe just with the socket instead.

Now I am going to argue for the other side :-). The above approach
would require some new plumbing for sure, since ethtool uses queue ids
today to set the redirection of packets. So maybe it is just a pipe
dream to get to this point (it will take too long, for example) and we
should just reuse the queue id space for a new set of queue ids that
have all the good properties that you describe. What I would
preferably not like to end up with is some new interface that we do
not need in one to two years time.

Is there some way we can extend the uapi so that it can be used for
both the intermediary step (tagging along the queue ids) and the end
goal (not having to use them at all)? Could we extend the bind() call
so that it creates a socket that is bound to some queue that we do not
specify (it can be created together with some other queues and bundled
with them, or some completely new queue and irq line, or whatever, we
do not care). It is bound to an "anonymous queue" you could say. Then
as an intermediary step, we can add a getsockopt that gets the real
queue id identifiers from the socket that we can use in ethtool, for
example. Other suggestions?

As I have said before, I like 99% of the work that Maxim has done. But
new uapis should not be take lightly as they have repercussions that
lasts forever. That is why your patch set is taking time as you
bundled them all together. Or maybe I am just chicken ;-).

/Magnus

> the way i see it, this new extension is actually a generalization to
> allow for more drivers support and AF_XDP flexibility.
>
> if you have different ideas on how to implement the new design, please
> provide your feedback and we will be more than happy to improve the
> current implementation, but requesting to drop it, i think is not a
> fair request.
>
> Side note: Our task is to provide a scalable and flexible in-kernel XDP
> solution so we can offer a valid replacement for DPDK and userspace
> only solutions, I think we need to have a scheme where we allow an
> unlimited number of xsk sockets/rings with full flow
> separation/isolation between different user sockets/apps, the driver/hw
> resources are really very cheap (as there is no buffer management) much
> cheaper than allocating a full blown regular socket buffers.
>
> Thanks,
> Saeed.
>
> > Thanks,
> > Björn
> >
> > [1]
> > https://lore.kernel.org/bpf/CAJ8uoz2UHk+5xPwz-STM9gkQZdm7r_=jsgaB0nF+mHgch=axPg@mail.gmail.com/
> >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ