lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UfE+RN-bK_Hu05kJv62s-edJtkrmkBefHU6UCYQSDdkvw@mail.gmail.com>
Date:   Sat, 18 Feb 2017 10:18:10 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     Jesper Dangaard Brouer <brouer@...hat.com>,
        John Fastabend <john.fastabend@...il.com>,
        Netdev <netdev@...r.kernel.org>,
        Tom Herbert <tom@...bertland.com>,
        Alexei Starovoitov <ast@...nel.org>,
        John Fastabend <john.r.fastabend@...el.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Miller <davem@...emloft.net>
Subject: Re: Questions on XDP

On Sat, Feb 18, 2017 at 9:41 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Sat, 2017-02-18 at 17:34 +0100, Jesper Dangaard Brouer wrote:
>> On Thu, 16 Feb 2017 14:36:41 -0800
>> John Fastabend <john.fastabend@...il.com> wrote:
>>
>> > On 17-02-16 12:41 PM, Alexander Duyck wrote:
>> > > So I'm in the process of working on enabling XDP for the Intel NICs
>> > > and I had a few questions so I just thought I would put them out here
>> > > to try and get everything sorted before I paint myself into a corner.
>> > >
>> > > So my first question is why does the documentation mention 1 frame per
>> > > page for XDP?
>>
>> Yes, XDP defines upfront a memory model where there is only one packet
>> per page[1], please respect that!
>>
>> This is currently used/needed for fast-direct recycling of pages inside
>> the driver for XDP_DROP and XDP_TX, _without_ performing any atomic
>> refcnt operations on the page. E.g. see mlx4_en_rx_recycle().
>
>
> XDP_DROP does not require having one page per frame.

Agreed.

> (Look after my recent mlx4 patch series if you need to be convinced)
>
> Only XDP_TX is.
>
> This requirement makes XDP useless (very OOM likely) on arches with 64K
> pages.

Actually I have been having a side discussion with John about XDP_TX.
Looking at the Mellanox way of doing it I am not entirely sure it is
useful.  It looks good for benchmarks but that is about it.  Also I
don't see it extending out to the point that we would be able to
exchange packets between interfaces which really seems like it should
be the ultimate goal for XDP_TX.

It seems like eventually we want to be able to peel off the buffer and
send it to something other than ourselves.  For example it seems like
it might be useful at some point to use XDP to do traffic
classification and have it route packets between multiple interfaces
on a host and it wouldn't make sense to have all of them map every
page as bidirectional because it starts becoming ridiculous if you
have dozens of interfaces in a system.

As per our original discussion at netconf if we want to be able to do
XDP Tx with a fully lockless Tx ring we needed to have a Tx ring per
CPU that is performing XDP.  The Tx path will end up needing to do the
map/unmap itself in the case of physical devices but the expense of
that can be somewhat mitigated on x86 at least by either disabling the
IOMMU or using identity mapping.  I think this might be the route
worth exploring as we could then start looking at doing things like
implementing bridges and routers in XDP and see what performance gains
can be had there.

Also as far as the one page per frame it occurs to me that you will
have to eventually deal with things like frame replication.  Once that
comes into play everything becomes much more difficult because the
recycling doesn't work without some sort of reference counting, and
since the device interrupt can migrate you could end up with clean-up
occurring on a different CPUs so you need to have some sort of
synchronization mechanism.

Thanks.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ