netdev - Re: [PATCH bpf-next v3 00/15] Introducing AF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ccb3ed90-440a-bc29-d6bb-9dc09824ed82@iogearbox.net>
Date:   Fri, 18 May 2018 18:17:57 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Björn Töpel <bjorn.topel@...il.com>
Cc:     Alexei Starovoitov <ast@...com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        "Duyck, Alexander H" <alexander.h.duyck@...el.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Netdev <netdev@...r.kernel.org>,
        Björn Töpel <bjorn.topel@...el.com>,
        michael.lundkvist@...csson.com,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        "Singhai, Anjali" <anjali.singhai@...el.com>,
        "Zhang, Qi Z" <qi.z.zhang@...el.com>
Subject: Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support

On 05/18/2018 05:18 PM, Björn Töpel wrote:
> 2018-05-18 15:43 GMT+02:00 Daniel Borkmann <daniel@...earbox.net>:
>> On 05/18/2018 05:38 AM, Alexei Starovoitov wrote:
>>> On 5/16/18 11:46 PM, Björn Töpel wrote:
>>>> 2018-05-04 1:38 GMT+02:00 Alexei Starovoitov <alexei.starovoitov@...il.com>:
>>>>> On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:
>>>>>> On 05/02/2018 01:01 PM, Björn Töpel wrote:
>>>>>>> From: Björn Töpel <bjorn.topel@...el.com>
>>>>>>>
>>>>>>> This patch set introduces a new address family called AF_XDP that is
>>>>>>> optimized for high performance packet processing and, in upcoming
>>>>>>> patch sets, zero-copy semantics. In this patch set, we have removed
>>>>>>> all zero-copy related code in order to make it smaller, simpler and
>>>>>>> hopefully more review friendly. This patch set only supports copy-mode
>>>>>>> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
>>>>>>> for RX using the XDP_DRV path. Zero-copy support requires XDP and
>>>>>>> driver changes that Jesper Dangaard Brouer is working on. Some of his
>>>>>>> work has already been accepted. We will publish our zero-copy support
>>>>>>> for RX and TX on top of his patch sets at a later point in time.
>>>>>>
>>>>>> +1, would be great to see it land this cycle. Saw few minor nits here
>>>>>> and there but nothing to hold it up, for the series:
>>>>>>
>>>>>> Acked-by: Daniel Borkmann <daniel@...earbox.net>
>>>>>>
>>>>>> Thanks everyone!
>>>>>
>>>>> Great stuff!
>>>>>
>>>>> Applied to bpf-next, with one condition.
>>>>> Upcoming zero-copy patches for both RX and TX need to be posted
>>>>> and reviewed within this release window.
>>>>> If netdev community as a whole won't be able to agree on the zero-copy
>>>>> bits we'd need to revert this feature before the next merge window.
>>>>>
>>>>> Few other minor nits:
>>>>> patch 3:
>>>>> +struct xdp_ring {
>>>>> +       __u32 producer __attribute__((aligned(64)));
>>>>> +       __u32 consumer __attribute__((aligned(64)));
>>>>> +};
>>>>> It kinda begs for ____cacheline_aligned_in_smp to be introduced for uapi headers.
>>>>
>>>> Hmm, I need some guidance on what a sane uapi variant would be. We
>>>> can't have the uapi depend on the kernel build. ARM64, e.g., can have
>>>> both 64B and 128B according to the specs. Contemporary IA processors
>>>> have 64B.
>>>>
>>>> The simplest, and maybe most future-proof, would be 128B aligned for
>>>> all. Another is having 128B for ARM and 64B for all IA. A third option
>>>> is having a hand-shaking API (I think virtio has that) for determine
>>>> the cache line size, but I'd rather not go down that route.
>>>>
>>>> Thoughts/ideas on how a uapi ____cacheline_aligned_in_smp version
>>>> would look like?
>>>
>>> I suspect i40e+arm combination wasn't tested anyway.
>>> The api may have endianness issues too on something like sparc.
>>> I think the way to be backwards compatible in this area
>>> is to make the api usable on x86 only by adding
>>> to include/uapi/linux/if_xdp.h
>>> #if defined(__x86_64__)
>>> #define AF_XDP_CACHE_BYTES 64
>>> #else
>>> #error "AF_XDP support is not yet available for this architecture"
>>> #endif
>>> and doing:
>>>     __u32 producer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>>     __u32 consumer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>>
>>> And progressively add to this for arm64 and few other archs.
>>> Eventually removing #error and adding some generic define
>>> that's good enough for long tail of architectures that
>>> we really cannot test.
>>
>> Been looking into this yesterday as well a bit, and it's a bit of a mess what
>> uapi headers do on this regard (though there are just a handful of such headers).
>> Some of the kernel uapi headers hard-code generally 64 bytes regardless of the
>> underlying arch. In general, the kernel does expose it to user space via sysfs
>> (coherency_line_size). Here's what perf does to retrieve it:
>>
>> #ifdef _SC_LEVEL1_DCACHE_LINESIZE
>> #define cache_line_size(cacheline_sizep) *cacheline_sizep = sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
>> #else
>> static void cache_line_size(int *cacheline_sizep)
>> {
>>         if (sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", cacheline_sizep))
>>                 pr_debug("cannot determine cache line size");
>> }
>> #endif
>>
>> The sysconf() implementation for _SC_LEVEL1_DCACHE_LINESIZE seems also only
>> available for x86, arm64, s390 and ppc on a cursory glance in the glibc code.
>> In the x86 case it retrieves the info from cpuid insn. In order to generically
>> use it in combination with the header you'd have some probe which would then
>> set this as a define before including the header.
> 
> But as a uapi we cannot depend on the L1 cache line size for what's
> currently running on the system, right? So, either a "one cache line
> size for all flavors of an arch", e.g. for ARMv8 that would be 128,
> even though there can be 64 flavors out there.
> 
> Another way would be to remove the ring structure completely, and
> leave that to the user-space application to figure out. So there's a
> runtime interface (getsockopt) to probe the offsets the head and tail
> pointer is after/before the mmap call, and only expose the descriptor
> format explicitly in if_xdp.h. Don't know if that is too unorthodox or
> not...

Good point, I think that may not be too unreasonable thing to do, imho,
at least this doesn't get us into the issue discussed here in the first
place, and it would work for other archs more seamlessly rather than ugly
build error or single per arch 'catch-all' define.

Thanks,
Daniel