[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ+HfNj4CogEhakBvHpfQjdaK53juNAoJCaUgC939nN=GRntdA@mail.gmail.com>
Date: Fri, 3 Nov 2017 10:54:00 +0100
From: Björn Töpel <bjorn.topel@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
Alexander Duyck <alexander.h.duyck@...el.com>,
Alexander Duyck <alexander.duyck@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Alexei Starovoitov <ast@...com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
michael.lundkvist@...csson.com, ravineet.singh@...csson.com,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
Björn Töpel <bjorn.topel@...el.com>,
jesse.brandeburg@...el.com, anjali.singhai@...el.com,
rami.rosen@...el.com, jeffrey.b.shaw@...el.com,
ferruh.yigit@...el.com, qi.z.zhang@...el.com
Subject: Re: [RFC PATCH 01/14] packet: introduce AF_PACKET V4 userspace API
2017-11-03 3:29 GMT+01:00 Willem de Bruijn <willemdebruijn.kernel@...il.com>:
>>>> +/*
>>>> + * struct tpacket_memreg_req is used in conjunction with PACKET_MEMREG
>>>> + * to register user memory which should be used to store the packet
>>>> + * data.
>>>> + *
>>>> + * There are some constraints for the memory being registered:
>>>> + * - The memory area has to be memory page size aligned.
>>>> + * - The frame size has to be a power of 2.
>>>> + * - The frame size cannot be smaller than 2048B.
>>>> + * - The frame size cannot be larger than the memory page size.
>>>> + *
>>>> + * Corollary: The number of frames that can be stored is
>>>> + * len / frame_size.
>>>> + *
>>>> + */
>>>> +struct tpacket_memreg_req {
>>>> + unsigned long addr; /* Start of packet data area */
>>>> + unsigned long len; /* Length of packet data area */
>>>> + unsigned int frame_size; /* Frame size */
>>>> + unsigned int data_headroom; /* Frame head room */
>>>> +};
>>>
>>> Existing packet sockets take a tpacket_req, allocate memory and let the
>>> user process mmap this. I understand that TPACKET_V4 distinguishes
>>> the descriptor from packet pools, but could both use the existing structs
>>> and logic (packet_mmap)? That would avoid introducing a lot of new code
>>> just for granting user pages to the kernel.
>>>
>>
>> We could certainly pass the "tpacket_memreg_req" fields as part of
>> descriptor ring setup ("tpacket_req4"), but we went with having the
>> memory register as a new separate setsockopt. Having it separated,
>> makes it easier to compare regions at the kernel side of things. "Is
>> this the same umem as another one?" If we go the path of passing the
>> range at descriptor ring setup, we need to handle all kind of
>> overlapping ranges to determine when a copy is needed or not, in those
>> cases where the packet buffer (i.e. umem) is shared between processes.
>
> That's not what I meant. Both descriptor rings and packet pools are
> memory regions. Packet sockets already have logic to allocate regions
> and make them available to userspace with mmap(). Packet v4 reuses
> that logic for its descriptor rings. Can it use the same for its packet
> pool? Why does the kernel map user memory, instead? That is a lot of
> non-trivial new logic.
Ah, got it. So, why do we register packet pool memory, instead of
allocating in the kernel and mapping *that* memory.
Actually, we started out with that approach, where the packet_mmap
call mapped Tx/Rx descriptor rings and the packet buffer region. We
later moved to this (register umem) approach, because it's more
flexible for user space, not having to use a AF_PACKET specific
allocator (i.e. continue to use regular mallocs, huge pages and such).
I agree that the memory register code is adding a lot of new logic,
but I believe it's worth the flexibility for user space. I'm looking
into if I can share the memory register logic from Infiniband/verbs
subsystem (drivers/infiniband/core/umem.c).
Björn
Powered by blists - more mailing lists