[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <34090015-d061-8f7a-18e3-c8c67ade800f@oracle.com>
Date: Thu, 2 Nov 2017 09:40:58 -0700
From: Tushar Dave <tushar.n.dave@...cle.com>
To: Björn Töpel <bjorn.topel@...il.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
Alexander Duyck <alexander.h.duyck@...el.com>,
Alexander Duyck <alexander.duyck@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Alexei Starovoitov <ast@...com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
michael.lundkvist@...csson.com, ravineet.singh@...csson.com,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
Björn Töpel <bjorn.topel@...el.com>,
jesse.brandeburg@...el.com, anjali.singhai@...el.com,
rami.rosen@...el.com, jeffrey.b.shaw@...el.com,
ferruh.yigit@...el.com, qi.z.zhang@...el.com
Subject: Re: [RFC PATCH 01/14] packet: introduce AF_PACKET V4 userspace API
On 11/02/2017 03:06 AM, Björn Töpel wrote:
> On 2017-11-02 02:45, Willem de Bruijn wrote:
>> On Tue, Oct 31, 2017 at 9:41 PM, Björn Töpel <bjorn.topel@...il.com> wrote:
>>> From: Björn Töpel <bjorn.topel@...el.com>
>>>
>>> This patch adds the necessary AF_PACKET V4 structures for usage from
>>> userspace. AF_PACKET V4 is a new interface optimized for high
>>> performance packet processing.
>>>
>>> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
>>> ---
>>> include/uapi/linux/if_packet.h | 65 +++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 64 insertions(+), 1 deletion(-)
>>>
>>> +struct tpacket4_queue {
>>> + struct tpacket4_desc *ring;
>>> +
>>> + unsigned int avail_idx;
>>> + unsigned int last_used_idx;
>>> + unsigned int num_free;
>>> + unsigned int ring_mask;
>>> +};
>>>
>>> struct packet_mreq {
>>> @@ -294,6 +335,28 @@ struct packet_mreq {
>>> unsigned char mr_address[8];
>>> };
>>>
>>> +/*
>>> + * struct tpacket_memreg_req is used in conjunction with PACKET_MEMREG
>>> + * to register user memory which should be used to store the packet
>>> + * data.
>>> + *
>>> + * There are some constraints for the memory being registered:
>>> + * - The memory area has to be memory page size aligned.
>>> + * - The frame size has to be a power of 2.
>>> + * - The frame size cannot be smaller than 2048B.
>>> + * - The frame size cannot be larger than the memory page size.
>>> + *
>>> + * Corollary: The number of frames that can be stored is
>>> + * len / frame_size.
>>> + *
>>> + */
>>> +struct tpacket_memreg_req {
>>> + unsigned long addr; /* Start of packet data area */
>>> + unsigned long len; /* Length of packet data area */
>>> + unsigned int frame_size; /* Frame size */
>>> + unsigned int data_headroom; /* Frame head room */
>>> +};
>>
>> Existing packet sockets take a tpacket_req, allocate memory and let the
>> user process mmap this. I understand that TPACKET_V4 distinguishes
>> the descriptor from packet pools, but could both use the existing structs
>> and logic (packet_mmap)? That would avoid introducing a lot of new code
>> just for granting user pages to the kernel.
>>
>
> We could certainly pass the "tpacket_memreg_req" fields as part of
> descriptor ring setup ("tpacket_req4"), but we went with having the
> memory register as a new separate setsockopt. Having it separated,
> makes it easier to compare regions at the kernel side of things. "Is
> this the same umem as another one?" If we go the path of passing the
> range at descriptor ring setup, we need to handle all kind of
> overlapping ranges to determine when a copy is needed or not, in those
> cases where the packet buffer (i.e. umem) is shared between processes.
Is there a reason to use separate packet socket for umem? Looks like
userspace has to create separate packet socket for PACKET_MEMREG.
-Tushar>
>> Also, use of unsigned long can cause problems on 32/64 bit compat
>> environments. Prefer fixed width types in uapi. Same for pointer in
>> tpacket4_queue.
>
> I agree; We'll change to a fixed width type in next version. Do you
> (and others on the list) prefer __u32/__u64 or unsigned int / unsigned
> long long?
>
>
> Thanks,
> Björn
>
Powered by blists - more mailing lists