[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb888d27-825c-37c9-128c-b67843777e32@redhat.com>
Date: Tue, 1 Nov 2022 15:23:26 +0100
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Stanislav Fomichev <sdf@...gle.com>, Yonghong Song <yhs@...a.com>
Cc: brouer@...hat.com,
Toke Høiland-Jørgensen <toke@...hat.com>,
"Bezdeka, Florian" <florian.bezdeka@...mens.com>,
"kuba@...nel.org" <kuba@...nel.org>,
"john.fastabend@...il.com" <john.fastabend@...il.com>,
"alexandr.lobakin@...el.com" <alexandr.lobakin@...el.com>,
"anatoly.burakov@...el.com" <anatoly.burakov@...el.com>,
"song@...nel.org" <song@...nel.org>,
"Deric, Nemanja" <nemanja.deric@...mens.com>,
"andrii@...nel.org" <andrii@...nel.org>,
"Kiszka, Jan" <jan.kiszka@...mens.com>,
"magnus.karlsson@...il.com" <magnus.karlsson@...il.com>,
"willemb@...gle.com" <willemb@...gle.com>,
"ast@...nel.org" <ast@...nel.org>, "yhs@...com" <yhs@...com>,
"martin.lau@...ux.dev" <martin.lau@...ux.dev>,
"kpsingh@...nel.org" <kpsingh@...nel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"mtahhan@...hat.com" <mtahhan@...hat.com>,
"xdp-hints@...-project.net" <xdp-hints@...-project.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"jolsa@...nel.org" <jolsa@...nel.org>,
"haoluo@...gle.com" <haoluo@...gle.com>
Subject: Re: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs
On 31/10/2022 23.55, Stanislav Fomichev wrote:
> On Mon, Oct 31, 2022 at 3:38 PM Yonghong Song<yhs@...a.com> wrote:
>>
>> On 10/31/22 3:09 PM, Stanislav Fomichev wrote:
>>> On Mon, Oct 31, 2022 at 12:36 PM Yonghong Song<yhs@...a.com> wrote:
>>>>
>>>> On 10/31/22 8:28 AM, Toke Høiland-Jørgensen wrote:
>>>>> "Bezdeka, Florian"<florian.bezdeka@...mens.com> writes:
>>>>>>
>>>>>> On Fri, 2022-10-28 at 18:14 -0700, Jakub Kicinski wrote:
>>>>>>> On Fri, 28 Oct 2022 16:16:17 -0700 John Fastabend wrote:
[...]
>>>>>> All parts of my application (BPF program included) should not be
>>>>>> optimized/adjusted for all the different HW variants out there.
>>>>> Yes, absolutely agreed. Abstracting away those kinds of hardware
>>>>> differences is the whole*point* of having an OS/driver model. I.e.,
>>>>> it's what the kernel is there for! If people want to bypass that and get
>>>>> direct access to the hardware, they can already do that by using DPDK.
>>>>>
>>>>> So in other words, 100% agreed that we should not expect the BPF
>>>>> developers to deal with hardware details as would be required with a
>>>>> kptr-based interface.
>>>>>
>>>>> As for the kfunc-based interface, I think it shows some promise.
>>>>> Exposing a list of function names to retrieve individual metadata items
>>>>> instead of a struct layout is sorta comparable in terms of developer UI
>>>>> accessibility etc (IMO).
>>>> >>>> Looks like there are quite some use cases for hw_timestamp.
>>>> Do you think we could add it to the uapi like struct xdp_md?
>>>>
>>>> The following is the current xdp_md:
>>>> struct xdp_md {
>>>> __u32 data;
>>>> __u32 data_end;
>>>> __u32 data_meta;
>>>> /* Below access go through struct xdp_rxq_info */
>>>> __u32 ingress_ifindex; /* rxq->dev->ifindex */
>>>> __u32 rx_queue_index; /* rxq->queue_index */
>>>>
>>>> __u32 egress_ifindex; /* txq->dev->ifindex */
>>>> };
>>>>
>>>> We could add __u64 hw_timestamp to the xdp_md so user
>>>> can just do xdp_md->hw_timestamp to get the value.
>>>> xdp_md->hw_timestamp == 0 means hw_timestamp is not
>>>> available.
>>>>
>>>> Inside the kernel, the ctx rewriter can generate code
>>>> to call driver specific function to retrieve the data.
>>> If the driver generates the code to retrieve the data, how's that
>>> different from the kfunc approach?
>>> The only difference I see is that it would be a more strong UAPI than
>>> the kfuncs?
>> Right. it is a strong uapi.
>>
>>>> The kfunc approach can be used to*less* common use cases?
>>> What's the advantage of having two approaches when one can cover
>>> common and uncommon cases?
>>
>> Beyond hw_timestamp, do we have any other fields ready to support?
>>
>> If it ends up with lots of fields to be accessed by the bpf program,
>> and bpf program actually intends to access these fields,
>> using a strong uapi might be a good thing as it can make code
>> much streamlined.
> > There are a bunch. Alexander's series has a good list:
>
> https://github.com/alobakin/linux/commit/31bfe8035c995fdf4f1e378b3429d24b96846cc8
>
Below are the fields I've identified, which are close to what Alexander
also found.
struct xdp_hints_common {
union {
__wsum csum;
struct {
__u16 csum_start;
__u16 csum_offset;
};
};
u16 rx_queue;
u16 vlan_tci;
u32 rx_hash32;
u32 xdp_hints_flags;
u64 btf_full_id; /* BTF object + type ID */
} __attribute__((aligned(4))) __attribute__((packed));
Some of the fields are encoded via flags:
enum xdp_hints_flags {
HINT_FLAG_CSUM_TYPE_BIT0 = BIT(0),
HINT_FLAG_CSUM_TYPE_BIT1 = BIT(1),
HINT_FLAG_CSUM_TYPE_MASK = 0x3,
HINT_FLAG_CSUM_LEVEL_BIT0 = BIT(2),
HINT_FLAG_CSUM_LEVEL_BIT1 = BIT(3),
HINT_FLAG_CSUM_LEVEL_MASK = 0xC,
HINT_FLAG_CSUM_LEVEL_SHIFT = 2,
HINT_FLAG_RX_HASH_TYPE_BIT0 = BIT(4),
HINT_FLAG_RX_HASH_TYPE_BIT1 = BIT(5),
HINT_FLAG_RX_HASH_TYPE_MASK = 0x30,
HINT_FLAG_RX_HASH_TYPE_SHIFT = 0x4,
HINT_FLAG_RX_QUEUE = BIT(7),
HINT_FLAG_VLAN_PRESENT = BIT(8),
HINT_FLAG_VLAN_PROTO_ETH_P_8021Q = BIT(9),
HINT_FLAG_VLAN_PROTO_ETH_P_8021AD = BIT(10),
/* Flags from BIT(16) can be used by drivers */
};
> We can definitely call some of them more "common" than the others, but
> not sure how strong of a definition that would be.
The important fields that would be worth considering as UAPI candidates
are: (1) RX-hash, (2) Hash-type and (3) RX-checksum.
With these three we can avoid calling the flow-dissector and GRO frame
aggregations works. (This currently hurts xdp_frame to SKB performance a
lot in practice).
*BUT* in it's current form above (incl. Alexanders approach/patch) it
would be a mistake to UAPI standardize the "(2) Hash-type" in this
simplified "reduced" form (which is what the SKB "needs").
There is a huge untapped potential in the Hash-type. Thanks to
Microsoft almost all NIC hardware provided a Hash-type that gives us the
L3-protocol (IPv4 or IPv6) and the L4-protocol (UDP or TCP and sometimes
SCTP), plus info if extention-headers are provided. (Digging in
datasheets, we can often also get the header-size).
Think about how many cycles XDP BPF-prog can save parsing protocol
headers. I'm also hoping we can leveregate this to allow SKBs created
from an xdp_frame to have skb->transport_header and skb->network_header
pre-populated (and skip some of these netstack layers).
--Jesper
p.s. in my patchset, I exposed the "raw" Hash-type bits from the
descriptor in hope this would evolve.
Powered by blists - more mailing lists