lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb888d27-825c-37c9-128c-b67843777e32@redhat.com>
Date:   Tue, 1 Nov 2022 15:23:26 +0100
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Stanislav Fomichev <sdf@...gle.com>, Yonghong Song <yhs@...a.com>
Cc:     brouer@...hat.com,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        "Bezdeka, Florian" <florian.bezdeka@...mens.com>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>,
        "alexandr.lobakin@...el.com" <alexandr.lobakin@...el.com>,
        "anatoly.burakov@...el.com" <anatoly.burakov@...el.com>,
        "song@...nel.org" <song@...nel.org>,
        "Deric, Nemanja" <nemanja.deric@...mens.com>,
        "andrii@...nel.org" <andrii@...nel.org>,
        "Kiszka, Jan" <jan.kiszka@...mens.com>,
        "magnus.karlsson@...il.com" <magnus.karlsson@...il.com>,
        "willemb@...gle.com" <willemb@...gle.com>,
        "ast@...nel.org" <ast@...nel.org>, "yhs@...com" <yhs@...com>,
        "martin.lau@...ux.dev" <martin.lau@...ux.dev>,
        "kpsingh@...nel.org" <kpsingh@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "mtahhan@...hat.com" <mtahhan@...hat.com>,
        "xdp-hints@...-project.net" <xdp-hints@...-project.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "jolsa@...nel.org" <jolsa@...nel.org>,
        "haoluo@...gle.com" <haoluo@...gle.com>
Subject: Re: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs



On 31/10/2022 23.55, Stanislav Fomichev wrote:
> On Mon, Oct 31, 2022 at 3:38 PM Yonghong Song<yhs@...a.com>  wrote:
>>
>> On 10/31/22 3:09 PM, Stanislav Fomichev wrote:
>>> On Mon, Oct 31, 2022 at 12:36 PM Yonghong Song<yhs@...a.com>  wrote:
>>>>
>>>> On 10/31/22 8:28 AM, Toke Høiland-Jørgensen wrote:
>>>>> "Bezdeka, Florian"<florian.bezdeka@...mens.com>  writes:
>>>>>>
>>>>>> On Fri, 2022-10-28 at 18:14 -0700, Jakub Kicinski wrote:
>>>>>>> On Fri, 28 Oct 2022 16:16:17 -0700 John Fastabend wrote:
[...]
>>>>>> All parts of my application (BPF program included) should not be
>>>>>> optimized/adjusted for all the different HW variants out there.
>>>>> Yes, absolutely agreed. Abstracting away those kinds of hardware
>>>>> differences is the whole*point*  of having an OS/driver model. I.e.,
>>>>> it's what the kernel is there for! If people want to bypass that and get
>>>>> direct access to the hardware, they can already do that by using DPDK.
>>>>>
>>>>> So in other words, 100% agreed that we should not expect the BPF
>>>>> developers to deal with hardware details as would be required with a
>>>>> kptr-based interface.
>>>>>
>>>>> As for the kfunc-based interface, I think it shows some promise.
>>>>> Exposing a list of function names to retrieve individual metadata items
>>>>> instead of a struct layout is sorta comparable in terms of developer UI
>>>>> accessibility etc (IMO).
>>>> >>>> Looks like there are quite some use cases for hw_timestamp.
>>>> Do you think we could add it to the uapi like struct xdp_md?
>>>>
>>>> The following is the current xdp_md:
>>>> struct xdp_md {
>>>>            __u32 data;
>>>>            __u32 data_end;
>>>>            __u32 data_meta;
>>>>            /* Below access go through struct xdp_rxq_info */
>>>>            __u32 ingress_ifindex; /* rxq->dev->ifindex */
>>>>            __u32 rx_queue_index;  /* rxq->queue_index  */
>>>>
>>>>            __u32 egress_ifindex;  /* txq->dev->ifindex */
>>>> };
>>>>
>>>> We could add  __u64 hw_timestamp to the xdp_md so user
>>>> can just do xdp_md->hw_timestamp to get the value.
>>>> xdp_md->hw_timestamp == 0 means hw_timestamp is not
>>>> available.
>>>>
>>>> Inside the kernel, the ctx rewriter can generate code
>>>> to call driver specific function to retrieve the data.
>>> If the driver generates the code to retrieve the data, how's that
>>> different from the kfunc approach?
>>> The only difference I see is that it would be a more strong UAPI than
>>> the kfuncs?
>> Right. it is a strong uapi.
>>
>>>> The kfunc approach can be used to*less*  common use cases?
>>> What's the advantage of having two approaches when one can cover
>>> common and uncommon cases?
>>
>> Beyond hw_timestamp, do we have any other fields ready to support?
>>
>> If it ends up with lots of fields to be accessed by the bpf program,
>> and bpf program actually intends to access these fields,
>> using a strong uapi might be a good thing as it can make code
>> much streamlined.
> > There are a bunch. Alexander's series has a good list:
> 
> https://github.com/alobakin/linux/commit/31bfe8035c995fdf4f1e378b3429d24b96846cc8
> 

Below are the fields I've identified, which are close to what Alexander 
also found.

  struct xdp_hints_common {
	union {
		__wsum		csum;
		struct {
			__u16	csum_start;
			__u16	csum_offset;
		};
	};
	u16 rx_queue;
	u16 vlan_tci;
	u32 rx_hash32;
	u32 xdp_hints_flags;
	u64 btf_full_id; /* BTF object + type ID */
  } __attribute__((aligned(4))) __attribute__((packed));

Some of the fields are encoded via flags:

  enum xdp_hints_flags {
	HINT_FLAG_CSUM_TYPE_BIT0  = BIT(0),
	HINT_FLAG_CSUM_TYPE_BIT1  = BIT(1),
	HINT_FLAG_CSUM_TYPE_MASK  = 0x3,

	HINT_FLAG_CSUM_LEVEL_BIT0 = BIT(2),
	HINT_FLAG_CSUM_LEVEL_BIT1 = BIT(3),
	HINT_FLAG_CSUM_LEVEL_MASK = 0xC,
	HINT_FLAG_CSUM_LEVEL_SHIFT = 2,

	HINT_FLAG_RX_HASH_TYPE_BIT0 = BIT(4),
	HINT_FLAG_RX_HASH_TYPE_BIT1 = BIT(5),
	HINT_FLAG_RX_HASH_TYPE_MASK = 0x30,
	HINT_FLAG_RX_HASH_TYPE_SHIFT = 0x4,

	HINT_FLAG_RX_QUEUE = BIT(7),

	HINT_FLAG_VLAN_PRESENT            = BIT(8),
	HINT_FLAG_VLAN_PROTO_ETH_P_8021Q  = BIT(9),
	HINT_FLAG_VLAN_PROTO_ETH_P_8021AD = BIT(10),
	/* Flags from BIT(16) can be used by drivers */
  };

> We can definitely call some of them more "common" than the others, but
> not sure how strong of a definition that would be.

The important fields that would be worth considering as UAPI candidates
are: (1) RX-hash, (2) Hash-type and (3) RX-checksum.
With these three we can avoid calling the flow-dissector and GRO frame
aggregations works. (This currently hurts xdp_frame to SKB performance a
lot in practice).

*BUT* in it's current form above (incl. Alexanders approach/patch) it
would be a mistake to UAPI standardize the "(2) Hash-type" in this
simplified "reduced" form (which is what the SKB "needs").

There is a huge untapped potential in the Hash-type.  Thanks to
Microsoft almost all NIC hardware provided a Hash-type that gives us the
L3-protocol (IPv4 or IPv6) and the L4-protocol (UDP or TCP and sometimes
SCTP), plus info if extention-headers are provided. (Digging in
datasheets, we can often also get the header-size).

Think about how many cycles XDP BPF-prog can save parsing protocol
headers.  I'm also hoping we can leveregate this to allow SKBs created
from an xdp_frame to have skb->transport_header and skb->network_header
pre-populated (and skip some of these netstack layers).

--Jesper

p.s. in my patchset, I exposed the "raw" Hash-type bits from the 
descriptor in hope this would evolve.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ