lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+_O+kwTV-qhXqA9jc-L3w6uwn9FShG_859qx30NPkzsw@mail.gmail.com>
Date: Fri, 7 Mar 2025 09:29:07 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Arthur Fabre <arthur@...hurfabre.com>
Cc: Network Development <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>, 
	Jakub Sitnicki <jakub@...udflare.com>, Jesper Dangaard Brouer <hawk@...nel.org>, Yan Zhai <yan@...udflare.com>, 
	jbrandeburg@...udflare.com, Toke Høiland-Jørgensen <thoiland@...hat.com>, 
	lbiancon@...hat.com, Arthur Fabre <afabre@...udflare.com>
Subject: Re: [PATCH RFC bpf-next 01/20] trait: limited KV store for packet metadata

On Fri, Mar 7, 2025 at 3:14 AM Arthur Fabre <arthur@...hurfabre.com> wrote:
>
> On Fri Mar 7, 2025 at 7:36 AM CET, Alexei Starovoitov wrote:
> > On Wed, Mar 5, 2025 at 6:33 AM <arthur@...hurfabre.com> wrote:
> > >
> > > +struct __trait_hdr {
> > > +       /* Values are stored ordered by key, immediately after the header.
> > > +        *
> > > +        * The size of each value set is stored in the header as two bits:
> > > +        *  - 00: Not set.
> > > +        *  - 01: 2 bytes.
> > > +        *  - 10: 4 bytes.
> > > +        *  - 11: 8 bytes.
> >
> > ...
> >
> > > +        *  - hweight(low) + hweight(high)<<1 is offset.
> >
> > the comment doesn't match the code
> >
> > > +        */
> > > +       u64 high;
> > > +       u64 low;
> >
> > ...
> >
> > > +static __always_inline int __trait_total_length(struct __trait_hdr h)
> > > +{
> > > +       return (hweight64(h.low) << 1) + (hweight64(h.high) << 2)
> > > +               // For size 8, we only get 4+2=6. Add another 2 in.
> > > +               + (hweight64(h.high & h.low) << 1);
> > > +}
> >
> > This is really cool idea, but 2 byte size doesn't feel that useful.
> > How about:
> > - 00: Not set.
> > - 01: 4 bytes.
> > - 10: 8 bytes.
> > - 11: 16 bytes.
> >
> > 4 byte may be useful for ipv4, 16 for ipv6, and 8 is just a good number.
> > And compute the same way with 3 popcount with extra +1 to shifts.
>
> I chose the sizes arbitrarily, happy to change them.
>
> 16 is also useful for UUIDs, for tracing.
>
> Size 0 could store bools / flags. Keys could be set without a value,
> and users could check if the key is set or not.
> That replaces single bits of the mark today, for example a
> "route locally" key.

I don't understand how that would work.
If I'm reading the code correctly 00 means that key is not present.
How one would differentiate key present/not with zero size in value?

>
> That only leaves one other size, maybe 4 for smaller values?
>
> If we want more sizes, we could also:
>
> - Add another u64 word to the header, so we have 3 bits per key. It
>   uses more room, and we need more popcnts, but most modern x86 CPUs can
>   do 3 popcnts in parallel so it could be ok.

Two u64s already need 3 pop counts, so it's a good compromise as-is.

> - Let users set consecutive keys to one big value. Instead of supporting
>   size 16, we let them set two 8 byte KVs in one trait_set() call and
>   provide a 16 byte value. Eg:
>
>         trait_set_batch(u64 key_from, u64_key_to, size, ...);
>
>   It's easy to implement, but it makes the API more complicated.

I don't think it complicates the api.
With max size 16 the user can put two consecutive keys of, say, 16 and 8
to make 24 bytes of info,
or use 4 keys of 16 byte each to form 64-bytes of info.
The bit manipulations are too tricky for compilers to optimize.
So even with full inlining the two trait_set() of consecutive keys
will still be largely separate blobs of code.
So trait_[gs]et_batch() makes sense to me.

Also let's not over focus on networking use cases.
This mini KV will be useful in all bpf maps including local storage.
For example the user process can add information about itself into
task local storage while sched-ext can use that to make scheduling decisions.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ