lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <877ciz0w2m.fsf@toke.dk>
Date: Tue, 20 Feb 2024 20:33:53 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Paolo Abeni <pabeni@...hat.com>, Daniel Borkmann <daniel@...earbox.net>,
 Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman
 <eddyz87@...il.com>, Song Liu <song@...nel.org>, Yonghong Song
 <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, KP
 Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo
 <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, "David S. Miller"
 <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Jesper Dangaard
 Brouer <hawk@...nel.org>
Cc: Alexander Lobakin <aleksander.lobakin@...el.com>, Eric Dumazet
 <edumazet@...gle.com>, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next 2/3] bpf: test_run: Use system page pool for
 XDP live frame mode

Paolo Abeni <pabeni@...hat.com> writes:

> On Tue, 2024-02-20 at 14:14 +0100, Toke Høiland-Jørgensen wrote:
>> Paolo Abeni <pabeni@...hat.com> writes:
>> 
>> > On Tue, 2024-02-20 at 10:06 +0100, Daniel Borkmann wrote:
>> > > On 2/15/24 2:26 PM, Toke Høiland-Jørgensen wrote:
>> > > > The BPF_TEST_RUN code in XDP live frame mode creates a new page pool
>> > > > each time it is called and uses that to allocate the frames used for the
>> > > > XDP run. This works well if the syscall is used with a high repetitions
>> > > > number, as it allows for efficient page recycling. However, if used with
>> > > > a small number of repetitions, the overhead of creating and tearing down
>> > > > the page pool is significant, and can even lead to system stalls if the
>> > > > syscall is called in a tight loop.
>> > > > 
>> > > > Now that we have a persistent system page pool instance, it becomes
>> > > > pretty straight forward to change the test_run code to use it. The only
>> > > > wrinkle is that we can no longer rely on a custom page init callback
>> > > > from page_pool itself; instead, we change the test_run code to write a
>> > > > random cookie value to the beginning of the page as an indicator that
>> > > > the page has been initialised and can be re-used without copying the
>> > > > initial data again.
>> > > > 
>> > > > Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
>> > > 
>> > > [...]
>> > > > -
>> > > >   	/* We create a 'fake' RXQ referencing the original dev, but with an
>> > > >   	 * xdp_mem_info pointing to our page_pool
>> > > >   	 */
>> > > >   	xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0);
>> > > > -	xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL;
>> > > > -	xdp->rxq.mem.id = pp->xdp_mem_id;
>> > > > +	xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL; /* mem id is set per-frame below */
>> > > >   	xdp->dev = orig_ctx->rxq->dev;
>> > > >   	xdp->orig_ctx = orig_ctx;
>> > > >   
>> > > > +	/* We need a random cookie for each run as pages can stick around
>> > > > +	 * between runs in the system page pool
>> > > > +	 */
>> > > > +	get_random_bytes(&xdp->cookie, sizeof(xdp->cookie));
>> > > > +
>> > > 
>> > > So the assumption is that there is only a tiny chance of collisions with
>> > > users outside of xdp test_run. If they do collide however, you'd leak data.
>> > 
>> > Good point. @Toke: what is the worst-case thing that could happen in
>> > case a page is recycled from another pool's user?
>> > 
>> > could we possibly end-up matching the cookie for a page containing
>> > 'random' orig_ctx/ctx, so that bpf program later tries to access
>> > equally random ptrs?
>> 
>> Well, yes, if there's a collision in the cookie value we'll end up
>> basically dereferencing garbage pointer values, with all the badness
>> that ensues (most likely just a crash, but system compromise is probably
>> also possible in such a case).
>> 
>> A 64-bit value is probably too small to be resistant against random
>> collisions in a "protect global data across the internet" type scenario
>> (for instance, a 64-bit cryptographic key is considered weak). However,
>> in this case the collision domain is only for the lifetime of the
>> running system, and each cookie value only stays valid for the duration
>> of a single syscall (seconds, at most), so I figured it was acceptable.
>> 
>> We could exclude all-zeros as a valid cookie value (and also anything
>> that looks as a valid pointer), but that only removes a few of the
>> possible random collision values, so if we're really worried about
>> random collisions of 64-bit numbers, I think a better approach would be
>> to just make the cookie a 128-bit value instead. I can respin with that
>> if you prefer? :)
>
> I must admit that merging a code that will allow trashing the kernel -
> even with a very low probability - is quite scaring to me.
>
> How much relevant is the recycle case optimization? Could removing
> completely that optimization be considered?

Did a quick test of this, and skipping the recycling eats ~12.5%
performance, so I don't think getting rid of it is a good solution.

However, increasing the cookie size to 128 bits makes no performance
difference (everything stays in the same cache lines). If we do that,
the collision probability enters "won't happen before the heat death of
the universe" territory, so I don't think there's any real concern that
this will happen.

I'll respin with the bigger cookie size (and add that patch Olek
suggested to remove the init callback from the page pool code).

-Toke


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ