[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83e7faeb4a241a00053fc71dbb18d1dbc7c0fac6.camel@redhat.com>
Date: Tue, 20 Feb 2024 15:57:04 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>, Daniel
Borkmann <daniel@...earbox.net>, Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
<martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu
<song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>, John Fastabend
<john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, Stanislav
Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>, Jiri Olsa
<jolsa@...nel.org>, "David S. Miller" <davem@...emloft.net>, Jakub Kicinski
<kuba@...nel.org>, Jesper Dangaard Brouer <hawk@...nel.org>
Cc: Alexander Lobakin <aleksander.lobakin@...el.com>, Eric Dumazet
<edumazet@...gle.com>, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next 2/3] bpf: test_run: Use system page pool for
XDP live frame mode
On Tue, 2024-02-20 at 14:14 +0100, Toke Høiland-Jørgensen wrote:
> Paolo Abeni <pabeni@...hat.com> writes:
>
> > On Tue, 2024-02-20 at 10:06 +0100, Daniel Borkmann wrote:
> > > On 2/15/24 2:26 PM, Toke Høiland-Jørgensen wrote:
> > > > The BPF_TEST_RUN code in XDP live frame mode creates a new page pool
> > > > each time it is called and uses that to allocate the frames used for the
> > > > XDP run. This works well if the syscall is used with a high repetitions
> > > > number, as it allows for efficient page recycling. However, if used with
> > > > a small number of repetitions, the overhead of creating and tearing down
> > > > the page pool is significant, and can even lead to system stalls if the
> > > > syscall is called in a tight loop.
> > > >
> > > > Now that we have a persistent system page pool instance, it becomes
> > > > pretty straight forward to change the test_run code to use it. The only
> > > > wrinkle is that we can no longer rely on a custom page init callback
> > > > from page_pool itself; instead, we change the test_run code to write a
> > > > random cookie value to the beginning of the page as an indicator that
> > > > the page has been initialised and can be re-used without copying the
> > > > initial data again.
> > > >
> > > > Signed-off-by: Toke Høiland-Jørgensen <toke@...hat.com>
> > >
> > > [...]
> > > > -
> > > > /* We create a 'fake' RXQ referencing the original dev, but with an
> > > > * xdp_mem_info pointing to our page_pool
> > > > */
> > > > xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0);
> > > > - xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL;
> > > > - xdp->rxq.mem.id = pp->xdp_mem_id;
> > > > + xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL; /* mem id is set per-frame below */
> > > > xdp->dev = orig_ctx->rxq->dev;
> > > > xdp->orig_ctx = orig_ctx;
> > > >
> > > > + /* We need a random cookie for each run as pages can stick around
> > > > + * between runs in the system page pool
> > > > + */
> > > > + get_random_bytes(&xdp->cookie, sizeof(xdp->cookie));
> > > > +
> > >
> > > So the assumption is that there is only a tiny chance of collisions with
> > > users outside of xdp test_run. If they do collide however, you'd leak data.
> >
> > Good point. @Toke: what is the worst-case thing that could happen in
> > case a page is recycled from another pool's user?
> >
> > could we possibly end-up matching the cookie for a page containing
> > 'random' orig_ctx/ctx, so that bpf program later tries to access
> > equally random ptrs?
>
> Well, yes, if there's a collision in the cookie value we'll end up
> basically dereferencing garbage pointer values, with all the badness
> that ensues (most likely just a crash, but system compromise is probably
> also possible in such a case).
>
> A 64-bit value is probably too small to be resistant against random
> collisions in a "protect global data across the internet" type scenario
> (for instance, a 64-bit cryptographic key is considered weak). However,
> in this case the collision domain is only for the lifetime of the
> running system, and each cookie value only stays valid for the duration
> of a single syscall (seconds, at most), so I figured it was acceptable.
>
> We could exclude all-zeros as a valid cookie value (and also anything
> that looks as a valid pointer), but that only removes a few of the
> possible random collision values, so if we're really worried about
> random collisions of 64-bit numbers, I think a better approach would be
> to just make the cookie a 128-bit value instead. I can respin with that
> if you prefer? :)
I must admit that merging a code that will allow trashing the kernel -
even with a very low probability - is quite scaring to me.
How much relevant is the recycle case optimization? Could removing
completely that optimization be considered?
Thanks!
Paolo
Powered by blists - more mailing lists