lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEBPF5wkOqYIUhOl@boxer>
Date: Wed, 4 Jun 2025 15:50:15 +0200
From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
To: Eryk Kubanski <e.kubanski@...tner.samsung.com>
CC: Stanislav Fomichev <stfomichev@...il.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "bjorn@...nel.org" <bjorn@...nel.org>,
	"magnus.karlsson@...el.com" <magnus.karlsson@...el.com>,
	"jonathan.lemon@...il.com" <jonathan.lemon@...il.com>
Subject: Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in
 __xsk_generic_xmit()

On Mon, Jun 02, 2025 at 06:18:57PM +0200, Eryk Kubanski wrote:
> > Eryk, can you tell us a bit more about HW you're using? The problem you
> > described simply can not happen for HW with in-order completions. You
> > can't complete descriptor from slot 5 without going through completion of
> > slot 3. So our assumption is you're using HW with out-of-order
> > completions, correct?
> 
> Maciej this isn't reproduced on any hardware.
> I found this bug while working on generic AF_XDP.
> 
> We're using MACVLAN deployment where, two or more
> sockets share single MACVLAN device queue.
> It doesn't even need to go out of host...
> 
> SKB doesn't even need to complete in this case
> to observe this bug. It's enough if earlier writer
> just fails after descriptor write. This case is
> writen in my diagram Notes 5).

Thanks for shedding a bit more light on it. In the future it would be nice
if you would be able to come up with a reproducer of a bug that others
could use on their side. Plus the overview of your deployment from the
beginning would also help with people understanding the issue :)

> 
> Are you sure that __dev_direct_xmit will keep
> the packets on the same thread? What's about
> NAPI, XPS, IRQs, etc?
> 
> If sendmsg() is issued by two threads, you don't
> know which one will complete faster. You can still
> have out-of-order completion in relation to
> descrpitor CQ write.
> 
> This isn't problem with out-of-order HW completion,
> but the problem with out-of-order completion in relation
> to sendmsg() call and descriptor write.
> 
> But this doesn't even need to be sent, as I
> explained above, situation where one of threads
> fails is more than enough to catch that bug.
> 
> > If that is the case then we have to think about possible solutions which
> > probably won't be straight-forward. As Stan said current fix is a no-go.
> 
> Okay what is your idea? In my opinion the only
> thing I can do is to just push the descriptors
> before or after __dev_direct_xmit() and keep
> these descriptors in some stack array.
> However this won't be compatible with behaviour
> of DRV deployed AF_XDP. Descriptors will be returned
> right after copy to SKB instead of after SKB is sent.
> If this is fine for you, It's fine for me.
> 
> Otherwise this need to be tied to SKB lifetime,
> but how?

I'm looking into it, bottom line is that we discussed it with Magnus and
agree that issue you're reporting needs to be addressed.

I'll get back to you to discuss potential way of attacking it.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ