lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241002055338.GI3296@linux-l9pv.suse>
Date: Wed, 2 Oct 2024 13:53:38 +0800
From: joeyli <jlee@...e.com>
To: Valentin Kleibel <valentin@...is.at>
Cc: Chun-Yi Lee <joeyli.kernel@...il.com>,
	Justin Sanders <justin@...aid.com>, Jens Axboe <axboe@...nel.dk>,
	Pavel Emelianov <xemul@...nvz.org>,
	Kirill Korotaev <dev@...nvz.org>,
	"David S . Miller" <davem@...emloft.net>,
	Nicolai Stange <nstange@...e.com>,
	Greg KH <gregkh@...uxfoundation.org>, linux-block@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] aoe: fix the potential use-after-free problem in more
 places

Hi Valentin,

On Thu, Sep 12, 2024 at 12:58:46PM +0200, Valentin Kleibel wrote:
> > Then Nicolai Stange found more places in aoe have potential use-after-free
> > problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
> > and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
> > packet to tx queue. So they should also use dev_hold() to increase the
> > refcnt of skb->dev.
> 
> We've tested your patch on our servers and ran into an issue.
> With heavy I/O load the aoe device had stale I/Os (e.g. rsync waiting
> indefinetly on one core) that can be "fixed" by running aoe-revalidate on
> that device.
> 
> Additionally when trying to shut down the system we see the message:
> unregister_netdevice: waiting for XXX to become free. Usage Count = XXXXX
> on aoe devices with a usage count somewhere in the millions.
> This has been the same as without the patch, i assume the fix is still
> incomplete.
>

For the reference count debugging, I have sent a patch series here:

[RFC PATCH 0/2] tracking the references of net_device in aoe
https://lore.kernel.org/lkml/20241002040616.25193-1-jlee@suse.com/T/#t

Base on my testing, the number of dev_hold(nd) and dev_put(nd) are balance
in aoe after the this 'aoe: fix the potential use-after-free problem in more places'
patch be applied on v6.11 kernel. I have tested add/modify/delete files in remote
target by aoe. My testing is not a heavy I/O testing. But the result is
balance.

Could you please help to try the above debug patch series for looking at the
refcnt value in aoe in your side?

Thanks a lot!
Joey Lee

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ