lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB41572415707F0FA6D9A61247D4132@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 9 Jan 2025 03:16:03 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Breno Leitao <leitao@...ian.org>, Herbert Xu
	<herbert@...dor.apana.org.au>, "saeedm@...dia.com" <saeedm@...dia.com>,
	"tariqt@...dia.com" <tariqt@...dia.com>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>
CC: Andrew Morton <akpm@...ux-foundation.org>, Thomas Graf <tgraf@...g.ch>,
	Tejun Heo <tj@...nel.org>, Hao Luo <haoluo@...gle.com>, Josh Don
	<joshdon@...gle.com>, Barret Rhoden <brho@...gle.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] rhashtable: Fix potential deadlock by moving
 schedule_work outside lock

From: Breno Leitao <leitao@...ian.org> Sent: Thursday, January 2, 2025 2:16 AM
> 
> On Sat, Dec 21, 2024 at 05:06:55PM +0800, Herbert Xu wrote:
> > On Thu, Dec 12, 2024 at 08:33:31PM +0800, Herbert Xu wrote:
> > >
> > > The growth check should stay with the atomic_inc.  Something like
> > > this should work:
> >
> > OK I've applied your patch with the atomic_inc move.
> 
> Sorry, I was on vacation, and I am back now. Let me know if you need
> anything further.
> 
> Thanks for fixing it,
> --breno

Breno and Herbert --

This patch seems to break things in linux-next. I'm testing with
linux-next20250108 in a VM in the Azure public cloud. The Mellanox mlx5
ethernet NIC in the VM is failing to get setup.

I bisected to commit e1d3422c95f0 ("rhashtable: Fix potential deadlock
by moving schedule_work outside lock"), then debugged why opening
the mlx5 NIC device is failing. The failure is in the XDP code in function
__xdp_reg_mem_model() where the call to rhashtable_insert_slow()
is returning -E2BIG. The problem does not occur when the commit
is reverted.

The function call stack is this:

dev_open()
__dev_open()
mlx5e_open()
mlx5e_open_locked()
mlx5e_open_channels()
mlx5e_open_channel()
mlx5e_open_queues()
mlx5e_open_rxq_rq()
mlx5e_open_rq()
mlx5e_alloc_rq()
xdp_rxq_info_reg_mem_model()
__xdp_reg_mem_model()
rhashtable_insert_slow()

I have not debugged further as I don't know anything about the
rhashtable code or the XDP code. The only repro I have is a VM
in Azure. I thought I'd ask you (Breno and Herbert) to review
the patch again and see if there's a path that could cause the
hash table to be incorrectly detected as full.

I've included the linux-hyperv mailing list and the mlx5 driver
maintainers on this email. Someone involved with Azure/Hyper-V
or the mlx5 driver may have seen the problem, and I want to try
to avoid duplicative debugging.

Let me know if there's something I can do to help debug further.

Thanks,

Michael Kelley

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ