linux-kernel - Re: [Bridge] [PATCH] net: bridge: Fix refcnt issues in dev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20240212152828.4049756-2-oficerovas@altlinux.org>
Date: Mon, 12 Feb 2024 18:28:28 +0300
From: Alexander Ofitserov <oficerovas@...linux.org>
To: astrajoan@...oo.com
Cc: arnd@...db.de,
	bridge@...ts.linux-foundation.org,
	davem@...emloft.net,
	edumazet@...gle.com,
	f.fainelli@...il.com,
	hkallweit1@...il.com,
	ivan.orlov0322@...il.com,
	keescook@...omium.org,
	kuba@...nel.org,
	linux-kernel@...r.kernel.org,
	mudongliangabcd@...il.com,
	netdev@...r.kernel.org,
	nikolay@...dia.com,
	pabeni@...hat.com,
	razor@...ckwall.org,
	roopa@...dia.com,
	skhan@...uxfoundation.org,
	syzbot+881d65229ca4f9ae8c84@...kaller.appspotmail.com,
	syzkaller-bugs@...glegroups.com,
	vladimir.oltean@....com,
	dutyrok@...linux.org,
	Alexander Ofitserov <oficerovas@...linux.org>
Subject: Re: [Bridge] [PATCH] net: bridge: Fix refcnt issues in dev_ioctl

On Wed, Aug 23, 2023 at 00:38:46PM +0300, Ziqi Zhao wrote:
> On Tue, Aug 22, 2023 at 01:40:45PM +0300, Nikolay Aleksandrov wrote:
> > Thank you for testing, but we really need to understand what is going on
> > and why the device isn't getting deleted for so long. Currently I don't
> > have the time to debug it properly (I'll be able to next week at the
> > earliest). We can't apply the patch based only on tests without
> > understanding the underlying issue. I'd look into what
> > the reproducer is doing exactly and also check the system state while the
> > deadlock has happened. Also you can list the currently held locks (if
> > CONFIG_LOCKDEP is enabled) via magic sysrq + d for example. See which
> > process is holding them, what are their priorities and so on.
> > Try to build some theory of how a deadlock might happen and then go
> > about proving it. Does the 8021q module have the same problem? It uses
> > similar code to set its hook.
>
> Hi Nik,
>
> Thank you so much for the instructions! I was able to obtain a decoded
> stacktrace showing the reproducer behavior in my QEMU VM running kernel
> 6.5-rc4, in case that would give us more context for pinpointing the
> problem. Here's a link to the output:
>
> https://pastecat.io/?p=IlKZlflN9j2Z2mspjKe7
>
> Basically, after running the reproducer (line 1854) for about 180
> seconnds or so, the unregister_netdevice warning was shown (line 1856),
> and then after another 50 seconds, the kernel detected that some tasks
> have been stalled for more than 143 seconds (line 1866), so it panicked
> on the blocked tasks (line 2116). Before the panic, we did get to see
> all the locks held in the system (line 2068), and it did show that many
> processes created by the reproducer were contending the br_ioctl_mutex.
> I'm now starting to wonder whether this is really a deadlock, or simply
> some tasks not being able to grab the lock because so many processes
> are trying to acquire it.
>
> Let me know what you think about the situation shown in the above log,
> and let's keep in touch for any future debugging. Thank you again for
> guiding me through the problem!
>
> Best regards,
> Ziqi

Hello,

I've also encountered this bug while fuzzing. Is there any going work on this
bug?


-- 
2.42.1