[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240212152828.4049756-2-oficerovas@altlinux.org>
Date: Mon, 12 Feb 2024 18:28:28 +0300
From: Alexander Ofitserov <oficerovas@...linux.org>
To: astrajoan@...oo.com
Cc: arnd@...db.de,
bridge@...ts.linux-foundation.org,
davem@...emloft.net,
edumazet@...gle.com,
f.fainelli@...il.com,
hkallweit1@...il.com,
ivan.orlov0322@...il.com,
keescook@...omium.org,
kuba@...nel.org,
linux-kernel@...r.kernel.org,
mudongliangabcd@...il.com,
netdev@...r.kernel.org,
nikolay@...dia.com,
pabeni@...hat.com,
razor@...ckwall.org,
roopa@...dia.com,
skhan@...uxfoundation.org,
syzbot+881d65229ca4f9ae8c84@...kaller.appspotmail.com,
syzkaller-bugs@...glegroups.com,
vladimir.oltean@....com,
dutyrok@...linux.org,
Alexander Ofitserov <oficerovas@...linux.org>
Subject: Re: [Bridge] [PATCH] net: bridge: Fix refcnt issues in dev_ioctl
On Wed, Aug 23, 2023 at 00:38:46PM +0300, Ziqi Zhao wrote:
> On Tue, Aug 22, 2023 at 01:40:45PM +0300, Nikolay Aleksandrov wrote:
> > Thank you for testing, but we really need to understand what is going on
> > and why the device isn't getting deleted for so long. Currently I don't
> > have the time to debug it properly (I'll be able to next week at the
> > earliest). We can't apply the patch based only on tests without
> > understanding the underlying issue. I'd look into what
> > the reproducer is doing exactly and also check the system state while the
> > deadlock has happened. Also you can list the currently held locks (if
> > CONFIG_LOCKDEP is enabled) via magic sysrq + d for example. See which
> > process is holding them, what are their priorities and so on.
> > Try to build some theory of how a deadlock might happen and then go
> > about proving it. Does the 8021q module have the same problem? It uses
> > similar code to set its hook.
>
> Hi Nik,
>
> Thank you so much for the instructions! I was able to obtain a decoded
> stacktrace showing the reproducer behavior in my QEMU VM running kernel
> 6.5-rc4, in case that would give us more context for pinpointing the
> problem. Here's a link to the output:
>
> https://pastecat.io/?p=IlKZlflN9j2Z2mspjKe7
>
> Basically, after running the reproducer (line 1854) for about 180
> seconnds or so, the unregister_netdevice warning was shown (line 1856),
> and then after another 50 seconds, the kernel detected that some tasks
> have been stalled for more than 143 seconds (line 1866), so it panicked
> on the blocked tasks (line 2116). Before the panic, we did get to see
> all the locks held in the system (line 2068), and it did show that many
> processes created by the reproducer were contending the br_ioctl_mutex.
> I'm now starting to wonder whether this is really a deadlock, or simply
> some tasks not being able to grab the lock because so many processes
> are trying to acquire it.
>
> Let me know what you think about the situation shown in the above log,
> and let's keep in touch for any future debugging. Thank you again for
> guiding me through the problem!
>
> Best regards,
> Ziqi
Hello,
I've also encountered this bug while fuzzing. Is there any going work on this
bug?
--
2.42.1
Powered by blists - more mailing lists