[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <2B5C19AE-C125-45A3-8C6F-CA6BBC01A6D9@gmail.com>
Date: Sat, 9 Dec 2023 01:01:04 +0200
From: Martin Zaharinov <micron10@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: peterz@...radead.org,
netdev <netdev@...r.kernel.org>,
Paolo Abeni <pabeni@...hat.com>,
patchwork-bot+netdevbpf@...nel.org,
Jakub Kicinski <kuba@...nel.org>,
Stephen Hemminger <stephen@...workplumber.org>,
kuba+netdrv@...nel.org,
dsahern@...il.com,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: Urgent Bug Report Kernel crash 6.5.2
Hi Thomas,
> On 9 Dec 2023, at 0:20, Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Thu, Dec 07 2023 at 00:38, Martin Zaharinov wrote:
>>> On 7 Dec 2023, at 0:26, Martin Zaharinov <micron10@...il.com> wrote:
>>>
>>> in this line is :
>>>
>>>
>>> /*
>>> * If the reference count was already in the dead zone, then this
>>> * put() operation is imbalanced. Warn, put the reference count back to
>>> * DEAD and tell the caller to not deconstruct the object.
>>> */
>>> if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
>>> atomic_set(&ref->refcnt, RCUREF_DEAD);
>>> return false;
>>> }
>
> So a rcuref_put() operation triggers the warning because the reference
> count is already dead, which means the rcuref_put() operation is
> imbalanced.
>
>>> [529520.875413] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G O 6.6.3 #1
>
> Can you reproduce this without the Out of Tree module?
Same error without Out of Tree modules. i try many time from kernel 6.5.x to now.
>
>>> [529520.875653] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
>>> [529520.878136] dst_release+0x1c/0x40
>>> [529520.878229] __dev_queue_xmit+0x594/0xcd0
>>> [529520.878324] ? eth_header+0x25/0xc0
>>> [529520.878417] ip_finish_output2+0x1a0/0x530
>>> [529520.878514] process_backlog+0x107/0x210
>>> [529520.878610] __napi_poll+0x20/0x180
>>> [529520.878702] net_rx_action+0x29f/0x380
>>> [529520.878935] __do_softirq+0xd0/0x202
>>> [529520.879033] do_softirq+0x3a/0x50
>
> So this is one call chain triggering the issue...
>
>>>> report same problem with kernel 6.6.1 - i think problem is in rcu
>>>> but … if have options to add people from RCU here.
>
> That's definitely not a RCU problem. It's a simple refcount fail.
>
> Thanks,
>
> tglx
>
Is this a problem or only simple fail , and is it possible to catch what is a problem and fix this fail.
m.
Powered by blists - more mailing lists