netdev - Re: Urgent Bug Report Kernel crash 6.5.2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <2B5C19AE-C125-45A3-8C6F-CA6BBC01A6D9@gmail.com>
Date: Sat, 9 Dec 2023 01:01:04 +0200
From: Martin Zaharinov <micron10@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: peterz@...radead.org,
 netdev <netdev@...r.kernel.org>,
 Paolo Abeni <pabeni@...hat.com>,
 patchwork-bot+netdevbpf@...nel.org,
 Jakub Kicinski <kuba@...nel.org>,
 Stephen Hemminger <stephen@...workplumber.org>,
 kuba+netdrv@...nel.org,
 dsahern@...il.com,
 Eric Dumazet <edumazet@...gle.com>
Subject: Re: Urgent Bug Report Kernel crash 6.5.2

Hi Thomas,



> On 9 Dec 2023, at 0:20, Thomas Gleixner <tglx@...utronix.de> wrote:
> 
> On Thu, Dec 07 2023 at 00:38, Martin Zaharinov wrote:
>>> On 7 Dec 2023, at 0:26, Martin Zaharinov <micron10@...il.com> wrote:
>>> 
>>> in this line is : 
>>> 
>>> 
>>>       /*
>>>        * If the reference count was already in the dead zone, then this
>>>        * put() operation is imbalanced. Warn, put the reference count back to
>>>        * DEAD and tell the caller to not deconstruct the object.
>>>        */
>>>       if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
>>>               atomic_set(&ref->refcnt, RCUREF_DEAD);
>>>               return false;
>>>       }
> 
> So a rcuref_put() operation triggers the warning because the reference
> count is already dead, which means the rcuref_put() operation is
> imbalanced.
> 
>>> [529520.875413] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G           O       6.6.3 #1
> 
> Can you reproduce this without the Out of Tree module?
Same error without Out of Tree modules. i try many time from kernel 6.5.x to now.

> 
>>> [529520.875653] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
>>> [529520.878136]  dst_release+0x1c/0x40
>>> [529520.878229]  __dev_queue_xmit+0x594/0xcd0
>>> [529520.878324]  ? eth_header+0x25/0xc0
>>> [529520.878417]  ip_finish_output2+0x1a0/0x530
>>> [529520.878514]  process_backlog+0x107/0x210
>>> [529520.878610]  __napi_poll+0x20/0x180
>>> [529520.878702]  net_rx_action+0x29f/0x380
>>> [529520.878935]  __do_softirq+0xd0/0x202
>>> [529520.879033]  do_softirq+0x3a/0x50
> 
> So this is one call chain triggering the issue...
> 
>>>> report same problem with kernel 6.6.1 - i think problem is in rcu
>>>> but … if have options to add people from RCU here.
> 
> That's definitely not a RCU problem. It's a simple refcount fail.
> 
> Thanks,
> 
>        tglx
> 

Is this a problem or only simple fail , and is it possible to catch what is a problem and fix this fail.

m.