lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66479f9e-fd0b-41d0-b7b8-07a336c3341b@candelatech.com>
Date: Mon, 16 Sep 2024 09:15:17 -0700
From: Ben Greear <greearb@...delatech.com>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>
Cc: Jan Glaza <jan.glaza@...el.com>,
 Aleksandr Loktionov <aleksandr.loktionov@...el.com>,
 "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
 netdev <netdev@...r.kernel.org>
Subject: Re: tcp_ack __list_del crash in 6.10.3+ hacks

On 9/16/24 04:09, Przemek Kitszel wrote:
> On 9/16/24 12:32, Przemek Kitszel wrote:
>> On 9/14/24 07:27, Ben Greear wrote:
>>> Hello,
>>>
>>> We found this during a long duration network test where we are using
>>> lots of wifi network devices in a single system, talking with
>>
>> It will be really hard to repro for us. Still would like to help.

We also have trouble reproducing this.  Thanks for suggestions on
debugging tips below...we'll try to get some better debugging
to share (on stock kernels).

>>
>>> an intel 10g
>>
>> It's more likely to get Intel's help if you mail (also) to our IWL list
>> (CCed, +Aleksandr for ixgbe expertise).
>>
>>
>>> NIC in the same system (using vrfs and such).  The system ran around
>>> 7 hours before it crashed.  Seems to be a null pointer in a list, but
>>> I'm not having great luck understanding where exactly in the large tcp_ack
>>> method this is happening.  Any suggestions for how to get more relevant
>>> info out of gdb?
> 
> I would also enable kmemleak, lockdep, ubsan to get some easy helpers.
> 
>>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000008^M
>>> #PF: supervisor write access in kernel mode^M
> 
> could you share your virtualization config?

We are using vrf for each of the network devices.  We're using mac-vlans
and 12 intel ax210 as well, though I need to verify the netdevs to make sure I'm
not confusing it with a second mostly unrelated problem we are tracking.

>>> #PF: error_code(0x0002) - not-present page^M
>>> PGD 115855067 P4D 115855067 PUD 283ed3067 PMD 0 ^M
>>> Oops: Oops: 0002 [#1] PREEMPT SMP^M
>>> CPU: 6 PID: 115673 Comm: btserver Tainted: G           O       6.10.3+ 
> 
> so, what hacks do you have? those are to aid debugging or to enable some
> of the wifi devices?

Great piles of wifi related hacks mostly.

> I don't have any insightful comment unfortunately, sorry.

We are able to reproduce on upstream 6.11.0 as well.  Or, we reproduced a soft-lockup
at least.  We are trying again now with lockdep and list debugging and some other
debugging enabled.

Thanks,
Ben

-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ