lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 3 Aug 2022 08:37:51 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Bruno Goncalves <bgoncalv@...hat.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        CKI Project <cki-project@...hat.com>
Subject: Re: RIP: 0010:qede_load+0x128d/0x13b0 [qede] - 5.19.0

On Wed, 3 Aug 2022 14:13:00 +0200 Bruno Goncalves wrote:
> Got this from the most recent failure (kernel built using commit 0805c6fb39f6):
> 
> the tarball is https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/603714145/build%20x86_64%20debug/2807738987/artifacts/kernel-mainline.kernel.org-redhat_603714145_x86_64_debug.tar.gz
> and the call trace from
> https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/08/02/redhat:603123526/build_x86_64_redhat:603123526_x86_64_debug/tests/1/results_0001/console.log/console.log
> 
> [   69.876513] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [   69.888521] Hardware name: HPE ProLiant DL325 Gen10 Plus/ProLiant
> DL325 Gen10 Plus, BIOS A43 08/09/2021
> [   69.897971] RIP: 0010:qede_load.cold
> (/builds/2807738987/workdir/./include/linux/spinlock.h:389
> /builds/2807738987/workdir/./include/linux/netdevice.h:4294
> /builds/2807738987/workdir/./include/linux/netdevice.h:4385
> /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2594
> /builds/2807738987/workdir/drivers/net/ethernet/qlogic/qede/qede_main.c:2575)

Thanks a lot! That seems to point the finger at commit 3aa6bce9af0e
("net: watchdog: hold device global xmit lock during tx disable") but
frankly IDK why... The driver must be fully initialized to get to
ndo_open() so how is the tx_global_lock busted?!

Would you be able to re-run with CONFIG_KASAN=y ?
Perhaps KASAN can tell us what's messing up the lock.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ