lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5cee9a36-b094-37a0-e961-d7404b3dafe2@gonehiking.org>
Date:   Tue, 11 Jan 2022 16:17:32 -0700
From:   Khalid Aziz <khalid@...ehiking.org>
To:     nbd@....name, lorenzo.bianconi83@...il.com, ryder.lee@...iatek.com,
        shayne.chen@...iatek.com, sean.wang@...iatek.com, kvalo@...nel.org
Cc:     davem@...emloft.net, kuba@...nel.org, matthias.bgg@...il.com,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: [Bug] mt7921e driver in 5.16 causes kernel panic

I am seeing an intermittent bug in mt7921e driver. When the driver module is loaded
and is being initialized, almost every other time it seems to write to some wild
memory location. This results in driver failing to initialize with message
"Timeout for driver own" and at the same time I start to see "Bad page state" messages
for random processes. Here is the relevant part of dmesg:

[OK] Found device SAMSUNG MZVLB1T0HBLR-000L7 6.
[OK ]Found device SAMSUNG MZVLB1T0HBLR-000L7 SYSTEM.
[OK] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
Starting Cryptography Setup for nvme8n1p6_crypt...
[  5.687489] mt7921e 0000:03:00.0: ASIC revision: 79610010
Starting File System Check on /dev/disk/by-uuid/CCSA-8086...
Please enter passphrase for disk SAMSUNG MZVLB1T0HBLR-000L7 (nvme8n1p6_crypt) on /home
[  7.798962] mt7921e 0000:03:00.0: Timeout for driver own
[  8.874863] mt7921e 0000:03:00.0: Timeout for driver own
[  8.876266] BUG: Bad page state in process systemd-udevd  pfn:123848
[  8.877953] BUG: Bad page state in process napi/phy8-8194  pfn:10a4a8
[  9.958899] mt7921e 0000:03:00.0: Timeout for driver own
[  9.961595] BUG: Bad page state in process systemd-udevd  pfn:1037e8
[ 11.843129] mt7921e 0000:03:00.0: Timeout for driver own
[ 11.845823] BUG: Bad page state in process systemd-udevd  pfn:104380
[ 12.126922] mt7921e 0000:03:00.0: Timeout for driver own
[ 12.128788] BUG: Bad page state in process systemd-udevd  pfn:10a050
[ 13.287898] mt7921e 0000:03:00.0: Timeout for driver own
[ 14.287827] mt7921e 0000:03:00.0: Timeout for driver own
[ 14.288968] BUG: Bad page state in process systemd-udevd  pfn:109f51
[ 14.298599] BUG: Bad page state in process systemd-udevd  pfn:105f60
[ 14.292162] BUG: Bad page state in process systemd-udevd  pfn:10ac07
[ 15.372501] mt7921e 0000:03:00.0: Timeout for driver own
[ 16.454773] mt7921e 0000:03:00.0: Timeout for driver own
[ 16.456238] BUG: Bad page state in process systemd-udevd pfn:1a0c00
[ 16.515869) mt7921e 0000:03:00.0: hardware init failed

These "Bad page state" messages continue until kernel finally panics with a page
fault in a seemingly random place:

[  17.544222] BUG: Bad page state in process apparmor_parser  pfn:1116f8
[  OK  ] Finished Create Volatile Files and Directories
          Starting Network Name Resolution...
          Starting Network Time Synchronization...
          Starting Update UTMP about System Boot/Shutdown...
[  17.677144] BUG: unable to handle page fault for address: 0000396eb08090ec
[  17.680395] #PF: supervisor read access in kernel mode
[  17.681086] #PF: error code(0x0000) - not-present page
[  17.681086] PGD 0 P4D 0
[  17.681006] Opps: 0000 [#1] PREEMPT SMP NOPTI
[  17.681006] CPU: 8 PID: 63 Con: ksoftirgd/8 Tainted: G  B  W        5.16.0 #3
[  17.681606] Hardware name: LENOVO 20XF004WUS/20XF004WUS, BIOS R1NET44W (1.14) 11/08/2821

Rest of the kernel stack trace is in form of a picture which I can send if it helps. Kernel
is compiled from git tag "v5.16". Details of mediatek controller:

$ lspci -v -s 03:00.0
03:00.0 Network controller: MEDIATEK Corp. Device 7961
	Subsystem: Lenovo Device e0bc
	Physical Slot: 0
	Flags: bus master, fast devsel, latency 0, IRQ 85, IOMMU group 11
	Memory at 870200000 (64-bit, prefetchable) [size=1M]
	Memory at 870300000 (64-bit, prefetchable) [size=16K]
	Memory at 870304000 (64-bit, prefetchable) [size=4K]
	Capabilities: [80] Express Endpoint, MSI 00
	Capabilities: [e0] MSI: Enable+ Count=1/32 Maskable+ 64bit+
	Capabilities: [f8] Power Management version 3
	Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
	Capabilities: [108] Latency Tolerance Reporting
	Capabilities: [110] L1 PM Substates
	Capabilities: [200] Advanced Error Reporting
	Kernel driver in use: mt7921e
	Kernel modules: mt7921e

This is an intermittent problem and I did not see this with 5.16-rc6 kernel.
Please let me know if you need more information.

Thanks,
Khalid

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ