[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALvjV29aZ5yG6CbZMuC1FgyfedwCSR-dLbUo=OJ2eN8G=Nwqag@mail.gmail.com>
Date: Tue, 10 Sep 2024 12:04:19 -0700
From: Hugues Bruant <hugues.bruant@...il.com>
To: Thomas Zimmermann <tzimmermann@...e.de>
Cc: Borislav Petkov <bp@...en8.de>, stable@...r.kernel.org, regressions@...ts.linux.dev,
linux-kernel@...r.kernel.org, Fenghua Yu <fenghua.yu@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>, Tony Luck <tony.luck@...el.com>,
Tzung-Bi Shih <tzungbi@...nel.org>, Brian Norris <briannorris@...omium.org>,
Julius Werner <jwerner@...omium.org>, chrome-platform@...ts.linux.dev,
Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>, Rodrigo Vivi <rodrigo.vivi@...el.com>,
Tvrtko Ursulin <tursulin@...ulin.net>, intel-gfx@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org
Subject: Re: [REGRESSION] soft lockup on boot starting with kernel 6.10 /
commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
On Mon, Sep 9, 2024 at 2:49 AM Thomas Zimmermann <tzimmermann@...e.de> wrote:
>
> Hi
>
> Am 09.09.24 um 10:02 schrieb Borislav Petkov:
> > Aug 20 20:29:37 luna kernel: resource: Trying to free nonexistent resource <0x00000000a0000000-0x00000000a0257fff>
> > Aug 20 20:29:37 luna kernel: BUG: unable to handle page fault for address: 0000000300000031
> > Aug 20 20:29:37 luna kernel: #PF: supervisor read access in kernel mode
> > Aug 20 20:29:37 luna kernel: #PF: error_code(0x0000) - not-present page
> > Aug 20 20:29:37 luna kernel: PGD 0 P4D 0
> > Aug 20 20:29:37 luna kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > Aug 20 20:29:37 luna kernel: CPU: 9 PID: 552 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8
> > Aug 20 20:29:37 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021
> > Aug 20 20:29:37 luna kernel: RIP: 0010:__release_resource+0x34/0xb0
> > Aug 20 20:29:37 luna kernel: Code: 8d 50 38 48 8b 40 38 48 85 c0 75 27 eb 6a 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 50 30 <48> 8b 40 30 48 85 c0 74 45 48 39 c7 75 ee 40 84 f6 75 45 48 8b 4f
> > Aug 20 20:29:37 luna kernel: RSP: 0018:ffffb30dc207f930 EFLAGS: 00010296
> > Aug 20 20:29:37 luna kernel: RAX: 0000000300000001 RBX: ffff8fa34616e900 RCX: ffff8fa3424aac50
> > Aug 20 20:29:37 luna kernel: RDX: 0000000300000031 RSI: 0000000000000001 RDI: ffff8fa34616e900
> > Aug 20 20:29:37 luna kernel: RBP: ffff8fa3460e1400 R08: ffff8fa3424a97b8 R09: 0000000000000000
> > Aug 20 20:29:37 luna kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fa341671000
> > Aug 20 20:29:37 luna kernel: R13: 0000000000000000 R14: ffff8fa3416710c8 R15: ffff8fa341671000
> > Aug 20 20:29:37 luna kernel: FS: 00007b1bee0eb880(0000) GS:ffff8fae6e480000(0000) knlGS:0000000000000000
> > Aug 20 20:29:37 luna kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Aug 20 20:29:37 luna kernel: CR2: 0000000300000031 CR3: 0000000103924002 CR4: 00000000003706f0
> > Aug 20 20:29:37 luna kernel: Call Trace:
> > Aug 20 20:29:37 luna kernel: <TASK>
> > Aug 20 20:29:37 luna kernel: ? __die_body.cold+0x19/0x27
> > Aug 20 20:29:37 luna kernel: ? page_fault_oops+0x15a/0x2d0
> > Aug 20 20:29:37 luna kernel: ? exc_page_fault+0x81/0x190
> > Aug 20 20:29:37 luna kernel: ? asm_exc_page_fault+0x26/0x30
> > Aug 20 20:29:37 luna kernel: ? __release_resource+0x34/0xb0
> > Aug 20 20:29:37 luna kernel: release_resource+0x26/0x40
> > Aug 20 20:29:37 luna kernel: platform_device_del+0x51/0x90
> > Aug 20 20:29:37 luna kernel: platform_device_unregister+0x12/0x30
> > Aug 20 20:29:37 luna kernel: sysfb_disable+0x2f/0x80
> > Aug 20 20:29:37 luna kernel: aperture_remove_conflicting_pci_devices+0x8c/0xa0
>
> It's looks like another report of a known problem. Please try the patch at
>
> https://patchwork.freedesktop.org/patch/610171/?series=137587&rev=1
>
Thanks for the suggestion. I tried 6.11-rc7 which I am told includes
this patch. The first boot attempt was successful, but 4 subsequent
attempts ran into soft lockup again: it seems this patch makes the
soft lockup less reliable to reproduce but does not entirely fix it.
Noteworthy: after applying this patch, there seems to be slightly more
variability in where the soft lockup happens, instead of always in the
i915 driver probe triggered by udev. See attached boot logs for
details.
> Best regards
> Thomas
>
> > Aug 20 20:29:37 luna kernel: i915_driver_probe+0x7c8/0xac0 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea]
> > Aug 20 20:29:37 luna kernel: local_pci_probe+0x42/0x90
> > Aug 20 20:29:37 luna kernel: pci_device_probe+0xbd/0x290
> > Aug 20 20:29:37 luna kernel: ? sysfs_do_create_link_sd+0x6e/0xe0
> > Aug 20 20:29:37 luna kernel: really_probe+0xdb/0x340
> > Aug 20 20:29:37 luna kernel: ? pm_runtime_barrier+0x54/0x90
> > Aug 20 20:29:37 luna kernel: ? __pfx___driver_attach+0x10/0x10
> > Aug 20 20:29:37 luna kernel: __driver_probe_device+0x78/0x110
> > Aug 20 20:29:37 luna kernel: driver_probe_device+0x1f/0xa0
> > Aug 20 20:29:37 luna kernel: __driver_attach+0xba/0x1c0
> > Aug 20 20:29:37 luna kernel: bus_for_each_dev+0x8c/0xe0
> > Aug 20 20:29:37 luna kernel: bus_add_driver+0x112/0x1f0
> > Aug 20 20:29:37 luna kernel: driver_register+0x72/0xd0
> > Aug 20 20:29:37 luna kernel: i915_init+0x23/0x90 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea]
> > Aug 20 20:29:37 luna kernel: ? __pfx_i915_init+0x10/0x10 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea]
> > Aug 20 20:29:37 luna kernel: do_one_initcall+0x58/0x310
> > Aug 20 20:29:37 luna kernel: do_init_module+0x60/0x220
> > Aug 20 20:29:37 luna kernel: init_module_from_file+0x89/0xe0
> > Aug 20 20:29:37 luna kernel: idempotent_init_module+0x121/0x320
> > Aug 20 20:29:37 luna kernel: __x64_sys_finit_module+0x5e/0xb0
> > Aug 20 20:29:37 luna kernel: do_syscall_64+0x82/0x190
> > Aug 20 20:29:37 luna kernel: ? switch_fpu_return+0x4e/0xd0
> > Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200
> > Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190
> > Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200
> > Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190
> > Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80
> > Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80
> > Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80
> > Aug 20 20:29:37 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > Aug 20 20:29:37 luna kernel: RIP: 0033:0x7b1bee2f81fd
> > Aug 20 20:29:37 luna kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01 48
> > Aug 20 20:29:37 luna kernel: RSP: 002b:00007ffe062c2ac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > Aug 20 20:29:37 luna kernel: RAX: ffffffffffffffda RBX: 000056171c8d0a00 RCX: 00007b1bee2f81fd
> > Aug 20 20:29:37 luna kernel: RDX: 0000000000000004 RSI: 00007b1bee0e5061 RDI: 0000000000000026
> > Aug 20 20:29:37 luna kernel: RBP: 00007ffe062c2b80 R08: 0000000000000001 R09: 00007ffe062c2b10
> > Aug 20 20:29:37 luna kernel: R10: 0000000000000040 R11: 0000000000000246 R12: 00007b1bee0e5061
> > Aug 20 20:29:37 luna kernel: R13: 0000000000020000 R14: 000056171c8d18c0 R15: 000056171c8d31e0
> > Aug 20 20:29:37 luna kernel: </TASK>
> > Aug 20 20:29:37 luna kernel: Modules linked in: intel_powerclamp ath9k(+) snd_compress coretemp ac97_bus ath9k_common snd_pcm_dmaengine kvm_intel snd_hda_intel ath9k_hw joydev snd_intel_dspcfg mousedev ath snd_intel_sdw_acpi i915(+) kvm snd_hda_codec iTCO_wdt mac80211 snd_hda_core processor_thermal_device_pci_legacy intel_pmc_bxt snd_hwdep processor_thermal_device hid_multitouch ee1004 iTCO_vendor_support processor_thermal_wt_hint drm_buddy snd_pcm rapl processor_thermal_rfim hid_generic spi_nor r8169 i2c_i801 i2c_algo_bit libarc4 memconsole_coreboot processor_thermal_rapl snd_timer intel_cstate intel_rapl_msr framebuffer_coreboot memconsole cbmem intel_uncore snd intel_rapl_common realtek ttm i2c_smbus cfg80211 mtd processor_thermal_wt_req psmouse mdio_devres pcspkr soundcore i2c_mux processor_thermal_power_floor drm_display_helper intel_lpss_pci libphy processor_thermal_mbox intel_lpss cec rfkill int340x_thermal_zone intel_pmc_core i2c_hid_acpi idma64 intel_gtt intel_soc_dts_iosf intel_pch_thermal i2c_hid intel_vsec intel_hid video
> > Aug 20 20:29:37 luna kernel: pmt_telemetry pmt_class pinctrl_cannonlake wmi sparse_keymap coreboot_table mac_hid pkcs8_key_parser crypto_user loop acpi_call(OE) nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel serio_raw sha512_ssse3 atkbd sha256_ssse3 sha1_ssse3 libps2 aesni_intel vivaldi_fmap nvme crypto_simd nvme_core spi_intel_pci cryptd xhci_pci spi_intel i8042 nvme_auth xhci_pci_renesas serio librem_ec_acpi(OE)
> > Aug 20 20:29:37 luna kernel: CR2: 0000000300000031
> > Aug 20 20:29:37 luna kernel: ---[ end trace 0000000000000000 ]---
> >
>
> --
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
>
View attachment "lockup-m3.log" of type "text/x-log" (89285 bytes)
View attachment "lockup-m5.log" of type "text/x-log" (57625 bytes)
View attachment "lockup-m2.log" of type "text/x-log" (69715 bytes)
View attachment "lockup-m4.log" of type "text/x-log" (70326 bytes)
Powered by blists - more mailing lists