lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <259ff554-76b8-8523-033-9e996f549c70@google.com>
Date:   Sat, 2 Oct 2021 03:17:29 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Steven Rostedt <rostedt@...dmis.org>
cc:     Daniele Ceraolo Spurio <daniele.ceraolospurio@...el.com>,
        Matt Roper <matthew.d.roper@...el.com>,
        Lucas De Marchi <lucas.demarchi@...el.com>,
        Tvrtko Ursulin <tvrtko.ursulin@...ux.intel.com>,
        Caz Yokoyama <caz.yokoyama@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Matthew Brost <matthew.brost@...el.com>,
        intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org
Subject: Re: [BUG 5.15-rc3] kernel BUG at
 drivers/gpu/drm/i915/i915_sw_fence.c:245!

On Sat, 2 Oct 2021, Steven Rostedt wrote:

> When I tried to test patches applied to v5.15-rc3, I hit this bug (and
> hence can not test my code), on 32 bit x86.
> 
> ------------[ cut here ]------------
> kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc1-test+ #456
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> EIP: __i915_sw_fence_init+0x15/0x38
> Code: 2b 3d 58 98 88 c1 74 05 e8 60 d9 58 00 8d 65 f4 5b 5e 5f 5d c3 3e
> 8d 74 26 00 55 89 e5 56 89 d6 53 85 d2 74 05 f6 c2 03 74 02 <0f> 0b 89
> ca 8b 4d 08 89 c3 e8 48 94 ab ff 89 73 34 c7 43 38 01 00
> EAX: c2508260 EBX: c2508000 ECX: c143de1e EDX: c09dfadd
> ESI: c09dfadd EDI: c45e7200 EBP: c26c9c68 ESP: c26c9c60
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
> CR0: 80050033 CR2: 00000000 CR3: 019e2000 CR4: 001506f0
> Call Trace:
>  intel_context_init+0x112/0x145
>  intel_context_create+0x29/0x37
>  intel_ring_submission_setup+0x3cb/0x5a8
>  ? kfree+0x135/0x1c6
>  ? wa_init_finish+0x32/0x59
>  ? wa_init_finish+0x4f/0x59
>  ? intel_engine_init_ctx_wa+0x39a/0x3b3
>  intel_engines_init+0x2dd/0x4d0
>  ? gen6_bsd_submit_request+0x97/0x97
>  intel_gt_init+0x122/0x20d
>  i915_gem_init+0x80/0xef
>  i915_driver_probe+0x880/0xa90
>  ? i915_pci_remove+0x27/0x27
>  i915_pci_probe+0xdd/0xf6
>  ? __pm_runtime_resume+0x63/0x6b
>  ? i915_pci_remove+0x27/0x27
>  pci_device_probe+0xbc/0x11e
>  really_probe+0x13e/0x328
>  __driver_probe_device+0x140/0x176
>  driver_probe_device+0x1f/0x71
>  __driver_attach+0xf6/0x109
>  ? __device_attach_driver+0xbd/0xbd
>  bus_for_each_dev+0x5b/0x88
>  driver_attach+0x19/0x1b
>  ? __device_attach_driver+0xbd/0xbd
>  bus_add_driver+0xf2/0x199
>  driver_register+0x8c/0xbe
>  __pci_register_driver+0x5b/0x60
>  i915_register_pci_driver+0x19/0x1b
>  i915_init+0x15/0x67
>  ? radeon_module_init+0x6a/0x6a
>  do_one_initcall+0xce/0x21c
>  ? rcu_read_lock_sched_held+0x35/0x6d
>  ? trace_initcall_level+0x5f/0x99
>  kernel_init_freeable+0x1fb/0x247
>  ? rest_init+0x129/0x129
>  kernel_init+0x17/0xfd
>  ret_from_fork+0x1c/0x28
> Modules linked in:
> ---[ end trace 791dc89810d853da ]---
> EIP: __i915_sw_fence_init+0x15/0x38
> Code: 2b 3d 58 98 88 c1 74 05 e8 60 d9 58 00 8d 65 f4 5b 5e 5f 5d c3 3e
> 8d 74 26 00 55 89 e5 56 89 d6 53 85 d2 74 05 f6 c2 03 74 02 <0f> 0b 89
> ca 8b 4d 08 89 c3 e8 48 94 ab ff 89 73 34 c7 43 38 01 00
> EAX: c2508260 EBX: c2508000 ECX: c143de1e EDX: c09dfadd
> ESI: c09dfadd EDI: c45e7200 EBP: c26c9c68 ESP: c26c9c60
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
> CR0: 80050033 CR2: 00000000 CR3: 019e2000 CR4: 001506f0
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> Kernel Offset: disabled
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> Attached is the dmesg and the config.
> 
> I bisected it down to this commit:
> 
> 3ffe82d701a4 ("drm/i915/xehp: handle new steering options")

Yes (though bisection doesn't work right on this one): the fix
https://lore.kernel.org/lkml/1f955bff-fd9e-d2ee-132a-f758add9e9cb@google.com/
seems to have got lost in the system: it has not even appeared in
linux-next yet. I was going to send a reminder later this weekend.

Here it is again (but edited to replace "__aligned(4)" in the original
by the official "__i915_sw_fence_call" I discovered afterwards; and
ignoring recent discussions of where __attributes ought to appear :-)


[PATCH] drm/i915: fix blank screen booting crashes

5.15-rc1 crashes with blank screen when booting up on two ThinkPads
using i915.  Bisections converge convincingly, but arrive at different
and suprising "culprits", none of them the actual culprit.

netconsole (with init_netconsole() hacked to call i915_init() when
logging has started, instead of by module_init()) tells the story:

kernel BUG at drivers/gpu/drm/i915/i915_sw_fence.c:245!
with RSI: ffffffff814d408b pointing to sw_fence_dummy_notify().
I've been building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, and that
function needs to be 4-byte aligned.

Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation")
Signed-off-by: Hugh Dickins <hughd@...gle.com>
---

 drivers/gpu/drm/i915/gt/intel_context.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -362,6 +362,7 @@ static int __intel_context_active(struct
 	return 0;
 }
 
+__i915_sw_fence_call	/* Respect the I915_SW_FENCE_MASK */
 static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 				 enum i915_sw_fence_notify state)
 {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ