lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee1d26c2-4e14-42f2-8dc0-d57a49490c9e@linux.intel.com>
Date: Thu, 17 Apr 2025 10:46:28 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: "Tian, Kevin" <kevin.tian@...el.com>, Joerg Roedel <joro@...tes.org>,
 Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
 Jarkko Nikula <jarkko.nikula@...ux.intel.com>
Cc: "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/1] iommu/vt-d: Revert ATS timing change to fix boot
 failure

On 4/17/25 10:23, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@...ux.intel.com>
>> Sent: Wednesday, April 16, 2025 3:36 PM
>>
>> Commit <5518f239aff1> ("iommu/vt-d: Move scalable mode ATS enablement
>> to
>> probe path") changed the PCI ATS enablement logic to run earlier,
>> specifically before the default domain attachment.
>>
>> On some client platforms, this change resulted in boot failures, causing
>> the kernel to panic with the following message and call trace:
>>
>>   Kernel panic - not syncing: DMAR hardware is malfunctioning
>>   CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #175
>>   Call Trace:
>>    <TASK>
>>    dump_stack_lvl+0x6f/0xb0
>>    dump_stack+0x10/0x16
>>    panic+0x10a/0x2b7
>>    iommu_enable_translation.cold+0xc/0xc
>>    intel_iommu_init+0xe39/0xec0
>>    ? trace_hardirqs_on+0x1e/0xd0
>>    ? __pfx_pci_iommu_init+0x10/0x10
>>    pci_iommu_init+0xd/0x40
>>    do_one_initcall+0x5b/0x390
>>    kernel_init_freeable+0x26d/0x2b0
>>    ? __pfx_kernel_init+0x10/0x10
>>    kernel_init+0x15/0x120
>>    ret_from_fork+0x35/0x60
>>    ? __pfx_kernel_init+0x10/0x10
>>    ret_from_fork_asm+0x1a/0x30
>>   RIP: 1f0f:0x0
>>   Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>>   RSP: 0000:0000000000000000 EFLAGS: 841f0f2e66 ORIG_RAX:
>>        1f0f2e6600000000
>>   RAX: 0000000000000000 RBX: 1f0f2e6600000000 RCX:
>>        2e66000000000084
>>   RDX: 0000000000841f0f RSI: 000000841f0f2e66 RDI:
>>        00841f0f2e660000
>>   RBP: 00841f0f2e660000 R08: 00841f0f2e660000 R09:
>>        000000841f0f2e66
>>   R10: 0000000000841f0f R11: 2e66000000000084 R12:
>>        000000841f0f2e66
>>   R13: 0000000000841f0f R14: 2e66000000000084 R15:
>>        1f0f2e6600000000
>>    </TASK>
>>   ---[ end Kernel panic - not syncing: DMAR hardware is malfunctioning ]---
>>
>> Fix this by reverting the timing change for ATS enablement introduced by
>> the offending commit and restoring the previous behavior.
>>
> 
> it's unclear how this timing is related to the dumped stack. Is there
> more detail how they are related?
> 

I'm not sure, but I'm trying to find a machine and get more information.
Anyway, let's revert the change and remove the boot regression first.

Thanks,
baolu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ