[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <78cefd4c-f735-2ec4-0c09-35c8191280c5@linux.alibaba.com>
Date: Sun, 20 Mar 2022 21:11:58 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: "rjw@...ysocki.net" <rjw@...ysocki.net>,
"lenb@...nel.org" <lenb@...nel.org>,
"james.morse@....com" <james.morse@....com>,
"bp@...en8.de" <bp@...en8.de>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"graeme.gregory@...aro.org" <graeme.gregory@...aro.org>,
"will.deacon@....com" <will.deacon@....com>,
"myron.stowe@...hat.com" <myron.stowe@...hat.com>,
"Brown, Len" <len.brown@...el.com>,
"Huang, Ying" <ying.huang@...el.com>
Subject: Re: [BUG] kernel side can NOT trigger memory error with einj
在 2022/3/18 AM12:57, Luck, Tony 写道:
>> - rc = apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR);
>> + ptr = kmap(pfn_to_page(pfn));
>> + tmp = *(ptr + (param1 & ~ PAGE_MASK));
>
> That hack works when the trigger action is just trying to access the injected
> location. But on Intel platforms the trigger "kicks" the patrol scrubber in the
> memory controller to access the address. So the error is triggered not by
> an access from the core, but by internal memory controller access.
>
> This results in a different error signature (for an uncorrected error injection
> it will be a UCNA or SRAO in Intel acronym-speak).
As far as I know, APEI only defines five injection instructions, ACPI_EINJ_READ_REGISTER,
ACPI_EINJ_READ_REGISTER_VALUE, ACPI_EINJ_WRITE_REGISTER, ACPI_EINJ_WRITE_REGISTER_VALUE and
ACPI_EINJ_NOOP. ACPI_EINJ_TRIGGER_ERROR action should run one of them, I don't see
any of them will kick the patrol scrubber. For example, trigger with ACPI_EINJ_READ_REGISTER:
apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR)
__apei_exec_run // ins=0
=> apei_exec_read_register
=> apei_read
=> acpi_os_read_memory
=> acpi_map_vaddr_lookup /* lookup VA of PA from acpi_ioremap */
=> acpi_os_ioremap
=> acpi_os_read_iomem
=> *(u32 *) value = readl(virt_addr);
As we can see, the error is triggered by access from the core. However, the physical
address can NOT be mapped by acpi_os_ioremap.
If I missed anything, please let me know. Thank you very much.
Best Regards,
Shuai
Powered by blists - more mailing lists