[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f93a5532-3e07-edf4-38ca-142a0f1d78d7@linux.alibaba.com>
Date: Thu, 17 Mar 2022 10:56:27 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: "rjw@...ysocki.net" <rjw@...ysocki.net>,
"lenb@...nel.org" <lenb@...nel.org>,
"james.morse@....com" <james.morse@....com>,
"bp@...en8.de" <bp@...en8.de>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
graeme.gregory@...aro.org, will.deacon@....com,
myron.stowe@...hat.com, len.brown@...el.com, ying.huang@...el.com
Subject: Re: [BUG] kernel side can NOT trigger memory error with einj
Hi, Tony,
Thank you for your quick reply.
在 2022/3/17 AM1:29, Luck, Tony 写道:
> On Tue, Mar 08, 2022 at 01:19:12PM +0800, Shuai Xue wrote:
>> Hi folks,
>>
>> If we inject an memory error at physical memory address, e.g. 0x92f033038,
>> used by a user space process:
>>
>> echo 0x92f033038 > /sys/kernel/debug/apei/einj/param1
>> echo 0xfffffffffffff000 > /sys/kernel/debug/apei/einj/param2
>> echo 0x1 > /sys/kernel/debug/apei/einj/flags
>> echo 0x8 > /sys/kernel/debug/apei/einj/error_type
>> echo 1 > /sys/kernel/debug/apei/einj/error_inject
>>
>> Then the following error will be reported in dmesg:
>>
>> ACPI: [Firmware Bug]: requested region covers kernel memory @ 0x000000092f033038
>>
>> After digging into einj trigger interface, I think it's a kernel bug.
>
> I think you are right. This isn't the first bug where Linux tries
> to validate addresses supplied by EINJ for Linux to read/write.
>
> I hadn't come across it because I almost always set:
>
> # echo 1 > notrigger
>
> so that I can have some application, or function in the kernel
> trigger the error. Instead of running the EINJ trigger action
> to make it happen right away.
Haha, I know your great test suit, ras-tools. All cases are not triggered
by EINJ tigger action. I have learned a lot from it.
>> I am wondering that should we use kmap to map RAM in acpi_map or add a
>> another path to address this issue? Any comment is welcomed.
>
> Perhaps just drop the sanity checks? Just trusting the BIOS? Sounds
> radical, but this is validation code where the user is deliberately
> injecting errors. If there are BIOS bugs, then people doing validation
> may be well positioned to find the BIOS people to make them fix
> things.
>
> Problem with this approach is that EINJ calls into the APEI code
> that is used for other things besides error injection for validation.
> So a blanket removal of sanity checks wouldn't be a good idea.
Agree. A blanket removal of APEI sanity checks is not a good idea. How about
requesting memory with kmap instead APEI API only in __einj_error_trigger()?
Then we would not break the validation of APEI code and could trigger the
injected error.
I have provided a rough code in last mail.
> A hacking way to address this issue is that map RAM memory with kmap
> instead of apei_exec_pre_map_gars, and read it directly instead of
> apei_exec_run.
> - rc = apei_exec_pre_map_gars(&trigger_ctx);
> - if (rc)
> - goto out_release;
> + volatile long *ptr;
> + long tmp;
> + unsigned long pfn;
> + pfn = param1 >> PAGE_SHIFT;
>
> - rc = apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR);
> + ptr = kmap(pfn_to_page(pfn));
> + tmp = *(ptr + (param1 & ~ PAGE_MASK));
>
> - apei_exec_post_unmap_gars(&trigger_ctx);
Best Regards.
Shuai
Powered by blists - more mailing lists