lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <57665787-66f1-1d5a-a190-e73f4b941dce@linux.alibaba.com>
Date:   Tue, 22 Mar 2022 11:36:29 +0800
From:   Shuai Xue <xueshuai@...ux.alibaba.com>
To:     "Huang, Ying" <ying.huang@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>
Cc:     "rjw@...ysocki.net" <rjw@...ysocki.net>,
        "lenb@...nel.org" <lenb@...nel.org>,
        "james.morse@....com" <james.morse@....com>,
        "bp@...en8.de" <bp@...en8.de>,
        "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "graeme.gregory@...aro.org" <graeme.gregory@...aro.org>,
        "will.deacon@....com" <will.deacon@....com>,
        "myron.stowe@...hat.com" <myron.stowe@...hat.com>,
        "Brown, Len" <len.brown@...el.com>
Subject: Re: [BUG] kernel side can NOT trigger memory error with einj

在 2022/3/21 AM10:43, Huang, Ying 写道:
> Shuai Xue <xueshuai@...ux.alibaba.com> writes:
> 
>> 在 2022/3/18 AM12:57, Luck, Tony 写道:
>>>> -       rc = apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR);
>>>> +       ptr = kmap(pfn_to_page(pfn));
>>>> +       tmp = *(ptr + (param1 & ~ PAGE_MASK));
>>>
>>> That hack works when the trigger action is just trying to access the injected
>>> location. But on Intel platforms the trigger "kicks" the patrol scrubber in the
>>> memory controller to access the address. So the error is triggered not by
>>> an access from the core, but by internal memory controller access.
>>>
>>> This results in a different error signature (for an uncorrected error injection
>>> it will be a UCNA or SRAO in Intel acronym-speak).
>>
>> As far as I know, APEI only defines five injection instructions, ACPI_EINJ_READ_REGISTER,
>> ACPI_EINJ_READ_REGISTER_VALUE, ACPI_EINJ_WRITE_REGISTER, ACPI_EINJ_WRITE_REGISTER_VALUE and
>> ACPI_EINJ_NOOP. ACPI_EINJ_TRIGGER_ERROR action should run one of them, I don't see
>> any of them will kick the patrol scrubber. For example, trigger with ACPI_EINJ_READ_REGISTER:
>>
>> apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR)
>>     __apei_exec_run	// ins=0
>>         => apei_exec_read_register
>>             => apei_read
>>                 => acpi_os_read_memory
>>                     => acpi_map_vaddr_lookup    /* lookup VA of PA from acpi_ioremap */
>>                     => acpi_os_ioremap
>> 		    => acpi_os_read_iomem
>> 			=> *(u32 *) value = readl(virt_addr);
>>
>> As we can see, the error is triggered by access from the core. However, the physical
>> address can NOT be mapped by acpi_os_ioremap.
>>
>> If I missed anything, please let me know. Thank you very much.


> If you write a device register, the device can kick the patrol scrubber
> for you.  This device behavior needs not to be defined in APEI spec.

I see, thank you. In our platform, patrol scrubber triggers deferred error, and the fatal
error is triggered by an access from CPU.

> As the name suggested, ACPI_EINJ_READ/WRITE_REGISTER are used to
> read/write device registers via iomem.  They aren't used to read/write
> normal physical memory.  If that's needed, you can try some other method
> I guess.

I think so, should we add new injection instructions to address this problem,
e.g. ACPI_EINJ_READ_MEMORY implemented by kmap?

By the way, commit fdea163d8c17 ("ACPI, APEI, EINJ, Fix resource conflict on some
machine") removes the injecting memory address range which conflits with
regular memory from trigger table resources. It make sense when calling
apei_resources_request(). **However, the actual mapping operation in
apei_exec_pre_map_gars() with trigger_ctx. And the conflit physical address
is still in trigger_ctx.**

		// drivers/acpi/apei/einj.c: __einj_error_trigger
		trigger_param_region = einj_get_trigger_parameter_region(
			trigger_tab, param1, param2);
		if (trigger_param_region) {
			...
		}

If the trigger_param_region is valid which means that the triggered address is
ACPI_ADR_SPACE_SYSTEM_MEMORY, then we should not use apei_exec_pre_map_gars to
map like a register, right? If we have ACPI_EINJ_READ_MEMORY, then we can directly
run ACPI_EINJ_TRIGGER_ERROR through ACPI_EINJ_READ_MEMORY.

Best Regards
Shuai






Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ