[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <997efa76-20b4-bf0e-2249-e5c850e52fbf@linux.alibaba.com>
Date: Wed, 20 Oct 2021 11:16:38 +0800
From: Xuesong Chen <xuesong.chen@...ux.alibaba.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: catalin.marinas@....com, lorenzo.pieralisi@....com,
james.morse@....com, will@...nel.org, rafael@...nel.org,
tony.luck@...el.com, bp@...en8.de, mingo@...nel.org,
bhelgaas@...gle.com, linux-pci@...r.kernel.org,
linux-acpi@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, Huang Ying <ying.huang@...el.com>
Subject: Re: [PATCH v3 2/2] ACPI: APEI: Filter the PCI MCFG address with an
arch-agnostic method
On 20/10/2021 03:23, Bjorn Helgaas wrote:
> [+cc Huang in case I'm misinterpreting EINJ resource requests]
>
> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
>> The commit d91525eb8ee6 ("ACPI, EINJ: Enhance error injection tolerance
>> level") fixes the issue that the ACPI/APEI can not access the PCI MCFG
>> address on x86 platform, but this issue can also happen on other
>> architectures, for instance, we got below error message on arm64 platform:
>> ...
>> APEI: Can not request [mem 0x50100000-0x50100003] for APEI EINJ Trigger registers
>
> I'm guessing this address is something in the MCFG area?
Right, correct! The address 0x50100000 is within the MCFG area.
> I wish we
> could also include the matching MCFG info For x86, we put something
> like this in the dmesg log, but I guess pci_mcfg.c doesn't do this:
>
> PCI: MMCONFIG for domain 0000 [bus 00-7f] at [mem 0xf0000000-0xf7ffffff] (base 0xf0000000)
>
I can add the similar dmesg log in pci_mcfg.c to make it's consistent between different arches
>> This patch will try to handle this case in a more common way instead of the
>> original 'arch' specific solution, which will be beneficial to all the
>> APEI-dependent platforms after that.
>
> This actually doesn't say anything about what the patch does or how it
> works. It says "handles this case in a more common way" but with no
> details.
Good suggestion, I'll give more details about that...
> The EINJ table contains "injection instructions" that can read or
> write "register regions" described by generic address structures (see
> ACPI v6.3, sec 18.6.2 and 18.6.3), and __einj_error_trigger() requests
> those register regions with request_mem_region() or request_region()
> before executing the injections instructions.
>
> IIUC, this patch basically says "if this region is part of the MCFG
> area, we don't need to reserve it." That leads to the questions of why
> we need to reserve *any* of the areas
AFAIK, the MCFG area is reserved since the ECAM module will provide a
generic Kernel Programming Interfaces(KPI), e.g, pci_generic_config_read(...),
so all the drivers are allowed to access the pci config space only by those
KPIs in a consistent and safe way, direct raw access will break the rule.
Correct me if I am missing sth.
> and why it's safe to simply skip
> reserving regions that are part of the MCFG area.
Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error injection tolerance
level") before to address this issue, the entire commit log as below:
Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
specific errors. EINJ will report errors as below when hitting such
cases:
APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
It is because on x86 platform ACPI based PCI MMCFG logic has
reserved all MMCFG spaces so that EINJ can't reserve it again.
We already trust the ACPI/APEI code when using the EINJ interface
so it is not a big leap to also trust it to access the right
MMCFG addresses. Skip address checking to allow the access.
Signed-off-by: Chen, Gong <gong.chen@...ux.intel.com>
Signed-off-by: Tony Luck <tony.luck@...el.com>
Except that the above explanation, IMO the EINJ is only a RAS debug framework,
in this code path, sometimes we need to acesss the address within the MCFG space
directly to trigger kind of HW error, which behavior does not like the normal
device driver's, in this case some possible unsafe operations(bypass the ecam ops)
can be mitigated because the touched device will generate some HW errors and the
RAS handling part will preempt its corresponding drivers to fix/log the HW error,
that's my understanding about that.
>
>> Signed-off-by: Xuesong Chen <xuesong.chen@...ux.alibaba.com>
>> Reported-by: kernel test robot <lkp@...el.com>
>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@....com>
>> Cc: Catalin Marinas <catalin.marinas@....com>
>> Cc: James Morse <james.morse@....com>
>> Cc: Will Deacon <will@...nel.org>
>> Cc: Rafael. J. Wysocki <rafael@...nel.org>
>> Cc: Tony Luck <tony.luck@...el.com>
>> Cc: Tomasz Nowicki <tn@...ihalf.com>
>> ---
>> arch/x86/pci/mmconfig-shared.c | 28 --------------------------
>> drivers/acpi/apei/apei-base.c | 45 ++++++++++++++++++++++++++++--------------
>> 2 files changed, 30 insertions(+), 43 deletions(-)
>>
>> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
>> index 0b961fe6..12f7d96 100644
>> --- a/arch/x86/pci/mmconfig-shared.c
>> +++ b/arch/x86/pci/mmconfig-shared.c
>> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
>> return 0;
>> }
>>
>> -#ifdef CONFIG_ACPI_APEI
>> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>> - void *data), void *data);
>> -
>> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
>> - void *data), void *data)
>> -{
>> - struct pci_mmcfg_region *cfg;
>> - int rc;
>> -
>> - if (list_empty(&pci_mmcfg_list))
>> - return 0;
>> -
>> - list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>> - rc = func(cfg->res.start, resource_size(&cfg->res), data);
>> - if (rc)
>> - return rc;
>> - }
>> -
>> - return 0;
>> -}
>> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
>> -#else
>> -#define set_apei_filter()
>> -#endif
>> -
>> static void __init __pci_mmcfg_init(int early)
>> {
>> pci_mmcfg_reject_broken(early);
>> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void)
>> else
>> acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
>> __pci_mmcfg_init(1);
>> -
>> - set_apei_filter();
>> }
>> }
>>
>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
>> index c7fdb12..daae75a 100644
>> --- a/drivers/acpi/apei/apei-base.c
>> +++ b/drivers/acpi/apei/apei-base.c
>> @@ -21,6 +21,7 @@
>> #include <linux/kernel.h>
>> #include <linux/module.h>
>> #include <linux/init.h>
>> +#include <linux/pci.h>
>> #include <linux/acpi.h>
>> #include <linux/slab.h>
>> #include <linux/io.h>
>> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources)
>> return acpi_nvs_for_each_region(apei_get_res_callback, resources);
>> }
>>
>> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>> - void *data), void *data);
>> -static int apei_get_arch_resources(struct apei_resources *resources)
>> +#ifdef CONFIG_PCI
>> +extern struct list_head pci_mmcfg_list;
>> +static int apei_filter_mcfg_addr(struct apei_resources *res,
>> + struct apei_resources *mcfg_res)
>> +{
>> + int rc = 0;
>> + struct pci_mmcfg_region *cfg;
>> +
>> + if (list_empty(&pci_mmcfg_list))
>> + return 0;
>> +
>> + apei_resources_init(mcfg_res);
>> + list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>> + rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
>> + if (rc)
>> + return rc;
>> + }
>>
>> + /* filter the mcfg resource from current APEI's */
>> + return apei_resources_sub(res, mcfg_res);
>> +}
>> +#else
>> +static inline int apei_filter_mcfg_addr(struct apei_resources *res,
>> + struct apei_resources *mcfg_res)
>> {
>> - return arch_apei_filter_addr(apei_get_res_callback, resources);
>> + return 0;
>> }
>> +#endif
>>
>> /*
>> * IO memory/port resource management mechanism is used to check
>> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources,
>> if (rc)
>> goto nvs_res_fini;
>>
>> - if (arch_apei_filter_addr) {
>> - apei_resources_init(&arch_res);
>> - rc = apei_get_arch_resources(&arch_res);
>> - if (rc)
>> - goto arch_res_fini;
>> - rc = apei_resources_sub(resources, &arch_res);
>> - if (rc)
>> - goto arch_res_fini;
>> - }
>> + rc = apei_filter_mcfg_addr(resources, &arch_res);
>> + if (rc)
>> + goto arch_res_fini;
>>
>> rc = -EINVAL;
>> list_for_each_entry(res, &resources->iomem, list) {
>> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources,
>> release_mem_region(res->start, res->end - res->start);
>> }
>> arch_res_fini:
>> - if (arch_apei_filter_addr)
>> - apei_resources_fini(&arch_res);
>> + apei_resources_fini(&arch_res);
>> nvs_res_fini:
>> apei_resources_fini(&nvs_resources);
>> return rc;
>> --
>> 1.8.3.1
>>
Powered by blists - more mailing lists