[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <671708cd6d84f_232629418@dwillia2-xfh.jf.intel.com.notmuch>
Date: Mon, 21 Oct 2024 19:07:09 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Li Zhijian <lizhijian@...itsu.com>, <linux-cxl@...r.kernel.org>
CC: Davidlohr Bueso <dave@...olabs.net>, Jonathan Cameron
<jonathan.cameron@...wei.com>, Dave Jiang <dave.jiang@...el.com>, "Alison
Schofield" <alison.schofield@...el.com>, Vishal Verma
<vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>, Dan Williams
<dan.j.williams@...el.com>, <linux-kernel@...r.kernel.org>, Li Zhijian
<lizhijian@...itsu.com>, "Huang, Ying" <ying.huang@...el.com>
Subject: Re: [PATCH v2] cxl/core: Return error when
cxl_endpoint_gather_bandwidth() handles a non-PCI device
Li Zhijian wrote:
> The function cxl_endpoint_gather_bandwidth() invokes
> pci_bus_read/write_XXX(), however, not all CXL devices are presently
> implemented via PCI. It is recognized that the cxl_test has realized a CXL
> device using a platform device.
> Calling pci_bus_read/write_XXX() in cxl_test will cause kernel panic:
I like that you include the failure info. I would also trim it to just
the salient information, like this:
platform cxl_host_bridge.3: host supports CXL (restricted)
Oops: general protection fault, probably for non-canonical address 0x3ef17856fcae4fbd: 0000 [#1] PREEMPT SMP PTI
RIP: 0010:pci_bus_read_config_word+0x1c/0x60
Call Trace:
<TASK>
? __die_body.cold+0x19/0x27
? die_addr+0x38/0x60
? exc_general_protection+0x1f5/0x4b0
? asm_exc_general_protection+0x22/0x30
? pci_bus_read_config_word+0x1c/0x60
pcie_capability_read_word+0x93/0xb0
pcie_link_speed_mbps+0x18/0x50
cxl_pci_get_bandwidth+0x18/0x60 [cxl_core]
cxl_endpoint_gather_bandwidth.constprop.0+0xf4/0x230 [cxl_core]
? xas_store+0x54/0x660
? preempt_count_add+0x69/0xa0
? _raw_spin_lock+0x13/0x40
? __kmalloc_cache_noprof+0xe7/0x270
cxl_region_shared_upstream_bandwidth_update+0x9c/0x790 [cxl_core]
cxl_region_attach+0x520/0x7e0 [cxl_core]
store_targetN+0xf2/0x120 [cxl_core]
> And Ying also reported a KASAN error with similar calltrace.
>
> Reported-by: "Huang, Ying" <ying.huang@...el.com>
> Closes: https://lore.kernel.org/linux-cxl/87y12w9vp5.fsf@yhuang6-desk2.ccr.corp.intel.com/
Minor, but this can also be trimmed:
Closes: http://lore.kernel.org/87y12w9vp5.fsf@yhuang6-desk2.ccr.corp.intel.com
> Fixes: a5ab0de0ebaa ("cxl: Calculate region bandwidth of targets with shared upstream link")
>
> Signed-off-by: Li Zhijian <lizhijian@...itsu.com>
> ---
> V2:
> Check device type in original cxl_endpoint_gather_bandwidth() instead of mocking a new one. # Dan
> Also noticed that the existing cxl_switch_gather_bandwidth() also have the same check.
> Signed-off-by: Li Zhijian <lizhijian@...itsu.com>
> ---
> drivers/cxl/core/cdat.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index ef1621d40f05..1a510e692ac0 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -641,6 +641,9 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr,
> void *ptr;
> int rc;
>
> + if (!dev_is_pci(cxlds->dev))
> + return -EINVAL;
This should be -ENODEV or -ENXIO. If this error code ever leaked out to
userspace the application would think it passed an "invalid argument" vs
encountering "no such device".
Feel free to add:
Reviewed-by: Dan Williams <dan.j.williams@...el.com>
...after fixing that up.
Powered by blists - more mailing lists