lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20210210185319.chharluce2ly4cne@intel.com> Date: Wed, 10 Feb 2021 10:53:19 -0800 From: Ben Widawsky <ben.widawsky@...el.com> To: Jonathan Cameron <Jonathan.Cameron@...wei.com> Cc: linux-cxl@...r.kernel.org, linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org, linux-nvdimm@...ts.01.org, linux-pci@...r.kernel.org, Bjorn Helgaas <helgaas@...nel.org>, Chris Browy <cbrowy@...ry-design.com>, Christoph Hellwig <hch@...radead.org>, Dan Williams <dan.j.williams@...el.com>, David Hildenbrand <david@...hat.com>, David Rientjes <rientjes@...gle.com>, Ira Weiny <ira.weiny@...el.com>, Jon Masters <jcm@...masters.org>, Rafael Wysocki <rafael.j.wysocki@...el.com>, Randy Dunlap <rdunlap@...radead.org>, Vishal Verma <vishal.l.verma@...el.com>, "John Groves (jgroves)" <jgroves@...ron.com>, "Kelley, Sean V" <sean.v.kelley@...el.com> Subject: Re: [PATCH v2 2/8] cxl/mem: Find device capabilities On 21-02-10 17:41:04, Jonathan Cameron wrote: > On Tue, 9 Feb 2021 16:02:53 -0800 > Ben Widawsky <ben.widawsky@...el.com> wrote: > > > Provide enough functionality to utilize the mailbox of a memory device. > > The mailbox is used to interact with the firmware running on the memory > > device. The flow is proven with one implemented command, "identify". > > Because the class code has already told the driver this is a memory > > device and the identify command is mandatory. > > > > CXL devices contain an array of capabilities that describe the > > interactions software can have with the device or firmware running on > > the device. A CXL compliant device must implement the device status and > > the mailbox capability. Additionally, a CXL compliant memory device must > > implement the memory device capability. Each of the capabilities can > > [will] provide an offset within the MMIO region for interacting with the > > CXL device. > > > > The capabilities tell the driver how to find and map the register space > > for CXL Memory Devices. The registers are required to utilize the CXL > > spec defined mailbox interface. The spec outlines two mailboxes, primary > > and secondary. The secondary mailbox is earmarked for system firmware, > > and not handled in this driver. > > > > Primary mailboxes are capable of generating an interrupt when submitting > > a background command. That implementation is saved for a later time. > > > > Link: https://www.computeexpresslink.org/download-the-specification > > Signed-off-by: Ben Widawsky <ben.widawsky@...el.com> > > Reviewed-by: Dan Williams <dan.j.williams@...el.com> > > A few more comments inline (proper review whereas my other reply was a > bug chase). > > Jonathan > > > --- > > drivers/cxl/Kconfig | 14 + > > drivers/cxl/cxl.h | 93 +++++++ > > drivers/cxl/mem.c | 511 +++++++++++++++++++++++++++++++++- > > drivers/cxl/pci.h | 13 + > > include/uapi/linux/pci_regs.h | 1 + > > 5 files changed, 630 insertions(+), 2 deletions(-) > > create mode 100644 drivers/cxl/cxl.h > > > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > > index 9e80b311e928..c4ba3aa0a05d 100644 > > --- a/drivers/cxl/Kconfig > > +++ b/drivers/cxl/Kconfig > > @@ -32,4 +32,18 @@ config CXL_MEM > > Chapter 2.3 Type 3 CXL Device in the CXL 2.0 specification. > > > > If unsure say 'm'. > > + > > +config CXL_MEM_INSECURE_DEBUG > > + bool "CXL.mem debugging" > > As mentioned below, this makes me a tiny bit uncomfortable. > > > + depends on CXL_MEM > > + help > > + Enable debug of all CXL command payloads. > > + > > + Some CXL devices and controllers support encryption and other > > + security features. The payloads for the commands that enable > > + those features may contain sensitive clear-text security > > + material. Disable debug of those command payloads by default. > > + If you are a kernel developer actively working on CXL > > + security enabling say Y, otherwise say N. > > + > > endif > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > > new file mode 100644 > > index 000000000000..745f5e0bfce3 > > --- /dev/null > > +++ b/drivers/cxl/cxl.h > > @@ -0,0 +1,93 @@ > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > +/* Copyright(c) 2020 Intel Corporation. */ > > + > > +#ifndef __CXL_H__ > > +#define __CXL_H__ > > + > > +#include <linux/bitfield.h> > > +#include <linux/bitops.h> > > +#include <linux/io.h> > > + > > +/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */ > > +#define CXLDEV_CAP_ARRAY_OFFSET 0x0 > > +#define CXLDEV_CAP_ARRAY_CAP_ID 0 > > +#define CXLDEV_CAP_ARRAY_ID_MASK GENMASK(15, 0) > > +#define CXLDEV_CAP_ARRAY_COUNT_MASK GENMASK(47, 32) > > +/* CXL 2.0 8.2.8.2.1 CXL Device Capabilities */ > > +#define CXLDEV_CAP_CAP_ID_DEVICE_STATUS 0x1 > > +#define CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX 0x2 > > +#define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3 > > +#define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000 > > + > > +/* CXL 2.0 8.2.8.4 Mailbox Registers */ > > +#define CXLDEV_MBOX_CAPS_OFFSET 0x00 > > +#define CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0) > > +#define CXLDEV_MBOX_CTRL_OFFSET 0x04 > > +#define CXLDEV_MBOX_CTRL_DOORBELL BIT(0) > > +#define CXLDEV_MBOX_CMD_OFFSET 0x08 > > +#define CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK(15, 0) > > +#define CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK(36, 16) > > +#define CXLDEV_MBOX_STATUS_OFFSET 0x10 > > +#define CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK(47, 32) > > +#define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18 > > +#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20 > > + > > +/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */ > > +#define CXLMDEV_STATUS_OFFSET 0x0 > > +#define CXLMDEV_DEV_FATAL BIT(0) > > +#define CXLMDEV_FW_HALT BIT(1) > > +#define CXLMDEV_STATUS_MEDIA_STATUS_MASK GENMASK(3, 2) > > +#define CXLMDEV_MS_NOT_READY 0 > > +#define CXLMDEV_MS_READY 1 > > +#define CXLMDEV_MS_ERROR 2 > > +#define CXLMDEV_MS_DISABLED 3 > > +#define CXLMDEV_READY(status) \ > > + (FIELD_GET(CXLMDEV_STATUS_MEDIA_STATUS_MASK, status) == \ > > + CXLMDEV_MS_READY) > > +#define CXLMDEV_MBOX_IF_READY BIT(4) > > +#define CXLMDEV_RESET_NEEDED_MASK GENMASK(7, 5) > > +#define CXLMDEV_RESET_NEEDED_NOT 0 > > +#define CXLMDEV_RESET_NEEDED_COLD 1 > > +#define CXLMDEV_RESET_NEEDED_WARM 2 > > +#define CXLMDEV_RESET_NEEDED_HOT 3 > > +#define CXLMDEV_RESET_NEEDED_CXL 4 > > +#define CXLMDEV_RESET_NEEDED(status) \ > > + (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \ > > + CXLMDEV_RESET_NEEDED_NOT) > > + > > +/** > > + * struct cxl_mem - A CXL memory device > > + * @pdev: The PCI device associated with this CXL device. > > + * @regs: IO mappings to the device's MMIO > > + * @status_regs: CXL 2.0 8.2.8.3 Device Status Registers > > + * @mbox_regs: CXL 2.0 8.2.8.4 Mailbox Registers > > + * @memdev_regs: CXL 2.0 8.2.8.5 Memory Device Registers > > + * @payload_size: Size of space for payload > > + * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) > > + * @mbox_mutex: Mutex to synchronize mailbox access. > > + * @firmware_version: Firmware version for the memory device. > > + * @pmem: Persistent memory capacity information. > > + * @ram: Volatile memory capacity information. > > + */ > > +struct cxl_mem { > > + struct pci_dev *pdev; > > + void __iomem *regs; > > + > > + void __iomem *status_regs; > > + void __iomem *mbox_regs; > > + void __iomem *memdev_regs; > > + > > + size_t payload_size; > > + struct mutex mbox_mutex; /* Protects device mailbox and firmware */ > > + char firmware_version[0x10]; > > + > > + struct { > > + struct range range; > > + } pmem; > > Christoph raised this in v1, and I agree with him that his would me more compact > and readable as > > struct range pmem_range; > struct range ram_range; > > The discussion seemed to get lost without getting resolved that I can see. > I had been waiting for Dan to chime in, since he authored it. I'll change it and he can yell if he cares. > > + > > + struct { > > + struct range range; > > + } ram; > > > +}; > > + > > +#endif /* __CXL_H__ */ > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > > index 99a6571508df..0a868a15badc 100644 > > --- a/drivers/cxl/mem.c > > +++ b/drivers/cxl/mem.c > > > ... > > > +static void cxl_mem_mbox_timeout(struct cxl_mem *cxlm, > > + struct mbox_cmd *mbox_cmd) > > +{ > > + struct device *dev = &cxlm->pdev->dev; > > + > > + dev_dbg(dev, "Mailbox command (opcode: %#x size: %zub) timed out\n", > > + mbox_cmd->opcode, mbox_cmd->size_in); > > + > > + if (IS_ENABLED(CONFIG_CXL_MEM_INSECURE_DEBUG)) { > > Hmm. Whilst I can see the advantage of this for debug, I'm not sure we want > it upstream even under a rather evil looking CONFIG variable. > > Is there a bigger lock we can use to avoid chance of accidental enablement? Any suggestions? I'm told this functionality was extremely valuable for NVDIMM, though I haven't personally experienced it. > > > > + print_hex_dump_debug("Payload ", DUMP_PREFIX_OFFSET, 16, 1, > > + mbox_cmd->payload_in, mbox_cmd->size_in, > > + true); > > + } > > +} > > + > > +/** > > + * cxl_mem_mbox_send_cmd() - Send a mailbox command to a memory device. > > + * @cxlm: The CXL memory device to communicate with. > > + * @mbox_cmd: Command to send to the memory device. > > + * > > + * Context: Any context. Expects mbox_lock to be held. > > + * Return: -ETIMEDOUT if timeout occurred waiting for completion. 0 on success. > > + * Caller should check the return code in @mbox_cmd to make sure it > > + * succeeded. > > + * > > + * This is a generic form of the CXL mailbox send command, thus the only I/O > > + * operations used are cxl_read_mbox_reg(). Memory devices, and perhaps other > > + * types of CXL devices may have further information available upon error > > + * conditions. > > + * > > + * The CXL spec allows for up to two mailboxes. The intention is for the primary > > + * mailbox to be OS controlled and the secondary mailbox to be used by system > > + * firmware. This allows the OS and firmware to communicate with the device and > > + * not need to coordinate with each other. The driver only uses the primary > > + * mailbox. > > + */ > > +static int cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm, > > + struct mbox_cmd *mbox_cmd) > > +{ > > + void __iomem *payload = cxlm->mbox_regs + CXLDEV_MBOX_PAYLOAD_OFFSET; > > + u64 cmd_reg, status_reg; > > + size_t out_len; > > + int rc; > > + > > + lockdep_assert_held(&cxlm->mbox_mutex); > > + > > + /* > > + * Here are the steps from 8.2.8.4 of the CXL 2.0 spec. > > + * 1. Caller reads MB Control Register to verify doorbell is clear > > + * 2. Caller writes Command Register > > + * 3. Caller writes Command Payload Registers if input payload is non-empty > > + * 4. Caller writes MB Control Register to set doorbell > > + * 5. Caller either polls for doorbell to be clear or waits for interrupt if configured > > + * 6. Caller reads MB Status Register to fetch Return code > > + * 7. If command successful, Caller reads Command Register to get Payload Length > > + * 8. If output payload is non-empty, host reads Command Payload Registers > > + * > > + * Hardware is free to do whatever it wants before the doorbell is rung, > > + * and isn't allowed to change anything after it clears the doorbell. As > > + * such, steps 2 and 3 can happen in any order, and steps 6, 7, 8 can > > + * also happen in any order (though some orders might not make sense). > > + */ > > + > > + /* #1 */ > > + if (cxl_doorbell_busy(cxlm)) { > > + dev_err_ratelimited(&cxlm->pdev->dev, > > + "Mailbox re-busy after acquiring\n"); > > + return -EBUSY; > > + } > > + > > + cmd_reg = FIELD_PREP(CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK, > > + mbox_cmd->opcode); > > + if (mbox_cmd->size_in) { > > + if (WARN_ON(!mbox_cmd->payload_in)) > > + return -EINVAL; > > + > > + cmd_reg |= FIELD_PREP(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, > > + mbox_cmd->size_in); > > + memcpy_toio(payload, mbox_cmd->payload_in, mbox_cmd->size_in); > > + } > > + > > + /* #2, #3 */ > > + writeq(cmd_reg, cxlm->mbox_regs + CXLDEV_MBOX_CMD_OFFSET); > > + > > + /* #4 */ > > + dev_dbg(&cxlm->pdev->dev, "Sending command\n"); > > + writel(CXLDEV_MBOX_CTRL_DOORBELL, > > + cxlm->mbox_regs + CXLDEV_MBOX_CTRL_OFFSET); > > + > > + /* #5 */ > > + rc = cxl_mem_wait_for_doorbell(cxlm); > > + if (rc == -ETIMEDOUT) { > > + cxl_mem_mbox_timeout(cxlm, mbox_cmd); > > + return rc; > > + } > > + > > + /* #6 */ > > + status_reg = readq(cxlm->mbox_regs + CXLDEV_MBOX_STATUS_OFFSET); > > + mbox_cmd->return_code = > > + FIELD_GET(CXLDEV_MBOX_STATUS_RET_CODE_MASK, status_reg); > > + > > + if (mbox_cmd->return_code != 0) { > > + dev_dbg(&cxlm->pdev->dev, "Mailbox operation had an error\n"); > > + return 0; > > See earlier diversion whilst I was chasing my bug (another branch of this > thread) > > > + } > > + > > + /* #7 */ > > + cmd_reg = readq(cxlm->mbox_regs + CXLDEV_MBOX_CMD_OFFSET); > > + out_len = FIELD_GET(CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK, cmd_reg); > > + > > + /* #8 */ > > + if (out_len && mbox_cmd->payload_out) > > + memcpy_fromio(mbox_cmd->payload_out, payload, out_len); > > + > > + mbox_cmd->size_out = out_len; > > + > > + return 0; > > +} > > + > > > ... > > > +static struct cxl_mem *cxl_mem_create(struct pci_dev *pdev, u32 reg_lo, > > + u32 reg_hi) > > +{ > > + struct device *dev = &pdev->dev; > > + struct cxl_mem *cxlm; > > + void __iomem *regs; > > + u64 offset; > > + u8 bar; > > + int rc; > > + > > + cxlm = devm_kzalloc(&pdev->dev, sizeof(*cxlm), GFP_KERNEL); > > + if (!cxlm) { > > + dev_err(dev, "No memory available\n"); > > + return NULL; > > + } > > + > > + offset = ((u64)reg_hi << 32) | FIELD_GET(CXL_REGLOC_ADDR_MASK, reg_lo); > > + bar = FIELD_GET(CXL_REGLOC_BIR_MASK, reg_lo); > > + > > + /* Basic sanity check that BAR is big enough */ > > + if (pci_resource_len(pdev, bar) < offset) { > > + dev_err(dev, "BAR%d: %pr: too small (offset: %#llx)\n", bar, > > + &pdev->resource[bar], (unsigned long long)offset); > > + return NULL; > > + } > > + > > + rc = pcim_iomap_regions(pdev, BIT(bar), pci_name(pdev)); > > + if (rc != 0) { > > if (rc) > > > + dev_err(dev, "failed to map registers\n"); > > + return NULL; > > + } > > + regs = pcim_iomap_table(pdev)[bar]; > > + > > + mutex_init(&cxlm->mbox_mutex); > > + cxlm->pdev = pdev; > > + cxlm->regs = regs + offset; > > + > > + dev_dbg(dev, "Mapped CXL Memory Device resource\n"); > > + return cxlm; > > +} > > > > ... > > > static int cxl_mem_probe(struct pci_dev *pdev, const struct pci_device_id *id) > > { > > struct device *dev = &pdev->dev; > > - int regloc; > > + struct cxl_mem *cxlm; > > + int rc, regloc, i; > > + u32 regloc_size; > > + > > + rc = pcim_enable_device(pdev); > > + if (rc) > > + return rc; > > > > regloc = cxl_mem_dvsec(pdev, PCI_DVSEC_ID_CXL_REGLOC_OFFSET); > > if (!regloc) { > > @@ -39,7 +509,44 @@ static int cxl_mem_probe(struct pci_dev *pdev, const struct pci_device_id *id) > > return -ENXIO; > > } > > > > - return 0; > > + /* Get the size of the Register Locator DVSEC */ > > + pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, ®loc_size); > > + regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size); > > + > > + regloc += PCI_DVSEC_ID_CXL_REGLOC_BLOCK1_OFFSET; > > + > > + rc = -ENXIO; > > + for (i = regloc; i < regloc + regloc_size; i += 8) { > > + u32 reg_lo, reg_hi; > > + u8 reg_type; > > + > > + /* "register low and high" contain other bits */ > > high doesn't contain any other bits so that's a tiny bit misleading. > > > + pci_read_config_dword(pdev, i, ®_lo); > > + pci_read_config_dword(pdev, i + 4, ®_hi); > > + > > + reg_type = FIELD_GET(CXL_REGLOC_RBI_MASK, reg_lo); > > + > > + if (reg_type == CXL_REGLOC_RBI_MEMDEV) { > > + rc = 0; > > I sort of assumed this unusual structure was to allow for some future > change, but checked end result and it still looks like this. > So, drop the rc assignment here and... > [snip] > > return -ENODEV; > [snip] > > With above direct return, only get here if rc = -ENXIO. > Could just as easily check if i >= regloc + regloc_size then it's > obvious this is kind of canonical form of 'not found'. > > > Alternative would be to treat the above as a 'find' loop then > have the clxm = cxl_mem_create() outside of the loop. > I don't recall honestly, but I think it was meant to help distinguish the failure type. ENXIO - No register locator found ENODEV - Some BAR or other resource not found/mapped. I think this distinction is shown through debug messages or lack of, so I'm fine to just make it -ENODEV in any failure. > > > + > > + rc = cxl_mem_setup_regs(cxlm); > > + if (rc) > > + return rc; > > + > > + rc = cxl_mem_setup_mailbox(cxlm); > > + if (rc) > > + return rc; > > + > > + return cxl_mem_identify(cxlm); > > } > > > > static const struct pci_device_id cxl_mem_pci_tbl[] = { > > diff --git a/drivers/cxl/pci.h b/drivers/cxl/pci.h > > index f135b9f7bb21..ffcbc13d7b5b 100644 > > --- a/drivers/cxl/pci.h > > +++ b/drivers/cxl/pci.h > > @@ -14,5 +14,18 @@ > > #define PCI_DVSEC_ID_CXL 0x0 > > > > #define PCI_DVSEC_ID_CXL_REGLOC_OFFSET 0x8 > > +#define PCI_DVSEC_ID_CXL_REGLOC_BLOCK1_OFFSET 0xC > > + > > +/* BAR Indicator Register (BIR) */ > > +#define CXL_REGLOC_BIR_MASK GENMASK(2, 0) > > + > > +/* Register Block Identifier (RBI) */ > > +#define CXL_REGLOC_RBI_MASK GENMASK(15, 8) > > +#define CXL_REGLOC_RBI_EMPTY 0 > > +#define CXL_REGLOC_RBI_COMPONENT 1 > > +#define CXL_REGLOC_RBI_VIRT 2 > > +#define CXL_REGLOC_RBI_MEMDEV 3 > > + > > +#define CXL_REGLOC_ADDR_MASK GENMASK(31, 16) > > CXL_REGLOCL_ADDR_LOW_MASK perhaps for clarity? > > > > > #endif /* __CXL_PCI_H__ */ > > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > > index e709ae8235e7..6267ca9ae683 100644 > > --- a/include/uapi/linux/pci_regs.h > > +++ b/include/uapi/linux/pci_regs.h > > @@ -1080,6 +1080,7 @@ > > > > /* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ > > #define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */ > > +#define PCI_DVSEC_HEADER1_LENGTH_MASK 0xFFF00000 > > Seems sensible to add the revision mask as well. > The vendor id currently read using a word read rather than dword, but perhaps > neater to add that as well for completeness? > > Having said that, given Bjorn's comment on clashes and the fact he'd rather see > this stuff defined in drivers and combined later (see review patch 1 and follow > the link) perhaps this series should not touch this header at all. I'm fine to move it back. > > > #define PCI_DVSEC_HEADER2 0x8 /* Designated Vendor-Specific Header2 */ > > > > /* Data Link Feature */ >
Powered by blists - more mailing lists