lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YtAl9WlqbiCFslcD@iweiny-desk3>
Date:   Thu, 14 Jul 2022 07:19:33 -0700
From:   Ira Weiny <ira.weiny@...el.com>
To:     Dan Williams <dan.j.williams@...el.com>
CC:     Bjorn Helgaas <bhelgaas@...gle.com>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        "Li, Ming" <ming4.li@...el.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        Lukas Wunner <lukas@...ner.de>,
        Alison Schofield <alison.schofield@...el.com>,
        Vishal Verma <vishal.l.verma@...el.com>,
        "Dave Jiang" <dave.jiang@...el.com>,
        Ben Widawsky <bwidawsk@...nel.org>,
        <linux-kernel@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
        <linux-pci@...r.kernel.org>
Subject: Re: [PATCH V13 3/9] PCI: Create PCIe library functions in support of
 DOE mailboxes.

On Wed, Jul 13, 2022 at 08:16:53PM -0700, Dan Williams wrote:
> ira.weiny@ wrote:
> > From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> > 

[snip]

> > +
> > +static int pci_doe_abort(struct pci_doe_mb *doe_mb)
> > +{
> > +	struct pci_dev *pdev = doe_mb->pdev;
> > +	int offset = doe_mb->cap_offset;
> > +	unsigned long timeout_jiffies;
> > +
> > +	pci_dbg(pdev, "[%x] Issuing Abort\n", offset);
> > +
> > +	timeout_jiffies = jiffies + PCI_DOE_TIMEOUT;
> > +	pci_doe_write_ctrl(doe_mb, PCI_DOE_CTRL_ABORT);
> > +
> > +	do {
> > +		u32 val;
> > +
> > +		if (pci_doe_wait(doe_mb, PCI_DOE_POLL_INTERVAL))
> > +			return -EIO;
> 
> nit, why translate the pci_doe_wait() return value? Not worth respinning
> just for this though.

Ok.

[snip]

> > +static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
> > +{
> > +	struct pci_dev *pdev = doe_mb->pdev;
> > +	int offset = doe_mb->cap_offset;
> > +	size_t length, payload_length;
> > +	u32 val;
> > +	int i;
> > +
> > +	/* Read the first dword to get the protocol */
> > +	pci_read_config_dword(pdev, offset + PCI_DOE_READ, &val);
> > +	if ((FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_1_VID, val) != task->prot.vid) ||
> > +	    (FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_1_TYPE, val) != task->prot.type)) {
> > +		pci_err(pdev,
> > +			"[%x] expected [VID, Protocol] = [%04x, %02x], got [%04x, %02x]\n",
> > +			doe_mb->cap_offset,
> > +			task->prot.vid, task->prot.type,
> > +			FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_1_VID, val),
> > +			FIELD_GET(PCI_DOE_DATA_OBJECT_HEADER_1_TYPE, val));
> 
> Is this an error that happens when userspace collides DOE accesses with
> the kernel? Or is this a "*should* never happen" error?
> 
> If this has the potential to spam the log it probably deserves to be
> pci_err_ratelimited() (dev_err_ratelimited()).
> 
> ...but just a question not a request to change.

I was originally viewing other actors as BIOS or something like an FM.  But
thinking about it yea userspace could be poking at this.

Of course we still need to close on restricting user space access.

Yes, ratelimited would be best.

[snip]

> > +static void signal_task_abort(struct pci_doe_task *task, int rv)
> > +{
> > +	struct pci_doe_mb *doe_mb = task->doe_mb;
> > +	struct pci_dev *pdev = doe_mb->pdev;
> > +
> > +	if (pci_doe_abort(doe_mb)) {
> > +		/*
> > +		 * If the device can't process an abort; set the mailbox dead
> > +		 *	- no more submissions
> > +		 */
> > +		pci_err(pdev, "[%x] Abort failed marking mailbox dead\n",
> > +			doe_mb->cap_offset);
> 
> Feels like a dev_dbg(), no?

No.  If this happens we are stopping all processing on the mailbox.  So I think
it needs to be an error such that the admin knows what happened.

> 
> > +		set_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags);
> > +	}
> > +	signal_task_complete(task, rv);
> > +}
> > +
> > +static void doe_statemachine_work(struct work_struct *work)
> > +{
> > +	struct pci_doe_task *task = container_of(work, struct pci_doe_task,
> > +						 work);
> > +	struct pci_doe_mb *doe_mb = task->doe_mb;
> > +	struct pci_dev *pdev = doe_mb->pdev;
> > +	int offset = doe_mb->cap_offset;
> > +	unsigned long timeout_jiffies;
> > +	u32 val;
> > +	int rc;
> > +
> > +	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags)) {
> > +		signal_task_complete(task, -EIO);
> > +		return;
> > +	}
> > +
> > +	/* Send request */
> > +	rc = pci_doe_send_req(doe_mb, task);
> > +
> > +	if (rc) {
> > +		/*
> > +		 * The specification does not provide any guidance on how to
> > +		 * resolve conflicting requests from other entities.
> > +		 * Furthermore, it is likely that busy will not be detected
> > +		 * most of the time.  Flag any detection of status busy with an
> > +		 * error.
> > +		 */
> > +		if (rc == -EBUSY) {
> > +			pci_err(pdev,
> > +				"[%x] busy detected; another entity is sending conflicting requests\n",
> > +				offset);
> 
> This definitely feels like something that needs to be ratelimited.

I think you are right with the potential for user space to be poking at the
registers.

[snip]

> > +
> > +static int pci_doe_cache_protocols(struct pci_doe_mb *doe_mb)
> > +{
> > +	u8 index = 0;
> > +	u8 xa_idx = 0;
> > +
> > +	do {
> > +		int rc;
> > +		u16 vid;
> > +		u8 prot;
> > +
> > +		rc = pci_doe_discovery(doe_mb, &index, &vid, &prot);
> > +		if (rc)
> > +			return rc;
> > +
> > +		pci_dbg(doe_mb->pdev,
> > +			"[%x] Found protocol %d vid: %x prot: %x\n",
> > +			doe_mb->cap_offset, xa_idx, vid, prot);
> > +
> > +		rc = xa_insert(&doe_mb->prots, xa_idx++,
> > +			       pci_doe_xa_prot_entry(vid, prot), GFP_KERNEL);
> > +		if (rc)
> > +			return -ENOMEM;
> 
> Why translate the xa_insert() return value when xa_insert() can also
> report -EBUSY?

By definition there will never be a entry present because xa_idx is being
incremented.  But I think you are correct on principle that it is best to let
the error flow up.  Especially if this call is ever used for other purposes.

[snip]

> > +
> > +/**
> > + * pcim_doe_create_mb() - Create a DOE mailbox object
> > + *
> > + * @pdev: PCI device to create the DOE mailbox for
> > + * @cap_offset: Offset of the DOE mailbox
> > + *
> > + * Create a single mailbox object to manage the mailbox protocol at the
> > + * cap_offset specified.
> > + *
> > + * RETURNS: created mailbox object on success
> > + *	    ERR_PTR(-errno) on failure
> > + */
> > +struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset)
> > +{
> > +	struct pci_doe_mb *doe_mb;
> > +	struct device *dev = &pdev->dev;
> > +	int rc;
> > +
> > +	doe_mb = devm_kzalloc(dev, sizeof(*doe_mb), GFP_KERNEL);
> > +	if (!doe_mb)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	doe_mb->pdev = pdev;
> > +	doe_mb->cap_offset = cap_offset;
> > +	init_waitqueue_head(&doe_mb->wq);
> > +
> > +	xa_init(&doe_mb->prots);
> > +	rc = devm_add_action(dev, pci_doe_xa_destroy, doe_mb);
> > +	if (rc)
> > +		return ERR_PTR(rc);
> > +
> > +	doe_mb->work_queue = alloc_ordered_workqueue("DOE: [%x]", 0,
> > +						     doe_mb->cap_offset);
> 
> I don't see much in the way of end user useful information in that name.
> Not that the thread names matter all that much, but for debug it might
> be nice to be able to tie queue threads back to the corresponding
> device. So maybe put the PCI device name and dev_driver_string() in the
> name?

Good point.  I got too used to the pci_* calls including that.

[snip]

> > +
> > +/**
> > + * pci_doe_submit_task() - Submit a task to be processed by the state machine
> > + *
> > + * @doe_mb: DOE mailbox capability to submit to
> > + * @task: task to be queued
> > + *
> > + * Submit a DOE task (request/response) to the DOE mailbox to be processed.
> > + * Returns upon queueing the task object.  If the queue is full this function
> > + * will sleep until there is room in the queue.
> > + *
> > + * task->complete will be called when the state machine is done processing this
> > + * task.
> > + *
> > + * Excess data will be discarded.
> > + *
> > + * RETURNS: 0 when task has been successful queued, -ERRNO on error
> > + */
> > +int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task)
> > +{
> > +	if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type))
> > +		return -EINVAL;
> > +
> > +	/*
> > +	 * DOE requests must be a whole number of DW
> > +	 * and the response needs to be big enough for at least 1 DW
> > +	 */
> > +	if (task->request_pl_sz % sizeof(u32) ||
> > +	    task->response_pl_sz < sizeof(u32))
> > +		return -EINVAL;
> > +
> > +	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags))
> > +		return -EIO;
> > +
> > +	task->doe_mb = doe_mb;
> > +	INIT_WORK(&task->work, doe_statemachine_work);
> 
> Would be nice to move all init to the callers and, similar to
> submit_bio(), rely on the task already knowing all its submit
> parameters.

That could work.  But I think that is awkward.  doe_statemachine_work() is not
exported and is an internal implementation detail of the way the MB processes
the task.  The doe_mb parameter could be set up.  But again seems safer to deal
with it here.

I thought about wrapping struct pci_doe_task with an internal structure but it
seemed overly complex.

> Can be a follow-on cleanup.

Maybe I'll wrap struct pci_doe_task with an internal structure as a follow on.

[snip]

> > 
> 
> Hey, this looks pretty good, thanks for all the hard work on this.

I'll spin soon thanks,
Ira

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ