linux-kernel - Re: [PATCH 04/15] net: rnpgbe: Add get_capability mbx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <60002AD49B3F6BD5+20250708020150.GB216877@nic-Precision-5820-Tower>
Date: Tue, 8 Jul 2025 10:01:50 +0800
From: Yibo Dong <dong100@...se.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
	pabeni@...hat.com, horms@...nel.org, corbet@....net,
	andrew+netdev@...n.ch, gur.stavi@...wei.com, maddy@...ux.ibm.com,
	mpe@...erman.id.au, danishanwar@...com, lee@...ger.us,
	gongfan1@...wei.com, lorenzo@...nel.org, geert+renesas@...der.be,
	Parthiban.Veerasooran@...rochip.com, lukas.bulwahn@...hat.com,
	alexanderduyck@...com, netdev@...r.kernel.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 04/15] net: rnpgbe: Add get_capability mbx_fw ops support

On Mon, Jul 07, 2025 at 02:09:23PM +0200, Andrew Lunn wrote:
> On Mon, Jul 07, 2025 at 03:37:43PM +0800, Yibo Dong wrote:
> > On Fri, Jul 04, 2025 at 08:25:12PM +0200, Andrew Lunn wrote:
> > > > +/**
> > > > + * mucse_fw_send_cmd_wait - Send cmd req and wait for response
> > > > + * @hw: Pointer to the HW structure
> > > > + * @req: Pointer to the cmd req structure
> > > > + * @reply: Pointer to the fw reply structure
> > > > + *
> > > > + * mucse_fw_send_cmd_wait sends req to pf-cm3 mailbox and wait
> > > > + * reply from fw.
> > > > + *
> > > > + * Returns 0 on success, negative on failure
> > > > + **/
> > > > +static int mucse_fw_send_cmd_wait(struct mucse_hw *hw,
> > > > +				  struct mbx_fw_cmd_req *req,
> > > > +				  struct mbx_fw_cmd_reply *reply)
> > > > +{
> > > > +	int err;
> > > > +	int retry_cnt = 3;
> > > > +
> > > > +	if (!hw || !req || !reply || !hw->mbx.ops.read_posted)
> > > 
> > > Can this happen?
> > > 
> > > If this is not supposed to happen, it is better the driver opps, so
> > > you get a stack trace and find where the driver is broken.
> > > 
> > Yes, it is not supposed to happen. So, you means I should remove this
> > check in order to get opps when this condition happen?
> 
> You should remove all defensive code. Let is explode with an Opps, so
> you can find your bugs.
> 

Got it.

> > > > +		return -EINVAL;
> > > > +
> > > > +	/* if pcie off, nothing todo */
> > > > +	if (pci_channel_offline(hw->pdev))
> > > > +		return -EIO;
> > > 
> > > What can cause it to go offline? Is this to do with PCIe hotplug?
> > > 
> > Yes, I try to get a PCIe hotplug condition by 'pci_channel_offline'.
> > If that happens, driver should never do bar-read/bar-write, so return
> > here.
> 
> I don't know PCI hotplug too well, but i assume the driver core will
> call the .release function. Can this function be called as part of
> release? What actually happens on the PCI bus when you try to access a
> device which no longer exists?
> 

This function maybe called as part of release:
->release
-->unregister_netdev
--->ndo_stop
---->this function
Based on what I have come across, some devices return 0xffffffff, while 
others maybe hang when try to access a device which no longer
exists.

> How have you tested this? Do you have the ability to do a hot{un}plug?
> 

I tested hot{un}plug with an ocp-card before.
But I think all the codes related to pcie hot{un}plug should be in a
separate patch, I should move it to that patch.

> > > > +	if (mutex_lock_interruptible(&hw->mbx.lock))
> > > > +		return -EAGAIN;
> > > 
> > > mutex_lock_interruptable() returns -EINTR, which is what you should
> > > return, not -EAGAIN.
> > > 
> > Got it, I should return '-EINTR' here.
> 
> No, you should return whatever mutex_lock_interruptable()
> returns. Whenever you call a function which returns an error code, you
> should pass that error code up the call stack. Never replace one error
> code with another.
> 

Ok, I see.

> > > > +	if (reply->error_code)
> > > > +		return -reply->error_code;
> > > 
> > > The mbox is using linux error codes? 
> > > 
> > It is used only between driver and fw, yay be just samply like this: 
> > 0     -- no error
> > not 0 -- error
> > So, it is not using linux error codes.
> 
> Your functions should always use linux/POSIX error codes. So if your
> firmware says an error has happened, turn it into a linux/POSIX error
> code. EINVAL, TIMEDOUT, EIO, whatever makes the most sense.
> 
> 	Andrew
> 

Got it, I will turn it into a linux/POSIX error code.

Thanks for your feedback.