[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <60002AD49B3F6BD5+20250708020150.GB216877@nic-Precision-5820-Tower>
Date: Tue, 8 Jul 2025 10:01:50 +0800
From: Yibo Dong <dong100@...se.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, horms@...nel.org, corbet@....net,
andrew+netdev@...n.ch, gur.stavi@...wei.com, maddy@...ux.ibm.com,
mpe@...erman.id.au, danishanwar@...com, lee@...ger.us,
gongfan1@...wei.com, lorenzo@...nel.org, geert+renesas@...der.be,
Parthiban.Veerasooran@...rochip.com, lukas.bulwahn@...hat.com,
alexanderduyck@...com, netdev@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 04/15] net: rnpgbe: Add get_capability mbx_fw ops support
On Mon, Jul 07, 2025 at 02:09:23PM +0200, Andrew Lunn wrote:
> On Mon, Jul 07, 2025 at 03:37:43PM +0800, Yibo Dong wrote:
> > On Fri, Jul 04, 2025 at 08:25:12PM +0200, Andrew Lunn wrote:
> > > > +/**
> > > > + * mucse_fw_send_cmd_wait - Send cmd req and wait for response
> > > > + * @hw: Pointer to the HW structure
> > > > + * @req: Pointer to the cmd req structure
> > > > + * @reply: Pointer to the fw reply structure
> > > > + *
> > > > + * mucse_fw_send_cmd_wait sends req to pf-cm3 mailbox and wait
> > > > + * reply from fw.
> > > > + *
> > > > + * Returns 0 on success, negative on failure
> > > > + **/
> > > > +static int mucse_fw_send_cmd_wait(struct mucse_hw *hw,
> > > > + struct mbx_fw_cmd_req *req,
> > > > + struct mbx_fw_cmd_reply *reply)
> > > > +{
> > > > + int err;
> > > > + int retry_cnt = 3;
> > > > +
> > > > + if (!hw || !req || !reply || !hw->mbx.ops.read_posted)
> > >
> > > Can this happen?
> > >
> > > If this is not supposed to happen, it is better the driver opps, so
> > > you get a stack trace and find where the driver is broken.
> > >
> > Yes, it is not supposed to happen. So, you means I should remove this
> > check in order to get opps when this condition happen?
>
> You should remove all defensive code. Let is explode with an Opps, so
> you can find your bugs.
>
Got it.
> > > > + return -EINVAL;
> > > > +
> > > > + /* if pcie off, nothing todo */
> > > > + if (pci_channel_offline(hw->pdev))
> > > > + return -EIO;
> > >
> > > What can cause it to go offline? Is this to do with PCIe hotplug?
> > >
> > Yes, I try to get a PCIe hotplug condition by 'pci_channel_offline'.
> > If that happens, driver should never do bar-read/bar-write, so return
> > here.
>
> I don't know PCI hotplug too well, but i assume the driver core will
> call the .release function. Can this function be called as part of
> release? What actually happens on the PCI bus when you try to access a
> device which no longer exists?
>
This function maybe called as part of release:
->release
-->unregister_netdev
--->ndo_stop
---->this function
Based on what I have come across, some devices return 0xffffffff, while
others maybe hang when try to access a device which no longer
exists.
> How have you tested this? Do you have the ability to do a hot{un}plug?
>
I tested hot{un}plug with an ocp-card before.
But I think all the codes related to pcie hot{un}plug should be in a
separate patch, I should move it to that patch.
> > > > + if (mutex_lock_interruptible(&hw->mbx.lock))
> > > > + return -EAGAIN;
> > >
> > > mutex_lock_interruptable() returns -EINTR, which is what you should
> > > return, not -EAGAIN.
> > >
> > Got it, I should return '-EINTR' here.
>
> No, you should return whatever mutex_lock_interruptable()
> returns. Whenever you call a function which returns an error code, you
> should pass that error code up the call stack. Never replace one error
> code with another.
>
Ok, I see.
> > > > + if (reply->error_code)
> > > > + return -reply->error_code;
> > >
> > > The mbox is using linux error codes?
> > >
> > It is used only between driver and fw, yay be just samply like this:
> > 0 -- no error
> > not 0 -- error
> > So, it is not using linux error codes.
>
> Your functions should always use linux/POSIX error codes. So if your
> firmware says an error has happened, turn it into a linux/POSIX error
> code. EINVAL, TIMEDOUT, EIO, whatever makes the most sense.
>
> Andrew
>
Got it, I will turn it into a linux/POSIX error code.
Thanks for your feedback.
Powered by blists - more mailing lists