[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yhxlrqqt4cuxp2hkv7nm7h5i25jjaxjhmuzhlvpfwb24jga7o2@f47d4wqe75tu>
Date: Thu, 16 Jan 2025 17:40:07 +0530
From: Siddharth Vadapalli <s-vadapalli@...com>
To: Roger Quadros <rogerq@...nel.org>
CC: Siddharth Vadapalli <s-vadapalli@...com>,
Andrew Lunn
<andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
"Eric
Dumazet" <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>,
Grygorii Strashko <grygorii.strashko@...com>, <srk@...com>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in
am65_cpsw_nuss_remove_tx_chns()
On Thu, Jan 16, 2025 at 01:47:59PM +0200, Roger Quadros wrote:
>
>
> On 16/01/2025 07:15, Siddharth Vadapalli wrote:
> > On Wed, Jan 15, 2025 at 06:38:57PM +0200, Roger Quadros wrote:
> >> Siddharth,
> >>
> >> On 15/01/2025 17:49, Roger Quadros wrote:
> >>> Hi Siddharth,
> >>>
> >>> On 15/01/2025 12:38, Siddharth Vadapalli wrote:
> >>>> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote:
> >>>>> Hi Siddharth,
> >>>>>
> >>>>> On 15/01/2025 07:18, Siddharth Vadapalli wrote:
> >>>>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote:
> >>>>>>
> >>>>>> Hello Roger,
> >>>>>>
> >>>>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns
> >>>>>>
> >>>>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to
> >>>>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows:
> >>>>>
> >>>>> Yes I meant tx instead of rx.
> >>>>>
> >>>>>>
> >>>>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
> >>>>>>
> >>>>>> Additionally, following the above section we have:
> >>>>>>
> >>>>>> if (tx_chn->irq < 0) {
> >>>>>> dev_err(dev, "Failed to get tx dma irq %d\n",
> >>>>>> tx_chn->irq);
> >>>>>> ret = tx_chn->irq;
> >>>>>> goto err;
> >>>>>> }
> >>>>>>
> >>>>>> Could you please provide details on the code-path which will lead to a
> >>>>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"?
> >>>>>>
> >>>>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely:
> >>>>>> 1. am65_cpsw_nuss_update_tx_rx_chns()
> >>>>>> 2. am65_cpsw_nuss_suspend()
> >>>>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only
> >>>>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it
> >>>>>> appears to me that "tx_chn->irq" will never be negative within
> >>>>>> am65_cpsw_nuss_remove_tx_chns()
> >>>>>>
> >>>>>> Please let me know if I have overlooked something.
> >>>>>
> >>>>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called
> >>>>> repeatedly (by user changing number of TX queues) even if previous call
> >>>>> to am65_cpsw_nuss_init_tx_chns() failed.
> >>>>
> >>>> Thank you for clarifying. So the issue/bug was discovered since the
> >>>> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag
> >>>> misled me. Maybe the "Fixes" tag should be updated? Though we should
> >>>> code to future-proof it as done in this patch, the "Fixes" tag pointing
> >>>> to the very first commit of the driver might not be accurate as the
> >>>> code-path associated with the bug cannot be exercised at that commit.
> >>>
> >>> Fair enough. I'll change the Fixes commit.
> >>
> >> Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(),
> >> am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns()
> >> were all introduced in the Fixes commit I stated.
> >>
> >> Could you please share why you thought it is not accurate?
> >
> > Though the functions were introduced in the Fixes commit that you have
> > mentioned in the commit message, the check for "tx_chn->irq" being
> > strictly positive as implemented in this patch, is not required until
> > the commit which added am65_cpsw_nuss_update_tx_rx_chns(). The reason
> > I say so is that a negative value for "tx_chn->irq" within
> > am65_cpsw_nuss_remove_tx_chns() requires am65_cpsw_nuss_init_tx_chns()
> > to partially fail *followed by* invocation of
> > am65_cpsw_nuss_remove_tx_chns(). That isn't possible in the blamed
> > commit which introduced them, since the driver probe fails when
> > am65_cpsw_nuss_init_tx_chns() fails. The code path:
> >
> > am65_cpsw_nuss_init_tx_chns() => Partially fails / Fails
> > am65_cpsw_nuss_remove_tx_chns() => Invoked later on
> >
> > isn't possible in the blamed commit.
>
> But, am65_cpsw_nuss_update_tx_chns() and am65_cpsw_set_channels() was
> introduced in the blamed commit and the test case I shared to
> test .set_channels with different channel counts can still
> fail with warning if am65_cpsw_nuss_init_tx_chns() partially fails.
I was looking for "am65_cpsw_nuss_update_tx_rx_chns()" in the blamed
commit. I realize now that it was renamed from
am65_cpsw_nuss_update_tx_chns() which indeed is present in the blamed
commit. I apologize for the confusion caused.
Regards,
Siddharth.
Powered by blists - more mailing lists