[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2d17324-39b8-4db5-85e7-bd66e67fcd52@kernel.org>
Date: Thu, 16 Jan 2025 13:47:59 +0200
From: Roger Quadros <rogerq@...nel.org>
To: Siddharth Vadapalli <s-vadapalli@...com>
Cc: Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Grygorii Strashko <grygorii.strashko@...com>, srk@...com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in
am65_cpsw_nuss_remove_tx_chns()
On 16/01/2025 07:15, Siddharth Vadapalli wrote:
> On Wed, Jan 15, 2025 at 06:38:57PM +0200, Roger Quadros wrote:
>> Siddharth,
>>
>> On 15/01/2025 17:49, Roger Quadros wrote:
>>> Hi Siddharth,
>>>
>>> On 15/01/2025 12:38, Siddharth Vadapalli wrote:
>>>> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote:
>>>>> Hi Siddharth,
>>>>>
>>>>> On 15/01/2025 07:18, Siddharth Vadapalli wrote:
>>>>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote:
>>>>>>
>>>>>> Hello Roger,
>>>>>>
>>>>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns
>>>>>>
>>>>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to
>>>>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows:
>>>>>
>>>>> Yes I meant tx instead of rx.
>>>>>
>>>>>>
>>>>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
>>>>>>
>>>>>> Additionally, following the above section we have:
>>>>>>
>>>>>> if (tx_chn->irq < 0) {
>>>>>> dev_err(dev, "Failed to get tx dma irq %d\n",
>>>>>> tx_chn->irq);
>>>>>> ret = tx_chn->irq;
>>>>>> goto err;
>>>>>> }
>>>>>>
>>>>>> Could you please provide details on the code-path which will lead to a
>>>>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"?
>>>>>>
>>>>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely:
>>>>>> 1. am65_cpsw_nuss_update_tx_rx_chns()
>>>>>> 2. am65_cpsw_nuss_suspend()
>>>>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only
>>>>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it
>>>>>> appears to me that "tx_chn->irq" will never be negative within
>>>>>> am65_cpsw_nuss_remove_tx_chns()
>>>>>>
>>>>>> Please let me know if I have overlooked something.
>>>>>
>>>>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called
>>>>> repeatedly (by user changing number of TX queues) even if previous call
>>>>> to am65_cpsw_nuss_init_tx_chns() failed.
>>>>
>>>> Thank you for clarifying. So the issue/bug was discovered since the
>>>> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag
>>>> misled me. Maybe the "Fixes" tag should be updated? Though we should
>>>> code to future-proof it as done in this patch, the "Fixes" tag pointing
>>>> to the very first commit of the driver might not be accurate as the
>>>> code-path associated with the bug cannot be exercised at that commit.
>>>
>>> Fair enough. I'll change the Fixes commit.
>>
>> Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(),
>> am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns()
>> were all introduced in the Fixes commit I stated.
>>
>> Could you please share why you thought it is not accurate?
>
> Though the functions were introduced in the Fixes commit that you have
> mentioned in the commit message, the check for "tx_chn->irq" being
> strictly positive as implemented in this patch, is not required until
> the commit which added am65_cpsw_nuss_update_tx_rx_chns(). The reason
> I say so is that a negative value for "tx_chn->irq" within
> am65_cpsw_nuss_remove_tx_chns() requires am65_cpsw_nuss_init_tx_chns()
> to partially fail *followed by* invocation of
> am65_cpsw_nuss_remove_tx_chns(). That isn't possible in the blamed
> commit which introduced them, since the driver probe fails when
> am65_cpsw_nuss_init_tx_chns() fails. The code path:
>
> am65_cpsw_nuss_init_tx_chns() => Partially fails / Fails
> am65_cpsw_nuss_remove_tx_chns() => Invoked later on
>
> isn't possible in the blamed commit.
But, am65_cpsw_nuss_update_tx_chns() and am65_cpsw_set_channels() was
introduced in the blamed commit and the test case I shared to
test .set_channels with different channel counts can still
fail with warning if am65_cpsw_nuss_init_tx_chns() partially fails.
>
> Regards,
> Siddharth.
--
cheers,
-roger
Powered by blists - more mailing lists