[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0004a78f-6802-4c47-9f1f-414cdecb2b2e@kernel.org>
Date: Thu, 16 Jan 2025 15:00:47 +0200
From: Roger Quadros <rogerq@...nel.org>
To: Siddharth Vadapalli <s-vadapalli@...com>
Cc: Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Grygorii Strashko <grygorii.strashko@...com>, srk@...com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in
am65_cpsw_nuss_remove_tx_chns()
On 16/01/2025 14:10, Siddharth Vadapalli wrote:
> On Thu, Jan 16, 2025 at 01:47:59PM +0200, Roger Quadros wrote:
>>
>>
>> On 16/01/2025 07:15, Siddharth Vadapalli wrote:
>>> On Wed, Jan 15, 2025 at 06:38:57PM +0200, Roger Quadros wrote:
>>>> Siddharth,
>>>>
>>>> On 15/01/2025 17:49, Roger Quadros wrote:
>>>>> Hi Siddharth,
>>>>>
>>>>> On 15/01/2025 12:38, Siddharth Vadapalli wrote:
>>>>>> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote:
>>>>>>> Hi Siddharth,
>>>>>>>
>>>>>>> On 15/01/2025 07:18, Siddharth Vadapalli wrote:
>>>>>>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote:
>>>>>>>>
>>>>>>>> Hello Roger,
>>>>>>>>
>>>>>>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns
>>>>>>>>
>>>>>>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to
>>>>>>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows:
>>>>>>>
>>>>>>> Yes I meant tx instead of rx.
>>>>>>>
>>>>>>>>
>>>>>>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
>>>>>>>>
>>>>>>>> Additionally, following the above section we have:
>>>>>>>>
>>>>>>>> if (tx_chn->irq < 0) {
>>>>>>>> dev_err(dev, "Failed to get tx dma irq %d\n",
>>>>>>>> tx_chn->irq);
>>>>>>>> ret = tx_chn->irq;
>>>>>>>> goto err;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Could you please provide details on the code-path which will lead to a
>>>>>>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"?
>>>>>>>>
>>>>>>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely:
>>>>>>>> 1. am65_cpsw_nuss_update_tx_rx_chns()
>>>>>>>> 2. am65_cpsw_nuss_suspend()
>>>>>>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only
>>>>>>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it
>>>>>>>> appears to me that "tx_chn->irq" will never be negative within
>>>>>>>> am65_cpsw_nuss_remove_tx_chns()
>>>>>>>>
>>>>>>>> Please let me know if I have overlooked something.
>>>>>>>
>>>>>>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called
>>>>>>> repeatedly (by user changing number of TX queues) even if previous call
>>>>>>> to am65_cpsw_nuss_init_tx_chns() failed.
>>>>>>
>>>>>> Thank you for clarifying. So the issue/bug was discovered since the
>>>>>> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag
>>>>>> misled me. Maybe the "Fixes" tag should be updated? Though we should
>>>>>> code to future-proof it as done in this patch, the "Fixes" tag pointing
>>>>>> to the very first commit of the driver might not be accurate as the
>>>>>> code-path associated with the bug cannot be exercised at that commit.
>>>>>
>>>>> Fair enough. I'll change the Fixes commit.
>>>>
>>>> Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(),
>>>> am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns()
>>>> were all introduced in the Fixes commit I stated.
>>>>
>>>> Could you please share why you thought it is not accurate?
>>>
>>> Though the functions were introduced in the Fixes commit that you have
>>> mentioned in the commit message, the check for "tx_chn->irq" being
>>> strictly positive as implemented in this patch, is not required until
>>> the commit which added am65_cpsw_nuss_update_tx_rx_chns(). The reason
>>> I say so is that a negative value for "tx_chn->irq" within
>>> am65_cpsw_nuss_remove_tx_chns() requires am65_cpsw_nuss_init_tx_chns()
>>> to partially fail *followed by* invocation of
>>> am65_cpsw_nuss_remove_tx_chns(). That isn't possible in the blamed
>>> commit which introduced them, since the driver probe fails when
>>> am65_cpsw_nuss_init_tx_chns() fails. The code path:
>>>
>>> am65_cpsw_nuss_init_tx_chns() => Partially fails / Fails
>>> am65_cpsw_nuss_remove_tx_chns() => Invoked later on
>>>
>>> isn't possible in the blamed commit.
>>
>> But, am65_cpsw_nuss_update_tx_chns() and am65_cpsw_set_channels() was
>> introduced in the blamed commit and the test case I shared to
>> test .set_channels with different channel counts can still
>> fail with warning if am65_cpsw_nuss_init_tx_chns() partially fails.
>
> I was looking for "am65_cpsw_nuss_update_tx_rx_chns()" in the blamed
> commit. I realize now that it was renamed from
> am65_cpsw_nuss_update_tx_chns() which indeed is present in the blamed
> commit. I apologize for the confusion caused.
>
No worries at all. I'll respin the patch with the typo fix and add more
description in commit log to clarify this.
--
cheers,
-roger
Powered by blists - more mailing lists