lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8829d58b-fbfc-4040-93de-51970631d935@kernel.org>
Date: Wed, 15 Jan 2025 18:38:57 +0200
From: Roger Quadros <rogerq@...nel.org>
To: Siddharth Vadapalli <s-vadapalli@...com>
Cc: Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Grygorii Strashko <grygorii.strashko@...com>, srk@...com,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in
 am65_cpsw_nuss_remove_tx_chns()

Siddharth,

On 15/01/2025 17:49, Roger Quadros wrote:
> Hi Siddharth,
> 
> On 15/01/2025 12:38, Siddharth Vadapalli wrote:
>> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote:
>>> Hi Siddharth,
>>>
>>> On 15/01/2025 07:18, Siddharth Vadapalli wrote:
>>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote:
>>>>
>>>> Hello Roger,
>>>>
>>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns
>>>>
>>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to
>>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows:
>>>
>>> Yes I meant tx instead of rx.
>>>
>>>>
>>>> 		tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
>>>>
>>>> Additionally, following the above section we have:
>>>>
>>>> 		if (tx_chn->irq < 0) {
>>>> 			dev_err(dev, "Failed to get tx dma irq %d\n",
>>>> 				tx_chn->irq);
>>>> 			ret = tx_chn->irq;
>>>> 			goto err;
>>>> 		}
>>>>
>>>> Could you please provide details on the code-path which will lead to a
>>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"?
>>>>
>>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely:
>>>> 1. am65_cpsw_nuss_update_tx_rx_chns()
>>>> 2. am65_cpsw_nuss_suspend()
>>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only
>>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it
>>>> appears to me that "tx_chn->irq" will never be negative within
>>>> am65_cpsw_nuss_remove_tx_chns()
>>>>
>>>> Please let me know if I have overlooked something.
>>>
>>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called
>>> repeatedly (by user changing number of TX queues) even if previous call
>>> to am65_cpsw_nuss_init_tx_chns() failed.
>>
>> Thank you for clarifying. So the issue/bug was discovered since the
>> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag
>> misled me. Maybe the "Fixes" tag should be updated? Though we should
>> code to future-proof it as done in this patch, the "Fixes" tag pointing
>> to the very first commit of the driver might not be accurate as the
>> code-path associated with the bug cannot be exercised at that commit.
> 
> Fair enough. I'll change the Fixes commit.

Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(),
am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns()
were all introduced in the Fixes commit I stated.

Could you please share why you thought it is not accurate?

> 
>>
>> Independent of the above change suggested for the "Fixes" tag,
>>
>> Reviewed-by: Siddharth Vadapalli <s-vadapalli@...com>
>>
>> There seems to be a different bug in am65_cpsw_nuss_update_tx_rx_chns()
>> which I have described below.
>>
>>>
>>> Please try the below patch to simulate the error condition.
>>>
>>> Then do the following
>>> - bring down all network interfaces
>>> - change num TX queues to 2. IRQ for 2nd channel fails.
>>> - change num TX queues to 3. Now we try to free an invalid IRQ and we see warning.
>>>
>>> Also I think it is good practice to code for set value than to code
>>> for existing code paths as they can change in the future.
>>>
>>> --test patch starts--
>>>
>>> diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
>>> index 36c29d3db329..c22cadaaf3d3 100644
>>> --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
>>> +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
>>> @@ -155,7 +155,7 @@
>>>  			 NETIF_MSG_IFUP	| NETIF_MSG_PROBE | NETIF_MSG_IFDOWN | \
>>>  			 NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
>>>  
>>> -#define AM65_CPSW_DEFAULT_TX_CHNS	8
>>> +#define AM65_CPSW_DEFAULT_TX_CHNS	1
>>>  #define AM65_CPSW_DEFAULT_RX_CHN_FLOWS	1
>>>  
>>>  /* CPPI streaming packet interface */
>>> @@ -2346,7 +2348,10 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common)
>>>  		tx_chn->dsize_log2 = __fls(hdesc_size_out);
>>>  		WARN_ON(hdesc_size_out != (1 << tx_chn->dsize_log2));
>>>  
>>> -		tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
>>> +		if (i == 1)
>>> +			tx_chn->irq = -ENODEV;
>>> +		else
>>> +			tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
>>
>> The pair - am65_cpsw_nuss_init_tx_chns()/am65_cpsw_nuss_remove_tx_chns()
>> seem to be written under the assumption that failure will result in the
>> driver's probe failing.
>>
>> With am65_cpsw_nuss_update_tx_rx_chns(), that assumption no longer holds
>> true. Please consider the following sequence:
>>
>> 1.
>> am65_cpsw_nuss_probe()
>>   am65_cpsw_nuss_register_ndevs()
>>     am65_cpsw_nuss_init_tx_chns() => Succeeds
>>
>> 2.
>> Probe is successful
>>
>> 3.
>> am65_cpsw_nuss_update_tx_rx_chns() => Invoked by user
>>   am65_cpsw_nuss_remove_tx_chns() => Succeeds
>>     am65_cpsw_nuss_init_tx_chns() => Partially fails
>>       devm_add_action(dev, am65_cpsw_nuss_free_tx_chns, common);
>>       ^ DEVM Action is added, but since the driver isn't removed,
>>       the cleanup via am65_cpsw_nuss_free_tx_chns() will not run.
>>
>> Only when the user re-invokes am65_cpsw_nuss_update_tx_rx_chns(),
>> the cleanup will be performed. This might have to be fixed in the
>> following manner:
>>
>> @@ -3416,10 +3416,17 @@ int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common,
>>         common->tx_ch_num = num_tx;
>>         common->rx_ch_num_flows = num_rx;
>>         ret = am65_cpsw_nuss_init_tx_chns(common);
>> -       if (ret)
>> +       if (ret) {
>> +               devm_remove_action(dev, am65_cpsw_nuss_free_tx_chns, common);
>> +               am65_cpsw_nuss_free_tx_chns(common);
>>                 return ret;
>> +       }
>>
>>         ret = am65_cpsw_nuss_init_rx_chns(common);
>> +       if (ret) {
>> +               devm_remove_action(dev, am65_cpsw_nuss_free_rx_chns, common);
>> +               am65_cpsw_nuss_free_rx_chns(common);
>> +       }
>>
>>         return ret;
>>  }
>>
>>  Please let me know what you think.
> 
> I've already implemented a cleanup series to get rid of devm_add/remove_action,
> cleanup probe error path and streamline TX and RQ queue init/cleanup.
> I'll send out the series soon as soon as I finish some tests.
> 

-- 
cheers,
-roger


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ