netdev - Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m4rhkzcr7dlylxr54udyt6lal5s2q4krrvmyay6gzgzhcu4q2c@r34snfumzqxy>
Date: Wed, 15 Jan 2025 16:08:14 +0530
From: Siddharth Vadapalli <s-vadapalli@...com>
To: Roger Quadros <rogerq@...nel.org>
CC: Siddharth Vadapalli <s-vadapalli@...com>,
        Andrew Lunn
	<andrew+netdev@...n.ch>,
        "David S. Miller" <davem@...emloft.net>,
        "Eric
 Dumazet" <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
	<pabeni@...hat.com>,
        Grygorii Strashko <grygorii.strashko@...com>, <srk@...com>,
        <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in
 am65_cpsw_nuss_remove_tx_chns()

On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote:
> Hi Siddharth,
> 
> On 15/01/2025 07:18, Siddharth Vadapalli wrote:
> > On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote:
> > 
> > Hello Roger,
> > 
> >> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns
> > 
> > You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to
> > assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows:
> 
> Yes I meant tx instead of rx.
> 
> > 
> > 		tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
> > 
> > Additionally, following the above section we have:
> > 
> > 		if (tx_chn->irq < 0) {
> > 			dev_err(dev, "Failed to get tx dma irq %d\n",
> > 				tx_chn->irq);
> > 			ret = tx_chn->irq;
> > 			goto err;
> > 		}
> > 
> > Could you please provide details on the code-path which will lead to a
> > negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"?
> > 
> > There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely:
> > 1. am65_cpsw_nuss_update_tx_rx_chns()
> > 2. am65_cpsw_nuss_suspend()
> > Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only
> > in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it
> > appears to me that "tx_chn->irq" will never be negative within
> > am65_cpsw_nuss_remove_tx_chns()
> > 
> > Please let me know if I have overlooked something.
> 
> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called
> repeatedly (by user changing number of TX queues) even if previous call
> to am65_cpsw_nuss_init_tx_chns() failed.

Thank you for clarifying. So the issue/bug was discovered since the
implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag
misled me. Maybe the "Fixes" tag should be updated? Though we should
code to future-proof it as done in this patch, the "Fixes" tag pointing
to the very first commit of the driver might not be accurate as the
code-path associated with the bug cannot be exercised at that commit.

Independent of the above change suggested for the "Fixes" tag,

Reviewed-by: Siddharth Vadapalli <s-vadapalli@...com>

There seems to be a different bug in am65_cpsw_nuss_update_tx_rx_chns()
which I have described below.

> 
> Please try the below patch to simulate the error condition.
> 
> Then do the following
> - bring down all network interfaces
> - change num TX queues to 2. IRQ for 2nd channel fails.
> - change num TX queues to 3. Now we try to free an invalid IRQ and we see warning.
> 
> Also I think it is good practice to code for set value than to code
> for existing code paths as they can change in the future.
> 
> --test patch starts--
> 
> diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> index 36c29d3db329..c22cadaaf3d3 100644
> --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
> @@ -155,7 +155,7 @@
>  			 NETIF_MSG_IFUP	| NETIF_MSG_PROBE | NETIF_MSG_IFDOWN | \
>  			 NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
>  
> -#define AM65_CPSW_DEFAULT_TX_CHNS	8
> +#define AM65_CPSW_DEFAULT_TX_CHNS	1
>  #define AM65_CPSW_DEFAULT_RX_CHN_FLOWS	1
>  
>  /* CPPI streaming packet interface */
> @@ -2346,7 +2348,10 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common)
>  		tx_chn->dsize_log2 = __fls(hdesc_size_out);
>  		WARN_ON(hdesc_size_out != (1 << tx_chn->dsize_log2));
>  
> -		tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
> +		if (i == 1)
> +			tx_chn->irq = -ENODEV;
> +		else
> +			tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);

The pair - am65_cpsw_nuss_init_tx_chns()/am65_cpsw_nuss_remove_tx_chns()
seem to be written under the assumption that failure will result in the
driver's probe failing.

With am65_cpsw_nuss_update_tx_rx_chns(), that assumption no longer holds
true. Please consider the following sequence:

1.
am65_cpsw_nuss_probe()
  am65_cpsw_nuss_register_ndevs()
    am65_cpsw_nuss_init_tx_chns() => Succeeds

2.
Probe is successful

3.
am65_cpsw_nuss_update_tx_rx_chns() => Invoked by user
  am65_cpsw_nuss_remove_tx_chns() => Succeeds
    am65_cpsw_nuss_init_tx_chns() => Partially fails
      devm_add_action(dev, am65_cpsw_nuss_free_tx_chns, common);
      ^ DEVM Action is added, but since the driver isn't removed,
      the cleanup via am65_cpsw_nuss_free_tx_chns() will not run.

Only when the user re-invokes am65_cpsw_nuss_update_tx_rx_chns(),
the cleanup will be performed. This might have to be fixed in the
following manner:

@@ -3416,10 +3416,17 @@ int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common,
        common->tx_ch_num = num_tx;
        common->rx_ch_num_flows = num_rx;
        ret = am65_cpsw_nuss_init_tx_chns(common);
-       if (ret)
+       if (ret) {
+               devm_remove_action(dev, am65_cpsw_nuss_free_tx_chns, common);
+               am65_cpsw_nuss_free_tx_chns(common);
                return ret;
+       }

        ret = am65_cpsw_nuss_init_rx_chns(common);
+       if (ret) {
+               devm_remove_action(dev, am65_cpsw_nuss_free_rx_chns, common);
+               am65_cpsw_nuss_free_rx_chns(common);
+       }

        return ret;
 }

 Please let me know what you think.


Regards,
Siddharth.