linux-kernel - Re: [PATCH] net: fec: handle page_pool_dev_alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bef66501-f769-4196-924c-1ec9ba2cfc93@lenbrook.com>
Date: Tue, 24 Dec 2024 15:20:16 -0500
From: Kevin Groeneveld <kgroeneveld@...brook.com>
To: Wei Fang <wei.fang@....com>
Cc: Shenwei Wang <shenwei.wang@....com>, Clark Wang <xiaoning.wang@....com>,
 Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 "imx@...ts.linux.dev" <imx@...ts.linux.dev>,
 "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] net: fec: handle page_pool_dev_alloc_pages error

On 2024-12-24 07:56, Wei Fang wrote:
> Please simplify the log, it's too long.

I will simplify it.

>> @@ -1943,10 +1943,12 @@ static int fec_enet_rx_napi(struct napi_struct
>> *napi, int budget)
>>   	struct fec_enet_private *fep = netdev_priv(ndev);
>>   	int done = 0;
>>
>> +	fep->rx_err_nomem = false;
>> +
>>   	do {
>>   		done += fec_enet_rx(ndev, budget - done);
>>   		fec_enet_tx(ndev, budget);
>> -	} while ((done < budget) && fec_enet_collect_events(fep));
>> +	} while ((done < budget) && !fep->rx_err_nomem &&
>> fec_enet_collect_events(fep));
> 
> Is the condition "!fep->rx_err_nomem" necessary here? If not, then there
> is no need to add this variable to fec_enet_private.

For my test case it often seems to loop forever without making any 
progress unless I add that condition.

> One situation I am concerned about is that when the issue occurs, the Rx
> rings are full. At the same time, because the 'done < budget' condition is
> met, the interrupt mode will be used to receive the packets. However,
> since the Rx rings are full, no Rx interrupt events will be generated. This
> means that the packets on the Rx rings may not be received by the CPU
> for a long time unless Tx interrupt events are generated.

These are the types of things I was worried might exist with my patch.

> Another approach is to discard the packets when the issue occurs, as
> shown below. Note that the following modification has not been verified.
> 
> -static void fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq,
> +static int fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq,
>                                  struct bufdesc *bdp, int index)
>   {
>          struct page *new_page;
>          dma_addr_t phys_addr;
> 
>          new_page = page_pool_dev_alloc_pages(rxq->page_pool);
> -       WARN_ON(!new_page);
> +       if (unlikely(!new_page))
> +               return -ENOMEM;
> +
>          rxq->rx_skb_info[index].page = new_page;
> 
>          rxq->rx_skb_info[index].offset = FEC_ENET_XDP_HEADROOM;
>          phys_addr = page_pool_get_dma_addr(new_page) + FEC_ENET_XDP_HEADROOM;
>          bdp->cbd_bufaddr = cpu_to_fec32(phys_addr);
> +
> +       return 0;
>   }
> 
>   static u32
> @@ -1771,7 +1775,10 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
>                                          pkt_len,
>                                          DMA_FROM_DEVICE);
>                  prefetch(page_address(page));
> -               fec_enet_update_cbd(rxq, bdp, index);
> +               if (fec_enet_update_cbd(rxq, bdp, index)) {
> +                       ndev->stats.rx_dropped++;
> +                       goto rx_processing_done;
> +               }
> 
>                  if (xdp_prog) {
>                          xdp_buff_clear_frags_flag(&xdp);

Thanks for the suggestion. I had considered something similar but I was 
not sure it was safe to just jump to rx_processing_done at that point in 
the code. I will try your patch and if it seems to work okay I will 
submit a new version.

I probably will not have time to work on this further until the new year.

Kevin