netdev - Re: [PATCH net-next v2 2/3] netconsole: pr_err() when netpoll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240821155404.5fc89ff6@kernel.org>
Date: Wed, 21 Aug 2024 15:54:04 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Breno Leitao <leitao@...ian.org>
Cc: davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v2 2/3] netconsole: pr_err() when netpoll_setup
 fails

On Wed, 21 Aug 2024 01:41:55 -0700 Breno Leitao wrote:
> On Tue, Aug 20, 2024 at 04:24:09PM -0700, Jakub Kicinski wrote:
> > On Mon, 19 Aug 2024 03:36:12 -0700 Breno Leitao wrote:  
> > > netpoll_setup() can fail in several ways, some of which print an error
> > > message, while others simply return without any message. For example,
> > > __netpoll_setup() returns in a few places without printing anything.
> > > 
> > > To address this issue, modify the code to print an error message on
> > > netconsole if the target is not enabled. This will help us identify and
> > > troubleshoot netcnsole issues related to netpoll setup failures
> > > more easily.  
> > 
> > Only if memory allocation fails, it seems, and memory allocation
> > failures with GFP_KERNEL will be quite noisy.  
> 
> Or anything that fails in ->ndo_netpoll_setup() and doesn't print
> anything else.

Which also only fails because of memory allocation AFAICT.

> Do you think this is useless?

I think it's better to push up more precise message into the fail sites.

> > BTW I looked thru 4 random implementations of ndo_netpoll_setup
> > and they look almost identical :S Perhaps they can be refactored?  
> 
> correct.  This should be refactored.
> 
> In fact, since you opened this topic, there are a few things that also
> come to my mind
> 
> 1) Possible reduce refill_skb() work in the critical path (UDP send
> path), moving it to a workqueue?
> 
> When sending a message, netpoll tries fill the whole skb poll, and then try to
> allocate a new skb before sending the packet. 
> 
> netconsole needs to write a message, which calls netpoll_send_udp()
> 
> 	send_ext_msg_udp() {
> 		netpoll_send_udp() {
> 			refill_skbs() {
> 				while (skb_pool.qlen < MAX_SKBS) {
> 					skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC);
> 				}
> 			}
> 			skb = alloc_skb(len, GFP_ATOMIC);
> 				if (!skb)
> 					skb = skb_dequeue(&skb_pool);
> 			}
> 		}
> 	}
> 		
> Would it be better if the hot path just get one of the skbs from the
> pool, and refill it in a workqueue? If the skb_poll() is empty, then
> alloc_skb(len, GFP_ATOMIC) !?

Yeah, that seems a bit odd. If you can't find anything in the history
that would explain this design - refactoring SG.

> 2) Report statistic back from netpoll_send_udp(). netpoll_send_skb()
> return values are being discarded, so, it is hard to know if the packet
> was transmitted or got something as NET_XMIT_DROP, NETDEV_TX_BUSY,
> NETDEV_TX_OK.
> 
> It is unclear where this should be reported two. Maybe a configfs entry?

Also sounds good. We don't use configfs much in networking so IDK if
it's okay to use it for stats. But no other obviously better place
comes to mind for me.