linux-kernel - Re: [PATCH net-next 1/3] net: netpoll: Defer skb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241101-prompt-carrot-hare-ff2aaa@leitao>
Date: Fri, 1 Nov 2024 11:18:29 -0700
From: Breno Leitao <leitao@...ian.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: horms@...nel.org, davem@...emloft.net, edumazet@...gle.com,
	pabeni@...hat.com, thepacketgeek@...il.com, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, davej@...emonkey.org.uk,
	vlad.wing@...il.com, max@...sevol.com, kernel-team@...a.com,
	jiri@...nulli.us, jv@...sburgh.net, andy@...yhouse.net,
	aehkn@...hub.one, Rik van Riel <riel@...riel.com>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH net-next 1/3] net: netpoll: Defer skb_pool population
 until setup success

On Fri, Nov 01, 2024 at 03:51:59AM -0700, Breno Leitao wrote:
> Hello Jakub,
> 
> On Thu, Oct 31, 2024 at 06:26:47PM -0700, Jakub Kicinski wrote:
> > On Fri, 25 Oct 2024 07:20:18 -0700 Breno Leitao wrote:
> > > The current implementation has a flaw where it populates the skb_pool
> > > with 32 SKBs before calling __netpoll_setup(). If the setup fails, the
> > > skb_pool buffer will persist indefinitely and never be cleaned up.
> > > 
> > > This change moves the skb_pool population to after the successful
> > > completion of __netpoll_setup(), ensuring that the buffers are not
> > > unnecessarily retained. Additionally, this modification alleviates rtnl
> > > lock pressure by allowing the buffer filling to occur outside of the
> > > lock.
> > 
> > arguably if the setup succeeds there would now be a window of time
> > where np is active but pool is empty.
> 
> I am not convinced this is a problem. Given that netpoll_setup() is only
> called from netconsole.
> 
> In netconsole, a target is not enabled (as in sending packets) until the
> netconsole target is, in fact, enabled. (nt->enabled = true). Enabling
> the target(nt) only happen after netpoll_setup() returns successfully.
> 
> Example:
> 
> 	static void write_ext_msg(struct console *con, const char *msg,
> 				  unsigned int len)
> 	{
> 		...
> 		list_for_each_entry(nt, &target_list, list)
> 			if (nt->extended && nt->enabled && netif_running(nt->np.dev))
> 				send_ext_msg_udp(nt, msg, len);
> 
> So, back to your point, the netpoll interface will be up, but, not used
> at all.
> 
> On top of that, two other considerations:
> 
>  * If the netpoll target is used without the buffer, it is not a big
> deal, since refill_skbs() is called, independently if the pool is full
> or not. (Which is not ideal and I eventually want to improve it).
> 
> Anyway, this is how the code works today:
> 
> 
> 	void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
> 	{
> 		...
> 		skb = find_skb(np, total_len + np->dev->needed_tailroom,...
> 		// transmit the skb
> 		
> 
> 	static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
> 	{
> 		...
> 		refill_skbs(np);
> 		skb = alloc_skb(len, GFP_ATOMIC);
> 		if (!skb)
> 			skb = skb_dequeue(&np->skb_pool);
> 		...
> 		// return the skb
> 
> So, even in there is a transmission in-between enabling the netpoll
> target and not populating the pool (which is NOT the case in the code
> today), it would not be a problem, given that netpoll_send_udp() will
> call refill_skbs() anyway.
> 
> I have an in-development patch to improve it, by deferring this to a
> workthread, mainly because this whole allocation dance is done with a
> bunch of locks held, including printk/console lock.
> 
> I think that a best mechanism might be something like:
> 
>  * If find_skb() needs to consume from the pool (which is rare, only
> when alloc_skb() fails), raise workthread that tries to repopulate the
> pool in the background. 
> 
>  * Eventually avoid alloc_skb() first, and getting directly from the
>    pool first, if the pool is depleted, try to alloc_skb(GPF_ATOMIC).
>    This might make the code faster, but, I don't have data yet.

I've hacked this case (getting the skb from the pool first and refilling
it on a workqueue) today, and the performance is expressive.

I've tested sending 2k messages, and meassured the time it takes to
run `netpoll_send_udp`, which is the critical function in netpoll.

Actual code (with this patchset applied), where the index is
nanoseconds:

	[14K, 16K)           107 |@@@                                                 |
	[16K, 20K)          1757 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
	[20K, 24K)            59 |@                                                   |
	[24K, 28K)            35 |@                                                   |
	[28K, 32K)            35 |@                                                   |
	[32K, 40K)             5 |                                                    |
	[40K, 48K)             0 |                                                    |
	[48K, 56K)             0 |                                                    |
	[56K, 64K)             0 |                                                    |
	[64K, 80K)             1 |                                                    |
	[80K, 96K)             0 |                                                    |
	[96K, 112K)            1 |                                                    |


With the optimization applied, I get a solid win:

	[8K, 10K)             32 |@                                                   |
	[10K, 12K)           514 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@                        |
	[12K, 14K)           932 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
	[14K, 16K)           367 |@@@@@@@@@@@@@@@@@@@@                                |
	[16K, 20K)           102 |@@@@@                                               |
	[20K, 24K)            29 |@                                                   |
	[24K, 28K)            17 |                                                    |
	[28K, 32K)             1 |                                                    |
	[32K, 40K)             3 |                                                    |
	[40K, 48K)             0 |                                                    |
	[48K, 56K)             0 |                                                    |
	[56K, 64K)             1 |                                                    |
	[64K, 80K)             0 |                                                    |
	[80K, 96K)             1 |                                                    |
	[96K, 112K)            1 |                                                    |

That was captured this simple bpftrace script:

	kprobe:netpoll_send_udp {
	    @start[tid] = nsecs;
	}

	kretprobe:netpoll_send_udp /@...rt[tid]/ {
	    $duration = nsecs - @start[tid];
	    delete(@start[tid]);
	    @ = hist($duration, 2)
	}

	END
	{
		clear(@start);
		print(@);
	}

And this is the patch I am testing right now:


	commit 262de00e439e0708fadf5e4c2896837c046a325b
	Author: Breno Leitao <leitao@...ian.org>
	Date:   Fri Nov 1 10:03:58 2024 -0700

	    netpoll: prioritize the skb from the pool.

	    Prioritize allocating SKBs from the pool over alloc_skb() to reduce
	    overhead in the critical path.

	    Move the pool refill operation to a worktask, allowing it to run
	    outside of the printk/console lock.

	    This change improves performance by minimizing the time spent in the
	    critical path filling and allocating skbs, reducing contention on the
	    printk/console lock.

	    Signed-off-by: Breno Leitao <leitao@...ian.org>

	diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
	index 77635b885c18..c81dc9cc0139 100644
	--- a/include/linux/netpoll.h
	+++ b/include/linux/netpoll.h
	@@ -33,6 +33,7 @@ struct netpoll {
		u16 local_port, remote_port;
		u8 remote_mac[ETH_ALEN];
		struct sk_buff_head skb_pool;
	+	struct work_struct work;
	 };

	 struct netpoll_info {
	diff --git a/net/core/netpoll.c b/net/core/netpoll.c
	index bf2064d689d5..657100393489 100644
	--- a/net/core/netpoll.c
	+++ b/net/core/netpoll.c
	@@ -278,18 +278,24 @@ static void zap_completion_queue(void)
		put_cpu_var(softnet_data);
	 }

	+
	+static void refill_skbs_wt(struct work_struct *work)
	+{
	+	struct netpoll *np = container_of(work, struct netpoll, work);
	+
	+	refill_skbs(np);
	+}
	+
	 static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
	 {
		int count = 0;
		struct sk_buff *skb;

		zap_completion_queue();
	-	refill_skbs(np);
	 repeat:
	-
	-	skb = alloc_skb(len, GFP_ATOMIC);
	+	skb = skb_dequeue(&np->skb_pool);
		if (!skb)
	-		skb = skb_dequeue(&np->skb_pool);
	+		skb = alloc_skb(len, GFP_ATOMIC);

		if (!skb) {
			if (++count < 10) {
	@@ -301,6 +307,7 @@ static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)

		refcount_set(&skb->users, 1);
		skb_reserve(skb, reserve);
	+	schedule_work(&np->work);
		return skb;
	 }

	@@ -780,6 +787,7 @@ int netpoll_setup(struct netpoll *np)

		/* fill up the skb queue */
		refill_skbs(np);
	+	INIT_WORK(&np->work, refill_skbs_wt);
		return 0;

	 put: