[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240221164942.5af086c5@kernel.org>
Date: Wed, 21 Feb 2024 16:49:42 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Miao Wang <shankerwangmiao@...il.com>
Cc: netdev@...r.kernel.org, pabeni@...hat.com, "David S. Miller"
<davem@...emloft.net>
Subject: Re: [Bug report] veth cannot be created, reporting page allocation
failure
On Tue, 20 Feb 2024 22:38:52 +0800 Miao Wang wrote:
> I tried to bisect the kernel to find the commit that introduced the problem, but
> it would take too long to carry out the tests. However, after 4 rounds of
> bisecting, by examining the remaining commits, I'm convinced that the problem is
> caused by the following commit:
>
> 9d3684c24a5232 ("veth: create by default nr_possible_cpus queues")
>
> where changes are made to the veth module to create queues for all possbile
> cpus when not providing expected number of queues by the userland. The previous
> behavior was to create only one queue in the same condition. The memory in need
> will be large when the number of cpus is large, which is 96 * 768 = 72KB or 18
> continuous 4K pages in total, no wonder causing the allocation failure. I guess
> on certain platforms, the number of possbile cpus might be even larger, and
> larger than actual cpu cores physically installed, for several people in the
> above discussion mentioned that manually specifing nr_cpus in the boot command
> line can work around the problem.
>
> I've carried out a cross check by applying the commit on the working 5.10
> kernel, and the problem occurs. Then I reverted the commit on the 6.1 kernel,
> the problem has not occured for 27 hours.
Thank you for the very detailed report! Would you be willing to give
this patch a try and report back if it fixes the problem for you?
It won't help with the memory waste but should make the allocation
failures less likely:
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index a786be805709..cd4a6fe458f9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1461,7 +1461,8 @@ static int veth_alloc_queues(struct net_device *dev)
struct veth_priv *priv = netdev_priv(dev);
int i;
- priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL_ACCOUNT);
+ priv->rq = kvcalloc(dev->num_rx_queues, sizeof(*priv->rq),
+ GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
if (!priv->rq)
return -ENOMEM;
@@ -1477,7 +1478,7 @@ static void veth_free_queues(struct net_device *dev)
{
struct veth_priv *priv = netdev_priv(dev);
- kfree(priv->rq);
+ kvfree(priv->rq);
}
static int veth_dev_init(struct net_device *dev)
Powered by blists - more mailing lists