netdev - RE: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it in rtnl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f479dcb95c04e54b689fa96386022e0@baidu.com>
Date: Tue, 14 Nov 2023 12:02:12 +0000
From: "Li,Rongqing" <lirongqing@...du.com>
To: Yunsheng Lin <linyunsheng@...wei.com>, "davem@...emloft.net"
	<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
	"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
	"Liam.Howlett@...cle.com" <Liam.Howlett@...cle.com>,
	"anjali.k.kulkarni@...cle.com" <anjali.k.kulkarni@...cle.com>,
	"leon@...nel.org" <leon@...nel.org>, "fw@...len.de" <fw@...len.de>,
	"shayagr@...zon.com" <shayagr@...zon.com>, "idosch@...dia.com"
	<idosch@...dia.com>, "razor@...ckwall.org" <razor@...ckwall.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it
 in rtnl_getlink



> -----Original Message-----
> From: Yunsheng Lin <linyunsheng@...wei.com>
> Sent: Tuesday, November 14, 2023 7:32 PM
> To: Li,Rongqing <lirongqing@...du.com>; davem@...emloft.net;
> edumazet@...gle.com; kuba@...nel.org; pabeni@...hat.com;
> Liam.Howlett@...cle.com; anjali.k.kulkarni@...cle.com; leon@...nel.org;
> fw@...len.de; shayagr@...zon.com; idosch@...dia.com;
> razor@...ckwall.org; netdev@...r.kernel.org
> Subject: Re: [PATCH][net-next][v2] rtnetlink: instroduce vnlmsg_new and use it
> in rtnl_getlink
> 
> On 2023/11/14 17:55, Li RongQing wrote:
> > if a PF has 256 or more VFs, ip link command will allocate a order 3
> > memory or more, and maybe trigger OOM due to memory fragement,
> 
> fragement -> fragment?

I will fix it 
Thanks

> 
> > the VFs needed memory size is computed in rtnl_vfinfo_size.
> >
> > so instroduce vnlmsg_new which calls netlink_alloc_large_skb in which
> 
> instroduce -> introduce?

Thanks

> 
> > vmalloc is used for large memory, to avoid the failure of allocating
> > memory
> >
> >     ip invoked oom-killer:
> gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\
> > 	__GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0
> >     CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P
> OE
> >     Call Trace:
> >     dump_stack+0x57/0x6a
> >     dump_header+0x4a/0x210
> >     oom_kill_process+0xe4/0x140
> >     out_of_memory+0x3e8/0x790
> >     __alloc_pages_slowpath.constprop.116+0x953/0xc50
> >     __alloc_pages_nodemask+0x2af/0x310
> >     kmalloc_large_node+0x38/0xf0
> >     __kmalloc_node_track_caller+0x417/0x4d0
> >     __kmalloc_reserve.isra.61+0x2e/0x80
> >     __alloc_skb+0x82/0x1c0
> >     rtnl_getlink+0x24f/0x370
> >     rtnetlink_rcv_msg+0x12c/0x350
> >     netlink_rcv_skb+0x50/0x100
> >     netlink_unicast+0x1b2/0x280
> >     netlink_sendmsg+0x355/0x4a0
> >     sock_sendmsg+0x5b/0x60
> >     ____sys_sendmsg+0x1ea/0x250
> >     ___sys_sendmsg+0x88/0xd0
> >     __sys_sendmsg+0x5e/0xa0
> >     do_syscall_64+0x33/0x40
> >     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >     RIP: 0033:0x7f95a65a5b70
> >
> > Cc: Yunsheng Lin <linyunsheng@...wei.com>
> > Signed-off-by: Li RongQing <lirongqing@...du.com>
> > ---
> > diff with v1: not move netlink_alloc_large_skb to skbuff.c
> >
> >  include/linux/netlink.h  |  1 +
> >  include/net/netlink.h    | 17 +++++++++++++++++
> >  net/core/rtnetlink.c     |  2 +-
> >  net/netlink/af_netlink.c |  2 +-
> >  4 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/netlink.h b/include/linux/netlink.h index
> > 75d7de3..abe91ed 100644
> > --- a/include/linux/netlink.h
> > +++ b/include/linux/netlink.h
> > @@ -351,5 +351,6 @@ bool netlink_ns_capable(const struct sk_buff *skb,
> >  			struct user_namespace *ns, int cap);  bool
> netlink_capable(const
> > struct sk_buff *skb, int cap);  bool netlink_net_capable(const struct
> > sk_buff *skb, int cap);
> > +struct sk_buff *netlink_alloc_large_skb(unsigned int size, int
> > +broadcast);
> >
> >  #endif	/* __LINUX_NETLINK_H */
> > diff --git a/include/net/netlink.h b/include/net/netlink.h index
> > 83bdf78..7d31217 100644
> > --- a/include/net/netlink.h
> > +++ b/include/net/netlink.h
> > @@ -1011,6 +1011,23 @@ static inline struct sk_buff *nlmsg_new(size_t
> > payload, gfp_t flags)  }
> >
> >  /**
> > + * vnlmsg_new - Allocate a new netlink message with non-contiguous
> > + * physical memory
> > + * @payload: size of the message payload
> > + *
> > + * Use NLMSG_DEFAULT_SIZE if the size of the payload isn't known
> > + * and a good default is needed.
> > + *
> > + * The allocated skb is unable to have frag page for shinfo->frags*,
> > + * as the NULL setting for skb->head in netlink_skb_destructor() will
> > + * bypass most of the handling in skb_release_data()  */ static
> > +inline struct sk_buff *vnlmsg_new(size_t payload) {
> > +	return netlink_alloc_large_skb(nlmsg_total_size(payload), 0); }
> 
> The nlmsg_new() has the below parameters, there is no gfp flags for
> vnlmsg_new() and always assuming GFP_KERNEL?
> 

I think that vnlmsg_new is similar as vmalloc,  so no flag is needed, and always assuming GFP_KERNEL 

-Li
>  * @payload: size of the message payload
>  * @flags: the type of memory to allocate.
> 
> There are a lot of callers for nlmsg_new(), I am wondering how many of existing
> nlmsg_new() caller can change to use vnlmsg_new().
> https://elixir.free-electrons.com/linux/v6.7-rc1/A/ident/nlmsg_new
>