[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250522162358.35076-1-kuniyu@amazon.com>
Date: Thu, 22 May 2025 09:23:55 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <matttbe@...nel.org>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <horms@...nel.org>,
<kuba@...nel.org>, <kuni1840@...il.com>, <kuniyu@...zon.com>,
<netdev@...r.kernel.org>, <pabeni@...hat.com>, <willemb@...gle.com>
Subject: Re: [PATCH v1 net-next 3/6] socket: Restore sock_create_kern().
From: Matthieu Baerts <matttbe@...nel.org>
Date: Thu, 22 May 2025 17:02:45 +0200
> Hi Kuniyuki,
>
> On 17/05/2025 05:50, Kuniyuki Iwashima wrote:
> > Let's restore sock_create_kern() that holds a netns reference.
> >
> > Now, it's the same as the version before commit 26abe14379f8 ("net:
> > Modify sk_alloc to not reference count the netns of kernel sockets.").
> >
> > Back then, after creating a socket in init_net, we used sk_change_net()
> > to drop the netns ref and switch to another netns, but now we can
> > simply use __sock_create_kern() instead.
> >
> > $ git blame -L:sk_change_net include/net/sock.h 26abe14379f8~
> >
> > DEBUG_NET_WARN_ON_ONCE() is to catch a path calling sock_create_kern()
> > from __net_init functions, since doing so would leak the netns as
> > __net_exit functions cannot run until the socket is removed.
>
> Thank you for working on this!
>
> (...)
>
> > diff --git a/net/socket.c b/net/socket.c
> > index 7c4474c966c0..aeece4c4bb08 100644
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -1632,6 +1632,48 @@ int __sock_create_kern(struct net *net, int family, int type, int protocol, stru
> > }
> > EXPORT_SYMBOL(__sock_create_kern);
> >
> > +/**
> > + * sock_create_kern - creates a socket for kernel space
> > + *
> > + * @net: net namespace
> > + * @family: protocol family (AF_INET, ...)
> > + * @type: communication type (SOCK_STREAM, ...)
> > + * @protocol: protocol (0, ...)
> > + * @res: new socket
> > + *
> > + * Creates a new socket and assigns it to @res.
> > + *
> > + * The socket is for kernel space and should not be exposed to
> > + * userspace via a file descriptor nor BPF hooks except for LSM
> > + * (see inet_create(), inet_release(), etc).
> > + *
> > + * The socket bypasses some LSMs that take care of @kern in
> > + * security_socket_create() and security_socket_post_create().
> > + *
> > + * The socket holds a reference count of @net so that the caller
> > + * does not need to care about @net's lifetime.
> > + *
> > + * This MUST NOT be called from the __net_init path and @net MUST
> > + * be alive as of calling sock_create_net().
> > + *
> > + * Context: Process context. This function internally uses GFP_KERNEL.
> > + * Return: 0 or an error.
> > + */
> > +int sock_create_kern(struct net *net, int family, int type, int protocol,
> > + struct socket **res)
> > +{
> > + int ret;
> > +
> > + DEBUG_NET_WARN_ON_ONCE(!net_initialized(net));
> > +
> > + ret = __sock_create(net, family, type, protocol, res, 1);
> > + if (!ret)
>
> A small suggestion if you have to send a v2: when quickly reading the
> code, I find it easy to interpret the code above as: "in case of error
> with __sock_create(), the refcnt is upgraded" . It might be clearer to
> simply rename "ret" to "err" or use "(ret < 0)".
>
> Up to you, a small detail for those who didn't directly realise what
> "ret" is :)
Makes sense, will use !err in v2 :)
Powered by blists - more mailing lists