[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <EF32EEAE-EEE3-4715-8048-F5E47A360D74@oracle.com>
Date: Thu, 16 Feb 2023 15:15:51 +0000
From: Chuck Lever III <chuck.lever@...cle.com>
To: Paolo Abeni <pabeni@...hat.com>
CC: Jakub Kicinski <kuba@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
"open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>,
"hare@...e.com" <hare@...e.com>,
David Howells <dhowells@...hat.com>,
Benjamin Coddington <bcodding@...hat.com>,
Olga Kornievskaia <kolga@...app.com>,
"jmeneghi@...hat.com" <jmeneghi@...hat.com>
Subject: Re: [PATCH v4 1/2] net/handshake: Create a NETLINK service for
handling handshake requests
> On Feb 16, 2023, at 8:12 AM, Paolo Abeni <pabeni@...hat.com> wrote:
>
> [partial feedback /me is still a bit lost in the code ;]
Thanks to you, Hannes, and Jakub for your review.
Responses/questions below.
> On Wed, 2023-02-15 at 14:23 -0500, Chuck Lever wrote:
>> +/*
>> + * This function is careful to not close the socket. It merely removes
>> + * it from the file descriptor table so that it is no longer visible
>> + * to the calling process.
>> + */
>> +static int handshake_genl_cmd_done(struct sk_buff *skb, struct genl_info *gi)
>> +{
>> + struct nlattr *tb[HANDSHAKE_GENL_ATTR_MAX + 1];
>> + struct handshake_req *req;
>> + struct socket *sock;
>> + int fd, status, err;
>> +
>> + err = genlmsg_parse(nlmsg_hdr(skb), &handshake_genl_family, tb,
>> + HANDSHAKE_GENL_ATTR_MAX, handshake_genl_policy,
>> + NULL);
>> + if (err) {
>> + pr_err_ratelimited("%s: genlmsg_parse() returned %d\n",
>> + __func__, err);
>> + return err;
>> + }
>> +
>> + if (!tb[HANDSHAKE_GENL_ATTR_SOCKFD])
>> + return handshake_genl_status_reply(skb, gi, -EINVAL);
>> + err = 0;
>> + fd = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SOCKFD]);
>> + sock = sockfd_lookup(fd, &err);
>> + if (err)
>> + return handshake_genl_status_reply(skb, gi, -EBADF);
>> +
>> + req = sock->sk->sk_handshake_req;
>> + if (req->hr_fd != fd) /* sanity */
>> + return handshake_genl_status_reply(skb, gi, -EBADF);
>> +
>> + status = -EIO;
>> + if (tb[HANDSHAKE_GENL_ATTR_SESS_STATUS])
>> + status = nla_get_u32(tb[HANDSHAKE_GENL_ATTR_SESS_STATUS]);
>> +
>> + put_unused_fd(req->hr_fd);
>
> If I read correctly, at this point the user-space is expected to have
> already closed hr_fd , but that is not enforced, right? a buggy or
> malicious user-space could cause bad things not closing such fd.
No, user space is no longer supposed to close the fd. The
CMD_DONE operation functions as "close" now. But maybe
that's a bad idea. More below.
The problem is what happens if user space /does/ close
without calling DONE; for example, if the daemon seg faults
and exits? (In other words, I'm not sure if the upcall
mechanism as it is now handles that kind of behavior).
> Can we use sockfd_put(sock) instead? will make the code more readable,
> I think.
Not sure yet, I need more detail; and if we use an
sk_destructor function, maybe that won't be needed.
> BTW I don't think there is any problem with the sock->sk dereference
> above, the fd reference count will prevent __sock_release from being
> called.
Yes, I tried to ensure that socket reference counting
keeps it alive where it might be used or dereferenced.
> [...]
>
>> +static void __net_exit handshake_net_exit(struct net *net)
>> +{
>> + struct handshake_req *req;
>> + LIST_HEAD(requests);
>> +
>> + /*
>> + * XXX: This drains the net's pending list, but does
>> + * nothing about requests that have been accepted
>> + * and are in progress.
>> + */
>> + spin_lock(&net->hs_lock);
>> + list_splice_init(&requests, &net->hs_requests);
>> + spin_unlock(&net->hs_lock);
>
> If I read correctly accepted, uncompleted reqs are leaked.
Yes, that's exactly right.
> I think that
> could be prevented installing a custom sk_destructor in sock->sk
> tacking care of freeing the sk->sk_handshake_req. The existing/old
> sk_destructor - if any - could be stored in an additional
> sk_handshake_req field and tail-called by the req's one.
I've been looking for a way to modify socket close behavior
for these sockets, and this sounds like it's in the
neighborhood. I'll have a look.
So one thing we might do is have CMD_DONE act just as a way
to report handshake results, and have the handshake daemon
close the fd to signal it's finished with it. sk_destructor
would then fire handshake_complete and free the
handshake_req.
Might make things a little more robust?
> [...]
>
>> +/*
>> + * This limit is to prevent slow remotes from causing denial of service.
>> + * A ulimit-style tunable might be used instead.
>> + */
>> +#define HANDSHAKE_PENDING_MAX (10)
>
> I liked the idea of a core mem based limit ;) not a big deal anyway ;)
Well, this is a placeholder, carried over from the last
version of this series. It's based on the same concept for
the maximum length of a listener queue.
I'm not dropping your idea, but instead trying to get the
high order bits taken care of first. If you have some
sample code, I'm happy to integrate it sooner rather than
later!
>> +
>> +struct handshake_req *handshake_req_get(struct handshake_req *req)
>> +{
>> + return likely(refcount_inc_not_zero(&req->hr_ref)) ? req : NULL;
>> +}
>
> It's unclear to me under which circumstances the refcount should be >
> 1: AFAICS the req should have always a single owner: initially the
> creator, then the accept queue and finally the user-space serving the
> request.
I think during request cancelation there are some moments
where a race between cancel and complete might result in
one of those two ending up with a reference to a freed
handshake_req. So I added reference counting.
Hannes is concerned about handshakes taking too long, and
would like a timeout mechanism. A handshake timeout would
be the same as a call to handshake_cancel, and thus be
likewise racy.
However if we use close/sk_destructor to fire the
completion and free the handshake_req, then maybe cancel/
timeout could be done simply by killing the user space
process that is handling the handshake request.
--
Chuck Lever
Powered by blists - more mailing lists