[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXJAmw4tULA02TLXBPa8Lv5cXD1Oe+1ajTWrM2x9=byMTy4jA@mail.gmail.com>
Date: Thu, 5 Dec 2024 11:49:09 -0800
From: John Ousterhout <ouster@...stanford.edu>
To: "D. Wythe" <alibuda@...ux.alibaba.com>
Cc: netdev@...r.kernel.org, linux-api@...r.kernel.org
Subject: Re: [PATCH net-next v2 11/12] net: homa: create homa_plumbing.c homa_utils.c
On Sun, Dec 1, 2024 at 7:51 PM D. Wythe <alibuda@...ux.alibaba.com> wrote:
> > +int homa_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
> > + unsigned int optlen)
> > +{
> > + struct homa_sock *hsk = homa_sk(sk);
> > + struct homa_set_buf_args args;
> > + int ret;
> > +
> > + if (level != IPPROTO_HOMA || optname != SO_HOMA_SET_BUF ||
> > + optlen != sizeof(struct homa_set_buf_args))
> > + return -EINVAL;
>
> SO_HOMA_SET_BUF is a bit odd here, maybe HOMA_RCVBUF ? which also can be
> implemented in getsockopt.
I have changed it to HOMA_RCVBUF (and renamed struct homa_set_buf_args
to struct homa_rcvbuf_args). I also implemented getsockopt for
HOMA_RCVBUF.
> > +
> > + if (copy_from_sockptr(&args, optval, optlen))
> > + return -EFAULT;
> > +
> > + /* Do a trivial test to make sure we can at least write the first
> > + * page of the region.
> > + */
> > + if (copy_to_user((__force void __user *)args.start, &args, sizeof(args)))
> > + return -EFAULT;
>
> To share buffer between kernel and userspace, maybe you should refer to the implementation of
> io_pin_pbuf_ring()
I'm not sure what you mean here. Are you suggesting that I look at the
code of io_pin_pbuf_ring to make sure I've done everything needed to
share buffers? I don't believe that Homa needs to do anything special
(e.g. it doesn't need to pin the user's buffers); it just saves the
address and makes copy_to_user calls later when needed (and these
calls are all done at syscall level in the context of the
application). Is there something I'm missing?
> > +
> > + homa_sock_lock(hsk, "homa_setsockopt SO_HOMA_SET_BUF");
> > + ret = homa_pool_init(hsk, (__force void __user *)args.start, args.length);
> > + homa_sock_unlock(hsk);
>
> It seems that if the option was not set, poll->hsk will not be initialized
> and then if recvmsg is called, a panic will be triggered.
I fixed homa_pool_check_waiting to properly handle empty pools.
> > +/**
> > + * homa_sendmsg() - Send a request or response message on a Homa socket.
> > + * @sk: Socket on which the system call was invoked.
> > + * @msg: Structure describing the message to send; the msg_control
> > + * field points to additional information.
> > + * @length: Number of bytes of the message.
> > + * Return: 0 on success, otherwise a negative errno.
> > + */
> > +int homa_sendmsg(struct sock *sk, struct msghdr *msg, size_t length)
> > +{
> > + struct homa_sock *hsk = homa_sk(sk);
> > + struct homa_sendmsg_args args;
> > + int result = 0;
> > + struct homa_rpc *rpc = NULL;
> > + union sockaddr_in_union *addr = (union sockaddr_in_union *)msg->msg_name;
>
> msg->msg_name can be NULL.
I have added a check for this.
> > + if (addr->in6.sin6_family != sk->sk_family) {
> > + result = -EAFNOSUPPORT;
> > + goto error;
> > + }
>
> addr->sa.sa_family? Using sin6_family would be odd to me, making it seem like homa can only run with
> IPv6.
The family is in the same place for all protocols so I thought it
would be safe to reference it as in6.sin6_family. But you're right
that it looks odd, so I changed it to addr->sa.sa_family.
> > + rpc = homa_rpc_new_client(hsk, addr);
> > + if (IS_ERR(rpc)) {
> > + result = PTR_ERR(rpc);
> > + rpc = NULL;
> > + goto error;
> > + }
> > + rpc->completion_cookie = args.completion_cookie;
> > + result = homa_message_out_fill(rpc, &msg->msg_iter, 1);
> > + if (result)
> > + goto error;
> > + args.id = rpc->id;
> > + homa_rpc_unlock(rpc);
>
> Strongly recommend that adding comments with all the unlock part to indicate where it was
> locked from.
I went through the code and added comments in places where the locking
isn't relatively obvious (e.g. it happens through methods without
"lock" in their name, such as homa_rpc_new_client). I also renamed
homa_try_bucket_lock to homa_try_rpc_lock, which will clarify RPC
locking a bit better.
> > + canonical_dest = canonical_ipv6_addr(addr);
>
> Are you treating all addresses as IPv6 addresses just for the sake of simplicity? It's a bit odd,
> but okay to me.
Yes: Homa uses IPv6 addresses internally for both IPv4 and IPv6; this
allows Homa to work with both IPv4 and IPv6 with only a small amount
of code that is protocol-specific.
> > +int homa_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
> > + int *addr_len)
> > +{
> > + struct homa_sock *hsk = homa_sk(sk);
> > + struct homa_recvmsg_args control;
> > + struct homa_rpc *rpc;
> > + int result;
> > +
> > + if (unlikely(!msg->msg_control)) {
> > + /* This test isn't strictly necessary, but it provides a
> > + * hook for testing kernel call times.
> > + */
> > + return -EINVAL;
> > + }
> > + if (msg->msg_controllen != sizeof(control)) {
> > + result = -EINVAL;
> > + goto done;
>
> Then you copied an uninitialized control into userspace ...
> Is goto really necessary ? Maybe just return.
Good catch; I changed it to just return. In general, I prefer to
funnel everything through a single cleanup-and-return point rather
than returning directly in some cases and jumping to a cleanup point
in others; I think that makes the code cleaner and more obvious. But,
that doesn't really work here.
>
> > + }
> > + if (unlikely(copy_from_user(&control,
> > + (__force void __user *)msg->msg_control,
> > + sizeof(control)))) {
> > + result = -EFAULT;
> > + goto done;
>
> Same as above, is goto really necessary ?
I changed it to return directly.
> > + }
> > + control.completion_cookie = 0;
> > +
> > + if (control.num_bpages > HOMA_MAX_BPAGES ||
> > + (control.flags & ~HOMA_RECVMSG_VALID_FLAGS)) {
> > + result = -EINVAL;
> > + goto done;
> > + }
> > + homa_pool_release_buffers(hsk->buffer_pool, control.num_bpages,
> > + control.bpage_offsets);
>
> homa_pool_release_buffers() quietly ignores erroneous bpage_index values passed in from the
> userspace. This behavior may obscure more complex issues in userspace, and exposure this problem
> could help users identify issues earlier.
Good point; homa_recvmsg now returns -EINVAL if there are erroneous
bpage_index values passed in.
> > + control.num_bpages = 0;
> > +
> > + rpc = homa_wait_for_message(hsk, (flags & MSG_DONTWAIT)
> > + ? (control.flags | HOMA_RECVMSG_NONBLOCKING)
> > + : control.flags, control.id);
> > + if (IS_ERR(rpc)) {
> > + /* If we get here, it means there was an error that prevented
> > + * us from finding an RPC to return. If there's an error in
> > + * the RPC itself we won't get here.
> > + */
> > + result = PTR_ERR(rpc);
> > + goto done;
> > + }
> > + result = rpc->error ? rpc->error : rpc->msgin.length;
>
> A trivial tips.
>
> result = rpc->error ?: rpc->msgin.length;
I wasn't aware that the shorter form existed. However, I find the
longer form a bit more obvious, so I'd prefer to leave that unless you
feel strongly that it should be abbreviated.
> > +
> > + /* Collect result information. */
> > + control.id = rpc->id;
> > + control.completion_cookie = rpc->completion_cookie;
> > + if (likely(rpc->msgin.length >= 0)) {
> > + control.num_bpages = rpc->msgin.num_bpages;
> > + memcpy(control.bpage_offsets, rpc->msgin.bpage_offsets,
> > + sizeof(control.bpage_offsets));
>
> A trivial tips.
>
> Although sizeof(control.bpage_offsets) and sizeof(rpc->msgin.bpage_offsets) are the same, but
> passing sizeof(rpc->msgin.bpage_offsets) would be more appropriate.
Done.
Thanks for the comments; lots of good caches there.
-John-
Powered by blists - more mailing lists