[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67237f8ec3078_b635c29443@willemb.c.googlers.com.notmuch>
Date: Thu, 31 Oct 2024 09:01:02 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Anna Nyiri <annaemesenyiri@...il.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org,
fejes@....elte.hu
Subject: Re: [PATCH net-next] support SO_PRIORITY cmsg
Anna Nyiri wrote:
> Willem de Bruijn <willemdebruijn.kernel@...il.com> ezt írta (időpont:
> 2024. okt. 29., K, 16:15):
> >
> > Anna Emese Nyiri wrote:
> > > The Linux socket API currently supports setting SO_PRIORITY at the socket
> > > level, which applies a uniform priority to all packets sent through that
> > > socket. The only exception is IP_TOS, if that is specified as ancillary
> > > data, the packet does not inherit the socket's priority. Instead, the
> > > priority value is computed when handling the ancillary data (as implemented
> > > in commit <f02db315b8d888570cb0d4496cfbb7e4acb047cb>: "ipv4: IP_TOS
> > > and IP_TTL can be specified as ancillary data").
> >
> > Please use commit format <$SHA1:12> ("subject"). Checkpatch might also
> > flag this.
> >
> > > Currently, there is no option to set the priority directly from userspace
> > > on a per-packet basis. The following changes allow SO_PRIORITY to be set
> > > through control messages (CMSG), giving userspace applications more
> > > granular control over packet priorities.
> > >
> > > This patch enables setting skb->priority using CMSG. If SO_PRIORITY is
> > > specified as ancillary data, the packet is sent with the priority value
> > > set through sockc->priority_cmsg_value, overriding the socket-level
> > > values set via the traditional setsockopt() method.
> >
> > Please also describe how this interacts with priority set from IP_TOS or
> > IPV6_TCLASS.
> >
> > > This is analogous to
> > > existing support for SO_MARK (as implemented in commit
> > > <c6af0c227a22bb6bb8ff72f043e0fb6d99fd6515>, “ip: support SO_MARK
> > > cmsg”).
> > >
> > > Suggested-by: Ferenc Fejes <fejes@....elte.hu>
> > > Signed-off-by: Anna Emese Nyiri <annaemesenyiri@...il.com>
> > > ---
> > > include/net/inet_sock.h | 2 ++
> > > include/net/sock.h | 5 ++++-
> > > net/can/raw.c | 6 +++++-
> > > net/core/sock.c | 12 ++++++++++++
> > > net/ipv4/ip_output.c | 11 ++++++++++-
> > > net/ipv4/raw.c | 5 ++++-
> > > net/ipv6/ip6_output.c | 8 +++++++-
> > > net/ipv6/raw.c | 6 +++++-
> > > net/packet/af_packet.c | 6 +++++-
> > > 9 files changed, 54 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> > > index f9ddd47dc4f8..9d4e4e2a8232 100644
> > > --- a/include/net/inet_sock.h
> > > +++ b/include/net/inet_sock.h
> > > @@ -175,6 +175,8 @@ struct inet_cork {
> > > __u16 gso_size;
> > > u64 transmit_time;
> > > u32 mark;
> > > + __u8 priority_cmsg_set;
> > > + u32 priority_cmsg_value;
> >
> > Just priority, drop the cmsg value.
> >
> > Instead of an explicit "is set" bit, preferred is to initialize the
> > cookie field from the sock. See sockcm_init(), below, and also
> > ipcm_init_sk(). That also avoids the branches later in the datapath.
> >
> > > };
> > >
> > > struct inet_cork_full {
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index cce23ac4d514..e02170977165 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -1794,13 +1794,16 @@ struct sockcm_cookie {
> > > u64 transmit_time;
> > > u32 mark;
> > > u32 tsflags;
> > > + u32 priority_cmsg_value;
> > > + u8 priority_cmsg_set;
> > > };
> > >
> > > static inline void sockcm_init(struct sockcm_cookie *sockc,
> > > const struct sock *sk)
> > > {
> > > *sockc = (struct sockcm_cookie) {
> > > - .tsflags = READ_ONCE(sk->sk_tsflags)
> > > + .tsflags = READ_ONCE(sk->sk_tsflags),
> > > + .priority_cmsg_set = 0
> > > };
> > > }
> > >
> > > diff --git a/net/can/raw.c b/net/can/raw.c
> > > index 00533f64d69d..cf7e7ae64cde 100644
> > > --- a/net/can/raw.c
> > > +++ b/net/can/raw.c
> > > @@ -962,7 +962,11 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
> > > }
> > >
> > > skb->dev = dev;
> > > - skb->priority = READ_ONCE(sk->sk_priority);
> > > + if (sockc.priority_cmsg_set)
> > > + skb->priority = sockc.priority_cmsg_value;
> > > + else
> > > + skb->priority = READ_ONCE(sk->sk_priority);
> > > +
> > > skb->mark = READ_ONCE(sk->sk_mark);
> > > skb->tstamp = sockc.transmit_time;
> > >
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index 9abc4fe25953..899bf850b52a 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -2863,6 +2863,18 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
> > > case SCM_RIGHTS:
> > > case SCM_CREDENTIALS:
> > > break;
> > > + case SO_PRIORITY:
> > > + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
> > > + return -EINVAL;
> > > +
> > > + if ((*(u32 *)CMSG_DATA(cmsg) >= 0 && *(u32 *)CMSG_DATA(cmsg) <= 6) ||
> > > + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) ||
> > > + sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
> > > + sockc->priority_cmsg_value = *(u32 *)CMSG_DATA(cmsg);
> > > + sockc->priority_cmsg_set = 1;
> > > + break;
> > > + }
> >
> > What is the magic constant 6 here?
>
> The mechanism for setting the priority value via cmsg mirrors that of
> setting the priority value through setsockopt. The control of the
> priority value is managed by the sk_setsockopt function, which allows
> setting the priority within the range of 0 to 6. However, if the user
> has CAP_NET_ADMIN or CAP_NET_RAW capability, they are permitted to set
> any priority value without restriction. The specified range of 0 to 6
> was selected to align with existing priority value check.
Oh right. This is just copied from setsockopt SO_PRIORITY.
Having an non-annotated constant there is unfortunate too, but goes
back to before the introduction of git.
And that goes back to the priority bands configured with
rt_tos2priority. As setsockopt IP_TOS is not a privileged operation.
Ideally this would say TC_PRIO_BESTEFFORT and TC_PRIO_INTERACTIVE.
Since both the setsockopt and cmsg check are in net/core/sock.c,
can we deduplicate the logic and introduce helper:
static bool sk_set_prio_allowed(const struct sock *sk, int val)
{
return ((val >= TC_PRIO_BESTEFFORT && val <= TC_PRIO_INTERACTIVE) ||
sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) ||
sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
}
Powered by blists - more mailing lists