[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6726d1954a48f_2980c729499@willemb.c.googlers.com.notmuch>
Date: Sat, 02 Nov 2024 21:27:49 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Anna Emese Nyiri <annaemesenyiri@...il.com>,
netdev@...r.kernel.org
Cc: fejes@....elte.hu,
annaemesenyiri@...il.com,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
willemdebruijn.kernel@...il.com
Subject: Re: [PATCH net-next v2 2/2] support SO_PRIORITY cmsg
Anna Emese Nyiri wrote:
> The Linux socket API currently supports setting SO_PRIORITY at the
> socket level, which applies a uniform priority to all packets sent
> through that socket. The only exception is IP_TOS; if specified as
> ancillary data, the packet does not inherit the socket's priority.
> Instead, the priority value is computed when handling the ancillary
> data (as implemented in commit <f02db315b8d88>
nit: drop the brackets
> ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")). If
> the priority is set via IP_TOS, then skb->priority derives its value
> from the rt_tos2priority function, which calculates the priority
> based on the value of ipc->tos obtained from IP_TOS. However, if
> IP_TOS is not used and the priority has been set through a control
> message, skb->priority will take the value provided by that control
> message.
The above describes the new situation? There is no way to set
priority to a control message prior to this patch.
> Therefore, when both options are available, the primary
> source for skb->priority is the value set via IP_TOS.
>
> Currently, there is no option to set the priority directly from
> userspace on a per-packet basis. The following changes allow
> SO_PRIORITY to be set through control messages (CMSG), giving
> userspace applications more granular control over packet priorities.
>
> This patch enables setting skb->priority using CMSG. If SO_PRIORITY
Duplicate statement. Overall, the explanation can perhaps be
condensed and made more clear.
> is specified as ancillary data, the packet is sent with the priority
> value set through sockc->priority_cmsg_value, overriding the
No longer matches the code.
> socket-level values set via the traditional setsockopt() method. This
> is analogous to existing support for SO_MARK (as implemented in commit
> <c6af0c227a22> ("ip: support SO_MARK cmsg")).
>
> Suggested-by: Ferenc Fejes <fejes@....elte.hu>
> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@...il.com>
> ---
> include/net/inet_sock.h | 2 +-
> include/net/ip.h | 3 ++-
> include/net/sock.h | 4 +++-
> net/can/raw.c | 2 +-
> net/core/sock.c | 8 ++++++++
> net/ipv4/ip_output.c | 7 +++++--
> net/ipv4/raw.c | 2 +-
> net/ipv6/ip6_output.c | 3 ++-
> net/ipv6/raw.c | 2 +-
> net/packet/af_packet.c | 2 +-
> 10 files changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 56d8bc5593d3..3ccbad881d74 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -172,7 +172,7 @@ struct inet_cork {
> u8 tx_flags;
> __u8 ttl;
> __s16 tos;
> - char priority;
> + u32 priority;
Let's check with pahole how this affects struct size and holes.
It likely adds a hole, but unavoidably so.
> __u16 gso_size;
> u32 ts_opt_id;
> u64 transmit_time;
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 0e548c1f2a0e..e8f71a191277 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -81,7 +81,7 @@ struct ipcm_cookie {
> __u8 protocol;
> __u8 ttl;
> __s16 tos;
> - char priority;
> + u32 priority;
No need for a field in ipcm_cookie, when also present in
sockcm_cookie. As SO_PRIORITY is not limited to IP, sockcm_cookie is
the right location.
If cmsg IP_TOS is present, that can overridde ipc->sockc.priority with
rt_tos2priority.
Interesting that this override by IP_TOS seems to be IPV4 only. There
is no equivalent call to rt_tos2priority when setting IPV6_TCLASS.
> __u16 gso_size;
> };
>
> @@ -96,6 +96,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
> ipcm_init(ipcm);
>
> ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark);
> + ipcm->sockc.priority = READ_ONCE(inet->sk.sk_priority);
> ipcm->sockc.tsflags = READ_ONCE(inet->sk.sk_tsflags);
> ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if);
> ipcm->addr = inet->inet_saddr;
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 7464e9f9f47c..316a34d6c48b 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1814,13 +1814,15 @@ struct sockcm_cookie {
> u32 mark;
> u32 tsflags;
> u32 ts_opt_id;
> + u32 priority;
> };
>
> static inline void sockcm_init(struct sockcm_cookie *sockc,
> const struct sock *sk)
> {
> *sockc = (struct sockcm_cookie) {
> - .tsflags = READ_ONCE(sk->sk_tsflags)
> + .tsflags = READ_ONCE(sk->sk_tsflags),
> + .priority = READ_ONCE(sk->sk_priority),
> };
> }
>
> diff --git a/net/can/raw.c b/net/can/raw.c
> index 255c0a8f39d6..46e8ed9d64da 100644
> --- a/net/can/raw.c
> +++ b/net/can/raw.c
> @@ -962,7 +962,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
> }
>
> skb->dev = dev;
> - skb->priority = READ_ONCE(sk->sk_priority);
> + skb->priority = sockc.priority;
> skb->mark = READ_ONCE(sk->sk_mark);
> skb->tstamp = sockc.transmit_time;
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5ecf6f1a470c..d5586b9212dd 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2941,6 +2941,14 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
> case SCM_RIGHTS:
> case SCM_CREDENTIALS:
> break;
> + case SO_PRIORITY:
> + if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
> + return -EINVAL;
> + if (sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) {
> + sockc->priority = *(u32 *)CMSG_DATA(cmsg);
> + break;
> + }
> + return -EPERM;
nit: invert to make the error case the (speculated as unlikely) branch
and have the common path unindented.
> default:
> return -EINVAL;
> }
Powered by blists - more mailing lists