[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADvbK_daHuittutNqWaiRR-GzaZ8g5iWw-WCEP5GNhiqFcwySg@mail.gmail.com>
Date: Sun, 5 Jan 2025 13:28:15 -0500
From: Xin Long <lucien.xin@...il.com>
To: Guillaume Nault <gnault@...hat.com>
Cc: David Miller <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
Simon Horman <horms@...nel.org>, Marcelo Ricardo Leitner <marcelo.leitner@...il.com>, linux-sctp@...r.kernel.org,
Ido Schimmel <idosch@...dia.com>
Subject: Re: [PATCH net-next] sctp: Prepare sctp_v4_get_dst() to dscp_t conversion.
On Fri, Jan 3, 2025 at 11:59 AM Guillaume Nault <gnault@...hat.com> wrote:
>
> On Fri, Jan 03, 2025 at 10:35:55AM -0500, Xin Long wrote:
> > On Thu, Jan 2, 2025 at 11:34 AM Guillaume Nault <gnault@...hat.com> wrote:
> > >
> > > Define inet_sk_dscp() to get a dscp_t value from struct inet_sock, so
> > > that sctp_v4_get_dst() can easily set ->flowi4_tos from a dscp_t
> > > variable. For the SCTP_DSCP_SET_MASK case, we can just use
> > > inet_dsfield_to_dscp() to get a dscp_t value.
> > >
> > > Then, when converting ->flowi4_tos from __u8 to dscp_t, we'll just have
> > > to drop the inet_dscp_to_dsfield() conversion function.
> > With inet_dsfield_to_dscp() && inet_dsfield_to_dscp(), the logic
> > looks like: tos(dsfield) -> dscp_t -> tos(dsfield)
> > It's a bit confusing, but it has been doing that all over routing places.
>
> The objective is to have DSCP values stored in dscp_t variables in the
> kernel and keep __u8 values in user space APIs and packet headers. In
> practice this means using inet_dscp_to_dsfield() and
> inet_dsfield_to_dscp() at boundaries with user space or networking.
>
> However, since core kernel functions and structures are getting updated
> incrementally, some inet_dscp_to_dsfield() and inet_dsfield_to_dscp()
> conversions are temporarily needed between already converted and not yet
> converted parts of the stack.
>
> > In sctp_v4_xmit(), there's the similar tos/dscp thing, although it's not
> > for fl4.flowi4_tos.
>
> The sctp_v4_xmit() case is special because its dscp variable, despite
> its name, doesn't only carry a DSCP value, but also ECN bits.
> Converting it to a dscp_t variable would lose the ECN information.
>
> To be more precise, this is only the case if the SCTP_DSCP_SET_MASK
> flag is not set. That is, when the "dscp" variable is set using
> inet->tos. Since inet->tos contains both DSCP and ECN bits, this allows
> the socket owner to manage ECN. I don't know if that's intented by the
> SCTP code. If that isn't, and the ECN bits aren't supposed to be taken
> into account here, then I'm happy to send a patch to convert
> sctp_v4_xmit() to dscp_t too.
>From the beginning SCTP sends its packet via ip_queue_xmit() where it
allows the socket owners to manage ECN, like TCP. So let's just leave it.
>
> > Also, I'm curious there are still a few places under net/ using:
> >
> > fl4.flowi4_tos = tos & INET_DSCP_MASK;
> >
> > Will you consider changing all of them with
> > inet_dsfield_to_dscp() && inet_dsfield_to_dscp() as well?
>
> Yes, I have a few more cases to convert. But some of them will have to
> stay. For example, in net/ipv4/ip_output.c, __ip_queue_xmit() has
> "fl4->flowi4_tos = tos & INET_DSCP_MASK;", but we can't just convert
> that "tos" variable to dscp_t because it carries both DSCP and ECN
> values. Although ->flowi4_tos isn't concerned with ECN, these ECN bits
> are used later to set the IP header.
>
> There are other cases that I'm not planning to convert, for example
> because the value is read from a UAPI structure that can't be updated.
> For example the "fl4.flowi4_tos = params->tos & INET_DSCP_MASK;" case
> in bpf_ipv4_fib_lookup(), where "params" is a struct bpf_fib_lookup,
> exported in UAPI.
>
> To summarise, the plan is to incrementally convert most ->flowi4_tos
> assignments, so that we have a dscp_t variable at hand. Then I'll send
> a patch converting all ->flowi4_tos users at once. Most of it should
> consist of trivial inet_dscp_to_dsfield() removals, thanks to the
> previous dscp_t conversions. The cases that won't follow that pattern
> will be explained in the commit message, but the idea is to have as few
> of them as possible.
>
> BTW, the reason for this work is to avoid having ECN bits interfering
> with route lookups. We had several such issues and regressions in the
> past because of ->flowi4_tos having ECN bits set in specific scenarios.
>
Got it, thanks for the detailed explanation.
Acked-by: Xin Long <lucien.xin@...il.com>
Powered by blists - more mailing lists