lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZqIhv1RuZudqaUmQ@shredder.mtl.com>
Date: Thu, 25 Jul 2024 12:58:23 +0300
From: Ido Schimmel <idosch@...dia.com>
To: Guillaume Nault <gnault@...hat.com>
Cc: netdev@...r.kernel.org
Subject: Re: Matching on DSCP with IPv4 FIB rules

On Fri, Jul 19, 2024 at 07:57:27PM +0200, Guillaume Nault wrote:
> On Thu, Jul 18, 2024 at 04:14:08PM +0300, Ido Schimmel wrote:
> > On Thu, Jul 04, 2024 at 08:19:05PM +0200, Guillaume Nault wrote:
> > 
> > > So, do you mean to centralise the effect of all the current RT_TOS()
> > > calls inside a few functions? So that we can eventually remove all
> > > those RT_TOS() calls later?
> > 
> > Yes. See this patch:
> > 
> > https://github.com/idosch/linux/commit/80f536f629c4dccdb6c015a10ca25d7743233208.patch
> > 
> > I can send it when net-next opens. It should allow us to start removing
> > the masking of the high order DSCP bits without introducing regressions
> > as the masking now happens at the core and not at individual call sites
> > or along the path to the core.
> 
> Thanks for the sample patch. I think we're on the same page.
> 
> > > > I was only able to find two call paths that can reach these functions
> > > > with a TOS value that does not have its three upper DSCP bits masked:
> > > > 
> > > > nl_fib_input()
> > > > 	nl_fib_lookup()
> > > > 		flowi4_tos = frn->fl_tos	// Directly from user space
> > > > 		fib_table_lookup()
> > > > 
> > > > nft_fib4_eval()
> > > > 	flowi4_tos = iph->tos & DSCP_BITS
> > > > 	fib_lookup()
> > > > 
> > > > The first belongs to an ancient "NETLINK_FIB_LOOKUP" family which I am
> > > > quite certain nobody is using and the second belongs to netfilter's fib
> > > > expression.
> > > 
> > > I agree that nl_fib_input() probably doesn't matter.
> > > 
> > > For nft_fib4_eval() it really looks like the current behaviour is
> > > intended. And even though it's possible that nobody currently relies on
> > > it, I think it's the correct one. So I don't really feel like changing
> > > it.
> > 
> > Yes, I agree. The patch I mentioned takes care of that by setting the
> > new 'FLOWI_FLAG_MATCH_FULL_DSCP' in nft_fib4_eval().
> 
> Okay, let me contradict myself... :)
> 
> Considering that the number of users of this new flag has no
> reason to grow and that we can ignore nl_fib_input() (which is close to
> unusable as the necessary fib_result_nl structure isn't exported to
> include/uapi/), does it really make sense to add a special case just
> for nft_fib4_eval()?

I have no problem removing the flag and introducing it only if
regressions are reported. See more below.

> 
> I imagine the pain of describing the tos and dscp keywords in man pages
> for example. There will be enough important details about the ECN bits,
> the behaviour differences between IPv4 and IPv6, the different number
> representation between dscp and tos (bit shift)... If we also have to
> explain that the behaviour also depends on the module at the origin of
> the route lookup, end users are going to get completely lost.
> 
> > > > If regressions are reported for any of them (unlikely, IMO), we can add
> > > > a new flow information flag (e.g., 'FLOWI_FLAG_DSCP_NO_MASK') which will
> > > > tell the core routing functions to not apply the 'IPTOS_RT_MASK' mask.
> > > > 
> > > > 3. Removing 'struct flowi4::flowi4_tos'.
> > > > 
> > > > 4. Adding a new DSCP FIB rule attribute (e.g., 'FRA_DSCP') with a
> > > > matching "dscp" keyword in iproute2 that accepts values in the range of
> > > > [0, 63] which both address families will support. IPv4 will support it
> > > > via the new DSCP field ('struct flowi4::dscp') and IPv6 will support it
> > > > using the existing flow label field ('struct flowi6::flowlabel').
> > > 
> > > I'm sorry, something isn't clear to me. Since masking the high order
> > > bits has been centralised at step 2, how will you match them?
> > > 
> > > If we continue to take fib4_rule_match() as an example; do you mean to
> > > extend struct fib4_rule to store the extra information, so that
> > > fib4_rule_match() knows how to test fl4->dscp? For example:
> > 
> > Yes. See these patches:
> > 
> > https://github.com/idosch/linux/commit/1a79fb59f66731cfc891e3fecb3b08cda6bb0170.patch
> > https://github.com/idosch/linux/commit/a4990aab8ee4866b9f853777a50de09537255d67.patch
> > https://github.com/idosch/linux/commit/7328d60b7cfe2b07b2d565c9af36f650e96552a5.patch
> > https://github.com/idosch/linux/commit/73a739735d27bef613813f0ac0a9280e6427264d.patch
> > 
> > Specifically these hunks from the second patch:
> > 
> > @@ -37,6 +37,7 @@ struct fib4_rule {
> >  	u8			dst_len;
> >  	u8			src_len;
> >  	dscp_t			dscp;
> > +	u8			is_dscp_sel:1;	/* DSCP or TOS selector */
> >  	__be32			src;
> >  	__be32			srcmask;
> >  	__be32			dst;
> > @@ -186,7 +187,14 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_match(struct fib_rule *rule,
> >  	    ((daddr ^ r->dst) & r->dstmask))
> >  		return 0;
> >  
> > -	if (r->dscp && !fib_dscp_match(r->dscp, fl4))
> > +	/* When DSCP selector is used we need to match on the entire DSCP field
> > +	 * in the flow information structure. When TOS selector is used we need
> > +	 * to mask the upper three DSCP bits prior to matching to maintain
> > +	 * legacy behavior.
> > +	 */
> > +	if (r->is_dscp_sel && r->dscp != inet_dsfield_to_dscp(fl4->flowi4_tos))
> > +		return 0;
> > +	else if (!r->is_dscp_sel && r->dscp && !fib_dscp_match(r->dscp, fl4))
> >  		return 0;
> >  
> >  	if (rule->ip_proto && (rule->ip_proto != fl4->flowi4_proto))
> > 
> > Note that it is just an RFC. I first need to remove the masking of the
> > high order DSCP bits before I can send it.
> 
> I find "is_dscp_sel" not very informative as a field name. Maybe
> "dscp_nomask" or "dscp_full" would be better? They're more intuitive
> to me, but I have no problem if you prefer to keep "is_dscp_sel" of
> course.

OK, changed to "dscp_full" as you suggested.

> 
> > >     /* Assuming FRA_DSCP sets ->dscp_mask to 0xff while the default
> > >      * would be 0x1c to keep the old behaviour.
> > >      */
> > >     if (r->dscp && r->dscp != (fl4->dscp & r->dscp_mask))
> > 
> > It's a bit more involved. Even if the old TOS selector is used, we don't
> > always want to mask using 0x1c. If nft_fib4_eval() filled 0xfc in
> > 'flowi4_tos', then by masking using 0x1c it will now match a FIB rule
> > that was configured with 'tos 0x1c' whereas previously it didn't. The
> > new 'FLOWI_FLAG_MATCH_FULL_DSCP' takes care of that, but it only applies
> > to rules configured with the TOS selector. The new DSCP selector will
> > always match against the entire DSCP field.
> 
> Back to nft_fib4_eval(), making it behave like the rest of the kernel
> would also mean it'd behave like the existing ipt_rpfilter module. So
> people moving from iptables to nftables would keep the same behaviour.
> Unless you strongly feel about keeping the FLOWI_FLAG_MATCH_FULL_DSCP
> flag, I think we should ask Pablo and Florian if they're okay for
> making nft_fib4_eval() behave like the rest of the stack.

Yes, that's a good idea. I will post a RFC patchset that converts
nft_fib4_eval() and "NETLINK_FIB_LOOKUP" to mask the upper DSCP bits
along with the RT_TOS() centralization patch. Let's see what Pablo and
Florian say.

> 
> > Are you OK with the approach that I outlined above? Basically:
> > 
> > 1. Submitting the patch that centralizes TOS matching
> > 2. Removing the masking of the high order DSCP bits
> > 3. Adding new DSCP selector for FIB rules
> 
> Yes, looks like a good plan!

Great, thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ