lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZpqpB8vJU/Q6LSqa@debian>
Date: Fri, 19 Jul 2024 19:57:27 +0200
From: Guillaume Nault <gnault@...hat.com>
To: Ido Schimmel <idosch@...dia.com>
Cc: netdev@...r.kernel.org
Subject: Re: Matching on DSCP with IPv4 FIB rules

On Thu, Jul 18, 2024 at 04:14:08PM +0300, Ido Schimmel wrote:
> On Thu, Jul 04, 2024 at 08:19:05PM +0200, Guillaume Nault wrote:
> 
> > So, do you mean to centralise the effect of all the current RT_TOS()
> > calls inside a few functions? So that we can eventually remove all
> > those RT_TOS() calls later?
> 
> Yes. See this patch:
> 
> https://github.com/idosch/linux/commit/80f536f629c4dccdb6c015a10ca25d7743233208.patch
> 
> I can send it when net-next opens. It should allow us to start removing
> the masking of the high order DSCP bits without introducing regressions
> as the masking now happens at the core and not at individual call sites
> or along the path to the core.

Thanks for the sample patch. I think we're on the same page.

> > > I was only able to find two call paths that can reach these functions
> > > with a TOS value that does not have its three upper DSCP bits masked:
> > > 
> > > nl_fib_input()
> > > 	nl_fib_lookup()
> > > 		flowi4_tos = frn->fl_tos	// Directly from user space
> > > 		fib_table_lookup()
> > > 
> > > nft_fib4_eval()
> > > 	flowi4_tos = iph->tos & DSCP_BITS
> > > 	fib_lookup()
> > > 
> > > The first belongs to an ancient "NETLINK_FIB_LOOKUP" family which I am
> > > quite certain nobody is using and the second belongs to netfilter's fib
> > > expression.
> > 
> > I agree that nl_fib_input() probably doesn't matter.
> > 
> > For nft_fib4_eval() it really looks like the current behaviour is
> > intended. And even though it's possible that nobody currently relies on
> > it, I think it's the correct one. So I don't really feel like changing
> > it.
> 
> Yes, I agree. The patch I mentioned takes care of that by setting the
> new 'FLOWI_FLAG_MATCH_FULL_DSCP' in nft_fib4_eval().

Okay, let me contradict myself... :)

Considering that the number of users of this new flag has no
reason to grow and that we can ignore nl_fib_input() (which is close to
unusable as the necessary fib_result_nl structure isn't exported to
include/uapi/), does it really make sense to add a special case just
for nft_fib4_eval()?

I imagine the pain of describing the tos and dscp keywords in man pages
for example. There will be enough important details about the ECN bits,
the behaviour differences between IPv4 and IPv6, the different number
representation between dscp and tos (bit shift)... If we also have to
explain that the behaviour also depends on the module at the origin of
the route lookup, end users are going to get completely lost.

> > > If regressions are reported for any of them (unlikely, IMO), we can add
> > > a new flow information flag (e.g., 'FLOWI_FLAG_DSCP_NO_MASK') which will
> > > tell the core routing functions to not apply the 'IPTOS_RT_MASK' mask.
> > > 
> > > 3. Removing 'struct flowi4::flowi4_tos'.
> > > 
> > > 4. Adding a new DSCP FIB rule attribute (e.g., 'FRA_DSCP') with a
> > > matching "dscp" keyword in iproute2 that accepts values in the range of
> > > [0, 63] which both address families will support. IPv4 will support it
> > > via the new DSCP field ('struct flowi4::dscp') and IPv6 will support it
> > > using the existing flow label field ('struct flowi6::flowlabel').
> > 
> > I'm sorry, something isn't clear to me. Since masking the high order
> > bits has been centralised at step 2, how will you match them?
> > 
> > If we continue to take fib4_rule_match() as an example; do you mean to
> > extend struct fib4_rule to store the extra information, so that
> > fib4_rule_match() knows how to test fl4->dscp? For example:
> 
> Yes. See these patches:
> 
> https://github.com/idosch/linux/commit/1a79fb59f66731cfc891e3fecb3b08cda6bb0170.patch
> https://github.com/idosch/linux/commit/a4990aab8ee4866b9f853777a50de09537255d67.patch
> https://github.com/idosch/linux/commit/7328d60b7cfe2b07b2d565c9af36f650e96552a5.patch
> https://github.com/idosch/linux/commit/73a739735d27bef613813f0ac0a9280e6427264d.patch
> 
> Specifically these hunks from the second patch:
> 
> @@ -37,6 +37,7 @@ struct fib4_rule {
>  	u8			dst_len;
>  	u8			src_len;
>  	dscp_t			dscp;
> +	u8			is_dscp_sel:1;	/* DSCP or TOS selector */
>  	__be32			src;
>  	__be32			srcmask;
>  	__be32			dst;
> @@ -186,7 +187,14 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_match(struct fib_rule *rule,
>  	    ((daddr ^ r->dst) & r->dstmask))
>  		return 0;
>  
> -	if (r->dscp && !fib_dscp_match(r->dscp, fl4))
> +	/* When DSCP selector is used we need to match on the entire DSCP field
> +	 * in the flow information structure. When TOS selector is used we need
> +	 * to mask the upper three DSCP bits prior to matching to maintain
> +	 * legacy behavior.
> +	 */
> +	if (r->is_dscp_sel && r->dscp != inet_dsfield_to_dscp(fl4->flowi4_tos))
> +		return 0;
> +	else if (!r->is_dscp_sel && r->dscp && !fib_dscp_match(r->dscp, fl4))
>  		return 0;
>  
>  	if (rule->ip_proto && (rule->ip_proto != fl4->flowi4_proto))
> 
> Note that it is just an RFC. I first need to remove the masking of the
> high order DSCP bits before I can send it.

I find "is_dscp_sel" not very informative as a field name. Maybe
"dscp_nomask" or "dscp_full" would be better? They're more intuitive
to me, but I have no problem if you prefer to keep "is_dscp_sel" of
course.

> >     /* Assuming FRA_DSCP sets ->dscp_mask to 0xff while the default
> >      * would be 0x1c to keep the old behaviour.
> >      */
> >     if (r->dscp && r->dscp != (fl4->dscp & r->dscp_mask))
> 
> It's a bit more involved. Even if the old TOS selector is used, we don't
> always want to mask using 0x1c. If nft_fib4_eval() filled 0xfc in
> 'flowi4_tos', then by masking using 0x1c it will now match a FIB rule
> that was configured with 'tos 0x1c' whereas previously it didn't. The
> new 'FLOWI_FLAG_MATCH_FULL_DSCP' takes care of that, but it only applies
> to rules configured with the TOS selector. The new DSCP selector will
> always match against the entire DSCP field.

Back to nft_fib4_eval(), making it behave like the rest of the kernel
would also mean it'd behave like the existing ipt_rpfilter module. So
people moving from iptables to nftables would keep the same behaviour.
Unless you strongly feel about keeping the FLOWI_FLAG_MATCH_FULL_DSCP
flag, I think we should ask Pablo and Florian if they're okay for
making nft_fib4_eval() behave like the rest of the stack.

> Are you OK with the approach that I outlined above? Basically:
> 
> 1. Submitting the patch that centralizes TOS matching
> 2. Removing the masking of the high order DSCP bits
> 3. Adding new DSCP selector for FIB rules

Yes, looks like a good plan!

> If you already have some patches that convert to 'dscp_t', then I can
> work on top of them to avoid conflicts.

I think we can complete the new dscp feature faster than the dscp_t
conversion. Therefore I think it'll make more sense for me to rebase
on top of your patches.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ