[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210120152359.GM3158@orbyte.nwl.cc>
Date: Wed, 20 Jan 2021 16:23:59 +0100
From: Phil Sutter <phil@....cc>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Stephen Hemminger <stephen@...workplumber.org>,
netdev@...r.kernel.org, Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Russell Stuart <russell-lartc@...art.id.au>
Subject: Re: tc: u32: Wrong sample hash calculation
Hi Jamal,
On Wed, Jan 20, 2021 at 08:55:11AM -0500, Jamal Hadi Salim wrote:
> On 2021-01-18 6:29 a.m., Phil Sutter wrote:
> > Hi!
> >
> > Playing with u32 filter's hash table I noticed it is not possible to use
> > 'sample' option with keys larger than 8bits to calculate the hash
> > bucket.
>
>
> I have mostly used something like: ht 2:: sample ip protocol 1 0xff
> Hoping this is continuing to work.
This should read 'sample ip protocol 1 divisor 0xff', right?
> I feel i am missing something basic in the rest of your email:
> Sample is a user space concept i.e it is used to instruct the
> kernel what table/bucket to insert the node into. This computation
> is done in user space. The kernel should just walk the nodes in
> the bucket and match.
Correct, but the kernel has to find the right bucket first. This is
where its key hashing comes into place.
> Reminder: you can only have 256 buckets (8 bit representation).
> Could that be the contributing factor?
It is. Any key smaller than 256B is unaffected as no folding is done in
either kernel or user space.
> Here's an example of something which is not 8 bit that i found in
> an old script that should work (but I didnt test in current kernels).
> ht 2:: sample u32 0x00000800 0x0000ff00 at 12
> We are still going to extract only 8 bits for the bucket.
Yes. The resulting key is 8Bit as the low zeroes are automatically
shifted away.
> Can you provide an example of what wouldnt work?
Sure, sorry for not including it in the original email. Let's apply
actions to some packets based on source IP address. To efficiently
support arbitrary numbers, we use a hash table with 256 buckets:
# tc qd add dev test0 ingress
# tc filter add dev test0 parent ffff: prio 99 handle 1: u32 divisor 256
# tc filter add dev test0 parent ffff: prio 1 protocol ip u32 \
hashkey mask 0xffffffff at 12 link 1: match u8 0 0
So with the above in place, the kernel uses 32bits at offset 12 as a key
to determine the bucket to jump to. This is done by just extracting the
lowest 8bits in host byteorder, i.e. the last octet of the packet's
source address.
Users don't know the above (and shouldn't need to), so they use sample
to have the bucket determined automatically:
# tc filter add dev test0 parent ffff: prio 99 u32 \
match ip src 10.0.0.2 \
ht 1: sample ip src 10.0.0.2 divisor 256 \
action drop
iproute2 calculates bucket 8 (= 10 ^ 2), while the kernel will check
bucket 2. So the above filter will never match.
Cheers, Phil
Powered by blists - more mailing lists