[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7d493e9f-23ee-34cf-fbdd-b13a4d3bb4af@mojatatu.com>
Date: Fri, 22 Jan 2021 06:25:22 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Phil Sutter <phil@....cc>,
Stephen Hemminger <stephen@...workplumber.org>,
netdev@...r.kernel.org, Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>,
Russell Stuart <russell-lartc@...art.id.au>
Subject: Re: tc: u32: Wrong sample hash calculation
Hi Phil,
On 2021-01-20 10:23 a.m., Phil Sutter wrote:
> Hi Jamal,
>
> On Wed, Jan 20, 2021 at 08:55:11AM -0500, Jamal Hadi Salim wrote:
>> On 2021-01-18 6:29 a.m., Phil Sutter wrote:
>>> Hi!
>>>
>>> Playing with u32 filter's hash table I noticed it is not possible to use
>>> 'sample' option with keys larger than 8bits to calculate the hash
>>> bucket.
>>
>>
>> I have mostly used something like: ht 2:: sample ip protocol 1 0xff
>> Hoping this is continuing to work.
>
> This should read 'sample ip protocol 1 divisor 0xff', right?
>
0xff is a mask.
The table(256 buckets) is created earlier. Something like:
filter add dev XXX parent ffff: protocol ip prio 10 handle 2:: u32
divisor 256
This is from some scripts i have that worked. I cant see anything
that would say they will break today.
>> Reminder: you can only have 256 buckets (8 bit representation).
>> Could that be the contributing factor?
>
> It is. Any key smaller than 256B is unaffected as no folding is done in
> either kernel or user space.
>
Ok. I have never used it in any scenario other than 8 bits
(maybe subconsciously because of the 256 bucket limit was playing in
my head). I am not sure if Alexey at the time was thinking it is
useful for more than that.
>> Here's an example of something which is not 8 bit that i found in
>> an old script that should work (but I didnt test in current kernels).
>> ht 2:: sample u32 0x00000800 0x0000ff00 at 12
>> We are still going to extract only 8 bits for the bucket.
>
> Yes. The resulting key is 8Bit as the low zeroes are automatically
> shifted away.
>
ok.
>> Can you provide an example of what wouldnt work?
>
> Sure, sorry for not including it in the original email. Let's apply
> actions to some packets based on source IP address. To efficiently
> support arbitrary numbers, we use a hash table with 256 buckets:
>
> # tc qd add dev test0 ingress
> # tc filter add dev test0 parent ffff: prio 99 handle 1: u32 divisor 256
> # tc filter add dev test0 parent ffff: prio 1 protocol ip u32 \
> hashkey mask 0xffffffff at 12 link 1: match u8 0 0
>
> So with the above in place, the kernel uses 32bits at offset 12 as a key
> to determine the bucket to jump to. This is done by just extracting the
> lowest 8bits in host byteorder, i.e. the last octet of the packet's
> source address.
>
> Users don't know the above (and shouldn't need to), so they use sample
> to have the bucket determined automatically:
>
> # tc filter add dev test0 parent ffff: prio 99 u32 \
> match ip src 10.0.0.2 \
> ht 1: sample ip src 10.0.0.2 divisor 256 \
> action drop
>
> iproute2 calculates bucket 8 (= 10 ^ 2), while the kernel will check
> bucket 2. So the above filter will never match.
>
Ok, makes more sense.
Is this always true though for all scenarios of key > 8b?
And is there a pattern that can be deduced?
My gut feel is user space is the right/easier spot to fix this
as long as it doesnt break the working setup of 8b.
cheers,
jamal
Powered by blists - more mailing lists