netdev - Re: tc: u32: Wrong sample hash calculation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7d493e9f-23ee-34cf-fbdd-b13a4d3bb4af@mojatatu.com>
Date:   Fri, 22 Jan 2021 06:25:22 -0500
From:   Jamal Hadi Salim <jhs@...atatu.com>
To:     Phil Sutter <phil@....cc>,
        Stephen Hemminger <stephen@...workplumber.org>,
        netdev@...r.kernel.org, Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Russell Stuart <russell-lartc@...art.id.au>
Subject: Re: tc: u32: Wrong sample hash calculation

Hi Phil,

On 2021-01-20 10:23 a.m., Phil Sutter wrote:
> Hi Jamal,
> 
> On Wed, Jan 20, 2021 at 08:55:11AM -0500, Jamal Hadi Salim wrote:
>> On 2021-01-18 6:29 a.m., Phil Sutter wrote:
>>> Hi!
>>>
>>> Playing with u32 filter's hash table I noticed it is not possible to use
>>> 'sample' option with keys larger than 8bits to calculate the hash
>>> bucket.
>>
>>
>> I have mostly used something like: ht 2:: sample ip protocol 1 0xff
>> Hoping this is continuing to work.
> 
> This should read 'sample ip protocol 1 divisor 0xff', right?
> 

0xff is a mask.
The table(256 buckets) is created earlier. Something like:
filter add dev XXX parent ffff: protocol ip prio 10 handle 2:: u32 
divisor 256
This is from some scripts i have that worked. I cant see anything
that would say they will break today.


>> Reminder: you can only have 256 buckets (8 bit representation).
>> Could that be the contributing factor?
> 
> It is. Any key smaller than 256B is unaffected as no folding is done in
> either kernel or user space.
> 

Ok. I have never used it in any scenario other than 8 bits
(maybe subconsciously because of the 256 bucket limit was playing in
my head). I am not sure if Alexey at the time was thinking it is
useful for more than that.

>> Here's an example of something which is not 8 bit that i found in
>> an old script that should work (but I didnt test in current kernels).
>> ht 2:: sample u32 0x00000800 0x0000ff00 at 12
>> We are still going to extract only 8 bits for the bucket.
> 
> Yes. The resulting key is 8Bit as the low zeroes are automatically
> shifted away.
> 

ok.

>> Can you provide an example of what wouldnt work?
> 
> Sure, sorry for not including it in the original email. Let's apply
> actions to some packets based on source IP address. To efficiently
> support arbitrary numbers, we use a hash table with 256 buckets:
> 
> # tc qd add dev test0 ingress
> # tc filter add dev test0 parent ffff: prio 99 handle 1: u32 divisor 256
> # tc filter add dev test0 parent ffff: prio 1 protocol ip u32 \
> 	hashkey mask 0xffffffff at 12 link 1: match u8 0 0
> 
> So with the above in place, the kernel uses 32bits at offset 12 as a key
> to determine the bucket to jump to. This is done by just extracting the
> lowest 8bits in host byteorder, i.e. the last octet of the packet's
> source address.
> 
> Users don't know the above (and shouldn't need to), so they use sample
> to have the bucket determined automatically:
> 
> # tc filter add dev test0 parent ffff: prio 99 u32 \
> 	match ip src 10.0.0.2 \
> 	ht 1: sample ip src 10.0.0.2 divisor 256 \
> 	action drop
> 
> iproute2 calculates bucket 8 (= 10 ^ 2), while the kernel will check
> bucket 2. So the above filter will never match.
> 

Ok, makes more sense.
Is this always true though for all scenarios of key > 8b?
And is there a pattern that can be deduced?
My gut feel is user space is the right/easier spot to fix this
as long as it doesnt break the working setup of 8b.

cheers,
jamal