netdev - tc filter hash table efficiency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 28 Feb 2012 22:55:18 -0500
From:	"John A. Sullivan III" <jsullivan@...nsourcedevel.com>
To:	netdev@...r.kernel.org
Subject: tc filter hash table efficiency

Hello, all.  Would it be correct to assume that tc filter hash tables
are like iptables user defined chains and can be used to reduce the
number of evaluations which must be made for each packet or is there
significant additional overhead incurred by such hash tables?

We are finding this isn't working for us.

For example, we wanted to shape VoIP packets in ingress to our PBX from
both VPN and direct Internet connections.  Since the ifb interface gets
the packet before NAT, we must account for both the public and internal
destination addresses.  We also need to match several possible port
ranges to match any UDP ports over 4096 because of the way tc filter
ranges are masked.  So we did the following:

# UDP
tc filter replace dev ifb0 parent 20:0 protocol ip prio 2 handle 627:
u32 divisor 1
tc filter replace dev ifb0 parent 20:0 protocol ip prio 2 u32 match ip
protocol 17 0xff link 627: offset at 0 mask 0x0f00 shift 6 plus 0

That is, we created a hash table for all UDP packets with an accurate
pointer to the beginning of the UDP header.


# VoIP - UDP packets to the VoIP network under 256 Bytes over port 1024
tc filter replace dev ifb0 parent 62:0 protocol ip prio 2 handle 628:
u32 divisor 1
tc filter replace dev ifb0 parent 62:0 protocol ip prio 2 u32 ht 627:0
match ip dst 172.x.x.0/24 match u16 0 0xffc0 at 2 link 628:
tc filter replace dev ifb0 parent 62:0 protocol ip prio 2 u32 ht 627:0
match ip dst 208.z.z.z match u16 0 0xffc0 at 2 link 628:

That is, we create another hash linked from the UDP hash table so we
only have to evaluate the IP type once and dump all small packets
destined for either the NAT or real address to that hash table

tc filter replace dev ifb0 parent 62:0 protocol ip prio 2 u32 ht 628:0
match udp dst 32768 0x8000 at nexthdr+2 flowid 62:20
tc filter replace dev ifb0 parent 62:0 protocol ip prio 2 u32 ht 628:0
match udp dst 16384 0x4000 at nexthdr+2 flowid 62:20
tc filter replace dev ifb0 parent 62:0 protocol ip prio 2 u32 ht 628:0
match udp dst 8192 0x2000 at nexthdr+2 flowid 62:20
tc filter replace dev ifb0 parent 62:0 protocol ip prio 2 u32 ht 628:0
match udp dst 4096 0x1000 at nexthdr+2 flowid 62:20

And then create filters for the various port ranges.  We figured this
way, we only had to evaluate the IP protocol once instead of two or
three times; we only had to evaluate the destination address and packet
size once instead of twice, and needed only four packet size rules
instead of four for each IP address.

Is this a proper use of hash tables or have I really abused the concept?
When we look at the filter stats, we see:

 tc -s filter show dev ifb0
filter parent 62: protocol ip pref 1 u32
filter parent 62: protocol ip pref 1 u32 fh 800: ht divisor 1
filter parent 62: protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 62:50  (rule hit 37879 success 295)
  match 00320000/00ff0000 at 8 (success 295 )
filter parent 62: protocol ip pref 1 u32 fh 800::801 order 2049 key ht 800 bkt 0 flowid 62:40  (rule hit 37584 success 7866)
  match d02e5d08/ffffffff at 16 (success 7868 )
  match 00060000/00ff0000 at 8 (success 7866 )
filter parent 62: protocol ip pref 2 u32
filter parent 62: protocol ip pref 2 u32 fh 628: ht divisor 1
filter parent 62: protocol ip pref 2 u32 fh 628::800 order 2048 key ht 628 bkt 0 flowid 62:20  (rule hit 35 success 0)
  match 00008000/00008000 at nexthdr+0 (success 0 )
filter parent 62: protocol ip pref 2 u32 fh 628::801 order 2049 key ht 628 bkt 0 flowid 62:20  (rule hit 35 success 0)
  match 00004000/00004000 at nexthdr+0 (success 0 )
filter parent 62: protocol ip pref 2 u32 fh 628::802 order 2050 key ht 628 bkt 0 flowid 62:20  (rule hit 35 success 0)
  match 00002000/00002000 at nexthdr+0 (success 0 )
filter parent 62: protocol ip pref 2 u32 fh 628::803 order 2051 key ht 628 bkt 0 flowid 62:20  (rule hit 35 success 33)
  match 00001000/00001000 at nexthdr+0 (success 33 )
filter parent 62: protocol ip pref 2 u32 fh 627: ht divisor 1
filter parent 62: protocol ip pref 2 u32 fh 627::800 order 2048 key ht 627 bkt 0 link 628:  (rule hit 11617 success 0)
  match axxxxe00/ffffff00 at 16 (success 4847 )
  match 00000000/0000ffc0 at 0 (success 35 )
filter parent 62: protocol ip pref 2 u32 fh 627::801 order 2049 key ht 627 bkt 0 link 628:  (rule hit 11584 success 0)
  match d0yyyy0e/ffffffff at 16 (success 20 )
  match 00000000/0000ffc0 at 0 (success 0 )

yet packet traces show oodles of 200 byte UDP packets on ports over
10000 destined for the PBX yet I see only a handful of packets flowing
into class 62:20.

Thanks - John


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html