netdev - Re: SFQ on HFSC leaf does not seem to work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1324651128.10184.586.camel@denise.theartistscloset.com>
Date:	Fri, 23 Dec 2011 09:38:48 -0500
From:	"John A. Sullivan III" <jsullivan@...nsourcedevel.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: SFQ on HFSC leaf does not seem to work

On Fri, 2011-12-23 at 15:00 +0100, Eric Dumazet wrote:
> Le vendredi 23 décembre 2011 à 14:45 +0100, Eric Dumazet a écrit :
> 
> > 1) What kernel version do you use ?
> > 
> > 2) How many concurrent flows are running (number of netperf/netcat)
> > 
> > 3) Remind that 'perturb xxx' introduces a temporary doubling of the
> > number of flows.
> > 
> > 4) Had you disabled tso on eth1 ?
> >    (If not, you might send 64Kbytes packets, and at 400kbit, they take a
> > lot of time to transmit : more than one second ...)
> > 
> > 
> 
> Using your script on net-next, (only using eth3 instead of eth1) and
> 
> ethtool -K eth3 tso off
> ethtool -K eth3 gso off
> ip ro flush cache
> 
> one ssh : dd if=/dev/zero | ssh 192.168.0.1 "dd of=/dev/null"
> my ping is quite good :
> 
> 
> $ ping -c 20 192.168.0.1
> PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
> 2011/11/23 14:57:01.106 64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=59.4 ms
> 2011/11/23 14:57:02.121 64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=72.7 ms
> 2011/11/23 14:57:03.109 64 bytes from 192.168.0.1: icmp_seq=3 ttl=64 time=60.3 ms
> 2011/11/23 14:57:04.108 64 bytes from 192.168.0.1: icmp_seq=4 ttl=64 time=57.8 ms
> 2011/11/23 14:57:05.115 64 bytes from 192.168.0.1: icmp_seq=5 ttl=64 time=62.6 ms
> 2011/11/23 14:57:06.116 64 bytes from 192.168.0.1: icmp_seq=6 ttl=64 time=62.6 ms
> 2011/11/23 14:57:07.112 64 bytes from 192.168.0.1: icmp_seq=7 ttl=64 time=57.6 ms
> 2011/11/23 14:57:08.127 64 bytes from 192.168.0.1: icmp_seq=8 ttl=64 time=70.9 ms
> 2011/11/23 14:57:09.123 64 bytes from 192.168.0.1: icmp_seq=9 ttl=64 time=65.4 ms
> 2011/11/23 14:57:10.113 64 bytes from 192.168.0.1: icmp_seq=10 ttl=64 time=53.5 ms
> 2011/11/23 14:57:11.127 64 bytes from 192.168.0.1: icmp_seq=11 ttl=64 time=66.7 ms
> 2011/11/23 14:57:12.129 64 bytes from 192.168.0.1: icmp_seq=12 ttl=64 time=67.4 ms
> 2011/11/23 14:57:13.119 64 bytes from 192.168.0.1: icmp_seq=13 ttl=64 time=56.3 ms
> 2011/11/23 14:57:14.127 64 bytes from 192.168.0.1: icmp_seq=14 ttl=64 time=64.0 ms
> 2011/11/23 14:57:15.116 64 bytes from 192.168.0.1: icmp_seq=15 ttl=64 time=51.9 ms
> 2011/11/23 14:57:16.127 64 bytes from 192.168.0.1: icmp_seq=16 ttl=64 time=61.2 ms
> 2011/11/23 14:57:17.127 64 bytes from 192.168.0.1: icmp_seq=17 ttl=64 time=60.4 ms
> 2011/11/23 14:57:18.135 64 bytes from 192.168.0.1: icmp_seq=18 ttl=64 time=68.2 ms
> 2011/11/23 14:57:19.137 64 bytes from 192.168.0.1: icmp_seq=19 ttl=64 time=69.1 ms
> 2011/11/23 14:57:20.136 64 bytes from 192.168.0.1: icmp_seq=20 ttl=64 time=67.0 ms
> 
> --- 192.168.0.1 ping statistics ---
> 20 packets transmitted, 20 received, 0% packet loss, time 19022ms
> rtt min/avg/max/mdev = 51.909/62.796/72.751/5.579 ms
> 
> $ tc -s -d class show dev eth3
> class hfsc 1: root 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
>  period 0 level 2 
> 
> class hfsc 1:1 parent 1: sc m1 0bit d 0us m2 1490Kbit ul m1 0bit d 0us m2 1490Kbit 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
>  period 69 work 38559740 bytes level 1 
> 
> class hfsc 1:10 parent 1:1 leaf 1101: rt m1 327680bit d 50.0ms m2 200000bit ls m1 0bit d 0us m2 1000Kbit 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
>  period 0 level 0 
> 
> class hfsc 1:20 parent 1:1 leaf 1201: rt m1 0bit d 0us m2 400000bit ls m1 0bit d 0us m2 200000bit 
>  Sent 38587058 bytes 27022 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 19p requeues 0 
>  period 69 work 38559740 bytes rtwork 10358780 bytes level 0 
> 
> class hfsc 1:30 parent 1:1 leaf 1301: rt m1 605600bit d 20.0ms m2 20000bit 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
>  period 0 level 0 
> 
> class sfq 1201:f7 parent 1201: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 25804b 18p requeues 0 
>  allot -1336 
> 
> 
> Hmm... we probably could fill hfsc class information with non null bytes backlog...
> I'll take a look.
> 
> 
> 
Thanks very much, Eric.  gso and gso only was enabled but disabling it
does not seem to have solved the problem when I activate netem:

root@...tswitch01:~# ./tcplay
root@...tswitch01:~# man ethtool
root@...tswitch01:~# ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off
ntuple-filters: off
receive-hashing: off
root@...tswitch01:~# ethtool -K eth1 gso off
root@...tswitch01:~# ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
ntuple-filters: off
receive-hashing: off
ip ro flush cache

64 bytes from 192.168.223.84: icmp_req=16 ttl=64 time=42.6 ms
64 bytes from 192.168.223.84: icmp_req=17 ttl=64 time=39.1 ms
64 bytes from 192.168.223.84: icmp_req=18 ttl=64 time=45.5 ms
64 bytes from 192.168.223.84: icmp_req=19 ttl=64 time=406 ms
64 bytes from 192.168.223.84: icmp_req=20 ttl=64 time=919 ms
64 bytes from 192.168.223.84: icmp_req=21 ttl=64 time=920 ms
64 bytes from 192.168.223.84: icmp_req=22 ttl=64 time=1013 ms
64 bytes from 192.168.223.84: icmp_req=23 ttl=64 time=1158 ms
64 bytes from 192.168.223.84: icmp_req=24 ttl=64 time=1521 ms
64 bytes from 192.168.223.84: icmp_req=25 ttl=64 time=1915 ms
64 bytes from 192.168.223.84: icmp_req=26 ttl=64 time=2371 ms
64 bytes from 192.168.223.84: icmp_req=27 ttl=64 time=2797 ms
64 bytes from 192.168.223.84: icmp_req=28 ttl=64 time=3161 ms
64 bytes from 192.168.223.84: icmp_req=29 ttl=64 time=3162 ms
64 bytes from 192.168.223.84: icmp_req=30 ttl=64 time=3163 ms

Just in case something is amiss in my methodology, I have four ssh
sessions open to the test firewall; ssh is in a separate prioritized
queue.  In one session I run:
	ping 192.168.223.84
Then, in another, I do:
	nc 192.168.223.100 443 >/dev/null - this should go into a non-default,
prioritized queue.
Pings are OK at this point.
Then, in a third, I do:
	nc 192.168.223.100 80 >/dev/null - this goes into the default queue,
the same as ping, and is when the trouble starts.

I did alter the queue lengths in a recommendation from Dave Taht.  Here
is my current script with netem:

tc qdisc add dev eth1 root handle 1: hfsc default 20
tc class add dev eth1 parent 1: classid 1:1 hfsc sc rate 1490kbit ul
rate 1490kbit
tc class add dev eth1 parent 1:1 classid 1:20 hfsc rt rate 400kbit ls
rate 200kbit
tc qdisc add dev eth1 parent 1:20 handle 1201 sfq perturb 60 limit 30
tc class add dev eth1 parent 1:1 classid 1:10 hfsc rt umax 16kbit dmax
50ms rate 200kbit ls rate 1000kbit
tc qdisc add dev eth1 parent 1:10 handle 1101 sfq perturb 60 limit 30
tc class add dev eth1 parent 1:1 classid 1:30 hfsc rt umax 1514b dmax
20ms rate 20kbit
tc qdisc add dev eth1 parent 1:30 handle 1301 sfq perturb 60 limit 30
iptables -t mangle -A POSTROUTING -p 6 --syn --dport 443 -j CONNMARK
--set-mark 0x10
iptables -t mangle -A PREROUTING -p 6 --syn --dport 822 -j CONNMARK
--set-mark 0x11
iptables -t mangle -A POSTROUTING -o eth1 -p 6 -j CONNMARK
--restore-mark
modprobe ifb
ifconfig ifb0 up
ifconfig ifb1 up
tc qdisc add dev ifb0 root handle 1: hfsc default 20
tc class add dev ifb0 parent 1: classid 1:1 hfsc sc rate 1490kbit ul
rate 1490kbit
tc class add dev ifb0 parent 1:1 classid 1:20 hfsc rt rate 400kbit ls
rate 200kbit
tc qdisc add dev ifb0 parent 1:20 handle 1201 netem delay 25ms 5ms
distribution normal loss 0.1% 30%
tc class add dev ifb0 parent 1:1 classid 1:10 hfsc rt umax 16kbit dmax
50ms rate 200kbit ls rate 1000kbit
tc qdisc add dev ifb0 parent 1:10 handle 1101 netem delay 25ms 5ms
distribution normal loss 0.1% 30%
tc class add dev ifb0 parent 1:1 classid 1:30 hfsc rt umax 1514b dmax
20ms rate 20kbit
tc qdisc add dev ifb0 parent 1:30 handle 1301 netem delay 25ms 5ms
distribution normal loss 0.1% 30%
tc filter add dev ifb0 parent 1:0 protocol ip prio 1 handle 6: u32
divisor 1
tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 match ip
protocol 6 0xff link 6: offset at 0 mask 0x0f00 shift 6 plus 0 eat
tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 ht 6:0 match
tcp src 443 0x00ff flowid 1:10
tc filter add dev ifb0 parent 1:0 protocol ip prio 1 u32 ht 6:0 match
tcp dst 822 0xff00 flowid 1:30
tc qdisc add dev ifb1 root handle 2 netem delay 25ms 5ms distribution
normal loss 0.1% 30%
tc qdisc add dev eth1 ingress
tc filter add dev eth1 parent ffff: protocol ip prio 50 u32 match u32 0
0 action mirred egress redirect dev ifb0
tc filter add dev eth1 parent 1:1 protocol ip prio 1 handle 0x11 fw
flowid 1:30
tc filter add dev eth1 parent 1:1 protocol ip prio 1 handle 0x10 fw
flowid 1:10
tc filter add dev eth1 parent 1:1 protocol ip prio 2 u32 match u32 0 0
flowid 1:20
tc filter add dev eth1 parent 1:0 protocol ip prio 1 u32 match u32 0 0
flowid 1:1 action mirred egress redirect dev ifb1
ip link set eth1 txqueuelen 100
ip link set ifb1 txqueuelen 100
ip link set ifb0 txqueuelen 100

I'd love to solve this.  Just when I thought I was all finished having
cracked the multiple filter problem to add netem to hfsc, I hit this.
Thanks again - John

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html