netdev - Re: htb parallelism on multi-core platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 08 May 2009 12:15:01 +0200
From:	Paweł Staszewski <pstaszewski@...are.pl>
To:	Linux Network Development list <netdev@...r.kernel.org>
CC:	netdev <netdev@...r.kernel.org>
Subject: Re: htb parallelism on multi-core platforms

Radu You have something wrong with your configuration i think.

I make Traffic management for many different nets with space of /18 
prefix outside net + 10.0.0.0/18 inside and some nets like /21 , /22 , 
/23, /20 network prefixes.

Some stats from my router:

tc -s -d filter show dev eth0 | grep dst | wc -l
14087
tc -s -d filter show dev eth1 | grep dst | wc -l
14087

cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU            3075  @ 2.66GHz
stepping        : 11
cpu MHz         : 2659.843
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority
bogomips        : 5319.68
clflush size    : 64
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU            3075  @ 2.66GHz
stepping        : 11
cpu MHz         : 2659.843
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm 
constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority
bogomips        : 5320.30
clflush size    : 64
power management:


mpstat -P ALL 1 10
Average:     CPU   %user   %nice    %sys %iowait    %irq   %soft  
%steal   %idle    intr/s
Average:     all    0.00    0.00    0.15    0.00    0.00    0.10    
0.00   99.75  73231.70
Average:       0    0.00    0.00    0.20    0.00    0.00    0.10    
0.00   99.70      0.00
Average:       1    0.00    0.00    0.00    0.00    0.00    0.00    
0.00  100.00  27686.80
Average:       2    0.00    0.00    0.00    0.00    0.00    0.00    
0.00    0.00      0.00

Some opreport:
CPU: Core 2, speed 2659.84 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a 
unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        app name                 symbol name
7592      8.3103  vmlinux                  rb_next
5393      5.9033  vmlinux                  e1000_get_hw_control
4514      4.9411  vmlinux                  hfsc_dequeue
4069      4.4540  vmlinux                  e1000_intr_msi
3695      4.0446  vmlinux                  u32_classify
3522      3.8552  vmlinux                  poll_idle
2234      2.4454  vmlinux                  _raw_spin_lock
2077      2.2735  vmlinux                  read_tsc
1855      2.0305  vmlinux                  rb_prev
1834      2.0075  vmlinux                  getnstimeofday
1800      1.9703  vmlinux                  e1000_clean_rx_irq
1553      1.6999  vmlinux                  ip_route_input
1509      1.6518  vmlinux                  hfsc_enqueue
1451      1.5883  vmlinux                  irq_entries_start
1419      1.5533  vmlinux                  mwait_idle
1392      1.5237  vmlinux                  e1000_clean_tx_irq
1345      1.4723  vmlinux                  rb_erase
1294      1.4164  vmlinux                  sfq_enqueue
1187      1.2993  libc-2.6.1.so            (no symbols)
1162      1.2719  vmlinux                  sfq_dequeue
1134      1.2413  vmlinux                  ipt_do_table
1116      1.2216  vmlinux                  apic_timer_interrupt
1108      1.2128  vmlinux                  cftree_insert
1039      1.1373  vmlinux                  rtsc_y2x
985       1.0782  vmlinux                  e1000_xmit_frame
943       1.0322  vmlinux                  update_vf

 bwm-ng v0.6 (probing every 5.000s), press 'h' for help
  input: /proc/net/dev type: rate
  /         iface                   Rx                   
Tx                Total
  
==============================================================================
               lo:           0.00 KB/s            0.00 KB/s            
0.00 KB/s
             eth1:       20716.35 KB/s        24258.43 KB/s        
44974.78 KB/s
             eth0:       24365.31 KB/s        30691.10 KB/s        
55056.42 KB/s
  
------------------------------------------------------------------------------

bwm-ng v0.6 (probing every 5.000s), press 'h' for help
  input: /proc/net/dev type: rate
  |         iface                   Rx                   
Tx                Total
  
==============================================================================
               lo:            0.00 P/s             0.00 P/s             
0.00 P/s
             eth1:        38034.00 P/s         36751.00 P/s         
74785.00 P/s
             eth0:        37195.40 P/s         38115.00 P/s         
75310.40 P/s
      
Maximum CPU load is when rush hour (from 5:00 pm to 10:00 pm) then it is 
20% - 30% of each CPU.


So i think you must change type of your hash tree in u32 filtering.
I use simply split of big nets like /18, /20, /21 to /24 prefixes  to 
build my hash tree.
I make many tests and this configuration of hash works best for my 
configuration.



Regards
Paweł Sstaszewski





Calin Velea pisze:
> Thursday, April 30, 2009, 2:19:36 PM, you wrote:
>
>   
>> On Thu, 2009-04-30 at 01:49 +0300, Calin Velea wrote:
>>     
>>>    I tested with e1000 only, on a single quad-core CPU - the L2 cache was
>>> shared between the cores.
>>>
>>>   For 8 cores I suppose you have 2 quad-core CPUs. If the cores actually
>>> used belong to different physical CPUs, L2 cache sharing does not occur -
>>> maybe this could explain the performance drop in your case.
>>>   Or there may be other explanation...
>>>       
>
>   
>> It is correct, I have 2 quad-core CPUs. If adjacent kernel-identified
>> CPUs are on the same physical CPU (e.g. CPU0, CPU1, CPU2 and CPU3) - and
>> it is very probable - then I think the L2 cache was actually shared.
>> That's because the used CPUs where either 0-3 or 4-7 but never a mix of
>> them. So perhaps there is another explanation (maybe driver/hardware).
>>     
>
>   
>>>   It could be the only way to get more power is to increase the number 
>>> of devices where you are shaping. You could split the IP space into 4 groups
>>> and direct the trafic to 4 IMQ devices with 4 iptables rules -
>>>
>>> -d 0.0.0.0/2 -j IMQ --todev imq0,
>>> -d 64.0.0.0/2 -j IMQ --todev imq1, etc...
>>>       
>
>   
>> Yes, but what if let's say 10.0.0.0/24 and 70.0.0.0/24 need to share
>> bandwidth? 10.a.b.c goes to imq0 qdisc, and 70.x.y.z goes to imq1 qdisc,
>> and the two qdiscs (HTB sets) are independent. This will result in a
>> maximum of double the allocated bandwidth (if HTB sets are identical and
>> traffic is equally distributed).
>>     
>
>   
>>>   The performance gained through parallelism might be a lot higher than the 
>>> added overhead of iptables and/or ipset nethash match. Anyway - this is more of
>>> a "hack" than a clean solution :)
>>>
>>> p.s.: latest IMQ at http://www.linuximq.net/ is for 2.6.26 so you will need to try with that
>>>       
>
>   
>> Yes, the performance gained through parallelism is expected to be higher
>> than the loss of the additional overhead. That's why I asked for
>> parallel HTB in the first place, but got very disappointed after David
>> Miller's reply :)
>>     
>
>   
>> Thanks a lot for all the hints and for the imq link. Imq is very
>> interesting regardless of whether it proves to be useful for this
>> project of mine or not.
>>     
>
>   
>> Radu Rendec
>>     
>
>
>    Indeed, you need to use ipset with nethash to avoid bandwidth doubling.
> Let's say we have a shaping bridge: customer side (download) is
> on eth0, the upstream side (upload) is on eth1.
>
>    Create customer groups with ipset (http://ipset.netfilter.org/)
>
> ipset -N cust_group1_ips nethash
> ipset -A cust_group1_ips <subnet/mask>
> ....
> ....for each subnet
>
>
>
> To shape the upload with multiple IMQs:
>
> -m physdev --physdev-in eth0 -m set --set cust_group1_ips src -j IMQ --to-dev 0
> -m physdev --physdev-in eth0 -m set --set cust_group2_ips src -j IMQ --to-dev 1
> -m physdev --physdev-in eth0 -m set --set cust_group3_ips src -j IMQ --to-dev 2
> -m physdev --physdev-in eth0 -m set --set cust_group4_ips src -j IMQ --to-dev 3
>
>
>  You will apply the same htb upload limits to imq 0-3.
>  Upload for customers having source IPs from the first group will be shaped
> by imq0, for the second, by imq1, etc...
>
>
> For download:
>
> -m physdev --physdev-in eth1 -m set --set cust_group1_ips dst -j IMQ --to-dev 4
> -m physdev --physdev-in eth1 -m set --set cust_group2_ips dst -j IMQ --to-dev 5
> -m physdev --physdev-in eth1 -m set --set cust_group3_ips dst -j IMQ --to-dev 6
> -m physdev --physdev-in eth1 -m set --set cust_group4_ips dst -j IMQ --to-dev 7
>
> and apply the same download limits on imq 4-7
>
>
>   
>> __________ NOD32 4045 (20090430) Information __________
>>     
>
>   
>> This message was checked by NOD32 antivirus system.
>> http://www.eset.com
>>     
>
>
>
>
>   

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html