lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49F77134.9030907@myri.com>
Date:	Tue, 28 Apr 2009 17:12:20 -0400
From:	Andrew Gallatin <gallatin@...i.com>
To:	Herbert Xu <herbert@...dor.apana.org.au>
CC:	David Miller <davem@...emloft.net>, brice@...i.com,
	sgruszka@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH] myr10ge: again fix lro_gen_skb() alignment

For variety, I grabbed a different "slow" receiver.  This is another
2 CPU machine, but a dual-socket single-core opteron (Tyan S2895)

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 252
stepping        : 1
cpu MHz         : 2611.738
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt 
lm 3dnowext 3dnow rep_good pni lahf_lm
bogomips        : 5223.47
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

The sender was an identical machine running an ancient RHEL4 kernel
(2.6.9-42.ELsmp) and our downloadable (backported) driver.
(http://www.myri.com/ftp/pub/Myri10GE/myri10ge-linux.1.4.4.tgz)
I disabled LRO, on the sender.

Binding the IRQ to CPU0, and the netserver to CPU1 I see 8.1Gb/s with
LRO and 8.0Gb/s with GRO.

Binding the IRQ to CPU0, and the netserver to CPU0, I see 6.9Gb/s
with LRO and 5.5 Gb/s with GRO.  Monitoring the packet/byte counts
on the interface once per second, LRO looks like this:

        Ipkts       IBytes        Opkts       Obytes
       588992    891733888         9758       644028
       589610    892669540         9771       644886
       589079    891865606         9754       643764

And GRO looks like this:

       480309    727187826         7949       524634
       480032    726768448         7947       524502
       480000    726720000         7943       524238


Similarly, in this same scenario, binding the app/irq to the same
CPU and running mpstat -P 0 1 shows about 60%sys and 40% irq+softirq
while GRO shows about 45% sys and 55% irq+softirq.

I can't put my finger on it, but something about GRO is certainly
more expensive on these types of machines.  I wish there was some
way you could see it, since it happens on every older AMD I try
it on.  If you haven't been able to reproduce it, I'll see if I
can make it happen on a newer "slow" amd64 box I have tomorrow.


Drew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ