lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4948566E.1050309@hp.com>
Date:	Tue, 16 Dec 2008 17:31:26 -0800
From:	Rick Jones <rick.jones2@...com>
To:	Linux Network Development list <netdev@...r.kernel.org>,
	"linux-ia64@...r.kernel.org" <linux-ia64@...r.kernel.org>
Subject: Soft Lockups on 2.6.28-rc8 under netperf bulk receive workload

I have a 32-core, 1.6 GHz Montvale hp rx8640 with 128 GB of RAM (64x2GB 
DIMMS) configured as ILM (interleave memory on a cacheline boundary) 
rather than cell local memory.  HyperThreading is disabled.  The system 
has four AD386A PCIe 10G Ethernet interfaces each in a separate PCIe x8 
slot.  The AD386A is a single-port card based on the Chelsio T3C chip. 
The interrupts of the 8 queues on each card are spread across the 32 
cores - 8 queues of card one to cores 0-7 one to one, those of card two 
to cores 8-15, etc etc.  The NICs are in turn connected to an HP 
ProCurve 5406 with a number of 10G modules, which then connect to four, 
4P/16C, 2.3 GHz Opteron 8356 HP DL585 G5's each with two AD386As also in 
x8 slots or better.  I configure four subnets - 192.168.[2345]/24, set 
arp_ignore to one (since they are all carried on the same switch) and 
all five systems are in all four subnets (two IP's per interface on the 
DL585s.

The MTU on all interfaces is 1500 bytes.  cxgb3 driver settings are 
default. net.core.[rw]mem_max is set to 4194304 and netperf is making 
explicit setsockopt calls asking for 1MB SO_[SND|RCV]BUF values.

I then launch 64 concurrent netperf TCP_MAERTS tests (actually the 
"omni" test equivalent which does the same thing) from the rx8640.  This 
causes each of the DL585 G5's to start sending data to the rx8640.

I was first running a not-yet-released distro based on an old 2.6 kernel 
and the 1.1.022 out-of-tree cxgb3 driver and saw soft lockups.  I then 
moved on to a Debian Lenny 2.6.26 kernel, still with the same 
out-of-tree driver and saw soft lockups.

Presently, the system is running a 2.6.28-rc8 kernel from kernel.org 
with the in-tree cxgb3 driver and I still see soft lockups which look like:

[ 1564.448850] BUG: soft lockup - CPU#31 stuck for 61s! [netperf:4094]
[ 1564.449062] Modules linked in: ipv6 loop evdev ext3 jbd mbcache 
dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sg sr_mod cdrom 
sd_mod mptspi mptscsih mptbase cxgb3 scsi_transport_spi e1000 inet_lro 
tg3 scsi_mod libphy
[ 1564.449828] Modules linked in: ipv6 loop evdev ext3 jbd mbcache 
dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sg sr_mod cdrom 
sd_mod mptspi mptscsih mptbase cxgb3 scsi_transport_spi e1000 inet_lro 
tg3 scsi_mod libphy
[ 1564.450575]
[ 1564.450576] Pid: 4094, CPU 31, comm:              netperf
[ 1564.450803] psr : 0000121008026018 ifs : 800000000000040b ip  : 
[<a000000100190121>]    Not tainted (2.6.28-rc8-raj)
[ 1564.451156] ip is at kmem_cache_alloc_node+0x201/0x280
[ 1564.451322] unat: 0000000000000000 pfs : 000000000000040b rsc : 
0000000000000003
[ 1564.451559] rnat: 0000000000003200 bsps: e000001e4f4a8008 pr  : 
a6144659159a6665
[ 1564.451796] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 
0009804c8a70033f
[ 1564.452033] csd : 0000000000000000 ssd : 0000000000000000
[ 1564.452208] b0  : a0000001001900f0 b6  : a00000010060f240 b7  : 
a000000100494f40
[ 1564.452445] f6  : 1003e0000000000002c00 f7  : 1003e000000000000002c
[ 1564.452646] f8  : 1003e0000000000000100 f9  : 1003efffffffffffffa58
[ 1564.452848] f10 : 1003e0000000000000015 f11 : 1003ecf3cf3cf3cf3cf3d
[ 1564.453050] r1  : a000000100b377d0 r2  : 0000000000000691 r3  : 
0000000000000004
[ 1564.453287] r8  : e000001e4d8d2d80 r9  : 000000000000003e r10 : 
e00000408c243718
[ 1564.454333] r11 : 0000000000041200 r12 : e000001f803479b0 r13 : 
e000001f80340000
[ 1564.454570] r14 : 0000000000000000 r15 : e00000408c243508 r16 : 
0000000000002c00
[ 1564.454807] r17 : 0000000000002c00 r18 : e000001e8db6c000 r19 : 
e00000408d248490
[ 1564.455044] r20 : e000001e4d8d0000 r21 : e000001e4d8d0008 r22 : 
e000001e4d8d0180
[ 1564.455281] r23 : 0000000000000100 r24 : e000001e8db6c008 r25 : 
0000000000100100
[ 1564.455519] r26 : 0000000000004000 r27 : 0000000000004000 r28 : 
e000001e06408000
[ 1564.455755] r29 : 0000000000000bd2 r30 : e00000408d2484b0 r31 : 
0000000000000bd3
[ 1564.456002]
[ 1564.456003] Call Trace:
[ 1564.456148]  [<a000000100014c30>] show_stack+0x50/0xa0
[ 1564.456149]                                 sp=e000001f80347610 
bsp=e000001f80341bc8
[ 1564.456564]  [<a0000001000154a0>] show_regs+0x820/0x860
[ 1564.456566]                                 sp=e000001f803477e0 
bsp=e000001f80341b70
[ 1564.515003]  [<a000000100119710>] softlockup_tick+0x350/0x400
[ 1564.515004]                                 sp=e000001f803477e0 
bsp=e000001f80341b28
[ 1564.515445]  [<a0000001000ad040>] run_local_timers+0x40/0x60
[ 1564.515447]                                 sp=e000001f803477e0 
bsp=e000001f80341b10
[ 1564.516443]  [<a0000001000ad140>] update_process_times+0x40/0xc0
[ 1564.516445]                                 sp=e000001f803477e0 
bsp=e000001f80341ae0
[ 1564.516891]  [<a000000100039a60>] timer_interrupt+0x1c0/0x3e0
[ 1564.516892]                                 sp=e000001f803477e0 
bsp=e000001f80341a78
[ 1564.517325]  [<a00000010011a340>] handle_IRQ_event+0x80/0x120
[ 1564.517326]                                 sp=e000001f803477e0 
bsp=e000001f80341a40
[ 1564.517759]  [<a00000010011a560>] __do_IRQ+0x180/0x4c0
[ 1564.517760]                                 sp=e000001f803477e0 
bsp=e000001f803419d0
[ 1564.577929]  [<a000000100011130>] ia64_handle_irq+0x290/0x340
[ 1564.577931]                                 sp=e000001f803477e0 
bsp=e000001f80341958
[ 1564.578365]  [<a00000010000bf40>] ia64_native_leave_kernel+0x0/0x270
[ 1564.578366]                                 sp=e000001f803477e0 
bsp=e000001f80341958
[ 1564.579299]  [<a000000100190120>] kmem_cache_alloc_node+0x200/0x280
[ 1564.579300]                                 sp=e000001f803479b0 
bsp=e000001f80341900
[ 1564.579863]  [<a0000001004cd210>] __alloc_skb+0x50/0x220
[ 1564.579864]                                 sp=e000001f803479b0 
bsp=e000001f803418b0
[ 1564.581319]  [<a00000020ecc9850>] process_responses+0xc50/0x2080 [cxgb3]
[ 1564.581321]                                 sp=e000001f803479b0 
bsp=e000001f80341610
[ 1564.581793]  [<a00000020eccacc0>] napi_rx_handler+0x40/0x200 [cxgb3]
[ 1564.581794]                                 sp=e000001f803479f0 
bsp=e000001f803415d0
[ 1564.582250]  [<a0000001004d76f0>] net_rx_action+0x210/0x560
[ 1564.582251]                                 sp=e000001f803479f0 
bsp=e000001f80341538
[ 1564.642479]  [<a0000001000a1760>] __do_softirq+0x120/0x2e0
[ 1564.642481]                                 sp=e000001f803479f0 
bsp=e000001f80341498
[ 1564.643364]  [<a0000001000a1990>] do_softirq+0x70/0xc0
[ 1564.643366]                                 sp=e000001f803479f0 
bsp=e000001f80341438
[ 1564.643780]  [<a0000001000a1c50>] irq_exit+0x70/0xa0
[ 1564.643781]                                 sp=e000001f803479f0 
bsp=e000001f80341420
[ 1564.644190]  [<a0000001000111a0>] ia64_handle_irq+0x300/0x340
[ 1564.644191]                                 sp=e000001f803479f0 
bsp=e000001f803413a0
[ 1564.644623]  [<a00000010000bf40>] ia64_native_leave_kernel+0x0/0x270
[ 1564.644625]                                 sp=e000001f803479f0 
bsp=e000001f803413a0
[ 1564.645077]  [<a000000100192720>] kmem_cache_free+0x580/0x5c0
[ 1564.645079]                                 sp=e000001f80347bc0 
bsp=e000001f80341328
[ 1564.703103]  [<a0000001004cb0a0>] __kfree_skb+0x180/0x1a0
[ 1564.703104]                                 sp=e000001f80347bd0 
bsp=e000001f80341308
[ 1564.703537]  [<a000000100556150>] tcp_recvmsg+0xcd0/0x1180
[ 1564.703538]                                 sp=e000001f80347bd0 
bsp=e000001f80341180
[ 1564.704667]  [<a0000001004beab0>] sock_common_recvmsg+0x90/0xe0
[ 1564.704668]                                 sp=e000001f80347be0 
bsp=e000001f80341140
[ 1564.705112]  [<a0000001004bb090>] sock_recvmsg+0x1d0/0x220
[ 1564.705114]                                 sp=e000001f80347bf0 
bsp=e000001f80341108
[ 1564.705538]  [<a0000001004bd590>] sys_recvfrom+0x110/0x220
[ 1564.705540]                                 sp=e000001f80347d60 
bsp=e000001f80341088
[ 1564.705965]  [<a0000001004bd6e0>] sys_recv+0x40/0x60
[ 1564.705966]                                 sp=e000001f80347e30 
bsp=e000001f80341030
[ 1564.706376]  [<a00000010000bda0>] ia64_ret_from_syscall+0x0/0x20
[ 1564.706377]                                 sp=e000001f80347e30 
bsp=e000001f80341030
[ 1564.755966]  [<a000000000010720>] 
__start_ivt_text+0xffffffff00010720/0x400
[ 1564.755967]                                 sp=e000001f80348000 
bsp=e000001f80341030

I have also taken Caliper (http://www.hp.com/go/caliper - uses the 
perfmon interface present in ia64 kernels) cycles and scgprof profiles.

The cycles profile uses the Itanium PMU to gather samples via the PMU so 
interrupts on/off is a don't care.  The profile for one of the netperf 
processes looks like this:

----------------------------------------------------
Run Summary (50 of 50 for netperf, Collection Run 1)
----------------------------------------------------

Target Application
   Program:            /opt/netperf2/bin/netperf
   Invocation:         netperf -P 0 -t omni -c -C -H 192.168.5.3 -l 30 
-i 30 30 -- -o tmpfoo -H 192.168.5.3 -s 1M -S 1M -M  64K -u 
868b6602-cbd0-11dd-aa0c-45ce53e6b179
   Process ID:         4022 (unknown start)
   Start time:         12:20:36 AM
   End time:           12:27:16 AM
   Termination Status: n/a
   Last modified:      December 15, 2008 at 08:15 PM
   Memory model:       LP64
   Hyperthreading:     disabled in firmware

Sampling Specification
   Number of samples: 478
   Data sampled:      ETB

Load Module Summary
---------------------------------------------
% Total
      IP  Cumulat             IP
Samples    % of         Samples
  (ETB)     Total         (ETB)    Load Module
---------------------------------------------
   99.05    99.05        822136    vmlinux
    0.90    99.95          7447    cxgb3.ko
    0.04    99.98           296    libc-2.7.so
    0.01    99.99           108    netperf
    0.00   100.00            30    *kernel gateway*
    0.00   100.00            10    inet_lro.ko
    0.00   100.00             2    ld-2.7.so
    0.00   100.00             1    mptscsih.ko
---------------------------------------------
  100.00   100.00        830030    Total
---------------------------------------------

Function Summary
-----------------------------------------------------------------------
% Total
      IP  Cumulat             IP
Samples    % of         Samples
  (ETB)     Total         (ETB)   Function                          File
-----------------------------------------------------------------------
   75.45    75.45        626273   vmlinux::ia64_spinlock_contention
    4.42    79.87         36683   vmlinux::_spin_lock
    4.08    83.95         33848   vmlinux::free_block
    3.24    87.18         26856   vmlinux::____cache_alloc_node
    2.09    89.27         17334   vmlinux::__copy_user
    1.75    91.03         14554   vmlinux::fallback_alloc
    0.99    92.01          8180   vmlinux::kfree
    0.79    92.80          6540   vmlinux::kmem_cache_free
    0.61    93.41          5026   cxgb3.ko::process_responses
    0.41    93.82          3423   vmlinux::__alloc_skb
    0.36    94.18          2971   vmlinux::skb_pull
    0.26    94.43          2142   vmlinux::dst_release
    0.25    94.68          2034   vmlinux::tcp_v4_do_rcv
    0.24    94.92          2033   vmlinux::tcp_recvmsg
    0.23    95.15          1886   vmlinux::tcp_parse_md5sig_option
    0.22    95.37          1789   vmlinux::tcp_v4_rcv
    0.21    95.57          1729   vmlinux::cache_alloc_refill
    0.20    95.78          1699   vmlinux::ip_route_input
    0.20    95.98          1657   vmlinux::eth_type_trans
    0.20    96.18          1655   vmlinux::tcp_event_data_recv
    0.18    96.36          1476   vmlinux::skb_copy_datagram_iovec
    0.17    96.53          1425   vmlinux::__inet_lookup_established
    0.15    96.67          1217   vmlinux::tcp_rcv_established
    0.14    96.82          1196   cxgb3.ko::refill_fl
    0.13    96.95          1098   vmlinux::netif_receive_skb
    0.13    97.08          1040   vmlinux::cache_grow
    0.12    97.20          1036   cxgb3.ko::t3_eth_xmit
    0.11    97.31           942   vmlinux::memset

(being one of the cobbler's children I don't yet know how to use the 
tools to produce one combined cycles profile across the whole kernel, 
but each netperf looks pretty much like that).

The scgprof profile is a sampled call graph profile.  I believe that it 
will be sensitive to whether interrupts are enabled/disabled and that 
may be why it looks rather different.

All three - cycles, scgprof and dmesg output, along with the config for 
the kernel build can be found under:

ftp://ftp.netperf.org/lockup

At this point I'm open to suggestion for next steps.  My linux kernel 
debugging experience is close to epsilon but I can follow explicit 
direction pretty well.

FWIW an outbound test (from the rx8640 to the dl585's) does not appear 
to generate soft lockup messages.

rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ