lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 19 Oct 2009 21:52:05 -0700
From:	Ben Greear <greearb@...delatech.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	NetDev <netdev@...r.kernel.org>
Subject: Re: pktgen and spin_lock_bh in xmit path

Eric Dumazet wrote:
> Ben Greear a écrit :
>   
>> I'm having strange issues when running pktgen on 10G interfaces while
>> also running
>> pktgen on mac-vlans on that interface, when the mac-vlan pktgen threads
>> are on a different
>> CPU.
>>
>> First, lockdep gives up and says that things are not properly
>> annotated.  I believe this is because
>> the macvlan tx path will lock it's txq and will also lock the
>> lower-dev's txq.  To fix this, perhaps
>> we need some new lockdep aware primitives for netdev txq locking?
>>
>> Second, is using _bh() locking really sufficient if we have pktgen
>> writing to a physical device
>> and also have other pktgen threads writing to that same device though
>> mac-vlans?   I'm seeing
>> deadlocks spinning on the _bh() lock in pktgen as well as strange
>> corruptions, so I think there
>> must be *some* problem somewhere, I just don't know quite what it is yet.
>>
>>     
>
> Could you please give us a copy if your pktgen scripts ?
>   
I'm driving it with another program, and my pktgen is a bit hacked, but 
the basic idea is:

1 pktgen connection on cpu 0 running as fast as it can (trying for 
10Gbps, but getting maybe 3-4),
  running between two 10G ports (intel 82599).
  Multi-pkt is set to 10,000 on each side.
3 pairs of mac-vlans on each of the two physical 10G ports.
 3 pktgen 'connections' between these..each are running at about 1Gbps.
 These 3 pktgen connections are on CPU 4.
 Multi-pkt is set to 1 since multi-pkt is a very bad idea on virtual 
devices.

1514 byte pkts.  No IPs on the interfaces, using ToS in pktgen, but 
nothing else is configured to
care.

The two physical ports are cabled together directly with a fibre cable.

All pktgen connections are full duplex (both sides transmitting to each 
other..and I have
rx logic to gather stats on received pkts as well).  With no kernel 
debugging, this can run right at 10Gbps bi-directional,
with lockdep it gets around 5-6Gbps in each direction.

The lockup often occurs near starting/stopping pktgen, but also happens 
while just normally
running under load, usually within 10 minutes.

I tried and failed to reproduce this on a 1G network, but maybe I'm just 
not getting (un)lucky,
didn't try for too long.

Among other things, it appears as if the mac-vlan interfaces sometimes 
become locked to transmit
by pktgen, but a raw socket in user-space can send on them fine.  I'm 
going to add some debugging
for this particular issue tomorrow to try to figure out why that happens.

Please note I have the rest of my network patches applied (but not using 
any proprietary modules),
so it could easily be something I've caused.  I think fixing lockdep to 
work with the txq _bh locks
would be a good first step to fixing this..

Thanks,
Ben

-- 
Ben Greear <greearb@...delatech.com> 
Candela Technologies Inc  http://www.candelatech.com


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ