netdev - Re: 2.6.24 BUG: soft lockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <47ED241F.9080003@sun.com>
Date:	Fri, 28 Mar 2008 10:00:15 -0700
From:	Matheos Worku <Matheos.Worku@....COM>
To:	hadi@...erus.ca
Cc:	Herbert Xu <herbert@...dor.apana.org.au>,
	David Miller <davem@...emloft.net>, jesse.brandeburg@...el.com,
	jarkao2@...il.com, netdev@...r.kernel.org
Subject: Re: 2.6.24 BUG: soft lockup - CPU#X

jamal wrote:
> On Thu, 2008-27-03 at 18:58 -0700, Matheos Worku wrote:
>
>   
>> In general, while the TX serialization  improves performance in terms to 
>> lock contention, wouldn't it reduce throughput since only one guy is 
>> doing the actual TX at any given time.  Wondering if it would be 
>> worthwhile to have an  enable/disable option specially for multi queue TX.
>>     
>
> Empirical evidence so far says at some point the bottleneck is going to
> be the wire i.e modern CPUs are "fast enough" that sooner than later
> they will fill up the DMA ring of transmitting driver and go back to
> doing other things. 
>   

> It is hard to create the condition you seem to have come across. I had access to a dual core opteron but found it very hard with parallel UDP
> sessions to keep the TX CPU locked in that region (while the other 3
> were busy pumping packets). My folly could have been that i had a Gige
> wire and maybe a 10G would have recreated the condition. 
> If you can reproduce this at will, can you try to reduce the number of
> sending TX u/iperfs and see when it begins to happen?
> Are all the iperfs destined out of the same netdevice?
>   
I am using 10G nic at this time. With the same driver, I haven't come 
across  the lockup on 1G nic though I haven't really tried to reproduce 
it.   Regarding the number of connection it takes to create the 
situation, I have noticed the lockup at 3 or more udp connections.  
Also, with TSO disabled, I have came across it with lots of TCP connections.


> [Typically the TX path on the driver side is inefficient either because
> of coding (ex: unnecessary locks) or expensive IO. But this has not
> mattered much thus far (given fast enough CPUs).
>   
That could be true  though oprofile is not providing obvious clues, 
alteast not yet.
> It all could be improved by reducing the per packet operations the
> driver incurs -  as an example, the CPU (to the driver) could batch a
> set of packet to the device then kick the device DMA once for the batch
> etc.]
>   
Regards
matheos

> cheers,
> jamal
>
>   

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html