linux-kernel - Re: questions on NAPI processing latency and dropped network packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <478B943C.7080009@cosmosbay.com>
Date:	Mon, 14 Jan 2008 17:56:28 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Chris Friesen <cfriesen@...tel.com>
Cc:	Ray Lee <ray-lk@...rabbit.org>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: questions on NAPI processing latency and dropped network packets

Chris Friesen a écrit :
> Ray Lee wrote:
>> On Jan 10, 2008 9:24 AM, Chris Friesen <cfriesen@...tel.com> wrote:
>
>>> After a recent userspace app change, we've started seeing packets being
>>> dropped by the ethernet hardware (e1000, NAPI is enabled).  The
>>> error/dropped/fifo counts are going up in ethtool:
>
>> Can you reproduce it with a simple userspace cpu hog? (Two, really,
>> one per cpu.)
>> Can you reproduce it with the newer e1000?
>
> Hmm...good questions and I haven't checked either.  The first one is 
> relatively straightforward.  The second is a bit trickier...last time 
> I tried the latest e1000 driver the card wouldn't boot (we use netboot).
>
>> Can you reproduce it with git head?
>
> Unfortunately, I don't think I'll be able to try this.  We require 
> kernel mods for our userspace to run, and I doubt I'd be able to get 
> the time to port all the changes forward to git head.
>
>> If the answer to the first one is yes, the last no, then bisect until
>> you get a kernel that doesn't show the problem. Backport the fix,
>> unless the fix happens to be CFS. However, I suspect that your
>> userpace app is just starving the system from time to time.
>
> It's conceivable that userspace is starving the kernel, but we have do 
> about 45% idle on one cpu, and 7-10% idle on the other.
>
> We also have an odd situation where on an initial test run after 
> bootup we have 18-24% idle on cpu1, but resetting the test tool drops 
> that to the 7-10% I mentioned above.
>
> Based on profiling and instrumentation it seems like the cost of 
> sctp_endpoint_lookup_assoc() more than triples, which means that the 
> amount of time that bottom halves are disabled in that function also 
> triples.
Any idea of the size of sctp hash size you have ?
(your dmesg probably includes a message starting with SCTP: Hash tables 
configured... 

How many concurrent sctp sockets are handled ?

Maybe sctp_assoc_hashfn() is too weak for your use, and some chains are 
*really* long.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/