linux-kernel - Re: questions on NAPI processing latency and dropped network packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <478B8473.6080506@nortel.com>
Date:	Mon, 14 Jan 2008 09:49:07 -0600
From:	"Chris Friesen" <cfriesen@...tel.com>
To:	Ray Lee <ray-lk@...rabbit.org>
CC:	netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: questions on NAPI processing latency and dropped network packets

Ray Lee wrote:
> On Jan 10, 2008 9:24 AM, Chris Friesen <cfriesen@...tel.com> wrote:

>>After a recent userspace app change, we've started seeing packets being
>>dropped by the ethernet hardware (e1000, NAPI is enabled).  The
>>error/dropped/fifo counts are going up in ethtool:

> Can you reproduce it with a simple userspace cpu hog? (Two, really,
> one per cpu.)
> Can you reproduce it with the newer e1000?

Hmm...good questions and I haven't checked either.  The first one is 
relatively straightforward.  The second is a bit trickier...last time I 
tried the latest e1000 driver the card wouldn't boot (we use netboot).

> Can you reproduce it with git head?

Unfortunately, I don't think I'll be able to try this.  We require 
kernel mods for our userspace to run, and I doubt I'd be able to get the 
time to port all the changes forward to git head.

> If the answer to the first one is yes, the last no, then bisect until
> you get a kernel that doesn't show the problem. Backport the fix,
> unless the fix happens to be CFS. However, I suspect that your
> userpace app is just starving the system from time to time.

It's conceivable that userspace is starving the kernel, but we have do 
about 45% idle on one cpu, and 7-10% idle on the other.

We also have an odd situation where on an initial test run after bootup 
we have 18-24% idle on cpu1, but resetting the test tool drops that to 
the 7-10% I mentioned above.

Based on profiling and instrumentation it seems like the cost of 
sctp_endpoint_lookup_assoc() more than triples, which means that the 
amount of time that bottom halves are disabled in that function also 
triples.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/