lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 18 Aug 2009 14:26:20 -0700
From:	Ron Mercer <ron.mercer@...gic.com>
To:	David Miller <davem@...emloft.net>
Cc:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [RFC net-next PATCH 0/4] qlge: Performance changes for qlge.

Dave,
Thanks for the quick feedback.  I will re-spin per my comments below.

> 
> > 1) Do TX completions in send path (with cleaner timer).
> 
> You should really do them in NAPI context.
> 
> When you do them from hardware interrupt context, they all
> get rescheduled into a softirq for the real SKB freeing
> work anyways.
> 
> So by doing it in NAPI poll, you're avoiding some needless
> overhead.
> 
> BTW, it's insanely confusing that there is a function called
> qlge_msix_tx_isr() that of all things does RX work :-/
> 

I tried to do the patch series as a logical progression but might have
made it more confusing.  Patch 1 moves TX completion processing to the
hardware interrupt context (as you pointed out).  Patch 2 moves it from
interrupt context to the send path as many drivers do.
It wasn't my intention to do the processinging in the ISR.  Sorry about
the confusion.


> > 2) Change RSS queue count to match MSIx vector count instead
> >    of CPU count.  Some platforms didn't offer enough vectors
> >    for our previous approach.
> 
> Ideally you want "max(num_msix_vectors, num_cpus)" because
> if you hook up more MSIX vectors than you have cpus it's just
> extra overhead and depending upon the descrepency between the
> two counts it might unevenly distribute traffic work amongst
> the cpus.

I think you mean "min(num_msix_vectors, num_cpus)".  That is what I'm
trying to do in the patch.  I will clean it up and improve comments before I
resubmit.

> 
> > 3) Change large RX buffer logic to use either multiple pages
> >    or chunks of pages based on MTU and system page size.
> > 
> >    Examples:
> > 
> >    64k Pages with 1500 MTU.  The RX buffers size would be
> >    2048 bytes and there would be 32 per page.
> > 
> >    4k pages with 9000 MTU.  The RX buffer size would be 16k,
> >    or 4 pages per buffer.
> 
> This is wasteful, does the card have a mechnism by which it
> can dynamically carve up pages depending upon the actual
> frame size?
> 
> If anything, make sure that skb->truesize gets set to something
> reasonable, or else TCP is going to reallocate SKBs when the
> receive queue limits are hit.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ