lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 11 Jul 2014 14:11:55 +0000
From:	"Skidmore, Donald C" <donald.c.skidmore@...el.com>
To:	Carlos Carvalho <carlos@...ica.ufpr.br>,
	Flavio Leitner <fbl@...hat.com>
CC:	Tom Herbert <therbert@...gle.com>,
	Linux Netdev List <netdev@...r.kernel.org>
Subject: RE: RSS is not efficient when forwarding (ixgbe)



> -----Original Message-----
> From: Carlos Carvalho [mailto:carlos@...ica.ufpr.br]
> Sent: Thursday, July 10, 2014 5:12 PM
> To: Flavio Leitner
> Cc: Skidmore, Donald C; Tom Herbert; Linux Netdev List
> Subject: Re: RSS is not efficient when forwarding (ixgbe)
> 
> Flavio Leitner (fbl@...hat.com) wrote on 9 July 2014 22:14:
>  >On Wed, Jul 09, 2014 at 09:08:27PM -0300, Carlos Carvalho wrote:
>  >> Flavio Leitner (fbl@...hat.com) wrote on 9 July 2014 02:22:
>  >>  >On Tue, Jul 08, 2014 at 02:32:43PM -0300, Carlos Carvalho wrote:
>  >>  >> Flavio Leitner (fbl@...hat.com) wrote on 8 July 2014 14:21:
>  >>  >>  >On Tue, Jul 08, 2014 at 02:09:13PM -0300, Carlos Carvalho wrote:
>  >>  >>  >> Flavio Leitner (fbl@...hat.com) wrote on 7 July 2014 21:28:
>  >>  >>  >>  >On Mon, Jul 07, 2014 at 04:33:24PM +0000, Skidmore, Donald C
> wrote:
>  >>  >>  >>  >>
>  >>  >>  >>  >>
>  >>  >>  >>  >> >
>  >>  >>  >>  >> > It's a router forwarding traffic from one interface to another,
> so I guess it's  >>  >>  >>  >> > only the kernel. BTW, no firewall.
>  >>  >>  >>  >> >
>  >>  >>  >>  >> > Flow Director needs to be enabled and I am using defaults.
>  >>  >>  >>  >>
>  >>  >>  >>  >> Flow Director in ATR mode is on by default for ixgbe.  So like
> Tom mentioned the driver will create hash buckets for egress packets.  You
> could try disabling ATR and just use RSS.  Which would probably be the right
> thing to do any way since Flow Director isn't very useful for routing scenarios.
>  >>  >>  >>  >
>  >>  >>  >>  >That was it.
>  >>  >>  >>
>  >>  >>  >> We have a similar setup and similar problem. How do we disable
> ATR? I  >>  >>  >> tried to set ntuple off but this almost zeroed traffic. I also
> tried  >>  >>  >> to change rx-flow-hash but ethtool says it's not possible. The
> docs  >>  >>  >> say that one can disable ATR by setting AtrSampleRate to 0
> but this  >>  >>  >> parameter doesn't exist in 3.14.10.
>  >>  >>  >>
>  >>  >>  >> So, how do we disable ATR and keep RSS?
>  >>  >>  >
>  >>  >>  >Keep in mind that this is actually 2 problems. One is enabling the  >>
> >>  >NIC to receive the streams in all queues for this scenario (setting  >>  >>
> >ntuple off and restarting the traffic works for me). The second problem  >>
> >>  >is having all the queue interrupts spread among the CPUs. That's what
> >>  >>  >does irqbalance, tuna, etc...
>  >>  >>
>  >>  >> Spreading the interrupts among the cpus is not the issue for us. The
> >>  >> problem is that the number of interrupts is *very* different among
> the  >>  >> irq's, so no matter how I distribute them among cores there will
> >>  >> always be a few that get saturated while 70% of the machine capacity
> >>  >> remains idle. Your case seems to be the extreme of ours, where all  >>
> >> the flux goes to a single irq.
>  >>  >>
>  >>  >> The problem is in how the NIC distributes traffic among the irq's in  >>
> >> the router. Traffic comes almost only from a single machine and  >>  >>
> spreads through several thousand destinations in the internet. That's  >>  >>
> why I tried to set the receiving hash mode to the destination IP, but  >>  >>
> the NIC or driver refuses. So how do I even out the frequency of irq's?
>  >>  >
>  >>  >So you see the traffic going to a few queues only and the rest is  >>
> >idle, is that correct?  If so, then RSS seems to be working, but  >>  >since all
> the traffic comes from one server and likely one port,  >>  >maybe the hash
> is not good enough to distribute among all queues.
>  >>  >I'd try using software hashing instead of hw hashing to see if it  >>
> >helps:
>  >>  ># ethtool -K <iface> rxhash off
>  >>
>  >> I'm all for using software instead of hardware. However, in this case  >>
> this is a fundamental function of the NIC, to distribute the load  >> among
> cores; if we do it via software, a single core (or subset of  >> them) will have
> to do all the work. So in this particular case I think  >> the correct way is to try
> to do it in the NIC.
>  >
>  >Actually no, you can use RPS with software hashing to distribute the
> >workload.  Take a look at Documentation/networking/scaling.txt for  >more
> details.
> 
> Thanks for the pointer. I'll try that if there's no way to do it on the NIC.
> 
>  >>  >BTW, there was a typo in my previous post, I had to turn on ntuple to
> >>  >disable ATR.
>  >>
>  >> Ah. Here it was off by default, which contradicts what Donald said  >>
> above... I turned it on, and nothing changed(?!). The nic was reset  >> but
> the distribution among queues is the same.
>  >
>  >ntuple is about Perfect Filters and it's OFF by default which leaves  >ATR
> mode ON. In my case, ATR mode was responsible for directing all  >the
> packets to a single queue. Once I enable nutple, the driver  >disables ATR
> and that works for me.
> 
> There does seem to be some confusion about this. However this thread is
> getting long and since it makes no difference for us I'd rather focus on our
> problem.
> 
>  >> I checked now and in fact the distribution is almost constant among  >> 16
> IRQ's. That's the problem, because it leaves the other 24  >> cores idle. Not
> completely, but the difference is 4 orders of  >> magnitude: 5.41e+08
> interruptions in the active 16 IRQ's versus  >> 6.36e+04 in the others. So 60%
> of the machine is just contributing to  >> global warming and, importantly,
> limiting our performance :-(  >  >How many NIC queues do you have?  It
> sounds like you have only 16,  >so you're limited by the number of queues
> which maps to 16 CPUs.
> 
> 40 for each NIC. The driver seems to configure the number of queues equal
> to the number of cores; in another machine with the same NIC but
> 32 cores there are 32 queues.
> 
> Does hyperthreading make a difference? There are actually only 20 cores. It
> seems that hyperthreading doesn't help for NIC interrrupt processing, so the
> card/kernel could just be ignoring the virtual cores. However, in this case it
> should not allocate queues for the virtual cores. Also, the number of
> interruptions in the virtual cores should be zero, but it isn't.
> 
> Further, why doesn't the NIC use all of the 20 real cores? Is it limited to
> power of 2 cores?

Hey Guys,

This is getting a bit difficult to read so I will just reply down here. :)

The ixgbe driver by default runs with ATR and RSS enabled.   ATR is a form of flow director the other mode being perfect filter which is enabled, as mentioned above, by ntuple.  ATR stands for application targeted receive, which mean we (the driver) create a hash bucket on ever SYN packet transmitted and returning packets of that flow will be scheduled to the same queue.  Now since that packet could be created on any core we need to have one queue per core (including hyper threads).  Any received packet that does not match an ATR hash bucket is then scheduled via RSS.

I believe you mentioned you were doing forwarding traffic, as you can imagine ATR is not that useful for traffic not tied to user space.  As others suggested you could just turn off ATR and then all packets would be assigned a queue via RSS.  Our RSS hash produces an output that is 4bits long, so the most we can spread out the load is over 16 queues.  This is why you only see 16 interrupts in use if you only use RSS. 

If you wanted to use all the queue and thus all the cores you could create your own filtering rules with perfect filter.  This may be tricky as you would have to have a pretty good understanding of your flow patterns to balance the load evenly.

Thanks,
-Don Skidmore <donald.c.skidmore@...el.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ