netdev - Re: tx queue hashing hot-spots and poor performance (multiq, ixgbe)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <606676310904302119w92e610crbdb7d5d824e6ed01@mail.gmail.com>
Date:	Thu, 30 Apr 2009 21:19:36 -0700
From:	Andrew Dickinson <andrew@...dna.net>
To:	David Miller <davem@...emloft.net>
Cc:	jelaas@...il.com, netdev@...r.kernel.org
Subject: Re: tx queue hashing hot-spots and poor performance (multiq, ixgbe)

Adding a bit more info...

I should add, the other 4 ksoftirqd tasklets _are_ running, they're
just not busy. (In case that wasn't clear...)

Also of note, I rebooted the box (after recompiling with NUMA off).
This time when I push traffic through, only the even-ksoftirqd's were
busy..  I then tweaked some of the ring settings via ethtool and
suddenly the odd-ksoftirqd's became busy (and the even ones went
idle).

Thoughts?  Suggestions?  driver issue?  I'm at 2.6.30-rc3.

(BTW, I'm under the assumption that since only 4 (of 8) ksoftirqd's
are busy that I still have room to make this box go faster).

-A


On Thu, Apr 30, 2009 at 4:53 PM, Andrew Dickinson <andrew@...dna.net> wrote:
> OK... I've got some more data on it...
>
> I passed a small number of packets through the system and added a ton
> of printks to it ;-P
>
> Here's the distribution of values as seen by
> skb_rx_queue_recorded()... count on the left, value on the right:
>     37 0
>     31 1
>     31 2
>     39 3
>     37 4
>     31 5
>     42 6
>     39 7
>
> That's nice and even....  Here's what's getting returned from the
> skb_tx_hash().  Again, count on the left, value on the right:
>     31 0
>     81 1
>     37 2
>     70 3
>     37 4
>     31 6
>
> Note that we're entirely missing 5 and 7 and that those interrupts
> seem to have gotten munged onto 1 and 3.
>
> I think the voodoo lies within:
>    return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
>
> David,  I made the change that you suggested:
>        //hash = skb_get_rx_queue(skb);
>        return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
>
> And now, I see a nice even mixing of interrupts on the TX side (yay!).
>
> However, my problem's not solved entirely... here's what top is showing me:
> top - 23:37:49 up 9 min,  1 user,  load average: 3.93, 2.68, 1.21
> Tasks: 119 total,   5 running, 114 sleeping,   0 stopped,   0 zombie
> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.3%hi,  0.3%si,  0.0%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  4.3%hi, 95.7%si,  0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  4.3%hi, 95.7%si,  0.0%st
> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.3%hi,  0.3%si,  0.0%st
> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,  2.0%id,  0.0%wa,  4.0%hi, 94.0%si,  0.0%st
> Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,  5.6%id,  0.0%wa,  2.3%hi, 92.1%si,  0.0%st
> Mem:  16403476k total,   335884k used, 16067592k free,    10108k buffers
> Swap:  2096472k total,        0k used,  2096472k free,   146364k cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>    7 root      15  -5     0    0    0 R 100.2  0.0   5:35.24
> ksoftirqd/1
>   13 root      15  -5     0    0    0 R 100.2  0.0   5:36.98
> ksoftirqd/3
>   19 root      15  -5     0    0    0 R 97.8  0.0   5:34.52
> ksoftirqd/5
>   25 root      15  -5     0    0    0 R 94.5  0.0   5:13.56
> ksoftirqd/7
>  3905 root      20   0 12612 1084  820 R  0.3  0.0   0:00.14 top
> <snip>
>
>
> It appears that only the odd CPUs are actually handling the
> interrupts, which doesn't jive with what /proc/interrupts shows me:
>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
>  66:    2970565          0          0          0          0
> 0          0          0   PCI-MSI-edge    eth2-rx-0
>  67:         28     821122          0          0          0
> 0          0          0   PCI-MSI-edge    eth2-rx-1
>  68:         28          0    2943299          0          0
> 0          0          0   PCI-MSI-edge    eth2-rx-2
>  69:         28          0          0     817776          0
> 0          0          0   PCI-MSI-edge    eth2-rx-3
>  70:         28          0          0          0    2963924
> 0          0          0   PCI-MSI-edge    eth2-rx-4
>  71:         28          0          0          0          0
> 821032          0          0   PCI-MSI-edge       eth2-rx-5
>  72:         28          0          0          0          0
> 0    2979987          0   PCI-MSI-edge    eth2-rx-6
>  73:         28          0          0          0          0
> 0          0     845422   PCI-MSI-edge    eth2-rx-7
>  74:    4664732          0          0          0          0
> 0          0          0   PCI-MSI-edge    eth2-tx-0
>  75:         34    4679312          0          0          0
> 0          0          0   PCI-MSI-edge    eth2-tx-1
>  76:         28          0    4665014          0          0
> 0          0          0   PCI-MSI-edge    eth2-tx-2
>  77:         28          0          0    4681531          0
> 0          0          0   PCI-MSI-edge    eth2-tx-3
>  78:         28          0          0          0    4665793
> 0          0          0   PCI-MSI-edge    eth2-tx-4
>  79:         28          0          0          0          0
> 4671596          0          0   PCI-MSI-edge      eth2-tx-5
>  80:         28          0          0          0          0
> 0    4665279          0   PCI-MSI-edge    eth2-tx-6
>  81:         28          0          0          0          0
> 0          0    4664504   PCI-MSI-edge    eth2-tx-7
>  82:          2          0          0          0          0
> 0          0          0   PCI-MSI-edge    eth2:lsc
>
>
> Why would ksoftirqd only run on half of the cores (and only the odd
> ones to boot)?  The one commonality that's striking me is that that
> all the odd CPU#'s are on the same physical processor:
>
> -bash-3.2# cat /proc/cpuinfo | grep -E '(physical|processor)' | grep -v virtual
> processor       : 0
> physical id     : 0
> processor       : 1
> physical id     : 1
> processor       : 2
> physical id     : 0
> processor       : 3
> physical id     : 1
> processor       : 4
> physical id     : 0
> processor       : 5
> physical id     : 1
> processor       : 6
> physical id     : 0
> processor       : 7
> physical id     : 1
>
> I did compile the kernel with NUMA support... am I being bitten by
> something there?  Other thoughts on where I should look.
>
> Also... is there an incantation to get NAPI to work in the torvalds
> kernel?  As you can see, I'm generating quite a few interrrupts.
>
> -A
>
>
> On Thu, Apr 30, 2009 at 7:08 AM, David Miller <davem@...emloft.net> wrote:
>> From: Andrew Dickinson <andrew@...dna.net>
>> Date: Thu, 30 Apr 2009 07:04:33 -0700
>>
>>>  I'll do some debugging around skb_tx_hash() and see if I can make
>>> sense of it.  I'll let you know what I find.  My hypothesis is that
>>> skb_record_rx_queue() isn't being called, but I should dig into it
>>> before I start making claims. ;-P
>>
>> That's one possibility.
>>
>> Another is that the hashing isn't working out.  One way to
>> play with that is to simply replace the:
>>
>>                hash = skb_get_rx_queue(skb);
>>
>> in skb_tx_hash() with something like:
>>
>>                return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
>>
>> and see if that improves the situation.
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html