[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <606676310904302119w92e610crbdb7d5d824e6ed01@mail.gmail.com>
Date: Thu, 30 Apr 2009 21:19:36 -0700
From: Andrew Dickinson <andrew@...dna.net>
To: David Miller <davem@...emloft.net>
Cc: jelaas@...il.com, netdev@...r.kernel.org
Subject: Re: tx queue hashing hot-spots and poor performance (multiq, ixgbe)
Adding a bit more info...
I should add, the other 4 ksoftirqd tasklets _are_ running, they're
just not busy. (In case that wasn't clear...)
Also of note, I rebooted the box (after recompiling with NUMA off).
This time when I push traffic through, only the even-ksoftirqd's were
busy.. I then tweaked some of the ring settings via ethtool and
suddenly the odd-ksoftirqd's became busy (and the even ones went
idle).
Thoughts? Suggestions? driver issue? I'm at 2.6.30-rc3.
(BTW, I'm under the assumption that since only 4 (of 8) ksoftirqd's
are busy that I still have room to make this box go faster).
-A
On Thu, Apr 30, 2009 at 4:53 PM, Andrew Dickinson <andrew@...dna.net> wrote:
> OK... I've got some more data on it...
>
> I passed a small number of packets through the system and added a ton
> of printks to it ;-P
>
> Here's the distribution of values as seen by
> skb_rx_queue_recorded()... count on the left, value on the right:
> 37 0
> 31 1
> 31 2
> 39 3
> 37 4
> 31 5
> 42 6
> 39 7
>
> That's nice and even.... Here's what's getting returned from the
> skb_tx_hash(). Again, count on the left, value on the right:
> 31 0
> 81 1
> 37 2
> 70 3
> 37 4
> 31 6
>
> Note that we're entirely missing 5 and 7 and that those interrupts
> seem to have gotten munged onto 1 and 3.
>
> I think the voodoo lies within:
> return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
>
> David, I made the change that you suggested:
> //hash = skb_get_rx_queue(skb);
> return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
>
> And now, I see a nice even mixing of interrupts on the TX side (yay!).
>
> However, my problem's not solved entirely... here's what top is showing me:
> top - 23:37:49 up 9 min, 1 user, load average: 3.93, 2.68, 1.21
> Tasks: 119 total, 5 running, 114 sleeping, 0 stopped, 0 zombie
> Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st
> Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 4.3%hi, 95.7%si, 0.0%st
> Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
> Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 4.3%hi, 95.7%si, 0.0%st
> Cpu4 : 0.0%us, 0.0%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.3%hi, 0.3%si, 0.0%st
> Cpu5 : 0.0%us, 0.0%sy, 0.0%ni, 2.0%id, 0.0%wa, 4.0%hi, 94.0%si, 0.0%st
> Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Cpu7 : 0.0%us, 0.0%sy, 0.0%ni, 5.6%id, 0.0%wa, 2.3%hi, 92.1%si, 0.0%st
> Mem: 16403476k total, 335884k used, 16067592k free, 10108k buffers
> Swap: 2096472k total, 0k used, 2096472k free, 146364k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 7 root 15 -5 0 0 0 R 100.2 0.0 5:35.24
> ksoftirqd/1
> 13 root 15 -5 0 0 0 R 100.2 0.0 5:36.98
> ksoftirqd/3
> 19 root 15 -5 0 0 0 R 97.8 0.0 5:34.52
> ksoftirqd/5
> 25 root 15 -5 0 0 0 R 94.5 0.0 5:13.56
> ksoftirqd/7
> 3905 root 20 0 12612 1084 820 R 0.3 0.0 0:00.14 top
> <snip>
>
>
> It appears that only the odd CPUs are actually handling the
> interrupts, which doesn't jive with what /proc/interrupts shows me:
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
> 66: 2970565 0 0 0 0
> 0 0 0 PCI-MSI-edge eth2-rx-0
> 67: 28 821122 0 0 0
> 0 0 0 PCI-MSI-edge eth2-rx-1
> 68: 28 0 2943299 0 0
> 0 0 0 PCI-MSI-edge eth2-rx-2
> 69: 28 0 0 817776 0
> 0 0 0 PCI-MSI-edge eth2-rx-3
> 70: 28 0 0 0 2963924
> 0 0 0 PCI-MSI-edge eth2-rx-4
> 71: 28 0 0 0 0
> 821032 0 0 PCI-MSI-edge eth2-rx-5
> 72: 28 0 0 0 0
> 0 2979987 0 PCI-MSI-edge eth2-rx-6
> 73: 28 0 0 0 0
> 0 0 845422 PCI-MSI-edge eth2-rx-7
> 74: 4664732 0 0 0 0
> 0 0 0 PCI-MSI-edge eth2-tx-0
> 75: 34 4679312 0 0 0
> 0 0 0 PCI-MSI-edge eth2-tx-1
> 76: 28 0 4665014 0 0
> 0 0 0 PCI-MSI-edge eth2-tx-2
> 77: 28 0 0 4681531 0
> 0 0 0 PCI-MSI-edge eth2-tx-3
> 78: 28 0 0 0 4665793
> 0 0 0 PCI-MSI-edge eth2-tx-4
> 79: 28 0 0 0 0
> 4671596 0 0 PCI-MSI-edge eth2-tx-5
> 80: 28 0 0 0 0
> 0 4665279 0 PCI-MSI-edge eth2-tx-6
> 81: 28 0 0 0 0
> 0 0 4664504 PCI-MSI-edge eth2-tx-7
> 82: 2 0 0 0 0
> 0 0 0 PCI-MSI-edge eth2:lsc
>
>
> Why would ksoftirqd only run on half of the cores (and only the odd
> ones to boot)? The one commonality that's striking me is that that
> all the odd CPU#'s are on the same physical processor:
>
> -bash-3.2# cat /proc/cpuinfo | grep -E '(physical|processor)' | grep -v virtual
> processor : 0
> physical id : 0
> processor : 1
> physical id : 1
> processor : 2
> physical id : 0
> processor : 3
> physical id : 1
> processor : 4
> physical id : 0
> processor : 5
> physical id : 1
> processor : 6
> physical id : 0
> processor : 7
> physical id : 1
>
> I did compile the kernel with NUMA support... am I being bitten by
> something there? Other thoughts on where I should look.
>
> Also... is there an incantation to get NAPI to work in the torvalds
> kernel? As you can see, I'm generating quite a few interrrupts.
>
> -A
>
>
> On Thu, Apr 30, 2009 at 7:08 AM, David Miller <davem@...emloft.net> wrote:
>> From: Andrew Dickinson <andrew@...dna.net>
>> Date: Thu, 30 Apr 2009 07:04:33 -0700
>>
>>> I'll do some debugging around skb_tx_hash() and see if I can make
>>> sense of it. I'll let you know what I find. My hypothesis is that
>>> skb_record_rx_queue() isn't being called, but I should dig into it
>>> before I start making claims. ;-P
>>
>> That's one possibility.
>>
>> Another is that the hashing isn't working out. One way to
>> play with that is to simply replace the:
>>
>> hash = skb_get_rx_queue(skb);
>>
>> in skb_tx_hash() with something like:
>>
>> return skb_get_rx_queue(skb) % dev->real_num_tx_queues;
>>
>> and see if that improves the situation.
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists