netdev - Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG+wggaUHeD5Vm_HF9m4+1SZZOUo_GeQroM+0kyA_qTC5c8=3Q@mail.gmail.com>
Date:	Sat, 8 Mar 2014 19:30:40 -0500
From:	Ming Chen <v.mingchen@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org, Erez Zadok <ezk@....cs.sunysb.edu>,
	Dean Hildebrand <dhildeb@...ibm.com>,
	Geoff Kuenning <geoff@...hmc.edu>
Subject: Re: [BUG?] ixgbe: only num_online_cpus() of the tx queues are enabled

Hi Eric,

Thanks for the suggestion. I believe "tc qdisc replace dev eth0 root
fq" can achieve fairness if we have only one queue. My understanding
is that we cannot directly apply FQ to a multiqueue devices, isn't it?
If we apply FQ separately to each tx queue, then what if we have one
flow (fl-0) in tx-0, but two flows (fl-1 and fl-2) in tx-1? With FQ,
the two flows in tx-1 should get the same bandwidth. But how about
fl-0 and fl-1?

Best,
Ming

On Sat, Mar 8, 2014 at 10:27 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Sat, 2014-03-08 at 01:13 -0500, Ming Chen wrote:
>> Hi,
>>
>> We have an Intel 82599EB dual-port 10GbE NIC, which has 128 tx queues
>> (64 per port and we used only one port). We found only 12 of the tx
>> queues are enabled, where 12 is number of CPUs of our system.
>>
>> We realized that, in the driver code, adapter->num_tx_queues (which
>> decides netdev->real_num_tx_queues) is indirectly set to "min_t(int,
>> IXGBE_MAX_RSS_INDICES, num_online_cpus())". It looks like the limit is
>> for RSS. But why tx queues is also set to the same as rx queues?
>>
>> The problem of having a small number of tx queues is high probability
>> of hash collision in skb_tx_hash(). If we have a small number of
>> long-lived data-intensive TCP flows, the hash collision can causes
>> unfairness. We found this problem during our benchmarking of NFS when
>> identical NFS clients are getting very different throughput when
>> reading a big file from the server. We call this problem Hash-Cast. If
>> interested, you can take a look at this poster:
>> http://www.fsl.cs.sunysb.edu/~mchen/fast14poster-hashcast-portrait.pdf
>>
>> Can anybody take a loot at this? It would be better to have all tx
>> queues enabled by default. If this is unlikely to happen, is there a
>> way to reconfigure the NIC so that we can use all tx queues if we
>> want?
>>
>> FYI, our kernel version is 3.12.0, but I found the same limit of tx
>> queues in the code of the latest kernel. I am counting the number of
>> enabled queues using "ls /sys/class/net/p3p1/queues| grep -c tx-"
>>
>> Best,
>
> Quite frankly, with a 1Gbe link, I would just use FQ and your problem
> would disappear.
>
> (I also use FQ with 40Gbe links if that matters)
>
> For a 1Gbe link, the following command is more than enough.
>
> tc qdisc replace dev eth0 root fq
>
> Also, following patch would probably help fairness. I'll submit an
> official and more complete patch later.
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index bc0fb0fc7552..296c201516d1 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1911,8 +1911,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
>                  * of queued bytes to ensure line rate.
>                  * One example is wifi aggregation (802.11 AMPDU)
>                  */
> -               limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
> -                             sk->sk_pacing_rate >> 10);
> +               limit = 2 * skb->truesize;
>
>                 if (atomic_read(&sk->sk_wmem_alloc) > limit) {
>                         set_bit(TSQ_THROTTLED, &tp->tsq_flags);
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html