netdev - Re: XPS configuration question (on tg3)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aa6714ed-73f4-4416-53fa-81babe9916fa@ziu.info>
Date:   Wed, 7 Sep 2016 09:13:35 +0200
From:   Michal Soltys <soltys@....info>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     Alexander Duyck <alexander.duyck@...il.com>,
        Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: XPS configuration question (on tg3)

On 2016-09-07 02:19, Eric Dumazet wrote:
> On Tue, 2016-09-06 at 23:00 +0200, Michal Soltys wrote:
>> On 2016-09-06 22:21, Alexander Duyck wrote:
>> > On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys <soltys@....info> wrote:
>> >> Hi,
>> >>
>> >> I've been testing different configurations and I didn't manage to get XPS to "behave" correctly - so I'm probably misunderstanding or forgetting something. The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was unused (this got me scratching my head a bit) and the remaining one was for the last rx (though due to another bug recently fixed the 4th rx queue was inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
>> >>
>> >> The XPS was configured as:
>> >>
>> >> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
>> >> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
>> >> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
>> >>
>> >> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 4-7 tx-1 and 8-15 tx-2.
>> >>
>> >> Just in case rx side could get in the way as far as flows go, relevant irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling into groups defined by the above masks.
>> >>
>> >> I tested both with mq and multiq scheduler, essentially either this:
>> >>
>> >> qdisc mq 2: root
>> >> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >>
>> >> or this (for the record, skbaction queue_mapping was behaving correctly with the one below):
>> >>
>> >> qdisc multiq 3: root refcnt 6 bands 3/5
>> >> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >>
>> >> Now, do I understand correctly, that under the above setup - commands such as
>> >>
>> >> taskset 400 nc -p $prt host_ip 12345 </dev/zero
>> >> or
>> >> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
>> >>
>> >> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports affinity by itself) and sending data to some other host on the net - should *always* use tx-2 queue ?
>> >> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 </dev/zero (just in case taskset was "too late" with the affinity).
>> >>
>> >> In my case, what queue it used was basically random (on top of that it sometimes changed the used queue mid-transfer) what could be easily confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss now, as I though xps configuration should be absolute.
>> >>
>> >> Well, I'd be greatful for some pointers / hints.
>> > 
>> > So it sounds like you have everything configured correctly.  The one
>> > question I would have is if we are certain the CPU pinning is working
>> > for the application.  You might try using something like perf to
>> > verify what is running on CPU 10, and what is running on the CPUs that
>> > the queues are associated with.
>> > 
>> 
>> I did verify with 'top' in this case. I'll double check tommorow just to
>> be sure. Other than testing, there was nothing else running on the machine.
>> 
>> > Also after you have configured things you may want to double check and
>> > verify the xps_cpus value is still set.  I know under some
>> > circumstances the value can be reset by a device driver if the number
>> > of queues changes, or if the interface toggles between being
>> > administratively up/down.
>> 
>> Hmm, none of this was happening during tests.
>> 
>> Are there any other circumstances where xps settings could be ignored or
>> changed during the test (that is during the actual transfer, not between
>> separate attempts) ?
>> 
>> One thing I'm a bit afraid is that kernel was not exactly the newest
>> (3.16), maybe I'm missing some crucial fixes, though xps was added much
>> earlier than that. Either way, I'll try to redo tests with current
>> kernel tommorow.
>> 
> 
> Keep in mind that TCP stack can send packets, responding to incoming
> ACK.
> 
> So you might check that incoming ACK are handled by the 'right' cpu.
> 
> Without RFS, there is no such guarantee.
> 
> echo 32768 >/proc/sys/net/core/rps_sock_flow_entries
> echo 8192 >/sys/class/net/eth1/queues/rx-0/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-1/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-2/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-3/rps_flow_cnt
> 

I do need to enable RPS as well before RFS can take any effect
(queues/rx-.../rps_cpus), right ?