netdev - Re: XPS configuration question (on tg3)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 06 Sep 2016 17:19:23 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Michal Soltys <soltys@....info>
Cc:     Alexander Duyck <alexander.duyck@...il.com>,
        Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: XPS configuration question (on tg3)

On Tue, 2016-09-06 at 23:00 +0200, Michal Soltys wrote:
> On 2016-09-06 22:21, Alexander Duyck wrote:
> > On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys <soltys@....info> wrote:
> >> Hi,
> >>
> >> I've been testing different configurations and I didn't manage to get XPS to "behave" correctly - so I'm probably misunderstanding or forgetting something. The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was unused (this got me scratching my head a bit) and the remaining one was for the last rx (though due to another bug recently fixed the 4th rx queue was inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
> >>
> >> The XPS was configured as:
> >>
> >> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
> >> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
> >> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
> >>
> >> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 4-7 tx-1 and 8-15 tx-2.
> >>
> >> Just in case rx side could get in the way as far as flows go, relevant irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling into groups defined by the above masks.
> >>
> >> I tested both with mq and multiq scheduler, essentially either this:
> >>
> >> qdisc mq 2: root
> >> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >>
> >> or this (for the record, skbaction queue_mapping was behaving correctly with the one below):
> >>
> >> qdisc multiq 3: root refcnt 6 bands 3/5
> >> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >>
> >> Now, do I understand correctly, that under the above setup - commands such as
> >>
> >> taskset 400 nc -p $prt host_ip 12345 </dev/zero
> >> or
> >> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
> >>
> >> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports affinity by itself) and sending data to some other host on the net - should *always* use tx-2 queue ?
> >> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 </dev/zero (just in case taskset was "too late" with the affinity).
> >>
> >> In my case, what queue it used was basically random (on top of that it sometimes changed the used queue mid-transfer) what could be easily confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss now, as I though xps configuration should be absolute.
> >>
> >> Well, I'd be greatful for some pointers / hints.
> > 
> > So it sounds like you have everything configured correctly.  The one
> > question I would have is if we are certain the CPU pinning is working
> > for the application.  You might try using something like perf to
> > verify what is running on CPU 10, and what is running on the CPUs that
> > the queues are associated with.
> > 
> 
> I did verify with 'top' in this case. I'll double check tommorow just to
> be sure. Other than testing, there was nothing else running on the machine.
> 
> > Also after you have configured things you may want to double check and
> > verify the xps_cpus value is still set.  I know under some
> > circumstances the value can be reset by a device driver if the number
> > of queues changes, or if the interface toggles between being
> > administratively up/down.
> 
> Hmm, none of this was happening during tests.
> 
> Are there any other circumstances where xps settings could be ignored or
> changed during the test (that is during the actual transfer, not between
> separate attempts) ?
> 
> One thing I'm a bit afraid is that kernel was not exactly the newest
> (3.16), maybe I'm missing some crucial fixes, though xps was added much
> earlier than that. Either way, I'll try to redo tests with current
> kernel tommorow.
> 

Keep in mind that TCP stack can send packets, responding to incoming
ACK.

So you might check that incoming ACK are handled by the 'right' cpu.

Without RFS, there is no such guarantee.

echo 32768 >/proc/sys/net/core/rps_sock_flow_entries
echo 8192 >/sys/class/net/eth1/queues/rx-0/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-1/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-2/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-3/rps_flow_cnt