[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHA+R7NQgYRG9oCXYB+H_7opcbebJCbpPq_UnYNf0bQ3fEWh+g@mail.gmail.com>
Date: Thu, 12 Mar 2015 16:48:11 -0700
From: Cong Wang <cwang@...pensource.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Willem de Bruijn <willemb@...gle.com>,
Nandita Dukkipati <nanditad@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH net-next] xps: fix xps for stacked devices
On Tue, Feb 3, 2015 at 11:48 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> From: Eric Dumazet <edumazet@...gle.com>
>
> A typical qdisc setup is the following :
>
> bond0 : bonding device, using HTB hierarchy
> eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc
>
> XPS allows to spread packets on specific tx queues, based on the cpu
> doing the send.
>
> Problem is that dequeues from bond0 qdisc can happen on random cpus,
> due to the fact that qdisc_run() can dequeue a batch of packets.
>
> CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
> CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0
>
> CPUB -> dequeue packet P1 from bond0
> enqueue packet on eth1/eth2
> CPUC -> dequeue packet P2 from bond0
> enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)
>
> get_xps_queue() then might select wrong queue for P1, since current cpu
> might be different than CPUA.
>
> P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
> if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)
>
I am trying to understand this, and wondering how possible CPUC or the CPU
whichever faster can dequeue P2 instead of P1? Since we always dequeue
from root qdisc and take the root qdisc lock before dequeue, the order should
be guaranteed?
Also, since CPUA enqueues P1 and P2, the root qdisc is supposed to schedule
on CPUA too, in which case it is CPUB or CPUC which can run dequeue()?
Looks like htb schedules the root qdisc by itself, but it schedules the work
on the same cpu with dequeue(), the hrtimer is pinned as well, so I don't
see the possibility.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists