[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C3WTSYT5IONB.21JUB7GA3HBW0@wkz-x280>
Date: Fri, 03 Jul 2020 09:55:22 +0200
From: "Tobias Waldekranz" <tobias@...dekranz.com>
To: "Andy Duan" <fugang.duan@....com>,
"davem@...emloft.net" <davem@...emloft.net>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [EXT] [PATCH v2 net] net: ethernet: fec: prevent tx starvation
under high rx load
On Fri Jul 3, 2020 at 4:45 AM CEST, Andy Duan wrote:
> From: Tobias Waldekranz <tobias@...dekranz.com> Sent: Friday, July 3,
> 2020 4:58 AM
> > In the ISR, we poll the event register for the queues in need of service and
> > then enter polled mode. After this point, the event register will never be read
> > again until we exit polled mode.
> >
> > In a scenario where a UDP flow is routed back out through the same interface,
> > i.e. "router-on-a-stick" we'll typically only see an rx queue event initially.
> > Once we start to process the incoming flow we'll be locked polled mode, but
> > we'll never clean the tx rings since that event is never caught.
> >
> > Eventually the netdev watchdog will trip, causing all buffers to be dropped and
> > then the process starts over again.
> >
> > Rework the NAPI poll to keep trying to consome the entire budget as long as
> > new events are coming in, making sure to service all rx/tx queues, in priority
> > order, on each pass.
> >
> > Fixes: 4d494cdc92b3 ("net: fec: change data structure to support
> > multiqueue")
> > Signed-off-by: Tobias Waldekranz <tobias@...dekranz.com>
> > ---
> >
> > v1 -> v2:
> > * Always do a full pass over all rx/tx queues as soon as any event is
> > received, as suggested by David.
> > * Keep dequeuing packets as long as events keep coming in and we're
> > under budget.
> >
> > drivers/net/ethernet/freescale/fec.h | 5 --
> > drivers/net/ethernet/freescale/fec_main.c | 94 ++++++++---------------
> > 2 files changed, 31 insertions(+), 68 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/freescale/fec.h
> > b/drivers/net/ethernet/freescale/fec.h
> > index a6cdd5b61921..d8d76da51c5e 100644
> > --- a/drivers/net/ethernet/freescale/fec.h
> > +++ b/drivers/net/ethernet/freescale/fec.h
> > @@ -525,11 +525,6 @@ struct fec_enet_private {
> > unsigned int total_tx_ring_size;
> > unsigned int total_rx_ring_size;
> >
> > - unsigned long work_tx;
> > - unsigned long work_rx;
> > - unsigned long work_ts;
> > - unsigned long work_mdio;
> > -
> > struct platform_device *pdev;
> >
> > int dev_id;
> > diff --git a/drivers/net/ethernet/freescale/fec_main.c
> > b/drivers/net/ethernet/freescale/fec_main.c
> > index 2d0d313ee7c5..84589d464850 100644
> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -75,8 +75,6 @@ static void fec_enet_itr_coal_init(struct net_device
> > *ndev);
> >
> > #define DRIVER_NAME "fec"
> >
> > -#define FEC_ENET_GET_QUQUE(_x) ((_x == 0) ? 1 : ((_x == 1) ? 2 : 0))
> > -
> > /* Pause frame feild and FIFO threshold */
> > #define FEC_ENET_FCE (1 << 5)
> > #define FEC_ENET_RSEM_V 0x84
> > @@ -1248,8 +1246,6 @@ fec_enet_tx_queue(struct net_device *ndev, u16
> > queue_id)
> >
> > fep = netdev_priv(ndev);
> >
> > - queue_id = FEC_ENET_GET_QUQUE(queue_id);
> > -
> > txq = fep->tx_queue[queue_id];
> > /* get next bdp of dirty_tx */
> > nq = netdev_get_tx_queue(ndev, queue_id); @@ -1340,17
> > +1336,14 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id)
> > writel(0, txq->bd.reg_desc_active); }
> >
> > -static void
> > -fec_enet_tx(struct net_device *ndev)
> > +static void fec_enet_tx(struct net_device *ndev)
> > {
> > struct fec_enet_private *fep = netdev_priv(ndev);
> > - u16 queue_id;
> > - /* First process class A queue, then Class B and Best Effort queue */
> > - for_each_set_bit(queue_id, &fep->work_tx,
> > FEC_ENET_MAX_TX_QS) {
> > - clear_bit(queue_id, &fep->work_tx);
> > - fec_enet_tx_queue(ndev, queue_id);
> > - }
> > - return;
> > + int i;
> > +
> > + /* Make sure that AVB queues are processed first. */
> > + for (i = fep->num_rx_queues - 1; i >= 0; i--)
>
> In fact, you already change the queue priority comparing before.
> Before: queue1 (Audio) > queue2 (video) > queue0 (best effort)
> Now: queue2 (video) > queue1 (Audio) > queue0 (best effort)
Yes, thank you, I meant to ask about that. I was looking at these
definitions in fec.h:
#define RCMR_CMP_1 (RCMR_CMP_CFG(0, 0) | RCMR_CMP_CFG(1, 1) | \
RCMR_CMP_CFG(2, 2) | RCMR_CMP_CFG(3, 3))
#define RCMR_CMP_2 (RCMR_CMP_CFG(4, 0) | RCMR_CMP_CFG(5, 1) | \
RCMR_CMP_CFG(6, 2) | RCMR_CMP_CFG(7, 3))
I read that as PCP 0-3 being mapped to queue 1 and 4-7 to queue
2. That led me to believe that the order should be 2, 1, 0. Is the
driver supposed to prioritize PCP 0-3 over 4-7, or have I
misunderstood completely?
> Other logic seems fine, but you should run stress test to avoid any
> block issue since the driver cover more than 20 imx platforms.
I have run stress tests and I observe that we're dequeuing about as
many packets from each queue when the incoming line is filled with 1/3
each of untagged/tagged-pcp0/tagged-pcp7 traffic:
root@...oy:~# ply -c "sleep 2" '
t:net/napi_gro_receive_entry {
@[data->napi_id, data->queue_mapping] = count();
}'
ply: active
ply: deactivating
@:
{ 66, 3 }: 165811
{ 66, 2 }: 167733
{ 66, 1 }: 169470
It seems like this is due to "Receive flushing" not being enabled in
the FEC. If I manually enable it for queue 0, processing is restricted
to only queue 1 and 2:
root@...oy:~# devmem 0x30be01f0 32 $((1 << 3))
root@...oy:~# ply -c "sleep 2" '
t:net/napi_gro_receive_entry {
@[data->napi_id, data->queue_mapping] = count();
}'
ply: active
ply: deactivating
@:
{ 66, 2 }: 275055
{ 66, 3 }: 275870
Enabling flushing on queue 1, focuses all processing on queue 2:
root@...oy:~# devmem 0x30be01f0 32 $((3 << 3))
root@...oy:~# ply -c "sleep 2" '
t:net/napi_gro_receive_entry {
@[data->napi_id, data->queue_mapping] = count();
}'
ply: active
ply: deactivating
@:
{ 66, 3 }: 545442
Changing the default QoS settings feels like a separate change, but I
can submit a v3 as a series if you want?
I do not have access to a single-queue iMX device, would it be
possible for you to test this change on such a device?
Powered by blists - more mailing lists