netdev - RE: [EXT] [PATCH v2 net] net: ethernet: fec: prevent tx starvation under high rx load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR0402MB36075D91D8365FC9DD677F09FF6A0@AM6PR0402MB3607.eurprd04.prod.outlook.com>
Date:   Fri, 3 Jul 2020 09:58:18 +0000
From:   Andy Duan <fugang.duan@....com>
To:     Tobias Waldekranz <tobias@...dekranz.com>,
        "davem@...emloft.net" <davem@...emloft.net>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [EXT] [PATCH v2 net] net: ethernet: fec: prevent tx starvation
 under high rx load

From: Tobias Waldekranz <tobias@...dekranz.com> Sent: Friday, July 3, 2020 3:55 PM
> On Fri Jul 3, 2020 at 4:45 AM CEST, Andy Duan wrote:
> > From: Tobias Waldekranz <tobias@...dekranz.com> Sent: Friday, July 3,
> > 2020 4:58 AM
> > > In the ISR, we poll the event register for the queues in need of
> > > service and then enter polled mode. After this point, the event
> > > register will never be read again until we exit polled mode.
> > >
> > > In a scenario where a UDP flow is routed back out through the same
> > > interface, i.e. "router-on-a-stick" we'll typically only see an rx queue event
> initially.
> > > Once we start to process the incoming flow we'll be locked polled
> > > mode, but we'll never clean the tx rings since that event is never caught.
> > >
> > > Eventually the netdev watchdog will trip, causing all buffers to be
> > > dropped and then the process starts over again.
> > >
> > > Rework the NAPI poll to keep trying to consome the entire budget as
> > > long as new events are coming in, making sure to service all rx/tx
> > > queues, in priority order, on each pass.
> > >
> > > Fixes: 4d494cdc92b3 ("net: fec: change data structure to support
> > > multiqueue")
> > > Signed-off-by: Tobias Waldekranz <tobias@...dekranz.com>
> > > ---
> > >
> > > v1 -> v2:
> > > * Always do a full pass over all rx/tx queues as soon as any event is
> > >   received, as suggested by David.
> > > * Keep dequeuing packets as long as events keep coming in and we're
> > >   under budget.
> > >
> > >  drivers/net/ethernet/freescale/fec.h      |  5 --
> > >  drivers/net/ethernet/freescale/fec_main.c | 94
> > > ++++++++---------------
> > >  2 files changed, 31 insertions(+), 68 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/freescale/fec.h
> > > b/drivers/net/ethernet/freescale/fec.h
> > > index a6cdd5b61921..d8d76da51c5e 100644
> > > --- a/drivers/net/ethernet/freescale/fec.h
> > > +++ b/drivers/net/ethernet/freescale/fec.h
> > > @@ -525,11 +525,6 @@ struct fec_enet_private {
> > >         unsigned int total_tx_ring_size;
> > >         unsigned int total_rx_ring_size;
> > >
> > > -       unsigned long work_tx;
> > > -       unsigned long work_rx;
> > > -       unsigned long work_ts;
> > > -       unsigned long work_mdio;
> > > -
> > >         struct  platform_device *pdev;
> > >
> > >         int     dev_id;
> > > diff --git a/drivers/net/ethernet/freescale/fec_main.c
> > > b/drivers/net/ethernet/freescale/fec_main.c
> > > index 2d0d313ee7c5..84589d464850 100644
> > > --- a/drivers/net/ethernet/freescale/fec_main.c
> > > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > > @@ -75,8 +75,6 @@ static void fec_enet_itr_coal_init(struct
> > > net_device *ndev);
> > >
> > >  #define DRIVER_NAME    "fec"
> > >
> > > -#define FEC_ENET_GET_QUQUE(_x) ((_x == 0) ? 1 : ((_x == 1) ? 2 :
> > > 0))
> > > -
> > >  /* Pause frame feild and FIFO threshold */
> > >  #define FEC_ENET_FCE   (1 << 5)
> > >  #define FEC_ENET_RSEM_V        0x84
> > > @@ -1248,8 +1246,6 @@ fec_enet_tx_queue(struct net_device *ndev,
> u16
> > > queue_id)
> > >
> > >         fep = netdev_priv(ndev);
> > >
> > > -       queue_id = FEC_ENET_GET_QUQUE(queue_id);
> > > -
> > >         txq = fep->tx_queue[queue_id];
> > >         /* get next bdp of dirty_tx */
> > >         nq = netdev_get_tx_queue(ndev, queue_id); @@ -1340,17
> > > +1336,14 @@ fec_enet_tx_queue(struct net_device *ndev, u16
> queue_id)
> > >                 writel(0, txq->bd.reg_desc_active);  }
> > >
> > > -static void
> > > -fec_enet_tx(struct net_device *ndev)
> > > +static void fec_enet_tx(struct net_device *ndev)
> > >  {
> > >         struct fec_enet_private *fep = netdev_priv(ndev);
> > > -       u16 queue_id;
> > > -       /* First process class A queue, then Class B and Best Effort
> queue */
> > > -       for_each_set_bit(queue_id, &fep->work_tx,
> > > FEC_ENET_MAX_TX_QS) {
> > > -               clear_bit(queue_id, &fep->work_tx);
> > > -               fec_enet_tx_queue(ndev, queue_id);
> > > -       }
> > > -       return;
> > > +       int i;
> > > +
> > > +       /* Make sure that AVB queues are processed first. */
> > > +       for (i = fep->num_rx_queues - 1; i >= 0; i--)
> >
> > In fact, you already change the queue priority comparing before.
> > Before: queue1 (Audio) > queue2 (video) > queue0 (best effort)
> > Now: queue2 (video) > queue1 (Audio) > queue0 (best effort)
> 
> Yes, thank you, I meant to ask about that. I was looking at these definitions in
> fec.h:
> 
> #define RCMR_CMP_1              (RCMR_CMP_CFG(0, 0) |
> RCMR_CMP_CFG(1, 1) | \
>                                 RCMR_CMP_CFG(2, 2) |
> RCMR_CMP_CFG(3, 3))
> #define RCMR_CMP_2              (RCMR_CMP_CFG(4, 0) |
> RCMR_CMP_CFG(5, 1) | \
>                                 RCMR_CMP_CFG(6, 2) |
> RCMR_CMP_CFG(7, 3))
> 
> I read that as PCP 0-3 being mapped to queue 1 and 4-7 to queue 2. That led
> me to believe that the order should be 2, 1, 0. Is the driver supposed to
> prioritize PCP 0-3 over 4-7, or have I misunderstood completely?

The configuration is for RX queues.
If consider PCP 0 is high priority, that does make sense: 2 > 1 > 0.
> 
> > Other logic seems fine, but you should run stress test to avoid any
> > block issue since the driver cover more than 20 imx platforms.
> 
> I have run stress tests and I observe that we're dequeuing about as many
> packets from each queue when the incoming line is filled with 1/3 each of
> untagged/tagged-pcp0/tagged-pcp7 traffic:
> 
> root@...oy:~# ply -c "sleep 2" '
> t:net/napi_gro_receive_entry {
>     @[data->napi_id, data->queue_mapping] = count(); }'
> ply: active
> ply: deactivating
> 
> @:
> { 66, 3 }: 165811
> { 66, 2 }: 167733
> { 66, 1 }: 169470
> 
> It seems like this is due to "Receive flushing" not being enabled in the FEC. If I
> manually enable it for queue 0, processing is restricted to only queue 1 and 2:
> 
> root@...oy:~# devmem 0x30be01f0 32 $((1 << 3)) root@...oy:~# ply -c
> "sleep 2" '
> t:net/napi_gro_receive_entry {
>     @[data->napi_id, data->queue_mapping] = count(); }'
> ply: active
> ply: deactivating
> 
> @:
> { 66, 2 }: 275055
> { 66, 3 }: 275870
> 
> Enabling flushing on queue 1, focuses all processing on queue 2:

Please don't enable flush, there have one IC issue.
NXP latest errata doc should includes the issue, but the flush issue was
fixed at imx8dxl platform.
> 
> root@...oy:~# devmem 0x30be01f0 32 $((3 << 3)) root@...oy:~# ply -c
> "sleep 2" '
> t:net/napi_gro_receive_entry {
>     @[data->napi_id, data->queue_mapping] = count(); }'
> ply: active
> ply: deactivating
> 
> @:
> { 66, 3 }: 545442
> 
> Changing the default QoS settings feels like a separate change, but I can
> submit a v3 as a series if you want?

I think the version is fine.  No need to submit separate change.
> 
> I do not have access to a single-queue iMX device, would it be possible for you
> to test this change on such a device?

Yes, I will do stress test on imx8 and legacy platform like imx6 with single-queue,
try to avoid any block issue.
Thank you !