[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-KEcWuYrSofugYBc3kwWvQdFDsv3z_yWx5-JDzyMga1qw@mail.gmail.com>
Date: Fri, 14 Sep 2018 00:46:16 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Florian Fainelli <f.fainelli@...il.com>,
Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>, caleb.raitto@...il.com,
"Michael S. Tsirkin" <mst@...hat.com>,
"Jon Olson (Google Drive)" <jonolson@...gle.com>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next] virtio_net: ethtool tx napi configuration
On Thu, Sep 13, 2018 at 11:53 PM Jason Wang <jasowang@...hat.com> wrote:
>
>
>
> On 2018年09月14日 11:40, Willem de Bruijn wrote:
> > On Thu, Sep 13, 2018 at 11:27 PM Jason Wang <jasowang@...hat.com> wrote:
> >>
> >>
> >> On 2018年09月13日 22:58, Willem de Bruijn wrote:
> >>> On Thu, Sep 13, 2018 at 5:02 AM Jason Wang <jasowang@...hat.com> wrote:
> >>>>
> >>>> On 2018年09月13日 07:27, Willem de Bruijn wrote:
> >>>>> On Wed, Sep 12, 2018 at 3:11 PM Willem de Bruijn
> >>>>> <willemdebruijn.kernel@...il.com> wrote:
> >>>>>> On Wed, Sep 12, 2018 at 2:16 PM Florian Fainelli <f.fainelli@...il.com> wrote:
> >>>>>>> On 9/12/2018 11:07 AM, Willem de Bruijn wrote:
> >>>>>>>> On Wed, Sep 12, 2018 at 1:42 PM Florian Fainelli <f.fainelli@...il.com> wrote:
> >>>>>>>>> On 9/9/2018 3:44 PM, Willem de Bruijn wrote:
> >>>>>>>>>> From: Willem de Bruijn <willemb@...gle.com>
> >>>>>>>>>>
> >>>>>>>>>> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
> >>>>>>>>>> Interrupt moderation is currently not supported, so these accept and
> >>>>>>>>>> display the default settings of 0 usec and 1 frame.
> >>>>>>>>>>
> >>>>>>>>>> Toggle tx napi through a bit in tx-frames. So as to not interfere
> >>>>>>>>>> with possible future interrupt moderation, use bit 10, well outside
> >>>>>>>>>> the reasonable range of real interrupt moderation values.
> >>>>>>>>>>
> >>>>>>>>>> Changes are not atomic. The tx IRQ, napi BH and transmit path must
> >>>>>>>>>> be quiesced when switching modes. Only allow changing this setting
> >>>>>>>>>> when the device is down.
> >>>>>>>>> Humm, would not a private ethtool flag to switch TX NAPI on/off be more
> >>>>>>>>> appropriate rather than use the coalescing configuration API here?
> >>>>>>>> What do you mean by private ethtool flag? A new field in ethtool
> >>>>>>>> --features (-k)?
> >>>>>>> I meant using ethtool_drvinfo::n_priv_flags, ETH_SS_PRIV_FLAGS and then
> >>>>>>> ETHTOOL_GFPFLAGS and ETHTOOL_SPFLAGS to control the toggling of that
> >>>>>>> private flag. mlx5 has a number of privates flags for instance.
> >>>>>> Interesting, thanks! I was not at all aware of those ethtool flags.
> >>>>>> Am having a look. It definitely looks promising.
> >>>>> Okay, I made that change. That is indeed much cleaner, thanks.
> >>>>> Let me send the patch, initially as RFC.
> >>>>>
> >>>>> I've observed one issue where if we toggle the flag before bringing
> >>>>> up the device, it hits a kernel BUG at include/linux/netdevice.h:515
> >>>>>
> >>>>> BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> >>>> This reminds me that we need to check netif_running() before trying to
> >>>> enable and disable tx napi in ethtool_set_coalesce().
> >>> The first iteration of my patch checked IFF_UP and effectively
> >>> only allowed the change when not running. What do you mean
> >>> by need to check?
> >> I mean if device is not up, there's no need to toggle napi state and tx
> >> lock.
> >>
> >>> And to respond to the other follow-up notes at once:
> >>>
> >>>> Consider we may have interrupt moderation in the future, I tend to use
> >>>> set_coalesce. Otherwise we may need two steps to enable moderation:
> >>>>
> >>>> - tx-napi on
> >>>> - set_coalesce
> >>> FWIW, I don't care strongly whether we do this through coalesce or priv_flags.
> >> Ok.
> > Since you prefer coalesce, let's go with that (and a revision of your
> > latest patch).
>
> Good to know this.
>
> >>>>> + if (!napi_weight)
> >>>>> + virtqueue_enable_cb(vi->sq[i].vq);
> >>>> I don't get why we need to disable enable cb here.
> >>> To avoid entering no-napi mode with too few descriptors to
> >>> make progress and no way to get out of that state. This is a
> >>> pretty crude attempt at handling that, admittedly.
> >> But in this case, we will call enable_cb_delayed() and we will finally
> >> get a interrupt?
> > Right. It's a bit of a roundabout way to ensure that
> > netif_tx_wake_queue and thus eventually free_old_xmit_skbs are called.
> > It might make more sense to just wake the device without going through
> > an interrupt.
>
> I'm not sure I get this. If we don't enable tx napi, we tend to delay TX
> interrupt if we found the ring is about to full to avoid interrupt
> storm, so we're probably ok in this case.
I'm only concerned about the transition state when converting from
napi to no-napi when the queue is stopped and tx interrupt disabled.
With napi mode the interrupt is only disabled if napi is scheduled,
in which case it will eventually reenable the interrupt. But when
switching to no-napi mode in this state no progress will be made.
But it seems this cannot happen. When converting to no-napi
mode, set_coalesce waits for napi to complete in napi_disable.
So the interrupt should always start enabled when transitioning
into no-napi mode.
Powered by blists - more mailing lists