lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d535d5142e42b8c550f0220200e3779d@dlink.ru>
Date:   Mon, 25 Nov 2019 15:02:24 +0300
From:   Alexander Lobakin <alobakin@...nk.ru>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Johannes Berg <johannes@...solutions.net>,
        Edward Cree <ecree@...arflare.com>,
        Nicholas Johnson <nicholas.johnson-opensource@...look.com.au>,
        David Miller <davem@...emloft.net>, jiri@...lanox.com,
        edumazet@...gle.com, idosch@...lanox.com, petrm@...lanox.com,
        sd@...asysnail.net, f.fainelli@...il.com,
        jaswinder.singh@...aro.org, ilias.apalodimas@...aro.org,
        linux-kernel@...r.kernel.org, emmanuel.grumbach@...el.com,
        luciano.coelho@...el.com, linuxwifi@...el.com,
        kvalo@...eaurora.org, netdev@...r.kernel.org,
        linux-wireless@...r.kernel.org
Subject: Re: [PATCH v2 net-next] net: core: use listified Rx for GRO_NORMAL in
 napi_gro_receive()

Paolo Abeni wrote 25.11.2019 14:42:
> On Mon, 2019-11-25 at 12:05 +0100, Johannes Berg wrote:
>> On Mon, 2019-11-25 at 13:58 +0300, Alexander Lobakin wrote:
>> > Edward Cree wrote 25.11.2019 13:31:
>> > > On 25/11/2019 09:09, Nicholas Johnson wrote:
>> > > > The default value of /proc/sys/net/core/gro_normal_batch was 8.
>> > > > Setting it to 1 allowed it to connect to Wi-Fi network.
>> > > >
>> > > > Setting it back to 8 did not kill the connection.
>> > > >
>> > > > But when I disconnected and tried to reconnect, it did not re-connect.
>> > > >
>> > > > Hence, it appears that the problem only affects the initial handshake
>> > > > when associating with a network, and not normal packet flow.
>> > > That sounds like the GRO batch isn't getting flushed at the endof the
>> > >  NAPI — maybe the driver isn't calling napi_complete_done() at the
>> > >  appropriate time?
>> >
>> > Yes, this was the first reason I thought about, but didn't look at
>> > iwlwifi yet. I already knew this driver has some tricky parts, but
>> > this 'fake NAPI' solution seems rather strange to me.
>> 
>> Truth be told, we kinda just fudged it until we got GRO, since that's
>> what we really want on wifi (to reduce the costly TCP ACKs if 
>> possible).
>> 
>> Maybe we should call napi_complete_done() instead? But as Edward noted
>> (below), we don't actually really do NAPI polling, we just fake it for
>> each interrupt since we will often get a lot of frames in one 
>> interrupt
>> if there's high throughput (A-MPDUs are basically coming in all at the
>> same time). I've never really looked too much at what exactly happens
>> here, beyond seeing the difference from GRO.
>> 
>> 
>> > > Indeed, from digging through the layers of iwlwifi I eventually get to
>> > >  iwl_pcie_rx_handle() which doesn't really have a NAPI poll (the
>> > >  napi->poll function is iwl_pcie_dummy_napi_poll() { WARN_ON(1);
>> > >  return 0; }) and instead calls napi_gro_flush() at the end of its RX
>> > >  handling.  Unfortunately, napi_gro_flush() is no longer enough,
>> > >  because it doesn't call gro_normal_list() so the packets on the
>> > >  GRO_NORMAL list just sit there indefinitely.
>> > >
>> > > It was seeing drivers calling napi_gro_flush() directly that had me
>> > >  worried in the first place about whether listifying napi_gro_receive()
>> > >  was safe and where the gro_normal_list() should go.
>> > > I wondered if other drivers that show up in [1] needed fixing with a
>> > >  gro_normal_list() next to their napi_gro_flush() call.  From a cursory
>> > >  check:
>> > > brocade/bna: has a real poller, calls napi_complete_done() so is OK.
>> > > cortina/gemini: calls napi_complete_done() straight after
>> > >  napi_gro_flush(), so is OK.
>> > > hisilicon/hns3: calls napi_complete(), so is _probably_ OK.
>> > > But it's far from clear to me why *any* of those drivers are calling
>> > >  napi_gro_flush() themselves...
>> >
>> > Agree. I mean, we _can_ handle this particular problem from networking
>> > core side, but from my point of view only rethinking driver's logic is
>> > the correct way to solve this and other issues that may potentionally
>> > appear in future.
>> 
>> Do tell what you think it should be doing :)
>> 
>> One additional wrinkle is that we have firmware notifications, command
>> completions and actual RX interleaved, so I think we do want to have
>> interrupts for the notifications and command completions?
> 
> I think it would be nice moving the iwlwifi driver to full/plain NAPI
> mode. The interrupt handler could keep processing extra work as it does
> now and queue real pkts on some internal queue, and than schedule the
> relevant napi, which in turn could process such queue in the napi poll
> method. Likely I missed tons of details and/or oversimplified it...

Yep, full NAPI is the best variant, but I may miss a lot too.

> For -net, I *think* something as dumb and hacky as the following could
> possibly work:
> ----
> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
> b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
> index 4bba6b8a863c..df82fad96cbb 100644
> --- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
> @@ -1527,7 +1527,7 @@ static void iwl_pcie_rx_handle(struct iwl_trans
> *trans, int queue)
>                 iwl_pcie_rxq_alloc_rbs(trans, GFP_ATOMIC, rxq);
> 
>         if (rxq->napi.poll)
> -               napi_gro_flush(&rxq->napi, false);
> +               napi_complete_done(&rxq->napi, 0);
> 
>         iwl_pcie_rxq_restock(trans, rxq);
>  }
> ---

napi_complete_done(napi, 0) has an equivalent static inline
napi_complete(napi). I'm not sure it will work without any issues
as iwlwifi doesn't _really_ turn NAPI into scheduling state.

I'm not very familiar with iwlwifi, but as a work around manual
napi_gro_flush() you can also manually flush napi->rx_list to
prevent packets from stalling:

diff -Naur a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
--- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c	2019-11-25 
14:55:03.610355230 +0300
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c	2019-11-25 
14:57:29.399556868 +0300
@@ -1526,8 +1526,16 @@
  	if (unlikely(emergency && count))
  		iwl_pcie_rxq_alloc_rbs(trans, GFP_ATOMIC, rxq);

-	if (rxq->napi.poll)
+	if (rxq->napi.poll) {
+		if (rxq->napi.rx_count) {
+			netif_receive_skb_list(&rxq->napi.rx_list);
+
+			INIT_LIST_HEAD(&rxq->napi.rx_list);
+			rxq->napi.rx_count = 0;
+		}
+
  		napi_gro_flush(&rxq->napi, false);
+	}

  	iwl_pcie_rxq_restock(trans, rxq);
  }

> Cheers,
> 
> Paolo

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ