netdev - Re: [PATCH net-next] net: ibm: replenish rx pool and poll less frequently

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <YLfbEjiu671HApgi@us.ibm.com>
Date:   Wed, 2 Jun 2021 12:25:06 -0700
From:   Sukadev Bhattiprolu <sukadev@...ux.ibm.com>
To:     Lijun Pan <lijunp213@...il.com>
Cc:     netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: ibm: replenish rx pool and poll less
 frequently

Lijun Pan [lijunp213@...il.com] wrote:
> The old mechanism replenishes rx pool even only one frames is processed in
> the poll function, which causes lots of overheads. The old mechanism

The soft lockup is not seen when replenishing a small number of buffers at
a time. Its only under some conditions when replenishing a _large_ number
at once - appears to be because the netdev_alloc_skb() calls collectively
take a long time.

Replenishing a small number at a time is not a problem.

> restarts polling until processed frames reaches the budget, which can
> cause the poll function to loop into restart_poll 63 times at most and to
> call replenish_rx_poll 63 times at most. This will cause soft lockup very
> easily. So, don't replenish too often, and don't goto restart_poll in each

The 64 is from the budget the system gave us. And for us to hit the goto
restart_loop:
	a. pending_scrq() in the while loop must not have a found a packet,
	   and
	b. by the time we replenished the pool, completed napi etc we must
	   have found a packet

For this to happen 64 times, we must find
	- exactly zero packets in a. and
	- exactly one packet in b, and
	- the tight sequence must occur 64 times.

IOW its more theoretical right?

Even if it did happen a handful of times, the only "overheads" in the
replenish are the netdev_alloc_skb() and the send-subcrq-indirect hcall.

The skb alloc cannot be avoided - we must do it now or in the future
anyway. The hcall is issued every 16 skbs. If we issue it for <16 skbs
it means the traffic is extremely low. No point optimizing for that.
Besides the hcalls are not very expensive.

There was a lot of testing done in Nov 2020 when the subcrq-indirect
hcall support was added. We would need to repeat that testing at the
least.

Thanks,

Sukadev