linux-kernel - Re: [PATCH net-next v5 3/3] selftests: net: add netpoll basic functionality test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <k45gwtae6yx3cjy5kprctlnw4wox5fnfd5yjlczgpyu2cw5bsj@dol2der4ykat>
Date: Fri, 11 Jul 2025 08:51:19 -0700
From: Breno Leitao <leitao@...ian.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Andrew Lunn <andrew+netdev@...n.ch>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Paolo Abeni <pabeni@...hat.com>, Shuah Khan <shuah@...nel.org>, Simon Horman <horms@...nel.org>, 
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	Willem de Bruijn <willemdebruijn.kernel@...il.com>, bpf@...r.kernel.org, kernel-team@...a.com, 
	Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next v5 3/3] selftests: net: add netpoll basic
 functionality test

Hello Jakub,

On Thu, Jul 10, 2025 at 06:45:35PM -0700, Jakub Kicinski wrote:
> > +MAX_WRITES: int = 40
> 
> FWIW the test takes 25sec on our debug-heavy VMs right now.
> I think we can crank the writes quite a bit.. ?

Sure. I will increase it to 40. On my VMs I get more than 30 hits every
single time:

	2025-07-11 08:30:08,904 - DEBUG - BPFtrace output: {'hits': 30}
	2025-07-11 08:30:08,904 - DEBUG - MAPS coming from bpftrace = {'hits': 30}

> > +def ethtool_read_rx_tx_queue(interface_name: str) -> tuple[int, int]:
> > +    """
> > +    Read the number of RX and TX queues using ethtool. This will be used
> > +    to restore it after the test
> > +    """
> > +    rx_queue = 0
> > +    tx_queue = 0
> > +
> > +    try:
> > +        ethtool_result = ethtool(f"-g {interface_name}").stdout
> 
> json=True please and you'll get a dict, on CLI you can try:
> 
> ethtool --json -g eth0

Sure. I was parsing manually because some options do not have --json
format. 

	ethtool --json -l eth0
	ethtool: bad command line argument(s)

I haven't checked upstream, but, if this feature is upstream, is it worth
implementing it?

> > +        ethtool(f"-G {interface_name} rx {rx_val} tx {tx_val}")
> 
> This is setting _ring size_ not queue count.
> I suppose we want both, this and queue count to 1 (with ethtool -l / -L)
> The ring size of 1 is unlikely to work on real devices.
> I'd try setting it to 128 and 256 and if neither sticks just carry on
> with whatever was there.

Thanks. I creating a function to do it. Something as:

	def configure_network(ifname: str) -> None:
	"""Configure ring size and queue numbers"""

	# Set defined queues to 1 to force congestion
	prev_queues = ethtool_get_queues_cnt(ifname)
	logging.debug("RX/TX/combined queues: %s", prev_queues)
	# Only set the queues to 1 if they exists in the device. I.e, they are > 0
	ethtool_set_queues_cnt(ifname, tuple(1 if x > 0 else x for x in prev_queues))
	defer(ethtool_set_queues_cnt, ifname, prev_queues)

	# Try to set the ring size to some low value.
	# Do not fail if the hardware do not accepted desired values
	prev_ring_size = ethtool_get_ringsize(ifname)
	for size in [(1, 1), (128, 128), (256, 256)]:
		if ethtool_set_ringsize(ifname, size):
		# hardware accepted the desired ringsize
		logging.debug("Set RX/TX ringsize to: %s from %s", size, prev_ring_size)
		break
	defer(ethtool_set_ringsize, ifname, prev_ring_size)

> > +        "remote_ip": (
> > +            cfg.remote_addr_v["4"] if cfg.addr_ipver == "4" else cfg.remote_addr_v["6"]
> > +        ),
> 
> this is already done for you
> cfg.addr is either v4 or v6 depending on what was provided in the env

Ack!

> > +# toggle the interface up and down, to cause some congestion
> 
> Let's not do this, you're missing disruptive annotation and for many
> drivers NAPI is stopped before queues
> https://github.com/linux-netdev/nipa/wiki/Guidance-for-test-authors#ksft_disruptive

Sure. This is not needed to reproduce the issue.
I just put it in there in order to create more entropy. Anyway, removing
it.

> > +def main() -> None:
> > +    """Main function to run the test"""
> > +    netcons_load_module()
> > +    test_check_dependencies()
> > +    with NetDrvEpEnv(__file__, nsim_test=True) as cfg:
> 
> I think nsim_test=True will make the test run _only_ on netdevsim.
> But there's nothing netdevsim specific here right?
> You can remove the argument and let's have this run against real
> drivers, too?

Sure. that is our goal, by the end of the day. Being able to run it on
real hardware.


Thanks of the review. I will send an updated version soon, and we can
continue the discusion over there.
--breno