netdev - Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20070509.000339.93384349.davem@davemloft.net>
Date:	Wed, 09 May 2007 00:03:39 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	ilpo.jarvinen@...sinki.fi
Cc:	linville@...driver.com, akpm@...l.org, jeff@...zik.org,
	billfink@...dspring.com, cebbert@...hat.com, netdev@...r.kernel.org
Subject: Re: Unexcepted latency (order of 100-200 ms) with TCP (packet
 receive)

From: "Ilpo_Järvinen" <ilpo.jarvinen@...sinki.fi>
Date: Mon, 7 May 2007 14:45:53 +0300 (EEST)

> I found out this shows up when 900fd17dd01d2c99dfd1ec0b53a860894a2673ee is 
> included, kernels before that seem to work fine. The problem description 
> is in this same thread. I'm not going to repeat it here as it is valid 
> except for the fact that my original claim that it happens with another 
> hardware is false, please see it in here:
> 
> http://marc.info/?l=linux-netdev&m=117758295411277&w=2
> 
> Should these delays I see be considered as evindence of mmio not working 
> with this card/model or could there be something else wrong somewhere? 
> ...I can just disable it using global_use_mmio parameter, which removes 
> the problem in 2.6.20.7 (tested) but just in case somebody has more clue 
> than I do about this, I'm willing to do more testing...

One thing that MMIO changes a lot is timing.

The port I/O instructions on x86 have fixed timings so take the same
amount of time to execute regardless of the underlying bus
capabilities.

This could have been masking some issue in the driver or the hardware.

Port I/O also does not have any write posting issues like MMIO does.

In my opinion it can only cause trouble to start using MMIO on such
old chips when we've been using port I/O to access them for 5+ years.
:-)

Wait, it's timing, I see the bug.  There was a similar problem like
this in the Linux floppy driver some 6 years ago.

Look at issue_and_wait(), how it polls:

	iowrite16(cmd, ioaddr + EL3_CMD);
	for (i = 0; i < 2000; i++) {
		if (!(ioread16(ioaddr + EL3_STATUS) & CmdInProgress))
			return;
	}

That takes longer to run with port I/O than MMIO.  So I bet it
breaks out of the loop faster with MMIO and thus can trigger
this thing:

	/* OK, that didn't work.  Do it the slow way.  One second */
	for (i = 0; i < 100000; i++) {
		if (!(ioread16(ioaddr + EL3_STATUS) & CmdInProgress)) {
			if (vortex_debug > 1)
				printk(KERN_INFO "%s: command 0x%04x took %d usecs\n",
					   dev->name, cmd, i * 10);
			return;
		}
		udelay(10);
	}

and that's where your delays are coming from.

I would suggest adding some kind of (very small) fixed delay to the
first loop.

The rest of the driver should be audited for this kind of problem.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html