lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 20 Aug 2009 11:03:09 +0200
From:	Krzysztof Halasa <khc@...waw.pl>
To:	walt@...mansrus.com
Cc:	David Miller <davem@...emloft.net>, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org
Subject: Re: Strange network timeouts w/ 2.6.30.5

> Since patching to 2.6.30.5 I'm experiencing periodic timeouts on my
> e100 which is used as my WAN interface on a server/router box. Nothing
> is reported in any logs and eventually the traffic resumes. It seems
> to happen at fairly regular intervals, although I've not timed them.
> The timeouts last for approx. 60-120 seconds and then traffic resumes
> normally with no hint of what happened.

x86-64, intel P965...

Can you provide "dmesg" output, please?

I wonder what additional side effect did the patch cause. Streaming
allocs on such x86 should already be coherent, no?

Perhaps you have more than 2 GB RAM (or so) and swiotlb has to provide
buffering? I think of something like:

- the driver does "sync for CPU" and examines status
- the descriptor is tested to be still empty
- meanwhile e100 chip changes the status in the descriptor
- the driver does "sync for device" (it's what the patch added)
- at this point swiotlb doesn't know the descriptor is clean and writes
  it out, thus dropping the change done by the e100 chip.

Does the above seem plausible? I admit I'm not swiotlb expert, it's
a pure guess that it simply and blindly moves data in and out.

If that's the case, I don't really know how could it work without the
patch in question. Perhaps the timings were just right?

What can we do with it? Rewriting to use consistent allocs, of course.
Temporarily adding #ifdef CONFIG_ARM around the
pci_dma_sync_single_for_device()? Not sure if other archs were affected.

The root problem is that the driver shouldn't use streaming allocations
for its descriptors (they are written from both sides simultaneously).
Only skb->data can be streaming.
-- 
Krzysztof Halasa
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ