lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <985009134.71250769263099.JavaMail.root@mail.holmansrus.com>
Date:	Thu, 20 Aug 2009 06:54:23 -0500 (CDT)
From:	Walt Holman <walt@...mansrus.com>
To:	Krzysztof Halasa <khc@...waw.pl>
Cc:	David Miller <davem@...emloft.net>, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org
Subject: Re: Strange network timeouts w/ 2.6.30.5


----- "Krzysztof Halasa" <khc@...waw.pl> wrote:

> > Since patching to 2.6.30.5 I'm experiencing periodic timeouts on my
> > e100 which is used as my WAN interface on a server/router box.
> Nothing
> > is reported in any logs and eventually the traffic resumes. It
> seems
> > to happen at fairly regular intervals, although I've not timed
> them.
> > The timeouts last for approx. 60-120 seconds and then traffic
> resumes
> > normally with no hint of what happened.
> 
> x86-64, intel P965...
> 
> Can you provide "dmesg" output, please?
> 
> I wonder what additional side effect did the patch cause. Streaming
> allocs on such x86 should already be coherent, no?
> 
> Perhaps you have more than 2 GB RAM (or so) and swiotlb has to
> provide
> buffering? I think of something like:
> 
> - the driver does "sync for CPU" and examines status
> - the descriptor is tested to be still empty
> - meanwhile e100 chip changes the status in the descriptor
> - the driver does "sync for device" (it's what the patch added)
> - at this point swiotlb doesn't know the descriptor is clean and
> writes
>   it out, thus dropping the change done by the e100 chip.
> 
> Does the above seem plausible? I admit I'm not swiotlb expert, it's
> a pure guess that it simply and blindly moves data in and out.
> 
> If that's the case, I don't really know how could it work without the
> patch in question. Perhaps the timings were just right?
> 
> What can we do with it? Rewriting to use consistent allocs, of
> course.
> Temporarily adding #ifdef CONFIG_ARM around the
> pci_dma_sync_single_for_device()? Not sure if other archs were
> affected.
> 
> The root problem is that the driver shouldn't use streaming
> allocations
> for its descriptors (they are written from both sides
> simultaneously).
> Only skb->data can be streaming.
> -- 
> Krzysztof Halasa

Hi Krzystof,

dmesg is attached. This box does have >2GB Ram (6GB total).  The dmesg will show e100 init'd 3 times since the first is the stock modprobe, 2nd was forced with use_io and the 3rd modprobe was after reverting the patch.

-Walt

View attachment "dmesg.txt" of type "text/plain" (60718 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ