netdev - Re: [PATCH net 0/2] net: marvell: Fix highmem support on non-TSO path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <54C16B43.5040504@tpi.com>
Date:	Thu, 22 Jan 2015 13:27:31 -0800
From:	Dean Gehnert <deang@....com>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
CC:	Ezequiel Garcia <ezequiel.garcia@...e-electrons.com>,
	netdev@...r.kernel.org, David Miller <davem@...emloft.net>,
	B38611@...escale.com, fabio.estevam@...escale.com
Subject: Re: [PATCH net 0/2] net: marvell: Fix highmem support on non-TSO
 path

On 01/22/2015 01:09 PM, Russell King - ARM Linux wrote:
> On Thu, Jan 22, 2015 at 10:41:00AM -0800, Dean Gehnert wrote:
>> FYI, I found a way to reproduce the mv643xx_eth transmit corruption without
>> using a network filesystem by using SOCAT (should also be able to use NETCAT
>> or NC) and I have a bit more information about the corruption that looks
>> like it is somehow related to the cache line size.
> That's not quite what I'm seeing.  What I'm seeing with NFS is that the
> machine is basically unusable.  I have the etna_viv source in a NFS
> share (it's shared amongst not only the Dove box but also my collection
> of iMX6 based hardware.)
>
> I'm fairly fully IPv6 enabled here, which includes NFS.
>
> On the Dove, if I try to build this without any fixes, and then try to
> build the etna_viv sources, it will take the machine out to the extent
> that I have to reboot it - either the machine will freeze solidly, or
> the kernel will oops in the DMA API functions, in a path which was
> called from an interrupt handler.  That takes out the entire machine
> because we miss acknowleding the interrupt.
I am wondering if there is a possibility of the root cause of this being 
in the arch DMA layer... From my testing with SOCAT and different cache 
line alignments, I am seeing Ethernet 4 byte transmit corruptions. My 
fear is this may not be restricted to the Ethernet transmit and maybe 
the root cause is a DMA / cache issue... I have no way to prove that 
theory. Your DMA API oops is a bit concerning that maybe there is some 
corruption going on during DMA operation.
>
> Either way, it's effectively a power cycle as there's no reset button on
> the machine.
>
> I have yet to see any sign of data corruption.
>
Can you can try the SOCAT test on your Dove platform and see if that 
passes the non-cache line aligned test case? I think what the SOCAT test 
does is take the NFS "variable" out of the equation. My theory is that 
if there is a DMA corruption, then hard telling what kinds of problems 
will occur. It might be the payload of a file is corrupted, or if the 
NFS structures are corrupted, it could manifest itself as a problem in 
the NFS code.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html