netdev - Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <21807.35967.660396.209954@mariner.uk.xensource.com>
Date:	Thu, 16 Apr 2015 11:18:39 +0100
From:	Ian Jackson <Ian.Jackson@...citrix.com>
To:	Prashant <prashant@...adcom.com>
CC:	Michael Chan <mchan@...adcom.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	"David Vrabel" <david.vrabel@...rix.com>,
	Thadeu Lima de Souza Cascardo <cascardo@...ux.vnet.ibm.com>,
	Vlad Yasevich <vyasevich@...il.com>,
	<xen-devel@...ts.xensource.com>, <netdev@...r.kernel.org>,
	"Siva Reddy (Siva) Kallam" <siva.kallam@...adcom.com>,
	Sanjeev Bansal <sanjeevb@...adcom.com>
Subject: Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

Prashant writes ("Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]"):
> Ian, using your config we are able to recreate the problem that you are 
> seeing. The driver finds the RX data buffer to be all zero, with a 
> analyzer trace we are seeing the chip is DMA'ing valid RX data buffer 
> contents to the host but once the driver tries to read this DMA area, it 
> is seeing all zero's which is the reason of the corruption. This is only 
> for the RX data buffer, the RX descriptor and status block update DMA 
> regions are having valid contents.

I am no expert on this area, but this suggests that the driver is
misoperating the Linux DMA management API.  This is what I think
Konrad suspected when he suggested the `iommu=soft swiotlb=force'
command line option.

Note in kernel-parameters.txt:

        swiotlb=        [ARM,IA-64,PPC,MIPS,X86]
                        Format: { <int> | force }
                        <int> -- Number of I/O TLB slabs
                        force -- force using of bounce buffers even if they
                                 wouldn't be automatically used by the kernel

So with `swiotlb=force' the DMA is _expected_ to go to a bounce buffer
managed by the kernel DMA API.

> This is unlikely to be a chip or driver issue, as the chip is doing the 
> correct DMA but the corruption occurs before driver reads it. Would 
> request iommu experts to take a look and suggest what can be done next.

As I say above I think this is probably a driver bug.

I have seen identical symptoms on a >5yo desktop box under my desk and
on two brand new rackmount servers; I therefore doubt that it's a
hardware problem.

Ian.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html