lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 15 Sep 2015 00:46:03 +0200
From:	Simon Guinot <simon.guinot@...uanux.org>
To:	Oren Laskin <oren@...eous.io>
Cc:	David Miller <davem@...emloft.net>,
	thomas.petazzoni@...e-electrons.com, andrew@...n.ch,
	jason@...edaemon.net, netdev@...r.kernel.org, vdonnefort@...il.com,
	stable@...r.kernel.org,
	Gregory CLEMENT <gregory.clement@...e-electrons.com>,
	yoann@...lo.fr, linux-arm-kernel@...ts.infradead.org,
	sebastian.hesselbarth@...il.com
Subject: Re: [PATCH v2] net: mvneta: fix refilling for Rx DMA buffers

On Mon, Sep 14, 2015 at 03:16:50PM -0700, Oren Laskin wrote:
> I would hit this error on my Armada 370 board about 20% of the time
> after downloading a 30MB file to /tmp.  We're running a 1 Gb SGMII
> link.  I would hit this in less than a minute before removing this
> commit from my tree.  I've now been running this test in a loop for a
> few hours with no problems.

Outch.

At the time I have tested this patch with several runs of 20 wget/md5sum
of a 1GB file (with jumbo frames enabled or not). I have also used this
program: http://git.lacie-nas.org/?p=netsum.git;a=summary. It allows to
detect data corruption over network very quickly.

It is very weird, I should have seen something...

Moreover I understand that you reproduce the issue very quickly and
without any refilling errors. It is also quite weird because the patch
does basically nothing in a such case.

BTW, which hardware are you using exactly ?

Definitively I'll have a closer look at it tomorrow.

Simon

> 
> It was somewhat hard to diagnose since files I used scp didn't see the
> issues (or at least as quickly).  I set up an http program to serve a
> file and replicated the problem with wget and found it.
> 
> Oren
> 
> On Mon, Sep 14, 2015 at 3:13 PM, Simon Guinot <simon.guinot@...uanux.org> wrote:
> > Hi Oren,
> >
> > On Mon, Sep 14, 2015 at 01:22:12PM -0700, Oren Laskin wrote:
> >> I had to undo this change on my Amada 370 based board.  It was causing
> >> corrupt data to make it through on large downloads.  I'm using wget to get
> >> the same 30MB file many times and the SHA would occasionally be different.
> >
> > During your tests, can you see some "Linux processing - Can't refill"
> > messages along with the data corruptions ?
> >
> >> I tracked it down to this commit.  In it, I would find on the order of a
> >> few hundred bytes to simply be wrong data.
> >
> > I am little bit surprised here. For me, this patch is very simple and
> > does the exact opposite. It does fix kernel crashes and data corruptions
> > in case of refilling errors. This can happen for example if you run
> > large data transfers with jumbo frames enabled...
> >
> > But anyway, I'll try to reproduce the issue tomorrow. I only have to
> > wget the same file (size 30MB) in a loop and to check its md5sum ?
> > That's it ? And how long should I wait for the error ?
> >
> > Thanks,
> >
> > Simon
> >
> >>
> >> Thanks,
> >>
> >> Oren
> >>
> >> On Tue, Jul 21, 2015 at 12:30 AM, David Miller <davem@...emloft.net> wrote:
> >>
> >> > From: Simon Guinot <simon.guinot@...uanux.org>
> >> > Date: Sun, 19 Jul 2015 13:00:53 +0200
> >> >
> >> > > With the actual code, if a memory allocation error happens while
> >> > > refilling a Rx descriptor, then the original Rx buffer is both passed
> >> > > to the networking stack (in a SKB) and let in the Rx ring. This leads
> >> > > to various kernel oops and crashes.
> >> > >
> >> > > As a fix, this patch moves Rx descriptor refilling ahead of building
> >> > > SKB with the associated Rx buffer. In case of a memory allocation
> >> > > failure, data is dropped and the original DMA buffer is put back into
> >> > > the Rx ring.
> >> > >
> >> > > Signed-off-by: Simon Guinot <simon.guinot@...uanux.org>
> >> > > Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP
> >> > network unit")
> >> > > Cc: <stable@...r.kernel.org> # v3.8+
> >> > > Tested-by: Yoann Sculo <yoann@...lo.fr>
> >> >
> >> > Applied, thanks.
> >> >
> >> > _______________________________________________
> >> > linux-arm-kernel mailing list
> >> > linux-arm-kernel@...ts.infradead.org
> >> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >> >

Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ