linux-kernel - Re: kernel panic in skb_copy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <06F8DA5DD1D9E277F2AF6F1E@Ximines.local>
Date:	Thu, 04 Jul 2013 13:57:16 +0100
From:	Alex Bligh <alex@...x.org.uk>
To:	Eric Dumazet <eric.dumazet@...il.com>,
	Ian Campbell <Ian.Campbell@...rix.com>
cc:	Joe Jin <joe.jin@...cle.com>,
	Frank Blaschka <frank.blaschka@...ibm.com>,
	"David S. Miller" <davem@...emloft.net>,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	zheng.x.li@...cle.com, Xen Devel <xen-devel@...ts.xen.org>,
	Jan Beulich <JBeulich@...e.com>,
	Stefano Stabellini <stefano.stabellini@...citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Alex Bligh <alex@...x.org.uk>
Subject: Re: kernel panic in skb_copy_bits

--On 4 July 2013 03:12:10 -0700 Eric Dumazet <eric.dumazet@...il.com> wrote:

> It looks like a typical COW issue to me.
>
> If the page content is written while there is still a reference on this
> page, we should allocate a new page and copy the previous content.
>
> And this has little to do with networking.

I suspect this would get more attention if we could make Ian's case
below trigger (a) outside Xen, (b) outside networking.

> 	memset(buf, 0xaa, 4096);
> 	write(fd, buf, 4096)
> 	memset(buf, 0x55, 4096);
> (where fd is O_DIRECT on NFS) Can result in 0x55 being seen on the wire
> in the TCP retransmit.

We know this should fail using O_DIRECT+NFS. We've had reports suggesting
it fails in O_DIRECT+iSCSI. However, that's been with a kernel panic
(under Xen) rather than data corruption as per the above.

Historical trawling suggests this is an issue with DRDB (see Ian's
original thread from the mists of time).

I don't quite understand why we aren't seeing corruption with standard
ATA devices + O_DIRECT and no Xen involved at all.

My memory is a bit misty on this but I had thought the reason why
this would NOT be solved simply by O_DIRECT taking a reference to
the page was that the O_DIRECT I/O completed (and thus the reference
would be freed up) before the networking stack had actually finished
with the page. If the O_DIRECT I/O did not complete until the
page was actually finished with, we wouldn't see the problem in the
first place. I may be completely off base here.

-- 
Alex Bligh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/