linux-kernel - build_skb() and data corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACmBeS35RAEQ+t2vtYtFTKNT-VdQw20hURjTi193Jk8HG7UECA@mail.gmail.com>
Date:	Mon, 13 Jan 2014 12:47:42 +0100
From:	Jonas Jensen <jonas.jensen@...il.com>
To:	netdev <netdev@...r.kernel.org>
Cc:	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	bhutchings@...arflare.com, alexander.h.duyck@...el.com,
	Arnd Bergmann <arnd@...db.de>,
	Florian Fainelli <florian@...nwrt.org>
Subject: build_skb() and data corruption

Please help,

I think I see memory corruption with a driver recently added to linux-next.

The following error occur downloading a large file with wget (or
ncftp): "read error: Bad address"
wget exits and leaves the file unfinished.

The error goes away when build_skb() is patched out, in this case by
allocating pages directly in RX loop.

I currently have two branches with code placed under ifdef USE_BUILD_SKB:
https://bitbucket.org/Kasreyn/linux-next/src/faa7c408a655fdfd7c383f79259031da5a05d61e/drivers/net/ethernet/moxa/moxart_ether.c#cl-472

If build_skb() is the cause, is there something the driver can do about it?

A quick search on "build_skb memory corruption" reveals the following
commit, "igb: Revert support for build_skb in igb"
f9d40f6a9921cc7d9385f64362314054e22152bd:

"The reason for reverting this patch is that it can lead to data corruption.
The following flow was pointed out by Ben Hutchings:
1. skb is forwarded to another device
2. Packet headers are modified and it's put into a queue
3. Second packet is received into the other half of this page
4. Page cannot be reused, so is DMA-unmapped
5. The DMA mapping was non-coherent, so unmap copies or invalidates
cache
The headers added in step 2 get trashed in step 5."


This is extra interesting, errors only happen on a locally mounted
ext3 filesystem, never on tmpfs e.g.:

# mount
/dev/mmcblk0p1 on / type ext3
(rw,relatime,errors=continue,barrier=1,data=ordered)
tmpfs on /dev/shm type tmpfs (rw,relatime,mode=777)
tmpfs on /tmp type tmpfs (rw,relatime)

#cd /tmp
# wget -c ftp://149.20.4.69/pub/linux/kernel/v2.6/linux-2.6.11.11.tar.gz
Connecting to 149.20.4.69 (149.20.4.69:21)
linux-2.6.11.11.tar.  25% |*******                        | 11374k
0:01:36 ETAwget: short write
[  153.560000] wget (383) used greatest stack depth: 4776 bytes left
# rm linux-2.6.11.11.tar.gz
# wget -c ftp://149.20.4.69/pub/linux/kernel/v2.6/linux-2.6.11.11.tar.gz
Connecting to 149.20.4.69 (149.20.4.69:21)
linux-2.6.11.11.tar.  25% |*******                        | 11315k
0:01:34 ETAwget: short write
# rm linux-2.6.11.11.tar.gz
# wget -c ftp://149.20.4.69/pub/linux/kernel/v2.6/linux-2.6.11.11.tar.gz
Connecting to 149.20.4.69 (149.20.4.69:21)
linux-2.6.11.11.tar.  25% |*******                        | 11473k
0:01:38 ETAwget: short write
[  237.300000] wget (387) used greatest stack depth: 4752 bytes left

# cd /root/
# wget -c ftp://149.20.4.69/pub/linux/kernel/v2.6/linux-2.6.11.11.tar.gz
Connecting to 149.20.4.69 (149.20.4.69:21)
linux-2.6.11.11.tar.   3% |*                              |  1647k  0:03:02 ETA
wget: read error: Bad address


Regards,
Jonas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/