netdev - Re: Memory corruption with r8169 across several device revisions and kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3ceebc43-6baf-2b4c-af10-70522e97385e@googlemail.com>
Date:   Mon, 22 Jan 2018 23:55:58 +0100
From:   Oliver Freyermuth <o.freyermuth@...glemail.com>
To:     Francois Romieu <romieu@...zoreil.com>
Cc:     netdev@...r.kernel.org
Subject: Re: Memory corruption with r8169 across several device revisions and
 kernels

Dear Francois, other r8169 experts, 

Am 22.01.2018 um 01:09 schrieb Francois Romieu:
> Are you able to retrieve the layout ? That is, does it appear to match:
> 
> - r8169 hardware stats DMA buffer ?
>   TxOk, RxOk, TxErr, RxErr, ...
> 
> - rtnl_link_stats ?
>   rx_packets, tx_packets, rx_bytes, tx_bytes, ...
> 
> or something else ?
> 

It took me a while, somehow it seems the bug does not always occur - potentially there's also some race involved. 
Reproducing on a Ubuntu 17.10 system I found the following:

Address in virtual memory || value
0x7f87bb4c6000            || 0x00000217
0x7f87bb4c6008            || 0x000003ab
0x7f87bb4c6018            || 0x00000000
0x7f87bb4c6028            || 0x00000279
0x7f87bb4c6030            || 0x000000e1
0x7f87bb4c6038            || 0x00000051

At almost the same time, I find the following numbers in /proc/self/net/dev for the device:

             decimal || hex
RX bytes:    870820  || 0x000d49a4
   packets:     945  || 0x000003b1
   errs           0  || 
   drop           0  || 
   fifo           0  ||  
   frame          0  || 
   compressed     0  || 
   multicast     83  || 0x00000053
TX bytes:     58505  || 0x0000e489
   packets:     535  || 0x00000217
   errs           0  || 
   drop           0  ||
   fifo           0  || 
   frame          0  || 
   compressed     0  || 
   multicast      0  || 

Since there was a small delay in time (reading from /proc/self/net/dev happened a few seconds later),
these values are by a few packets off from the memory dump. 

So I deduce the layout:
0x7f87bb4c6000   TX Packets
0x7f87bb4c6008   RX Packets
0x7f87bb4c6010    * corruption not seen by memtester for whatever reason *
0x7f87bb4c6018   ???
0x7f87bb4c6020    * corruption not seen by memtester for whatever reason *
0x7f87bb4c6028   ???
0x7f87bb4c6030   ???
0x7f87bb4c6038   RX multicast (?)

So the only thing which is fully clear is that there are TX Packets and after that RX Packets. 

Checking through the driver sources, I find rtnl_link_stats64 can not be the culprit, since it has rx_packets and only after tx_packets. 
However, struct rtl8169_counters looks like:
struct rtl8169_counters {
	__le64	tx_packets;
	__le64	rx_packets;
	__le64	tx_errors;
	__le32	rx_errors;
	__le16	rx_missed;
	__le16	align_errors;
	__le32	tx_one_collision;
	__le32	tx_multi_collision;
	__le64	rx_unicast;
	__le64	rx_broadcast;
	__le32	rx_multicast;
	__le16	tx_aborted;
	__le16	tx_underun;
};
This looks like it could very well match the structure found in memory, so something would be broken related to rtl8169_do_counters, in the DMA transfer. 

Does this help - can I provide more info? I get the feeling this affects many tens of thousands of systems and just has been hidden due to 
network stats being read rarely... 

Cheers,
Oliver