[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3fb413aa-997f-42f5-2a43-c29d8de51d3d@googlemail.com>
Date: Mon, 22 Jan 2018 01:44:33 +0100
From: Oliver Freyermuth <o.freyermuth@...glemail.com>
To: Francois Romieu <romieu@...zoreil.com>
Cc: netdev@...r.kernel.org
Subject: Re: Memory corruption with r8169 across several device revisions and
kernels
Am 22.01.2018 um 01:09 schrieb Francois Romieu:
> You said:
>
> Oliver Freyermuth <o.freyermuth@...glemail.com> :
> [...]
>> The values found in overwritten memory match those contained in
>> /proc/self/net/dev for the realtek ethernet device.
>
> Are you able to retrieve the layout ? That is, does it appear to match:
>
> - r8169 hardware stats DMA buffer ?
> TxOk, RxOk, TxErr, RxErr, ...
>
> - rtnl_link_stats ?
> rx_packets, tx_packets, rx_bytes, tx_bytes, ...
>
> or something else ?
Not cleanly.
Since I'm no expert in kernel module development, I can only deduce from what I get in mapped memory,
e.g. with memtester. What I found there I found back in /proc/self/net/dev,
I'm not sure anymore whether it was RX or TX bytes / packets (but it was none of the error counters).
I can try to reproduce to clarify, but it's a somwhat dangerous undertaking.
Also, from a time when the physical offset was in low memory, I got the following in syslog:
Oct 12 10:05:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065b8ea
Oct 12 10:10:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065be39
Oct 12 10:11:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065be8c
Oct 12 10:12:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065bef8
Oct 12 10:13:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065bfbe
Oct 12 10:18:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065c37a
Oct 12 10:19:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065c3db
Oct 12 10:31:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065cc48
Oct 12 10:35:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065d402
Oct 12 10:47:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065dcbb
Oct 12 10:53:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065e0a3
Oct 12 11:39:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 006602f2
Oct 12 11:44:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 00661ef0
Also, I'm not sure whether the low memory scanner continues after a single corruption was found, potentially it would only see the first corrupted region.
memtester in userspace stops on the first corruption and then tries another pass. At least I only ever saw one corrupted region with the tools I used.
The same was true for the corrupted btrfs filesystem: As far as I could tell, there was a single corrupted region, no series of counters, i.e. not a full structure.
Cheers,
Oliver
Powered by blists - more mailing lists