netdev - Re: Memory corruption with r8169 across several device revisions and kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <22a1f2d1-4ae8-d941-ab8e-00deac41d4ef@googlemail.com>
Date:   Tue, 23 Jan 2018 16:47:11 +0100
From:   Oliver Freyermuth <o.freyermuth@...glemail.com>
To:     David Miller <davem@...emloft.net>
Cc:     romieu@...zoreil.com, netdev@...r.kernel.org
Subject: Re: Memory corruption with r8169 across several device revisions and
 kernels

Am 23.01.2018 um 16:28 schrieb David Miller:
> Looking at how these DMA counters are handled, there appears to be a
> requirement that the memory buffer is 64-byte aligned.
> 
> [...]
> 
> Therefore the driver needs to allocate "size + (64 - 1)" bytes and do
> the 64-byte alignment of the CPU pointer and the DMA address by hand.

This is also what I wondered about as a non-expert in hardware drivers; 
alignment should surely be enforced here. 

However, for the memory corruption I observed, I used an x86_64 system
(which I believe always has PAGE_SIZE aligned buffers). 
So there should be another bug, unless I am mistaken about x86_64. 

I checked the deprecated r8168 driver by Realtek (I am not sure if this one is also affected by the issue, though)
and found two major differences in DMA handling:
1) It wraps the DMA operations (writing of adresses, waiting for cmd bits to be pulled down) in spin_lock_irqsave / spin_unlock_irqrestore. 
2) It does not reset CounterAddrLow / CounterAddrHigh to 0 / 0 after finishing. 
   That's not really good, but may have hidden this issue with r8168. 

Again, I have not tried to use r8168 yet (especially since it only supports old kernels),
but maybe this helps to trigger some ideas. 

Worst case, this could be a firmware timing bug, i.e. the card writes the counters to system memory
shortly before the cmd bytes are pulled high / shortly after they have been pulled down (then using the partially zeroed
out memory address) - I don't know. Let me know if I can extract any more info from an affected machine,
but I believe these machines should be very abundant. 

HTH and thanks,
Oliver