lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 23 Feb 2024 16:37:23 -0800
From: Elliot Berman <quic_eberman@...cinc.com>
To: Christoph Hellwig <hch@...radead.org>, Will Deacon <will@...nel.org>,
        Quentin Perret <qperret@...gle.com>,
        Chris Goldsworthy
	<quic_cgoldswo@...cinc.com>,
        Android KVM <android-kvm@...gle.com>,
        "Patrick
 Daly" <quic_pdaly@...cinc.com>
CC: Alex Elder <elder@...aro.org>,
        Srinivas Kandagatla
	<srinivas.kandagatla@...aro.org>,
        Murali Nalajal <quic_mnalajal@...cinc.com>,
        Trilok Soni <quic_tsoni@...cinc.com>,
        Srivatsa Vaddagiri
	<quic_svaddagi@...cinc.com>,
        Carl van Schaik <quic_cvanscha@...cinc.com>,
        Philip Derrin <quic_pderrin@...cinc.com>,
        Prakruthi Deepak Heragu
	<quic_pheragu@...cinc.com>,
        Jonathan Corbet <corbet@....net>, Rob Herring
	<robh+dt@...nel.org>,
        Krzysztof Kozlowski
	<krzysztof.kozlowski+dt@...aro.org>,
        Conor Dooley <conor+dt@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Konrad Dybcio <konrad.dybcio@...aro.org>,
        Bjorn Andersson
	<andersson@...nel.org>,
        Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
        "Fuad
 Tabba" <tabba@...gle.com>,
        Sean Christopherson <seanjc@...gle.com>,
        "Andrew
 Morton" <akpm@...ux-foundation.org>,
        <linux-arm-msm@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <devicetree@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>, <linux-mm@...ck.org>
Subject: Re: Re: [PATCH v17 19/35] arch/mm: Export direct {un,}map functions

On Thu, Feb 22, 2024 at 11:09:40PM -0800, Christoph Hellwig wrote:
> On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote:
> > Firmware and hypervisor drivers can donate system heap memory to their
> > respective firmware/hypervisor entities. Those drivers should unmap the
> > pages from the kernel's logical map before doing so.
> > 
> > Export can_set_direct_map, set_direct_map_invalid_noflush, and
> > set_direct_map_default_noflush.
> 
> Err, not they should not.  And not using such super low-level interfaces
> from modular code.

Hi Cristoph,
 
We've observed a few times that Linux can unintentionally access a page
we've unmapped from host's stage 2 page table via an unaligned load from
an adjacent page. The stage 2 is managed by Gunyah. There are few
scenarios where even though we allocate and own a page from buddy,
someone else could try to access the page without going through the
hypervisor driver. One such instance we know about is
load_unaligned_zeropad() via pathlookup_at() [1].
 
load_unaligned_zeropad() could be called near the end of a page. If the
next page isn't mapped by the kernel in the stage one page tables, then
the access from to the unmapped page from load_unaligned_zeropad() will
land in __do_kernel_fault(), call fixup_exception(), and fill the
remainder of the load with zeroes. If the page in question is mapped in
stage 1 but was unmapped from stage 2, then the access lands back in
Linux in do_sea(), leading to a panic().
 
Our preference would be to add fixup_exception() to S2 PTW errors for
two reasons:
1. It's cheaper to do performance wise: we've already manipulated S2
   page table and prevent intentional access to the page because
   pKVM/Gunyah drivers know that access to the page has been lost.
2. Page-granular S1 mappings only happen on arm64 with rodata=full.
 
In an off-list discussion with the Android pkvm folks, their preference
was to have the pages unmapped from stage 1. I've gone with that
approach to get started but welcome discussion on the best approach.
 
The Android (downstream) implementation of arm64 pkvm is currently
implementing a hack where s2 ptw faults are given back to the host as s1
ptw faults (i.e. __do_kernel_fault() gets called and not do_sea()) --
allowing the kernel to fixup the exception.
 
arm64 pKVM will also face this issue when implementing guest_memfd or
when donating more memory to the hyp for s2 page tables, etc. As far as
I can tell, this isn't an issue for arm64 pKVM today because memory
isn't being dynamically donated to the hypervisor.
 
Thanks,
Elliot

[1]:
path_lookupat+0x340/0x3228
filename_lookup+0xbc/0x1c0
__arm64_sys_newfstatat+0xb0/0x4a0
invoke_syscall+0x58/0x118


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ