lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZeXIWBLVWzVycm0r@google.com>
Date: Mon, 4 Mar 2024 13:10:48 +0000
From: Quentin Perret <qperret@...gle.com>
To: Christoph Hellwig <hch@...radead.org>, Will Deacon <will@...nel.org>,
	Chris Goldsworthy <quic_cgoldswo@...cinc.com>,
	Android KVM <android-kvm@...gle.com>,
	Patrick Daly <quic_pdaly@...cinc.com>,
	Alex Elder <elder@...aro.org>,
	Srinivas Kandagatla <srinivas.kandagatla@...aro.org>,
	Murali Nalajal <quic_mnalajal@...cinc.com>,
	Trilok Soni <quic_tsoni@...cinc.com>,
	Srivatsa Vaddagiri <quic_svaddagi@...cinc.com>,
	Carl van Schaik <quic_cvanscha@...cinc.com>,
	Philip Derrin <quic_pderrin@...cinc.com>,
	Prakruthi Deepak Heragu <quic_pheragu@...cinc.com>,
	Jonathan Corbet <corbet@....net>, Rob Herring <robh+dt@...nel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
	Conor Dooley <conor+dt@...nel.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Konrad Dybcio <konrad.dybcio@...aro.org>,
	Bjorn Andersson <andersson@...nel.org>,
	Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
	Fuad Tabba <tabba@...gle.com>,
	Sean Christopherson <seanjc@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-arm-msm@...r.kernel.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, devicetree@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org
Subject: Re: Re: [PATCH v17 19/35] arch/mm: Export direct {un,}map functions

On Friday 23 Feb 2024 at 16:37:23 (-0800), Elliot Berman wrote:
> On Thu, Feb 22, 2024 at 11:09:40PM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote:
> > > Firmware and hypervisor drivers can donate system heap memory to their
> > > respective firmware/hypervisor entities. Those drivers should unmap the
> > > pages from the kernel's logical map before doing so.
> > > 
> > > Export can_set_direct_map, set_direct_map_invalid_noflush, and
> > > set_direct_map_default_noflush.
> > 
> > Err, not they should not.  And not using such super low-level interfaces
> > from modular code.
> 
> Hi Cristoph,
>  
> We've observed a few times that Linux can unintentionally access a page
> we've unmapped from host's stage 2 page table via an unaligned load from
> an adjacent page. The stage 2 is managed by Gunyah. There are few
> scenarios where even though we allocate and own a page from buddy,
> someone else could try to access the page without going through the
> hypervisor driver. One such instance we know about is
> load_unaligned_zeropad() via pathlookup_at() [1].
>  
> load_unaligned_zeropad() could be called near the end of a page. If the
> next page isn't mapped by the kernel in the stage one page tables, then
> the access from to the unmapped page from load_unaligned_zeropad() will
> land in __do_kernel_fault(), call fixup_exception(), and fill the
> remainder of the load with zeroes. If the page in question is mapped in
> stage 1 but was unmapped from stage 2, then the access lands back in
> Linux in do_sea(), leading to a panic().
>  
> Our preference would be to add fixup_exception() to S2 PTW errors for
> two reasons:
> 1. It's cheaper to do performance wise: we've already manipulated S2
>    page table and prevent intentional access to the page because
>    pKVM/Gunyah drivers know that access to the page has been lost.
> 2. Page-granular S1 mappings only happen on arm64 with rodata=full.
>  
> In an off-list discussion with the Android pkvm folks, their preference
> was to have the pages unmapped from stage 1. I've gone with that
> approach to get started but welcome discussion on the best approach.
>  
> The Android (downstream) implementation of arm64 pkvm is currently
> implementing a hack where s2 ptw faults are given back to the host as s1
> ptw faults (i.e. __do_kernel_fault() gets called and not do_sea()) --
> allowing the kernel to fixup the exception.
>  
> arm64 pKVM will also face this issue when implementing guest_memfd or
> when donating more memory to the hyp for s2 page tables, etc. As far as
> I can tell, this isn't an issue for arm64 pKVM today because memory
> isn't being dynamically donated to the hypervisor.

FWIW pKVM already donates memory dynamically to the hypervisor, to store
e.g. guest VM metadata and page-tables, and we've never seen that
problem as far as I can recall.

A key difference is that pKVM injects a data abort back into the kernel
in case of a stage-2 fault, so the whole EXTABLE trick/hack in
load_unaligned_zeropad() should work fine out of the box.

As discussed offline, Gunyah injecting an SEA into the kernel is
questionable, but I understand that the architecture is a bit lacking in
this department, and that's probably the next best thing.

Could the Gunyah driver allocate from a CMA region instead? That would
surely simplify unmapping from EL1 stage-1 (similar to how drivers
usually donate memory to TZ).

Thanks,
Quentin



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ