[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZeXIWBLVWzVycm0r@google.com>
Date: Mon, 4 Mar 2024 13:10:48 +0000
From: Quentin Perret <qperret@...gle.com>
To: Christoph Hellwig <hch@...radead.org>, Will Deacon <will@...nel.org>,
Chris Goldsworthy <quic_cgoldswo@...cinc.com>,
Android KVM <android-kvm@...gle.com>,
Patrick Daly <quic_pdaly@...cinc.com>,
Alex Elder <elder@...aro.org>,
Srinivas Kandagatla <srinivas.kandagatla@...aro.org>,
Murali Nalajal <quic_mnalajal@...cinc.com>,
Trilok Soni <quic_tsoni@...cinc.com>,
Srivatsa Vaddagiri <quic_svaddagi@...cinc.com>,
Carl van Schaik <quic_cvanscha@...cinc.com>,
Philip Derrin <quic_pderrin@...cinc.com>,
Prakruthi Deepak Heragu <quic_pheragu@...cinc.com>,
Jonathan Corbet <corbet@....net>, Rob Herring <robh+dt@...nel.org>,
Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
Conor Dooley <conor+dt@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Konrad Dybcio <konrad.dybcio@...aro.org>,
Bjorn Andersson <andersson@...nel.org>,
Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
Fuad Tabba <tabba@...gle.com>,
Sean Christopherson <seanjc@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-arm-msm@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, devicetree@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org
Subject: Re: Re: [PATCH v17 19/35] arch/mm: Export direct {un,}map functions
On Friday 23 Feb 2024 at 16:37:23 (-0800), Elliot Berman wrote:
> On Thu, Feb 22, 2024 at 11:09:40PM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 22, 2024 at 03:16:42PM -0800, Elliot Berman wrote:
> > > Firmware and hypervisor drivers can donate system heap memory to their
> > > respective firmware/hypervisor entities. Those drivers should unmap the
> > > pages from the kernel's logical map before doing so.
> > >
> > > Export can_set_direct_map, set_direct_map_invalid_noflush, and
> > > set_direct_map_default_noflush.
> >
> > Err, not they should not. And not using such super low-level interfaces
> > from modular code.
>
> Hi Cristoph,
>
> We've observed a few times that Linux can unintentionally access a page
> we've unmapped from host's stage 2 page table via an unaligned load from
> an adjacent page. The stage 2 is managed by Gunyah. There are few
> scenarios where even though we allocate and own a page from buddy,
> someone else could try to access the page without going through the
> hypervisor driver. One such instance we know about is
> load_unaligned_zeropad() via pathlookup_at() [1].
>
> load_unaligned_zeropad() could be called near the end of a page. If the
> next page isn't mapped by the kernel in the stage one page tables, then
> the access from to the unmapped page from load_unaligned_zeropad() will
> land in __do_kernel_fault(), call fixup_exception(), and fill the
> remainder of the load with zeroes. If the page in question is mapped in
> stage 1 but was unmapped from stage 2, then the access lands back in
> Linux in do_sea(), leading to a panic().
>
> Our preference would be to add fixup_exception() to S2 PTW errors for
> two reasons:
> 1. It's cheaper to do performance wise: we've already manipulated S2
> page table and prevent intentional access to the page because
> pKVM/Gunyah drivers know that access to the page has been lost.
> 2. Page-granular S1 mappings only happen on arm64 with rodata=full.
>
> In an off-list discussion with the Android pkvm folks, their preference
> was to have the pages unmapped from stage 1. I've gone with that
> approach to get started but welcome discussion on the best approach.
>
> The Android (downstream) implementation of arm64 pkvm is currently
> implementing a hack where s2 ptw faults are given back to the host as s1
> ptw faults (i.e. __do_kernel_fault() gets called and not do_sea()) --
> allowing the kernel to fixup the exception.
>
> arm64 pKVM will also face this issue when implementing guest_memfd or
> when donating more memory to the hyp for s2 page tables, etc. As far as
> I can tell, this isn't an issue for arm64 pKVM today because memory
> isn't being dynamically donated to the hypervisor.
FWIW pKVM already donates memory dynamically to the hypervisor, to store
e.g. guest VM metadata and page-tables, and we've never seen that
problem as far as I can recall.
A key difference is that pKVM injects a data abort back into the kernel
in case of a stage-2 fault, so the whole EXTABLE trick/hack in
load_unaligned_zeropad() should work fine out of the box.
As discussed offline, Gunyah injecting an SEA into the kernel is
questionable, but I understand that the architecture is a bit lacking in
this department, and that's probably the next best thing.
Could the Gunyah driver allocate from a CMA region instead? That would
surely simplify unmapping from EL1 stage-1 (similar to how drivers
usually donate memory to TZ).
Thanks,
Quentin
Powered by blists - more mailing lists