lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68acf6c9-4ab8-4ed5-bddc-f3fc5313597e@redhat.com>
Date: Wed, 31 Jul 2024 16:36:49 +1000
From: Gavin Shan <gshan@...hat.com>
To: Suzuki K Poulose <suzuki.poulose@....com>,
 Steven Price <steven.price@....com>, kvm@...r.kernel.org,
 kvmarm@...ts.linux.dev
Cc: Catalin Marinas <catalin.marinas@....com>, Marc Zyngier <maz@...nel.org>,
 Will Deacon <will@...nel.org>, James Morse <james.morse@....com>,
 Oliver Upton <oliver.upton@...ux.dev>, Zenghui Yu <yuzenghui@...wei.com>,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 Joey Gouly <joey.gouly@....com>, Alexandru Elisei
 <alexandru.elisei@....com>, Christoffer Dall <christoffer.dall@....com>,
 Fuad Tabba <tabba@...gle.com>, linux-coco@...ts.linux.dev,
 Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>
Subject: Re: [PATCH v4 05/15] arm64: Mark all I/O as non-secure shared

Hi Suzuki,

On 7/30/24 8:36 PM, Suzuki K Poulose wrote:
> On 30/07/2024 02:36, Gavin Shan wrote:
>> On 7/1/24 7:54 PM, Steven Price wrote:
>> I'm unable to understand this. Steven, could you please explain a bit how
>> PROT_NS_SHARED is turned to a shared (non-secure) mapping to hardware?
>> According to tf-rmm's implementation in tf-rmm/lib/s2tt/src/s2tt_pvt_defs.h,
>> a shared (non-secure) mapping is is identified by NS bit (bit#55). I find
>> difficulties how the NS bit is correlate with PROT_NS_SHARED. For example,
>> how the NS bit is set based on PROT_NS_SHARED.
> 
> 
> There are two things at play here :
> 
> 1. Stage1 mapping controlled by the Realm (Linux in this case, as above).
> 2. Stage2 mapping controlled by the RMM (with RMI commands from NS Host).
> 
> Also :
> The Realm's IPA space is divided into two halves (decided by the IPA Width of the Realm, not the NSbit #55), protected (Lower half) and
> Unprotected (Upper half). All stage2 mappings of the "Unprotected IPA"
> will have the NS bit (#55) set by the RMM. By design, any MMIO access
> to an unprotected half is sent to the NS Host by RMM and any page
> the Realm wants to share with the Host must be in the Upper half
> of the IPA.
> 
> What we do above is controlling the "Stage1" used by the Linux. i.e,
> for a given VA, we flip the Guest "PA" (in reality IPA) to the
> "Unprotected" alias.
> 
> e.g., DTB describes a UART at address 0x10_0000 to Realm (with an IPA width of 40, like in the normal VM case), emulated by the host. Realm is
> trying to map this I/O address into Stage1 at VA. So we apply the
> BIT(39) as PROT_NS_SHARED while creating the Stage1 mapping.
> 
> ie., VA == stage1 ==> BIT(39) | 0x10_0000 =(IPA)== > 0x80_10_0000
> 
                                                      0x8000_10_0000

> Now, the Stage2 mapping won't be present for this IPA if it is emulated
> and thus an access to "VA" causes a Stage2 Abort to the Host, which the
> RMM allows the host to emulate. Otherwise a shared page would have been
> mapped by the Host (and NS bit set at Stage2 by RMM), allowing the
> data to be shared with the host.
> 

Thank you for the explanation and details. It really helps to understand
how the access fault to the unprotected space (upper half) is routed to NS
host, and then VMM (QEMU) for emulation. If the commit log can be improved
with those information, it will make reader easier to understand the code.

I had the following call trace and it seems the address 0x8000_10_1000 is
converted to 0x10_0000 in [1], based on current code base (branch: cca-full/v3).
At [1], the GPA is masked with kvm_gpa_stolen_bits() so that BIT#39 is removed
in this particular case.

   kvm_vcpu_ioctl(KVM_RUN)                         // non-secured host
   kvm_arch_vcpu_ioctl_run
   kvm_rec_enter
   rmi_rec_enter                                   // -> SMC_RMI_REC_ENTER
     :
   rmm_handler                                     // tf-rmm
   handle_ns_smc
   smc_rec_enter
   rec_run_loop
   run_realm
     :
   el2_vectors
   el2_sync_lel
   realm_exit
     :
   handle_realm_exit
   handle_exception_sync
   handle_data_abort
     :
   handle_rme_exit                                 // non-secured host
   rec_exit_sync_dabt
   kvm_handle_guest_abort                          // -> [1]
   gfn_to_memslot
   io_mem_abort
   kvm_io_bus_write                                // -> run->exit_reason = KVM_EXIT_MMIO

Another question is how the Granule Protection Check (GPC) table is updated so
that the corresponding granule (0x8000_10_1000) to is accessible by NS host? I
mean how the BIT#39 is synchronized to GPC table and translated to the property
"granule is accessible by NS host".
     
Thanks,
Gavin






Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ