linux-kernel - Re: [PATCH v6 20/43] arm64: RME: Runtime faulting of memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f4411830-067e-448e-b43b-1418bb264e11@arm.com>
Date: Fri, 7 Feb 2025 17:04:48 +0000
From: Steven Price <steven.price@....com>
To: Gavin Shan <gshan@...hat.com>, kvm@...r.kernel.org, kvmarm@...ts.linux.dev
Cc: Catalin Marinas <catalin.marinas@....com>, Marc Zyngier <maz@...nel.org>,
 Will Deacon <will@...nel.org>, James Morse <james.morse@....com>,
 Oliver Upton <oliver.upton@...ux.dev>,
 Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu
 <yuzenghui@...wei.com>, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, Joey Gouly <joey.gouly@....com>,
 Alexandru Elisei <alexandru.elisei@....com>,
 Christoffer Dall <christoffer.dall@....com>, Fuad Tabba <tabba@...gle.com>,
 linux-coco@...ts.linux.dev,
 Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>,
 Shanker Donthineni <sdonthineni@...dia.com>, Alper Gun
 <alpergun@...gle.com>, "Aneesh Kumar K . V" <aneesh.kumar@...nel.org>
Subject: Re: [PATCH v6 20/43] arm64: RME: Runtime faulting of memory

On 30/01/2025 05:22, Gavin Shan wrote:
> On 12/13/24 1:55 AM, Steven Price wrote:
[...]
>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>> index d4561e368cd5..146ef598a581 100644
>> --- a/arch/arm64/kvm/rme.c
>> +++ b/arch/arm64/kvm/rme.c
>> @@ -602,6 +602,162 @@ static int fold_rtt(struct realm *realm,
>> unsigned long addr, int level)
>>       return 0;
>>   }
>>   +int realm_map_protected(struct realm *realm,
>> +            unsigned long ipa,
>> +            kvm_pfn_t pfn,
>> +            unsigned long map_size,
>> +            struct kvm_mmu_memory_cache *memcache)
>> +{
>> +    phys_addr_t phys = __pfn_to_phys(pfn);
>> +    phys_addr_t rd = virt_to_phys(realm->rd);
>> +    unsigned long base_ipa = ipa;
>> +    unsigned long size;
>> +    int map_level;
>> +    int ret = 0;
>> +
>> +    if (WARN_ON(!IS_ALIGNED(ipa, map_size)))
>> +        return -EINVAL;
>> +
>> +    switch (map_size) {
>> +    case PAGE_SIZE:
>> +        map_level = 3;
>> +        break;
>> +    case RMM_L2_BLOCK_SIZE:
>> +        map_level = 2;
>> +        break;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +
> 
> The same block of code, to return the RTT level according to the map
> size, has been
> used for multiple times. It would be nice to introduce a helper for this.

I have some changes to support larger host (and guest) page sizes which
rework this code which should help clean this up.

>> +    if (map_level < RMM_RTT_MAX_LEVEL) {
>> +        /*
>> +         * A temporary RTT is needed during the map, precreate it,
>> +         * however if there is an error (e.g. missing parent tables)
>> +         * this will be handled below.
>> +         */
>> +        realm_create_rtt_levels(realm, ipa, map_level,
>> +                    RMM_RTT_MAX_LEVEL, memcache);
>> +    }
>> +
> 
> This block of code could be dropped. If the RTTs have been existing,
> realm_create_rtt_levels()
> doesn't nothing, but several RMI calls are issued. RMI calls aren't
> cheap and it can cause
> performance lost.

As mentioned in the patch 19 this is to prevent the first call to
rmi_data_create_unknown() failing when we know we're going to need to
create the RTTs. So this should generally reduce the number of RMIs
calls not increase.

>> +    for (size = 0; size < map_size; size += PAGE_SIZE) {
>> +        if (rmi_granule_delegate(phys)) {
>> +            /*
>> +             * It's likely we raced with another VCPU on the same
>> +             * fault. Assume the other VCPU has handled the fault
>> +             * and return to the guest.
>> +             */
>> +            return 0;
>> +        }
> 
> We probably can't bail immediately when error is returned from
> rmi_granule_delegate()
> because we intend to map a region whose size is 'map_size'. So a
> 'continue' instead
> of 'return 0' seems correct to me.

The logic here is that two (or more) vCPUs have faulted on the same
region. So if we get a failure here it means that we're the second vCPU
to hit the fault. While we could continue, it's highly likely that the
other vCPU will be faulting in the later parts of the region so we
expect to hit more delegation failures.

Returning to the guest and letting the guest retry the access means:
a) The access might now continue because the other vCPU has completed
(enough of) the mapping, or
b) We fault a second time, but give the other vCPU some more time to
progress in creating the mapping (and prevent the other vCPU being
tripped up by continued attempts by this vCPU from delegating the pages).

"continue" would work, but it's likely to waste CPU time with the two
vCPUs fighting each other to complete the mapping.

>> +
>> +        ret = rmi_data_create_unknown(rd, phys, ipa);
>> +
>> +        if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>> +            /* Create missing RTTs and retry */
>> +            int level = RMI_RETURN_INDEX(ret);
>> +
>> +            ret = realm_create_rtt_levels(realm, ipa, level,
>> +                              RMM_RTT_MAX_LEVEL,
>> +                              memcache);
>> +            WARN_ON(ret);
>> +            if (ret)
>> +                goto err_undelegate;
> 
>             if (WARN_ON(ret))

ack

>> +
>> +            ret = rmi_data_create_unknown(rd, phys, ipa);
>> +        }
>> +        WARN_ON(ret);
>> +
>> +        if (ret)
>> +            goto err_undelegate;
> 
>         if (WARN_ON(ret))

ack

>> +
>> +        phys += PAGE_SIZE;
>> +        ipa += PAGE_SIZE;
>> +    }
>> +
>> +    if (map_size == RMM_L2_BLOCK_SIZE)
>> +        ret = fold_rtt(realm, base_ipa, map_level);
>> +    if (WARN_ON(ret))
>> +        goto err;
>> +
> 
> The nested if statements are needed here because the WARN_ON() only
> take effect on the return value from fold_rtt().
> 
>     if (map_size == RMM_L2_BLOCK_SIZE) {
>         ret = fold_rtt(realm, base_ipa, map_level);
>         if (WARN_ON(ret))
>             goto err;
>     }

Technically to get here then ret==0 so there's no need to nest like
this. But I agree it makes the intent much clearer so I'll update.

Thanks,
Steve