lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b588a211-c3d7-40b2-88d4-76af9042feed@amd.com>
Date: Thu, 11 Sep 2025 18:31:28 +1000
From: Alexey Kardashevskiy <aik@....com>
To: "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
 Arto Merilainen <amerilainen@...dia.com>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org, lukas@...ner.de,
 Samuel Ortiz <sameo@...osinc.com>, Xu Yilun <yilun.xu@...ux.intel.com>,
 Jason Gunthorpe <jgg@...pe.ca>, Suzuki K Poulose <Suzuki.Poulose@....com>,
 Steven Price <steven.price@....com>,
 Catalin Marinas <catalin.marinas@....com>, Marc Zyngier <maz@...nel.org>,
 Will Deacon <will@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>,
 kvmarm@...ts.linux.dev, linux-coco@...ts.linux.dev
Subject: Re: [RFC PATCH v1 34/38] coco: guest: arm64: Validate mmio range
 found in the interface report



On 11/9/25 15:33, Aneesh Kumar K.V wrote:
> Arto Merilainen <amerilainen@...dia.com> writes:
> 
>> On 31.7.2025 14.39, Arto Merilainen wrote:
>>> On 28.7.2025 16.52, Aneesh Kumar K.V (Arm) wrote:
>>>
>>>> +    for (int i = 0; i < interface_report->mmio_range_count; i++,
>>>> mmio_range++) {
>>>> +
>>>> +        /*FIXME!! units in 4K size*/
>>>> +        range_id = FIELD_GET(TSM_INTF_REPORT_MMIO_RANGE_ID,
>>>> mmio_range->range_attributes);
>>>> +
>>>> +        /* no secure interrupts */
>>>> +        if (msix_tbl_bar != -1 && range_id == msix_tbl_bar) {
>>>> +            pr_info("Skipping misx table\n");
>>>> +            continue;
>>>> +        }
>>>> +
>>>> +        if (msix_pba_bar != -1 && range_id == msix_pba_bar) {
>>>> +            pr_info("Skipping misx pba\n");
>>>> +            continue;
>>>> +        }
>>>> +
>>>
>>>
>>> MSI-X and PBA can be placed to a BAR that has other registers as well.
>>> While the PCIe specification recommends BAR-level isolation for MSI-X
>>> structures, it is not mandated. It is enough to have sufficient
>>> isolation within the BAR. Therefore, skipping the MSI-X and PBA BARs
>>> altogether may leave registers unintentionally mapped via unprotected
>>> IPA when they should have been mapped via protected IPA.
>>>
>>> Instead of skipping the whole BAR, would it make sense to determine
>>> where the MSI-X related regions reside, and skip validation only from
>>> these regions?
>>
>> I re-reviewed my suggestion, and what I proposed here seems wrong.
>> However, I think there is a different more generic problem related to
>> the MSI-X table, PBA and non-TEE ranges.
>>
>> If a BAR is sparse (e.g., it has TEE pages and the MSI-X table, PBA or
>> non-TEE areas), the TDISP interface report may contain multiple ranges
>> with the same range_id (/BAR id). In case a BAR contains some registers
>> in low addresses, the MSI-X table and other registers after the MSI-X
>> table, the interface report is expected to have two ranges for the same
>> BAR with different "first 4k page" and "size" fields.
>>
>> This creates a tricky problem given that RSI_VDEV_VALIDATE_MAPPING
>> requires both the ipa_base and pa_base which should correspond to the
>> same location. In above scenario, the PA of the first range would
>> correspond to the BAR base whereas the second range would correspond to
>> a location residing after the MSI-X table.
>>
>> Assuming that the report contains obfuscated (but linear) physical
>> addresses, it would be possible to create heuristics for this case.
>> However, the fundamental problem is that none of the "first 4k page"
>> fields in the ranges is guaranteed to correspond to the base of any BAR:
>> Consider a case where the MSI-X table is in the beginning of a BAR and
>> it is followed by a single TEE range. If the MSI-X is not locked, the
>> "first 4k page" field will not correspond to the beginning of the BAR.
>> If the realm naiviely reads the ipa_base using pci_resouce_n() and
>> corresponding pa_base from the interface report, the addresses won't
>> match and the validation will fail.
>>
>> It seems that interpreting the interface report cannot be done without
>> knowledge of the device's register layout. Therefore, I don't think the
>> ranges can be validated/remapped automatically without involving the
>> device driver, but there should be APIs for reading the interface
>> report, and for requesting making specific ranges protected.
>>
> 
> But we need to validate the interface report before accepting the device,
> and the device driver is only loaded after the device has been accepted.
> 
> Can we assume that only the MSI-X table and PBA ranges may be missing
> from the interface report, while all other non-secure regions are
> reported as NON-TEE ranges?

In case of the "some registers"+MSIX+"some more registers" BAR#X, I'd rather expect 3 ranges in the report and not just 2, for each subrange, sorted, with/without NonTEE bit set, as the PCIe spec suggests (sort of, I struggle parsing it):

===
When reporting the MMIO range for a TDI, the MMIO ranges must be reported in the logical order in which the TDI MMIO
range is configured such that the first range reported corresponds to first range of pages in the TDI and so on
===

And if the number of pages in the report differs from the BAR size, then we should fail validation. Otherwise it is impossible to tell from the report's MMIO addresses what part of a BAR needs to be validated (==TEE) and if the guest device driver has to know it - then reports are just useless (which is hardly true).


> If so, we could retrieve the MSI-X guest real address details from
> config space and map the beginning of the BAR correctly.
> 
> Dan / Yilun — how is this handled in Intel TDX?
> 
>  From what I can see, the AMD patches appear to encounter the same issue.

I am skipping MSIX BAR because T=1 is tied to C=1 so after such validation, the BAR belongs to the guest and the hw will reject VFIO-PCI (==HV) attempts to write to MSIX BAR. Probably too straight forward though and I can try assigning it to the host (and add Cbit in those PTEs), and see how the PSP handles it (not). Thanks,


> 
> -aneesh

-- 
Alexey


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ