[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee5a226b-0e12-8823-0961-bf2fe0bd494e@gmail.com>
Date: Fri, 19 Nov 2021 22:23:16 +0200
From: Oleksandr <olekstysh@...il.com>
To: Stefano Stabellini <sstabellini@...nel.org>
Cc: xen-devel@...ts.xenproject.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
Russell King <linux@...linux.org.uk>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Juergen Gross <jgross@...e.com>, Julien Grall <julien@....org>
Subject: Re: [PATCH V2 4/4] arm/xen: Read extended regions from DT and init
Xen resource
On 19.11.21 03:19, Stefano Stabellini wrote:
Hi Stefano
> On Wed, 10 Nov 2021, Oleksandr wrote:
>> On 28.10.21 04:40, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> I am sorry for the late response.
>>
>>> On Tue, 26 Oct 2021, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>
>>>>
>>>> This patch implements arch_xen_unpopulated_init() on Arm where
>>>> the extended regions (if any) are gathered from DT and inserted
>>>> into passed Xen resource to be used as unused address space
>>>> for Xen scratch pages by unpopulated-alloc code.
>>>>
>>>> The extended region (safe range) is a region of guest physical
>>>> address space which is unused and could be safely used to create
>>>> grant/foreign mappings instead of wasting real RAM pages from
>>>> the domain memory for establishing these mappings.
>>>>
>>>> The extended regions are chosen by the hypervisor at the domain
>>>> creation time and advertised to it via "reg" property under
>>>> hypervisor node in the guest device-tree. As region 0 is reserved
>>>> for grant table space (always present), the indexes for extended
>>>> regions are 1...N.
>>>>
>>>> If arch_xen_unpopulated_init() fails for some reason the default
>>>> behaviour will be restored (allocate xenballooned pages).
>>>>
>>>> This patch also removes XEN_UNPOPULATED_ALLOC dependency on x86.
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>
>>>> ---
>>>> Changes RFC -> V2:
>>>> - new patch, instead of
>>>> "[RFC PATCH 2/2] xen/unpopulated-alloc: Query hypervisor to provide
>>>> unallocated space"
>>>> ---
>>>> arch/arm/xen/enlighten.c | 112
>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>> drivers/xen/Kconfig | 2 +-
>>>> 2 files changed, 113 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>>>> index dea46ec..1a1e0d3 100644
>>>> --- a/arch/arm/xen/enlighten.c
>>>> +++ b/arch/arm/xen/enlighten.c
>>>> @@ -62,6 +62,7 @@ static __read_mostly unsigned int xen_events_irq;
>>>> static phys_addr_t xen_grant_frames;
>>>> #define GRANT_TABLE_INDEX 0
>>>> +#define EXT_REGION_INDEX 1
>>>> uint32_t xen_start_flags;
>>>> EXPORT_SYMBOL(xen_start_flags);
>>>> @@ -303,6 +304,117 @@ static void __init xen_acpi_guest_init(void)
>>>> #endif
>>>> }
>>>> +#ifdef CONFIG_XEN_UNPOPULATED_ALLOC
>>>> +int arch_xen_unpopulated_init(struct resource *res)
>>>> +{
>>>> + struct device_node *np;
>>>> + struct resource *regs, *tmp_res;
>>>> + uint64_t min_gpaddr = -1, max_gpaddr = 0;
>>>> + unsigned int i, nr_reg = 0;
>>>> + struct range mhp_range;
>>>> + int rc;
>>>> +
>>>> + if (!xen_domain())
>>>> + return -ENODEV;
>>>> +
>>>> + np = of_find_compatible_node(NULL, NULL, "xen,xen");
>>>> + if (WARN_ON(!np))
>>>> + return -ENODEV;
>>>> +
>>>> + /* Skip region 0 which is reserved for grant table space */
>>>> + while (of_get_address(np, nr_reg + EXT_REGION_INDEX, NULL, NULL))
>>>> + nr_reg++;
>>>> + if (!nr_reg) {
>>>> + pr_err("No extended regions are found\n");
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + regs = kcalloc(nr_reg, sizeof(*regs), GFP_KERNEL);
>>>> + if (!regs)
>>>> + return -ENOMEM;
>>>> +
>>>> + /*
>>>> + * Create resource from extended regions provided by the hypervisor to
>>>> be
>>>> + * used as unused address space for Xen scratch pages.
>>>> + */
>>>> + for (i = 0; i < nr_reg; i++) {
>>>> + rc = of_address_to_resource(np, i + EXT_REGION_INDEX,
>>>> ®s[i]);
>>>> + if (rc)
>>>> + goto err;
>>>> +
>>>> + if (max_gpaddr < regs[i].end)
>>>> + max_gpaddr = regs[i].end;
>>>> + if (min_gpaddr > regs[i].start)
>>>> + min_gpaddr = regs[i].start;
>>>> + }
>>>> +
>>>> + /* Check whether the resource range is within the hotpluggable range
>>>> */
>>>> + mhp_range = mhp_get_pluggable_range(true);
>>>> + if (min_gpaddr < mhp_range.start)
>>>> + min_gpaddr = mhp_range.start;
>>>> + if (max_gpaddr > mhp_range.end)
>>>> + max_gpaddr = mhp_range.end;
>>>> +
>>>> + res->start = min_gpaddr;
>>>> + res->end = max_gpaddr;
>>>> +
>>>> + /*
>>>> + * Mark holes between extended regions as unavailable. The rest of
>>>> that
>>>> + * address space will be available for the allocation.
>>>> + */
>>>> + for (i = 1; i < nr_reg; i++) {
>>>> + resource_size_t start, end;
>>>> +
>>>> + start = regs[i - 1].end + 1;
>>>> + end = regs[i].start - 1;
>>>> +
>>>> + if (start > (end + 1)) {
>>> Should this be:
>>>
>>> if (start >= end)
>>>
>>> ?
>> Yes, we can do this here (since the checks are equivalent) but ...
>>
>>>> + rc = -EINVAL;
>>>> + goto err;
>>>> + }
>>>> +
>>>> + /* There is no hole between regions */
>>>> + if (start == (end + 1))
>>> Also here, shouldn't it be:
>>>
>>> if (start == end)
>>>
>>> ?
>> ... not here.
>>
>> As
>>
>> "(start == (end + 1))" is equal to "(regs[i - 1].end + 1 == regs[i].start)"
>>
>> but
>>
>> "(start == end)" is equal to "(regs[i - 1].end + 1 == regs[i].start - 1)"
>
> OK. But the check:
>
> if (start >= end)
>
> Actually covers both cases so that's the only check we need?
Sorry, I don't entirely understand the question.
Is the question to use only a single check in that loop?
Paste the updated code which I have locally for the convenience.
[snip]
/*
* Mark holes between extended regions as unavailable. The rest of that
* address space will be available for the allocation.
*/
for (i = 1; i < nr_reg; i++) {
resource_size_t start, end;
start = regs[i - 1].end + 1;
end = regs[i].start - 1;
if (start > (end + 1)) {
rc = -EINVAL;
goto err;
}
/* There is no hole between regions */
if (start == (end + 1))
continue;
tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
if (!tmp_res) {
rc = -ENOMEM;
goto err;
}
tmp_res->name = "Unavailable space";
tmp_res->start = start;
tmp_res->end = end;
rc = insert_resource(&xen_resource, tmp_res);
if (rc) {
pr_err("Cannot insert resource %pR (%d)\n", tmp_res, rc);
kfree(tmp_res);
goto err;
}
}
[snip]
1. The first check is to detect an overlap (which is a wrong
configuration, correct?) and bail out if true (for example, regX:
0x81000000...0x82FFFFFF and regY: 0x82000000...0x83FFFFFF).
2. The second check is just to skip current iteration as there is no
space/hole between regions (for example, regX: 0x81000000...0x82FFFFFF
and regY: 0x83000000...0x83FFFFFF).
Therefore I think they should be distinguished.
Yes, both check could be transformed to a single one, but this way the
overlaps will be ignored:
if (start >= (end + 1))
continue;
Or I really missed something?
>
>
>>> I think I am missing again something in termination accounting :-)
>> If I understand correctly, we need to follow "end = start + size - 1" rule, so
>> the "end" is the last address inside a range, but not the "first" address
>> outside of a range))
> yeah
>
>
>>>> + continue;
>>>> +
>>>> + /* Check whether the hole range is within the resource range
>>>> */
>>>> + if (start < res->start || end > res->end) {
>>> By definition I don't think this check is necessary as either condition
>>> is impossible?
>>
>> This is a good question, let me please explain.
>> Not all extended regions provided by the hypervisor can be used here. This is
>> because the addressable physical memory range for which the linear mapping
>> could be created has limits on Arm, and maximum addressable range depends on
>> the VA space size (CONFIG_ARM64_VA_BITS_XXX). So we decided to not filter them
>> in hypervisor as this logic could be quite complex as different OS may have
>> different requirement, etc. This means that we need to make sure that regions
>> are within the hotpluggable range to avoid a failure later on when a region is
>> pre-validated by the memory hotplug path.
>>
>> The following code limits the resource range based on that:
>>
>> + /* Check whether the resource range is within the hotpluggable range */
>> + mhp_range = mhp_get_pluggable_range(true);
>> + if (min_gpaddr < mhp_range.start)
>> + min_gpaddr = mhp_range.start;
>> + if (max_gpaddr > mhp_range.end)
>> + max_gpaddr = mhp_range.end;
>> +
>> + res->start = min_gpaddr;
>> + res->end = max_gpaddr;
>>
>> In current loop (when calculating and inserting holes) we also need to make
>> sure that resulting hole range is within the resource range (and adjust/skip
>> it if not true) as regs[] used for the calculations contains raw regions as
>> they described in DT so not updated. Otherwise insert_resource() down the
>> function will return an error for the conflicting operations. Yes, I could
>> took a different route and update regs[] in advance to adjust/skip
>> non-suitable regions in front, but I decided to do it on the fly in the loop
>> here, I thought doing it in advance would add some overhead/complexity. What
>> do you think?
> I understand now.
>
>
>> So I am afraid this check is necessary here.
>>
>> For example in my environment the extended regions are:
>>
>> (XEN) Extended region 0: 0->0x8000000
>> (XEN) Extended region 1: 0xc000000->0x30000000
>> (XEN) Extended region 2: 0x40000000->0x47e00000
>> (XEN) Extended region 3: 0xd0000000->0xe6000000
>> (XEN) Extended region 4: 0xe7800000->0xec000000
>> (XEN) Extended region 5: 0xf1200000->0xfd000000
>> (XEN) Extended region 6: 0x100000000->0x500000000
>> (XEN) Extended region 7: 0x580000000->0x600000000
>> (XEN) Extended region 8: 0x680000000->0x700000000
>> (XEN) Extended region 9: 0x780000000->0x10000000000
>>
>> *With* the check the holes are:
>>
>> holes [47e00000 - cfffffff]
>> holes [e6000000 - e77fffff]
>> holes [ec000000 - f11fffff]
>> holes [fd000000 - ffffffff]
>> holes [500000000 - 57fffffff]
>> holes [600000000 - 67fffffff]
>> holes [700000000 - 77fffffff]
>>
>> And they seem to look correct, you can see that two possible holes between
>> extended regions 0-1 (8000000-bffffff) and 1-2 (30000000-3fffffff) were
>> skipped as they entirely located below res->start
>> which is 0x40000000 in my case (48-bit VA: 0x40000000 - 0x80003fffffff).
>>
>> *Without* the check these two holes won't be skipped and as the result
>> insert_resource() will fail.
>>
>>
>> **********
>>
>>
>> I have one idea how we can simplify filter logic, we can drop all checks here
>> (including confusing one) in Arm code and update common code a bit:
>>
>> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
>> index 1a1e0d3..ed5b855 100644
>> --- a/arch/arm/xen/enlighten.c
>> +++ b/arch/arm/xen/enlighten.c
>> @@ -311,7 +311,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>> struct resource *regs, *tmp_res;
>> uint64_t min_gpaddr = -1, max_gpaddr = 0;
>> unsigned int i, nr_reg = 0;
>> - struct range mhp_range;
>> int rc;
>>
>> if (!xen_domain())
>> @@ -349,13 +348,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>> min_gpaddr = regs[i].start;
>> }
>>
>> - /* Check whether the resource range is within the hotpluggable range
>> */
>> - mhp_range = mhp_get_pluggable_range(true);
>> - if (min_gpaddr < mhp_range.start)
>> - min_gpaddr = mhp_range.start;
>> - if (max_gpaddr > mhp_range.end)
>> - max_gpaddr = mhp_range.end;
>> -
>> res->start = min_gpaddr;
>> res->end = max_gpaddr;
>>
>> @@ -378,17 +370,6 @@ int arch_xen_unpopulated_init(struct resource *res)
>> if (start == (end + 1))
>> continue;
>>
>> - /* Check whether the hole range is within the resource range
>> */
>> - if (start < res->start || end > res->end) {
>> - if (start < res->start)
>> - start = res->start;
>> - if (end > res->end)
>> - end = res->end;
>> -
>> - if (start >= (end + 1))
>> - continue;
>> - }
>> -
>> tmp_res = kzalloc(sizeof(*tmp_res), GFP_KERNEL);
>> if (!tmp_res) {
>> rc = -ENOMEM;
>> diff --git a/drivers/xen/unpopulated-alloc.c b/drivers/xen/unpopulated-alloc.c
>> index 1f1d8d8..a5d3ebb 100644
>> --- a/drivers/xen/unpopulated-alloc.c
>> +++ b/drivers/xen/unpopulated-alloc.c
>> @@ -39,6 +39,7 @@ static int fill_list(unsigned int nr_pages)
>> void *vaddr;
>> unsigned int i, alloc_pages = round_up(nr_pages, PAGES_PER_SECTION);
>> int ret;
>> + struct range mhp_range;
>>
>> res = kzalloc(sizeof(*res), GFP_KERNEL);
>> if (!res)
>> @@ -47,8 +48,10 @@ static int fill_list(unsigned int nr_pages)
>> res->name = "Xen scratch";
>> res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>>
>> + mhp_range = mhp_get_pluggable_range(true);
>> +
>> ret = allocate_resource(target_resource, res,
>> - alloc_pages * PAGE_SIZE, 0, -1,
>> + alloc_pages * PAGE_SIZE, mhp_range.start,
>> mhp_range.end,
>> PAGES_PER_SECTION * PAGE_SIZE, NULL, NULL);
>> if (ret < 0) {
>> pr_err("Cannot allocate new IOMEM resource\n");
>> (END)
>>
>> I believe, this will work on x86 as arch_get_mappable_range() is not
>> implemented there,
>> and the default option contains exactly what being used currently (0, -1).
>>
>> struct range __weak arch_get_mappable_range(void)
>> {
>> struct range mhp_range = {
>> .start = 0UL,
>> .end = -1ULL,
>> };
>> return mhp_range;
>> }
>>
>> And this is going to be more generic and clear, what do you think?
> Yeah this is much better, good thinking!
Great, thank you. Will do in next version.
--
Regards,
Oleksandr Tyshchenko
Powered by blists - more mailing lists