linux-kernel - Re: [PATCH] xen: speed up grant-table reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <79d2fe19-f2fa-7df6-3833-81bd0b171fa1@suse.com>
Date:   Mon, 12 Jun 2023 08:45:22 +0200
From:   Juergen Gross <jgross@...e.com>
To:     Demi Marie Obenour <demi@...isiblethingslab.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
        Jan Beulich <JBeulich@...e.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
Cc:     Xen developer discussion <xen-devel@...ts.xenproject.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        stable@...r.kernel.org
Subject: Re: [PATCH] xen: speed up grant-table reclaim

On 12.06.23 08:27, Juergen Gross wrote:
> On 10.06.23 17:32, Demi Marie Obenour wrote:
>> When a grant entry is still in use by the remote domain, Linux must put
>> it on a deferred list.
> 
> This lacks quite some context.
> 
> The main problem is related to the grant not having been unmapped after
> the end of a request, but the side granting the access is assuming this
> should be the case.
> 
> In general this means that the two sides implementing the protocol don't
> agree how it should work, or that the protocol itself has a flaw.
> 
>  > Normally, this list is very short, because
>> the PV network and block protocols expect the backend to unmap the grant
>> first.
> 
> Normally the list is just empty. Only in very rare cases like premature
> PV frontend module unloading it is expected to see cases of deferred
> grant reclaims.
> 
>> However, Qubes OS's GUI protocol is subject to the constraints
>> of the X Window System, and as such winds up with the frontend unmapping
>> the window first.  As a result, the list can grow very large, resulting
>> in a massive memory leak and eventual VM freeze.
> 
> I do understand that it is difficult to change the protocol and/or
> behavior after the fact, or that performance considerations are in the
> way of doing so.
> 
>> To partially solve this problem, make the number of entries that the VM
>> will attempt to free at each iteration tunable.  The default is still
>> 10, but it can be overridden at compile-time (via Kconfig), boot-time
>> (via a kernel command-line option), or runtime (via sysfs).
> 
> Is there really a need to have another Kconfig option for this? AFAICS
> only QubesOS is affected by the problem you are trying to solve. I don't
> see why you can't use the command-line option or sysfs node to set the
> higher reclaim batch size.
> 
>>
>> Fixes: 569ca5b3f94c ("xen/gnttab: add deferred freeing logic")
> 
> I don't think this "Fixes:" tag is appropriate. The mentioned commit didn't
> have a bug. You are adding new functionality on top of it.
> 
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Demi Marie Obenour <demi@...isiblethingslab.com>
>> ---
>>   drivers/xen/Kconfig       | 12 ++++++++++++
>>   drivers/xen/grant-table.c | 40 ++++++++++++++++++++++++++++-----------
>>   2 files changed, 41 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
>> index 
>> d5d7c402b65112b8592ba10bd3fd1732c26b771e..8f96e1359eb102d6420775b66e7805004a4ce9fe 100644
>> --- a/drivers/xen/Kconfig
>> +++ b/drivers/xen/Kconfig
>> @@ -65,6 +65,18 @@ config XEN_MEMORY_HOTPLUG_LIMIT
>>         This value is used to allocate enough space in internal
>>         tables needed for physical memory administration.
>> +config XEN_GRANTS_RECLAIM_PER_ITERATION
>> +    int "Default number of grant entries to reclaim per iteration"
>> +    default 10
>> +    range 10 4294967295
>> +    help
>> +      This sets the default value for the grant_table.free_per_iteration
>> +      kernel command line option, which sets the number of grants that
>> +      Linux will try to reclaim at once.  The default is 10, but
>> +      workloads that make heavy use of gntalloc will likely want to
>> +      increase this.  The current value can be accessed and/or modified
>> +      via /sys/module/grant_table/parameters/free_per_iteration.
>> +
> 
> Apart from the fact that I don't like adding a new Kconfig option, the help
> text is not telling the true story. Heavy use of gntalloc isn't to blame, but
> the fact that your PV-device implementation is based on the reclaim
> functionality. TBH, someone not familiar with the grant reclaim will have no
> chance to select a sensible value based on your help text.
> 
>>   config XEN_SCRUB_PAGES_DEFAULT
>>       bool "Scrub pages before returning them to system by default"
>>       depends on XEN_BALLOON
>> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
>> index 
>> e1ec725c2819d4d5dede063eb00d86a6d52944c0..fa666aa6abc3e786dddc94f895641505ec0b23d8 100644
>> --- a/drivers/xen/grant-table.c
>> +++ b/drivers/xen/grant-table.c
>> @@ -498,14 +498,20 @@ static LIST_HEAD(deferred_list);
>>   static void gnttab_handle_deferred(struct timer_list *);
>>   static DEFINE_TIMER(deferred_timer, gnttab_handle_deferred);
>> +static atomic64_t deferred_count;
>> +static atomic64_t leaked_count;
>> +static unsigned int free_per_iteration = 
>> CONFIG_XEN_GRANTS_RECLAIM_PER_ITERATION;
>> +
>>   static void gnttab_handle_deferred(struct timer_list *unused)
>>   {
>> -    unsigned int nr = 10;
>> +    unsigned int nr = READ_ONCE(free_per_iteration);

Two additional remarks:

I wouldn't be opposed to raise the default number of reclaims to a higher
number, e.g. 100.

Another option would be to move the reclaim handling into a workqueue and
try to reclaim as many grants as possible in one go with a cond_resched()
call every e.g. 100 reclaims. This would remove the need for having a
reclaim upper bound.


Juergen

Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3099 bytes)

Download attachment "OpenPGP_signature" of type "application/pgp-signature" (496 bytes)