lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 12 Dec 2019 08:18:46 +0100
From:   Jürgen Groß <jgross@...e.com>
To:     Nicholas Tsirakis <niko.tsirakis@...il.com>,
        boris.ostrovsky@...cle.com
Cc:     xen-devel <xen-devel@...ts.xenproject.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [BUG] Xen-ballooned memory never returned to domain after
 partial-free

On 11.12.19 23:08, Nicholas Tsirakis wrote:
> Hello,
> 
> The issue I'm seeing is that pages of previously-xenballooned memory are getting
> trapped in the balloon on free, specifically when they are free'd in batches
> (i.e. not all at once). The first batch is restored to the domain properly, but
> subsequent frees are not.
> 
> Truthfully I'm not sure if this is a bug or not, but the behavior I'm seeing
> doesn't seem to make sense. Note that this "bug" is in the balloon driver, but
> the behavior is seen when using the gnttab API, which utilizes the balloon in
> the background.
> 
> ------------------------------------------------------------------------------
> 
> This issue is better illustrated as an example, seen below. Note that the file
> in question is drivers/xen/balloon.c:
> 
> Kernel version: 4.19.*, code seems consistent on master as well
> Relevant configs:
>      - CONFIG_MEMORY_HOTPLUG not set
>      - CONFIG_XEN_BALLOON_MEMORY_HOTPLUG not set
> 
> * current_pages = # of pages assigned to domain
> * target_pages = # of pages we want assigned to domain
> * credit = target - current
> 
> Start with current_pages/target_pages = 20 pages
> 
> 1. alloc 5 pages with gnttab_alloc_pages(). current_pages = 15, credit = 5.
> 2. alloc 3 pages with gnttab_alloc_pages(). current_pages = 12, credit = 8.
> 3. some time later, free the last 3 pages with gnttab_free_pages().
> 4. 3 pages go back to balloon and balloon worker is scheduled since credit > 0.
>      * Relevant part of balloon worker shown below:
> 
>      do {
>          ...
> 
>          credit = current_credit();
> 
>          if (credit > 0) {
>              if (balloon_is_inflated())
>                  state = increase_reservation(credit);
>              else
>                  state = reserve_additional_memory();
>          }
> 
>          ...
> 
>      } while (credit && state == BP_DONE);
> 
> 5. credit > 0 and the balloon contains 3 pages, so run increase_reservation. 3
>     pages are restored to domain, correctly. current_pages = 15, credit = 5.
> 6. at this point credit is still > 0, so we loop again.
> 7. this time, the balloon has 0 pages, so we call reserve_additional_memory,
>     seen below. note that CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is disabled, so this
>     funciton is very sparse.
> 
>      static enum bp_state reserve_additional_memory(void)
>      {
>          balloon_stats.target_pages = balloon_stats.current_pages;
>          return BP_ECANCELED;
>      }
> 
> 8. now target = current = 15, which drops our credit down to 0.

And I think this is the problem. We want here:

     balloon_stats.target_pages = balloon_stats.current_pages +
                                  balloon_stats.target_unpopulated;

This should fix it. Thanks for the detailed analysis!

Does the attached patch work for you?

And are you fine with the "Reported-by:" added?


Juergen

View attachment "0001-xen-balloon-fix-ballooned-page-accounting-without-ho.patch" of type "text/x-patch" (1288 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ