lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YXvFfRKuD574hulr@mail-itl>
Date:   Fri, 29 Oct 2021 11:57:17 +0200
From:   Marek Marczykowski-Górecki 
        <marmarek@...isiblethingslab.com>
To:     Juergen Gross <jgross@...e.com>
Cc:     xen-devel@...ts.xenproject.org, linux-kernel@...r.kernel.org,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        stable@...r.kernel.org
Subject: Re: [PATCH] xen/balloon: add late_initcall_sync() for initial
 ballooning done

On Fri, Oct 29, 2021 at 06:48:44AM +0200, Juergen Gross wrote:
> On 28.10.21 22:16, Marek Marczykowski-Górecki wrote:
> > On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote:
> > > When running as PVH or HVM guest with actual memory < max memory the
> > > hypervisor is using "populate on demand" in order to allow the guest
> > > to balloon down from its maximum memory size. For this to work
> > > correctly the guest must not touch more memory pages than its target
> > > memory size as otherwise the PoD cache will be exhausted and the guest
> > > is crashed as a result of that.
> > > 
> > > In extreme cases ballooning down might not be finished today before
> > > the init process is started, which can consume lots of memory.
> > > 
> > > In order to avoid random boot crashes in such cases, add a late init
> > > call to wait for ballooning down having finished for PVH/HVM guests.
> > > 
> > > Cc: <stable@...r.kernel.org>
> > > Reported-by: Marek Marczykowski-Górecki <marmarek@...isiblethingslab.com>
> > > Signed-off-by: Juergen Gross <jgross@...e.com>
> > 
> > It may happen that initial balloon down fails (state==BP_ECANCELED). In
> > that case, it waits indefinitely. I think it should rather report a
> > failure (and panic? it's similar to OOM before PID 1 starts, so rather
> > hard to recover), instead of hanging.
> 
> Okay, I can add something like that. I'm thinking of issuing a failure
> message in case of credit not having changed for 1 minute and panic()
> after two more minutes. Is this fine?

Isn't it better to get a state from balloon_thread()? If the balloon
fails it won't really try anymore (until 3600s timeout), so waiting in
that state doesn't help. And reporting the failure earlier may be more
user friendly. Or maybe there is something that could wakeup the thread
earlier, that I don't see? Hot plugging more RAM is rather unlikely at
this stage...
See my patch at [1], although rather hacky (and likely - racy).

[1] https://lore.kernel.org/xen-devel/YXFxKC4shCATB913@mail-itl/

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ