[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200216044551-mutt-send-email-mst@kernel.org>
Date: Sun, 16 Feb 2020 04:46:12 -0500
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Tyler Sanderson <tysand@...gle.com>
Cc: David Hildenbrand <david@...hat.com>,
Michal Hocko <mhocko@...nel.org>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, virtualization@...ts.linux-foundation.org,
Wei Wang <wei.w.wang@...el.com>,
Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Nadav Amit <namit@...are.com>
Subject: Re: [PATCH v1 3/3] virtio-balloon: Switch back to OOM handler for
VIRTIO_BALLOON_F_DEFLATE_ON_OOM
On Fri, Feb 14, 2020 at 12:48:42PM -0800, Tyler Sanderson wrote:
> Regarding Wei's patch that modifies the shrinker implementation, versus this
> patch which reverts to OOM notifier:
> I am in favor of both patches. But I do want to make sure a fix gets back
> ported to 4.19 where the performance regression was first introduced.
> My concern with reverting to the OOM notifier is, as mst@ put it (in the other
> thread):
> "when linux hits OOM all kind of error paths are being hit, latent bugs start
> triggering, latency goes up drastically."
> The guest could be in a lot of pain before the OOM notifier is invoked, and it
> seems like the shrinker API might allow more fine grained control of when we
> deflate.
>
> On the other hand, I'm not totally convinced that Wei's patch is an expected
> use of the shrinker/page-cache APIs, and maybe it is fragile. Needs more
> testing and scrutiny.
>
> It seems to me like the shrinker API is the right API in the long run, perhaps
> with some fixes and modifications. But maybe reverting to OOM notifier is the
> best patch to back port?
In that case can I see some Tested-by reports pls?
> On Fri, Feb 14, 2020 at 6:19 AM David Hildenbrand <david@...hat.com> wrote:
>
> >> There was a report that this results in undesired side effects when
> >> inflating the balloon to shrink the page cache. [1]
> >> "When inflating the balloon against page cache (i.e. no free memory
> >> remains) vmscan.c will both shrink page cache, but also invoke the
> >> shrinkers -- including the balloon's shrinker. So the balloon
> >> driver allocates memory which requires reclaim, vmscan gets this
> >> memory by shrinking the balloon, and then the driver adds the
> >> memory back to the balloon. Basically a busy no-op."
> >>
> >> The name "deflate on OOM" makes it pretty clear when deflation should
> >> happen - after other approaches to reclaim memory failed, not while
> >> reclaiming. This allows to minimize the footprint of a guest - memory
> >> will only be taken out of the balloon when really needed.
> >>
> >> Especially, a drop_slab() will result in the whole balloon getting
> >> deflated - undesired.
> >
> > Could you explain why some more? drop_caches shouldn't be really used in
> > any production workloads and if somebody really wants all the cache to
> > be dropped then why is balloon any different?
> >
>
> Deflation should happen when the guest is out of memory, not when
> somebody thinks it's time to reclaim some memory. That's what the
> feature promised from the beginning: Only give the guest more memory in
> case it *really* needs more memory.
>
> Deflate on oom, not deflate on reclaim/memory pressure. (that's what the
> report was all about)
>
> A priority for shrinkers might be a step into the right direction.
>
> --
> Thanks,
>
> David / dhildenb
>
>
Powered by blists - more mailing lists