[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d1f80c05986b439cbeef12bcd595b264@BLUPR03MB050.namprd03.prod.outlook.com>
Date: Wed, 24 Jul 2013 19:45:12 +0000
From: KY Srinivasan <kys@...rosoft.com>
To: Dave Hansen <dave@...1.net>
CC: Dave Hansen <dave.hansen@...el.com>, Michal Hocko <mhocko@...e.cz>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
"olaf@...fle.de" <olaf@...fle.de>,
"apw@...onical.com" <apw@...onical.com>,
"andi@...stfloor.org" <andi@...stfloor.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"kamezawa.hiroyuki@...il.com" <kamezawa.hiroyuki@...il.com>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"yinghan@...gle.com" <yinghan@...gle.com>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"kay@...y.org" <kay@...y.org>
Subject: RE: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining
memory blocks
> -----Original Message-----
> From: Dave Hansen [mailto:dave@...1.net]
> Sent: Wednesday, July 24, 2013 12:43 PM
> To: KY Srinivasan
> Cc: Dave Hansen; Michal Hocko; gregkh@...uxfoundation.org; linux-
> kernel@...r.kernel.org; devel@...uxdriverproject.org; olaf@...fle.de;
> apw@...onical.com; andi@...stfloor.org; akpm@...ux-foundation.org; linux-
> mm@...ck.org; kamezawa.hiroyuki@...il.com; hannes@...xchg.org;
> yinghan@...gle.com; jasowang@...hat.com; kay@...y.org
> Subject: Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining
> memory blocks
>
> On 07/23/2013 10:21 AM, KY Srinivasan wrote:
> >> You have allocated some large, physically contiguous areas of memory
> >> under heavy pressure. But you also contend that there is too much
> >> memory pressure to run a small userspace helper. Under heavy memory
> >> pressure, I'd expect large, kernel allocations to fail much more often
> >> than running a small userspace helper.
> >
> > I am only reporting what I am seeing. Broadly, I have two main failure
> conditions to
> > deal with: (a) resource related failure (add_memory() returning -ENOMEM)
> and (b) not being
> > able to online a segment that has been successfully hot-added. I have seen
> both these failures
> > under high memory pressure. By supporting "in context" onlining, we can
> eliminate one failure
> > case. Our inability to online is not a recoverable failure from the host's point of
> view - the memory
> > is committed to the guest (since hot add succeeded) but is not usable since it is
> not onlined.
>
> Could you please precisely report on what you are seeing in detail?
> Where are the -ENOMEMs coming from? Which allocation site? Are you
> seeing OOMs or page allocation failure messages on the console?
The ENOMEM failure I see from the call to hot add memory - the call to
add_memory(). Usually I don't see any OOM messages on the console.
>
> The operation was split up in to two parts for good reason. It's
> actually for your _precise_ use case.
I agree and without this split, I could not implement the balloon driver with
hot-add.
>
> A system under memory pressure is going to have troubles doing a
> hot-add. You need memory to add memory. Of the two operations ("add"
> and "online"), "add" is the one vastly more likely to fail. It has to
> allocate several large swaths of contiguous physical memory. For that
> reason, the system was designed so that you could "add" and "online"
> separately. The intention was that you could "add" far in advance and
> then "online" under memory pressure, with the "online" having *VASTLY*
> smaller memory requirements and being much more likely to succeed.
>
> You're lumping the "allocate several large swaths of contiguous physical
> memory" failures in to the same class as "run a small userspace helper".
> They are _really_ different problems. Both prone to allocation
> failures for sure, but _very_ separate problems. Please don't conflate
> them.
I don't think I am conflating these two issues; I am sorry if I gave that
impression. All I am saying is that I see two classes of failures: (a) Our
inability to allocate memory to manage the memory that is being hot added
and (b) Our inability to bring the hot added memory online within a reasonable
amount of time. I am not sure the cause for (b) and I was just speculating that
this could be memory related. What is interesting is that I have seen failure related
to our inability to online the memory after having succeeded in hot adding the
memory.
>
> >> It _sounds_ like you really want to be able to have the host retry the
> >> operation if it fails, and you return success/failure from inside the
> >> kernel. It's hard for you to tell if running the userspace helper
> >> failed, so your solution is to move what what previously done in
> >> userspace in to the kernel so that you can more easily tell if it failed
> >> or succeeded.
> >>
> >> Is that right?
> >
> > No; I am able to get the proper error code for recoverable failures (hot add
> failures
> > because of lack of memory). By doing what I am proposing here, we can avoid
> one class
> > of failures completely and I think this is what resulted in a better "hot add"
> experience in the
> > guest.
>
> I think you're taking a huge leap here: "We could not online memory,
> thus we must take userspace out of the loop."
>
> You might be right. There might be only one way out of this situation.
> But you need to provide a little more supporting evidence before we all
> arrive at the same conclusion.
I am not even suggesting that. All I am saying is that there should be a mechanism
for "in context" onlining of memory in addition to the existing sysfs mechanism
for bringing memory online from a kernel context. Hyper-V balloon driver
can certainly use this functionality. I should be sending out the patches for this
shortly.
>
> BTW, it doesn't _require_ udev. There could easily be another listener
> for hotplug events.
Agreed; but structurally it is identical to having a udev rule.
Regards,
K. Y
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists