linux-kernel - RE: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 24 Jul 2013 19:45:12 +0000
From:	KY Srinivasan <kys@...rosoft.com>
To:	Dave Hansen <dave@...1.net>
CC:	Dave Hansen <dave.hansen@...el.com>, Michal Hocko <mhocko@...e.cz>,
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"olaf@...fle.de" <olaf@...fle.de>,
	"apw@...onical.com" <apw@...onical.com>,
	"andi@...stfloor.org" <andi@...stfloor.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"kamezawa.hiroyuki@...il.com" <kamezawa.hiroyuki@...il.com>,
	"hannes@...xchg.org" <hannes@...xchg.org>,
	"yinghan@...gle.com" <yinghan@...gle.com>,
	"jasowang@...hat.com" <jasowang@...hat.com>,
	"kay@...y.org" <kay@...y.org>
Subject: RE: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining
 memory blocks



> -----Original Message-----
> From: Dave Hansen [mailto:dave@...1.net]
> Sent: Wednesday, July 24, 2013 12:43 PM
> To: KY Srinivasan
> Cc: Dave Hansen; Michal Hocko; gregkh@...uxfoundation.org; linux-
> kernel@...r.kernel.org; devel@...uxdriverproject.org; olaf@...fle.de;
> apw@...onical.com; andi@...stfloor.org; akpm@...ux-foundation.org; linux-
> mm@...ck.org; kamezawa.hiroyuki@...il.com; hannes@...xchg.org;
> yinghan@...gle.com; jasowang@...hat.com; kay@...y.org
> Subject: Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining
> memory blocks
> 
> On 07/23/2013 10:21 AM, KY Srinivasan wrote:
> >> You have allocated some large, physically contiguous areas of memory
> >> under heavy pressure.  But you also contend that there is too much
> >> memory pressure to run a small userspace helper.  Under heavy memory
> >> pressure, I'd expect large, kernel allocations to fail much more often
> >> than running a small userspace helper.
> >
> > I am only reporting what I am seeing. Broadly, I have two main failure
> conditions to
> > deal with: (a) resource related failure (add_memory() returning -ENOMEM)
> and (b) not being
> > able to online a segment that has been successfully hot-added. I have seen
> both these failures
> > under high memory pressure. By supporting "in context" onlining, we can
> eliminate one failure
> > case. Our inability to online is not a recoverable failure from the host's point of
> view - the memory
> > is committed to the guest (since hot add succeeded) but is not usable since it is
> not onlined.
> 
> Could you please precisely report on what you are seeing in detail?
> Where are the -ENOMEMs coming from?  Which allocation site?  Are you
> seeing OOMs or page allocation failure messages on the console?

The ENOMEM failure I see from the call to hot add memory - the call to
add_memory(). Usually I don't see any OOM messages on the console.

> 
> The operation was split up in to two parts for good reason.  It's
> actually for your _precise_ use case.

I agree and without this split, I could not implement the balloon driver with
hot-add.

> 
> A system under memory pressure is going to have troubles doing a
> hot-add.  You need memory to add memory.  Of the two operations ("add"
> and "online"), "add" is the one vastly more likely to fail.  It has to
> allocate several large swaths of contiguous physical memory.  For that
> reason, the system was designed so that you could "add" and "online"
> separately.  The intention was that you could "add" far in advance and
> then "online" under memory pressure, with the "online" having *VASTLY*
> smaller memory requirements and being much more likely to succeed.
> 
> You're lumping the "allocate several large swaths of contiguous physical
> memory" failures in to the same class as "run a small userspace helper".
>  They are _really_ different problems.  Both prone to allocation
> failures for sure, but _very_ separate problems.  Please don't conflate
> them.

I don't think I am conflating these two issues; I am sorry if I gave that 
impression. All I am saying is that I see two classes of failures: (a) Our
inability to allocate memory to manage the memory that is being hot added
and (b) Our inability to bring the hot added memory online within a reasonable
amount of time. I am not sure the cause for (b) and I was just speculating that
this could be memory related. What is interesting is that I have seen failure related
to our inability to online the memory after having succeeded in hot adding the
memory.
 
> 
> >> It _sounds_ like you really want to be able to have the host retry the
> >> operation if it fails, and you return success/failure from inside the
> >> kernel.  It's hard for you to tell if running the userspace helper
> >> failed, so your solution is to move what what previously done in
> >> userspace in to the kernel so that you can more easily tell if it failed
> >> or succeeded.
> >>
> >> Is that right?
> >
> > No; I am able to get the proper error code for recoverable failures (hot add
> failures
> > because of lack of memory). By doing what I am proposing here, we can avoid
> one class
> > of failures completely and I think this is what resulted in a better "hot add"
> experience in the
> > guest.
> 
> I think you're taking a huge leap here: "We could not online memory,
> thus we must take userspace out of the loop."
> 
> You might be right.  There might be only one way out of this situation.
>  But you need to provide a little more supporting evidence before we all
> arrive at the same conclusion.

I am not even suggesting that. All I am saying is that there should be a mechanism
for "in context" onlining of memory in addition to the existing sysfs mechanism
for bringing memory online from a kernel context. Hyper-V balloon driver
can certainly use this functionality. I should be sending out the patches for this
shortly.
> 
> BTW, it doesn't _require_ udev.  There could easily be another listener
> for hotplug events.

Agreed; but structurally it is identical to having a udev rule.

Regards,

K. Y



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/