[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b7de7d9d84e9dd47358a254d36f6a24dd48da963.camel@gmail.com>
Date: Mon, 29 Jul 2019 15:42:43 +1000
From: Rashmica Gupta <rashmica.g@...il.com>
To: David Hildenbrand <david@...hat.com>,
Oscar Salvador <osalvador@...e.de>
Cc: akpm@...ux-foundation.org, mhocko@...e.com,
dan.j.williams@...el.com, pasha.tatashin@...een.com,
Jonathan.Cameron@...wei.com, anshuman.khandual@....com,
vbabka@...e.cz, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/5] Allocate memmap from hotadded memory
On Tue, 2019-07-16 at 14:28 +0200, David Hildenbrand wrote:
> On 02.07.19 08:42, Rashmica Gupta wrote:
> > Hi David,
> >
> > Sorry for the late reply.
>
> Hi,
>
> sorry I was on PTO :)
>
> > On Wed, 2019-06-26 at 10:28 +0200, David Hildenbrand wrote:
> > > On 26.06.19 10:15, Oscar Salvador wrote:
> > > > On Wed, Jun 26, 2019 at 10:11:06AM +0200, David Hildenbrand
> > > > wrote:
> > > > > Back then, I already mentioned that we might have some users
> > > > > that
> > > > > remove_memory() they never added in a granularity it wasn't
> > > > > added. My
> > > > > concerns back then were never fully sorted out.
> > > > >
> > > > > arch/powerpc/platforms/powernv/memtrace.c
> > > > >
> > > > > - Will remove memory in memory block size chunks it never
> > > > > added
> > > > > - What if that memory resides on a DIMM added via
> > > > > MHP_MEMMAP_DEVICE?
> > > > >
> > > > > Will it at least bail out? Or simply break?
> > > > >
> > > > > IOW: I am not yet 100% convinced that MHP_MEMMAP_DEVICE is
> > > > > save
> > > > > to be
> > > > > introduced.
> > > >
> > > > Uhm, I will take a closer look and see if I can clear your
> > > > concerns.
> > > > TBH, I did not try to use
> > > > arch/powerpc/platforms/powernv/memtrace.c
> > > > yet.
> > > >
> > > > I will get back to you once I tried it out.
> > > >
> > >
> > > BTW, I consider the code in
> > > arch/powerpc/platforms/powernv/memtrace.c
> > > very ugly and dangerous.
> >
> > Yes it would be nice to clean this up.
> >
> > > We should never allow to manually
> > > offline/online pages / hack into memory block states.
> > >
> > > What I would want to see here is rather:
> > >
> > > 1. User space offlines the blocks to be used
> > > 2. memtrace installs a hotplug notifier and hinders the blocks it
> > > wants
> > > to use from getting onlined.
> > > 3. memory is not added/removed/onlined/offlined in memtrace code.
> > >
> >
> > I remember looking into doing it a similar way. I can't recall the
> > details but my issue was probably 'how does userspace indicate to
> > the kernel that this memory being offlined should be removed'?
>
> Instead of indicating a "size", indicate the offline memory blocks
> that
> the driver should use. E.g. by memory block id for each node
>
> 0:20-24,1:30-32
>
> Of course, other interfaces might make sense.
>
> You can then start using these memory blocks and hinder them from
> getting onlined (as a safety net) via memory notifiers.
>
> That would at least avoid you having to call
> add_memory/remove_memory/offline_pages/device_online/modifying
> memblock
> states manually.
I see what you're saying and that definitely sounds safer.
We would still need to call remove_memory and add_memory from memtrace
as
just offlining memory doesn't remove it from the linear page tables
(if
it's still in the page tables then hardware can prefetch it and if
hardware tracing is using it then the box checkstops).
>
> (binding the memory block devices to a driver would be nicer, but the
> infrastructure is not really there yet - we have no such drivers in
> place yet)
>
> > I don't know the mm code nor how the notifiers work very well so I
> > can't quite see how the above would work. I'm assuming memtrace
> > would
> > register a hotplug notifier and when memory is offlined from
> > userspace,
> > the callback func in memtrace would be called if the priority was
> > high
> > enough? But how do we know that the memory being offlined is
> > intended
> > for usto touch? Is there a way to offline memory from userspace not
> > using sysfs or have I missed something in the sysfs interface?
>
> The notifier would really only be used to hinder onlining as a safety
> net. User space prepares (offlines) the memory blocks and then tells
> the
> drivers which memory blocks to use.
>
> > On a second read, perhaps you are assuming that memtrace is used
> > after
> > adding new memory at runtime? If so, that is not the case. If not,
> > then
> > would you be able to clarify what I'm not seeing?
>
> The main problem I see is that you are calling
> add_memory/remove_memory() on memory your device driver doesn't own.
> It
> could reside on a DIMM if I am not mistaking (or later on
> paravirtualized memory devices like virtio-mem if I ever get to
> implement them ;) ).
This is just for baremetal/powernv so shouldn't affect virtual memory
devices.
>
> How is it guaranteed that the memory you are allocating does not
> reside
> on a DIMM for example added via add_memory() by the ACPI driver?
Good point. We don't have ACPI on powernv but currently this would try
to remove memory from any online memory node, not just the ones that
are backed by RAM. oops.
Powered by blists - more mailing lists