[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <897363324.7325313.1482778965996.JavaMail.zimbra@redhat.com>
Date: Mon, 26 Dec 2016 14:02:46 -0500 (EST)
From: Jerome Glisse <jglisse@...hat.com>
To: Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@...el.com>, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
John Hubbard <jhubbard@...dia.com>,
Dan Williams <dan.j.williams@...el.com>,
Ross Zwisler <ross.zwisler@...ux.intel.com>
Subject: Re: [HMM v14 05/16] mm/ZONE_DEVICE/unaddressable: add support for
un-addressable device memory
> On 12/09/2016 02:07 AM, Jerome Glisse wrote:
> >> On 12/08/2016 08:39 AM, Jerome Glisse wrote:
> >>>> > >> On 12/08/2016 08:39 AM, Jérôme Glisse wrote:
> >>>>>>> > >>> > > Architecture that wish to support un-addressable device
> >>>>>>> > >>> > > memory should
> >>>>>>> > >>> > > make
> >>>>>>> > >>> > > sure to never populate the kernel linar mapping for the
> >>>>>>> > >>> > > physical
> >>>>>>> > >>> > > range.
> >>>>> > >> >
> >>>>> > >> > Does the platform somehow provide a range of physical addresses
> >>>>> > >> > for this
> >>>>> > >> > unaddressable area? How do we know no memory will be hot-added
> >>>>> > >> > in a
> >>>>> > >> > range we're using for unaddressable device memory, for instance?
> >>> > > That's what one of the big issue. No platform does not reserve any
> >>> > > range so
> >>> > > there is a possibility that some memory get hotpluged and assign this
> >>> > > range.
> >>> > >
> >>> > > I pushed the range decision to higher level (ie it is the device
> >>> > > driver
> >>> > > that
> >>> > > pick one) so right now for device driver using HMM (NVidia close
> >>> > > driver as
> >>> > > we don't have nouveau ready for that yet) it goes from the highest
> >>> > > physical
> >>> > > address and scan down until finding an empty range big enough.
> >> >
> >> > I don't think you should be stealing physical address space for things
> >> > that don't and can't have physical addresses. Delegating this to
> >> > individual device drivers and hoping that they all get it right seems
> >> > like a recipe for disaster.
> > Well i expected device driver to use hmm_devmem_add() which does not take
> > physical address but use the above logic to pick one.
> >
> >> >
> >> > Maybe worth adding to the changelog:
> >> >
> >> > This feature potentially breaks memory hotplug unless every
> >> > driver using it magically predicts the future addresses of
> >> > where memory will be hotplugged.
> > I will add debug printk to memory hotplug in case it fails because of some
> > un-addressable resource. If you really dislike memory hotplug being broken
> > then i can go down the way of allowing to hotplug memory above the max
> > physical memory limit. This require more changes but i believe this is
> > doable for some of the memory model (sparsemem and sparsemem extreme).
>
> Did not get that. Hotplug memory request will come within the max physical
> memory limit as they are real RAM. The address range also would have been
> specified. How it can be added beyond the physical limit irrespective of
> which we memory model we use.
>
Maybe what you do not know is that on x86 we do not have resource reserve by the
patform for the device memory (the PCIE bar never cover the whole memory so this
range can not be use).
Right now i pick random unuse physical address range for device memory and thus
real memory might later be hotplug just inside the range i took and hotplug will
fail because i already registered a resource for my device memory. This is an
x86 platform limitation.
Now if i bump the maximum physical memory by one bit than i can hotplug device
memory inside that extra bit range and be sure that i will never have any real
memory conflict (as i am above the architectural limit).
Allowing to bump the maximum physical memory have implication and i can not just
bump MAX_PHYSMEM_BITS as it will have repercusion that i don't want. Now in some
memory model i can allow hotplug to happen above the MAX_PHYSMEM_BITS without
having to change MAX_PHYSMEM_BITS and allowing page_to_pfn() and pfn_to_page()
to work above MAX_PHYSMEM_BITS again without changing it.
Memory model like SPARSEMEM_VMEMMAP are problematic as i would need to change the
kernel virtual memory map for the architecture and it is not something i want to
do.
In the meantime people using HMM are "~happy~" enough with memory hotplug failing.
Cheers,
Jérôme
Powered by blists - more mailing lists