[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bDJ3hrWoE91L2wpAk+Yu0_=GtYw=4gLDDD7mxs321b_aA@mail.gmail.com>
Date: Fri, 29 Jan 2021 11:24:21 -0500
From: Pavel Tatashin <pasha.tatashin@...een.com>
To: David Hildenbrand <david@...hat.com>
Cc: Anshuman Khandual <anshuman.khandual@....com>,
linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Sasha Levin <sashal@...nel.org>,
Tyler Hicks <tyhicks@...ux.microsoft.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dan Williams <dan.j.williams@...el.com>,
Michal Hocko <mhocko@...e.com>,
Oscar Salvador <osalvador@...e.de>,
Vlastimil Babka <vbabka@...e.cz>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Jason Gunthorpe <jgg@...pe.ca>, Marc Zyngier <maz@...nel.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Will Deacon <will.deacon@....com>,
James Morse <james.morse@....com>,
James Morris <jmorris@...ei.org>
Subject: Re: dax alignment problem on arm64 (and other achitectures)
On Fri, Jan 29, 2021 at 8:19 AM David Hildenbrand <david@...hat.com> wrote:
>
> On 29.01.21 03:06, Pavel Tatashin wrote:
> >>> Might be related to the broken custom pfn_valid() implementation for
> >>> ZONE_DEVICE.
> >>>
> >>> https://lkml.kernel.org/r/1608621144-4001-1-git-send-email-anshuman.khandual@arm.com
> >>>
> >>> And essentially ignoring sub-section data in there for now as well (but
> >>> might not be that relevant yet). In addition, this might also be related to
> >>>
> >>> https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com
> >>
> >> I will check it, and see what I find. I saw that panic almost a year
> >> ago, things might have changed since then.
> >
> > Hi David,
> >
> > There is no panic anymore, but I also can't offset by 2M anymore, the
> > minimum that works now is 16M, and if alignment is less than 16M
> > creating devdax device fails.
>
> I wonder why we get such different namespace sizes? Where do the
> differences come from? This looks very weird.
>
> >
> > So, I tried the new ARM64 patch that reduces section sizes, and two
> > alignments for pmem: regular 2G alignment, and 2G+16M alignment.
> > (subtracted 16M from the bottom)
> >
> > ***** 4K page, 6G RAM, 2G PRAM *****
> > BOOT:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1c21fffff : namespace0.0
> > 1c2200000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1c21fffff : namespace0.0
> > 1c8000000-23fffffff : dax0.0
> > 1c8000000-23fffffff : System RAM (kmem) 128M Wasted (Expected)
>
> The namespace spans 34MB??
>
> >
> > ***** 4K page, 6G-16M RAM, 2G+16M PRAM *****
> > BOOT:
> > 40000000-1beffffff : System RAM
> > 1bf000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1c11fffff : namespace0.0
> > 1c1200000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1c11fffff : namespace0.0
> > 1c8000000-23fffffff : dax0.0
> > 1c8000000-23fffffff : System RAM (kmem) 144M Wasted (????)
>
> The namespace spans 34MB??
Right, this seems like a bug
>
> >
> > ***** 64K page, 6G RAM, 2G PRAM *****
> > BOOT:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1dfffffff : namespace0.0
> > 1e0000000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1dfffffff : namespace0.0
>
> The namespace spans 512MB ?!? What?
This is because section size is 512M with 64K pages.
>
> > 1e0000000-23fffffff : dax0.0
> > 1e0000000-23fffffff : System RAM (kmem) 512M Wasted (Expected)
> >
> > ***** 64K page, 6G-16M RAM, 2G+16M PRAM *****
> > BOOT:
> > 40000000-1beffffff : System RAM
> > 1bf000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1bf3fffff : namespace0.0
> > 1bf400000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1bf3fffff : namespace0.0
>
> The namespace now consumes 4MB ?!?
>
> > 1c0000000-23fffffff : dax0.0
> > 1c0000000-23fffffff : System RAM (kmem) 16M Wasted (Optimal)
>
> Good :) I guess more optimal would be 2MB/0MB :)
Agree, but for the offset 16M this is optimal, because 16M is smaller
than section size.
>
> >
> > In all three cases only System RAM, namespace0.0, and dax0.0 were
> > printed from /proc/iomem.
> > BOOT content of iomem right after boot
> > DEVDAX content of iomem after devdax is created
> > ndctl create-namespace --mode devdax -e namespace0.0"
> > HOTPLUG content of imem after dax0.0 is hotplugged:
> > echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> > echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> >
> >
> > The most surprising part is why with 4K pages and 16M offset 144M is
> > wasted? For whatever reason, when devdax is created 34 goes wasted to
> > the label? Something is wrong here.. However, I am happy with 64K
> > pages result, and that only 16M is wasted, of course optimally, we
> > should be using any memory here, but it is still much better than what
> > we have now.
>
> Definitely, but we should try figuring out what's going on here. I
> assume on x86-64 it behaves differently?
Yes, we should root cause. I highly suspect that there is somewhere
alignment miscalculations happen that cause this memory waste with the
offset 16M. I am also not sure why the 2M label size was increased,
and why 16M is now an alignment requirement.
I tested on x86, and got pretty much the same results as on ARM64: 2M
offset is not allowed anymore 16M minimum, and even with 16M offset,
144M is wasted. Here is full QEMU command if anyone wants to repro it:
KERNEL_PARAM='console=ttyS0 ip=dhcp'
KERNEL_PARAM+=' memmap=2G!8G'
#KERNEL_PARAM+=' memmap=2064M!8176M'
qemu-system-x86_64
\
-m 8G -smp 1
\
-machine q35
\
-nographic
\
-enable-kvm
\
-kernel pmem/native/arch/x86/boot/bzImage
\
-initrd
../poky/build/tmp/deploy/images/qemux86-64/core-image-minimal-qemux86-64.cpio.gz
\
-chardev stdio,id=console,signal=off,mux=on
\
-mon chardev=console
\
-serial chardev:console
\
-netdev user,hostfwd=tcp::5000-:22,id=netdev0
\
-device virtio-net-pci,netdev=netdev0
\
-append "$KERNEL_PARAM"
Also, I am using current master branch tip for ndctl command:
root@...ux86-64:~# ndctl --version
71.2.gea014c0
***** 4K page, 6G RAM, 2G PRAM: kernel parameter memmap=2G!8G *****
BOOT:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
200000000-27fffffff : namespace0.0
DEVDAX:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
200000000-2021fffff : namespace0.0
202200000-27fffffff : dax0.0
HOTPLUG:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
200000000-2021fffff : namespace0.0
208000000-27fffffff : dax0.0
208000000-27fffffff : System RAM (kmem) (128M Wasted)
***** 4K page, 6G-16M RAM, 2G+16M PRAM: kernel parameter
memmap=2064M!8176M *****
BOOT:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
1ff000000-27fffffff : namespace0.0
DEVDAX:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
1ff000000-2011fffff : namespace0.0
201200000-27fffffff : dax0.0
HOTPLUG:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
1ff000000-2011fffff : namespace0.0
208000000-27fffffff : dax0.0
208000000-27fffffff : System RAM (kmem) (144M Wasted)
The least amount of wasted memory I can get on x86 with this
experiment is with offset that is larger than 34M, and 16M aligned:
48M: memmap=2096M!8144M
root@...ux86-64:~# cat /proc/iomem | grep 'dax\|namespace\|System\|Pers'
100000000-1fcffffff : System RAM
1fd000000-27fffffff : Persistent Memory (legacy)
1fd000000-1ff1fffff : namespace0.0
200000000-27fffffff : dax0.0
200000000-27fffffff : System RAM (kmem) (48M Wasted)
Pasha
>
> Thanks
>
>
> --
> Thanks,
>
> David / dhildenb
>
Powered by blists - more mailing lists