lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 25 Jun 2015 16:11:27 -0600
From:	Toshi Kani <toshi.kani@...com>
To:	Dan Williams <dan.j.williams@...el.com>
Cc:	"axboe@...nel.dk" <axboe@...nel.dk>,
	"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mingo@...nel.org" <mingo@...nel.org>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"hch@....de" <hch@....de>
Subject: Re: [PATCH v2 15/17] libnvdimm: Set numa_node to NVDIMM devices

On Thu, 2015-06-25 at 15:00 -0700, Dan Williams wrote:
> On Thu, Jun 25, 2015 at 2:51 PM, Toshi Kani <toshi.kani@...com> wrote:
> > On Thu, 2015-06-25 at 14:31 -0700, Dan Williams wrote:
> >> On Thu, Jun 25, 2015 at 11:34 AM, Williams, Dan J
> >> <dan.j.williams@...el.com> wrote:
> >> > On Thu, 2015-06-25 at 11:45 -0600, Toshi Kani wrote:
> >> >> On Thu, 2015-06-25 at 05:37 -0400, Dan Williams wrote:
> >> >> > From: Toshi Kani <toshi.kani@...com>
> >> >> >
> >> >> > ACPI NFIT table has System Physical Address Range Structure entries that
> >> >> > describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
> >> >> > set in the flags.
> >> >> >
> >> >> > Change acpi_nfit_register_region() to map a proximity ID to its node ID,
> >> >> > and set it to a new numa_node field of nd_region_desc, which is then
> >> >> > conveyed to the nd_region device.
> >> >> >
> >> >> > The device core arranges for btt and namespace devices to inherit their
> >> >> > node from their parent region.
> >> >> >
> >> >> > Signed-off-by: Toshi Kani <toshi.kani@...com>
> >> >> > [djbw: move set_dev_node() from region 'probe' to 'create']
> >> >>
> >> >> Sorry, I failed to mention other issue, which led me call set_dev_node()
> >> >> in probe.  nd_async_device_register() calls device_add(), which does:
> >> >>
> >> >>         /* use parent numa_node */
> >> >>         if (parent)
> >> >>                 set_dev_node(dev, dev_to_node(parent));
> >> >>
> >> >> and overwrites numa_node to -1.  Since region's parent is ndbusN, we
> >> >> cannot set numa_node to the parent.  So, I had to set it in probe.
> >> >
> >> > In general, I still don't like leaving it up to ->probe() which is
> >> > within its rights to fail and not set the node.  How about the following
> >> > that moves it to the bus uevent code?  Should get triggered before probe
> >> > so the numa_node is valid before userspace is ever notified about the
> >> > device.
> >> >
> >> > device_add() does:
> >> >
> >> >         kobject_uevent(&dev->kobj, KOBJ_ADD);
> >> >         bus_probe_device(dev);
> >> >
> >> > ...so I think we're good, agree?  I also added a missing init of
> >> > ndr_desc.numa_node in arch/x86/kernel/pmem.c, see below.
> >>
> >> This looks good in a quick manual test.  It's interesting/illustrative
> >> that I inadvertently broke the one bit of the libnvdimm sysfs
> >> interface that did not have unit test coverage.
> >
> > Sorry I had some interrupt.  Yes, this works fine for region &
> > namespace.  I'd like to check with you for btt since the attach logic
> > has changed in v2.
> >
> > Previously, as described in patch 16/17, bttN bound to pmem had a valid
> > numa_node value, and seeding btt0 had -1.
> >
> >   /sys/bus/nd/devices
> >   |-- btt0/numa_node:-1
> >   |-- btt1/numa_node:0
> >
> > In this version, there are unbound (seeding?) btt0-3 for every region
> > (there are 4 regions) and btt4 & 5 bound to pmem0 & 3 on my system.
> >
> > btt0/numa_node:0
> > btt1/numa_node:0
> > btt2/numa_node:1
> > btt3/numa_node:1
> > btt4/numa_node:0
> > btt5/numa_node:1
> >
> > btt0
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
> > btt1
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt1
> > btt2
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
> > btt3
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt3
> > btt4
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
> > btt5
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5
> >
> > And unbound bttNs attach to different regions across a reboot.
> >
> > btt0/numa_node:0
> > btt1/numa_node:1
> > btt2/numa_node:1
> > btt3/numa_node:0
> > btt4/numa_node:0
> > btt5/numa_node:1
> >
> > btt0
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt0
> > btt1
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt1
> > btt2
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region2/btt2
> > btt3
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region1/btt3
> > btt4
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region0/btt4
> > btt5
> > -> ../../../devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/region3/btt5
> >
> > Is this how you'd expect btt to work in this version?  (I have not
> > looked at the btt changes yet)
> 
> Yes, this looks fine.
> 
> As requested by Christoph, in the latest version BTTs are child
> devices of regions rather than busses.  They automatically inherit the
> numa_node of the parent region.  In your dump above the numa_nodes are
> not changing from boot-to-boot, instead the BTTs are registered
> asynchronously so get different ids from boot-to-boot.  Userspace
> should not care what the btt id is and the same naming trick we use to
> give block devices static names would not work for BTTs.  The child
> block device of the BTT will still have the static name as we
> discussed earlier (/dev/pmemXs or /dev/ndblkX.Ys) because the scan
> order of those is deterministic.

Yes, I see no problem with bound BTTs and their device files.  So, how
do we bind BTT with this new version?

Thanks,
-Toshi  



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ