linux-kernel - Re: [PATCH v3] libnvdimm, region: sysfs trigger for nvdimm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4hthct-P=W9QLbjswvco8Fcewpf13ZX6-h2PXpHwRtAKg@mail.gmail.com>
Date:   Thu, 27 Apr 2017 13:02:41 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Jeff Moyer <jmoyer@...hat.com>
Cc:     "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
        Masayoshi Mizuma <m.mizuma@...fujitsu.com>,
        Linux ACPI <linux-acpi@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3] libnvdimm, region: sysfs trigger for nvdimm_flush()

On Thu, Apr 27, 2017 at 12:40 PM, Jeff Moyer <jmoyer@...hat.com> wrote:
> Dan Williams <dan.j.williams@...el.com> writes:
>
>> On Thu, Apr 27, 2017 at 11:41 AM, Jeff Moyer <jmoyer@...hat.com> wrote:
>>> Dan Williams <dan.j.williams@...el.com> writes:
>>>
>>>>> The sentiment is that programs shouldn't have to grovel around in sysfs
>>>>> to do stuff related to an open file descriptor or mapping.  I don't take
>>>>> issue with the name.  I do worry that something like 'wpq_drain' may be
>>>>> too platform specific, though.  The NVM Programming Model specification
>>>>> is going to call this "deep flush", so maybe that will give you
>>>>> some inspiration if you do want to change the name.
>>>>
>>>> I'll change to "deep_flush", and I quibble that this is related to a
>>>> single open file descriptor or mapping. It really is a "region flush"
>>>> for giving extra protection for global metadata, but the persistence
>>>> of individual fds or mappings is handled by ADR. I think an ioctl
>>>> might give the false impression that every time you flush a cacheline
>>>> to persistence you need to call the ioctl.
>>>
>>> fsync, for example, may affect more than one fd--all data in the drive
>>> write cache will be flushed.  I don't see how this is so different.  I
>>> think a sysfs file is awkward because it requires an application to
>>> chase down the correct file in the sysfs hierarchy.  If the application
>>> already has an open fd or a mapping, it should be able to operate on
>>> that.
>>
>> I'm teetering, but still leaning towards sysfs. The use case that
>> needs this is device-dax because we otherwise silently do this behind
>> the application's back on filesystem-dax for fsync / msync.
>
> We may yet get file system support for flush from userspace (NOVA, for
> example).  So I don't think we should restrict ourselves to only
> thinking about the device dax use case.
>
>> A device-dax ioctl would be straightforward, but 'deep flush' assumes
>> that the device-dax instance is fronting persistent memory.  There's
>> nothing persistent memory specific about device-dax except that today
>> only the nvdimm sub-system knows how to create them, but there's
>> nothing that prevents other memory regions from being mapped this way.
>
> You're concerned that applications operating on device dax instances
> that are not backed by pmem will try to issue a deep flush?  Why would
> they do that, and why can't you just return failure from the ioctl?
>
>> So I'd rather this persistent memory specific mechanism stay with the
>> persistent memory specific portion of the interface rather than plumb
>> persistent memory details out through the generic device-dax interface
>> since we have no other intercept point like we do in the
>> filesystem-dax case to hide this flush.
>
> Look at the block layer.  You can issue an ioctl on a block device, and
> if the generic block layer can handle it, it does.  If not, it gets
> passed down to lower layers until either it gets handled, or it bubbles
> back up because nobody knew what to do with it.  I think you can do the
> same thing here, and that solves your layering violation.

So this is where I started. I was going to follow the block layer.
Except recently the block layer has been leaning away from ioctls and
implementing support for syscalls directly. The same approach for
device-dax fallocate() support got NAKd, so I opted for sysfs out of
the gate.

However, since there really is no analog for "deep flush" in the
syscall namespace lets (*gag*) implement an ioctl for this.