linux-kernel - Re: [PATCHSETS] v14 fsdax-rmap + v11 fsdax-reflink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <32f51223-c671-1dc0-e14a-8887863d9071@fujitsu.com>
Date:   Thu, 12 May 2022 20:27:12 +0800
From:   Shiyang Ruan <ruansy.fnst@...itsu.com>
To:     Dan Williams <dan.j.williams@...el.com>,
        "Darrick J. Wong" <djwong@...nel.org>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Dave Chinner <david@...morbit.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-xfs <linux-xfs@...r.kernel.org>,
        Linux NVDIMM <nvdimm@...ts.linux.dev>,
        Linux MM <linux-mm@...ck.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Christoph Hellwig <hch@...radead.org>,
        Jane Chu <jane.chu@...cle.com>,
        Goldwyn Rodrigues <rgoldwyn@...e.de>,
        Al Viro <viro@...iv.linux.org.uk>,
        Matthew Wilcox <willy@...radead.org>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        <linmiaohe@...wei.com>
Subject: Re: [PATCHSETS] v14 fsdax-rmap + v11 fsdax-reflink



在 2022/5/11 23:46, Dan Williams 写道:
> On Wed, May 11, 2022 at 8:21 AM Darrick J. Wong <djwong@...nel.org> wrote:
>>
>> Oan Tue, May 10, 2022 at 10:24:28PM -0700, Andrew Morton wrote:
>>> On Tue, 10 May 2022 19:43:01 -0700 "Darrick J. Wong" <djwong@...nel.org> wrote:
>>>
>>>> On Tue, May 10, 2022 at 07:28:53PM -0700, Andrew Morton wrote:
>>>>> On Tue, 10 May 2022 18:55:50 -0700 Dan Williams <dan.j.williams@...el.com> wrote:
>>>>>
>>>>>>> It'll need to be a stable branch somewhere, but I don't think it
>>>>>>> really matters where al long as it's merged into the xfs for-next
>>>>>>> tree so it gets filesystem test coverage...
>>>>>>
>>>>>> So how about let the notify_failure() bits go through -mm this cycle,
>>>>>> if Andrew will have it, and then the reflnk work has a clean v5.19-rc1
>>>>>> baseline to build from?
>>>>>
>>>>> What are we referring to here?  I think a minimal thing would be the
>>>>> memremap.h and memory-failure.c changes from
>>>>> https://lkml.kernel.org/r/20220508143620.1775214-4-ruansy.fnst@fujitsu.com ?
>>>>>
>>>>> Sure, I can scoot that into 5.19-rc1 if you think that's best.  It
>>>>> would probably be straining things to slip it into 5.19.
>>>>>
>>>>> The use of EOPNOTSUPP is a bit suspect, btw.  It *sounds* like the
>>>>> right thing, but it's a networking errno.  I suppose livable with if it
>>>>> never escapes the kernel, but if it can get back to userspace then a
>>>>> user would be justified in wondering how the heck a filesystem
>>>>> operation generated a networking errno?
>>>>
>>>> <shrug> most filesystems return EOPNOTSUPP rather enthusiastically when
>>>> they don't know how to do something...
>>>
>>> Can it propagate back to userspace?
>>
>> AFAICT, the new code falls back to the current (mf_generic_kill_procs)
>> failure code if the filesystem doesn't provide a ->memory_failure
>> function or if it returns -EOPNOSUPP.  mf_generic_kill_procs can also
>> return -EOPNOTSUPP, but all the memory_failure() callers (madvise, etc.)
>> convert that to 0 before returning it to userspace.
>>
>> I suppose the weirder question is going to be what happens when madvise
>> starts returning filesystem errors like EIO or EFSCORRUPTED when pmem
>> loses half its brains and even the fs can't deal with it.
> 
> Even then that notification is not in a system call context so it
> would still result in a SIGBUS notification not a EOPNOTSUPP return
> code. The only potential gap I see are what are the possible error
> codes that MADV_SOFT_OFFLINE might see? The man page is silent on soft
> offline failure codes. Shiyang, that's something to check / update if
> necessary.

According to the code around MADV_SOFT_OFFLINE, it will return -EIO when 
the backend is NVDIMM.

Here is the logic:
  madvise_inject_error() {
      ...
      if (MADV_SOFT_OFFLINE) {
          ret = soft_offline_page() {
              ...
              /* Only online pages can be soft-offlined (esp., not 
ZONE_DEVICE). */
              page = pfn_to_online_page(pfn);
              if (!page) {
                  put_ref_page(ref_page);
                  return -EIO;
              }
              ...
          }
      } else {
          ret = memory_failure()
      }
      return ret
  }


--
Thanks,
Ruan.