linux-kernel - Re: [PATCH 04/10] mm, fsdax: Refactor memory-failure handler for dax mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <781f276b-afdd-091c-3dba-048e415431ab@linux.alibaba.com>
Date:   Wed, 13 Jan 2021 18:04:46 +0800
From:   zhong jiang <zhongjiang-ali@...ux.alibaba.com>
To:     Ruan Shiyang <ruansy.fnst@...fujitsu.com>, Jan Kara <jack@...e.cz>
Cc:     linux-kernel@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-nvdimm@...ts.01.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-raid@...r.kernel.org,
        darrick.wong@...cle.com, dan.j.williams@...el.com,
        david@...morbit.com, hch@....de, song@...nel.org, rgoldwyn@...e.de,
        qi.fuli@...itsu.com, y-goto@...itsu.com
Subject: Re: [PATCH 04/10] mm, fsdax: Refactor memory-failure handler for dax
 mapping


On 2021/1/12 10:55 上午, Ruan Shiyang wrote:
>
>
> On 2021/1/6 下午11:41, Jan Kara wrote:
>> On Thu 31-12-20 00:55:55, Shiyang Ruan wrote:
>>> The current memory_failure_dev_pagemap() can only handle single-mapped
>>> dax page for fsdax mode.  The dax page could be mapped by multiple 
>>> files
>>> and offsets if we let reflink feature & fsdax mode work together.  So,
>>> we refactor current implementation to support handle memory failure on
>>> each file and offset.
>>>
>>> Signed-off-by: Shiyang Ruan <ruansy.fnst@...fujitsu.com>
>>
>> Overall this looks OK to me, a few comments below.
>>
>>> ---
>>>   fs/dax.c            | 21 +++++++++++
>>>   include/linux/dax.h |  1 +
>>>   include/linux/mm.h  |  9 +++++
>>>   mm/memory-failure.c | 91 
>>> ++++++++++++++++++++++++++++++++++-----------
>>>   4 files changed, 100 insertions(+), 22 deletions(-)
>
> ...
>
>>>   @@ -345,9 +348,12 @@ static void add_to_kill(struct task_struct 
>>> *tsk, struct page *p,
>>>       }
>>>         tk->addr = page_address_in_vma(p, vma);
>>> -    if (is_zone_device_page(p))
>>> -        tk->size_shift = dev_pagemap_mapping_shift(p, vma);
>>> -    else
>>> +    if (is_zone_device_page(p)) {
>>> +        if (is_device_fsdax_page(p))
>>> +            tk->addr = vma->vm_start +
>>> +                    ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
>>
>> It seems strange to use 'pgoff' for dax pages and not for any other 
>> page.
>> Why? I'd rather pass correct pgoff from all callers of add_to_kill() and
>> avoid this special casing...
>
> Because one fsdax page can be shared by multiple pgoffs.  I have to 
> pass each pgoff in each iteration to calculate the address in vma (for 
> tk->addr).  Other kinds of pages don't need this. They can get their 
> unique address by calling "page_address_in_vma()".
>
IMO,   an fsdax page can be shared by multiple files rather than 
multiple pgoffs if fs query support reflink.   Because an page only 
located in an mapping(page->mapping is exclusive),  hence it  only has 
an pgoff or index pointing at the node.

  or  I miss something for the feature ?  thanks,

> So, I added this fsdax case here.  This patchset only implemented the 
> fsdax case, other cases also need to be added here if to be implemented.
>
>
> -- 
> Thanks,
> Ruan Shiyang.
>
>>
>>> +        tk->size_shift = dev_pagemap_mapping_shift(p, vma, tk->addr);
>>> +    } else
>>>           tk->size_shift = page_shift(compound_head(p));
>>>         /*
>>> @@ -495,7 +501,7 @@ static void collect_procs_anon(struct page 
>>> *page, struct list_head *to_kill,
>>>               if (!page_mapped_in_vma(page, vma))
>>>                   continue;
>>>               if (vma->vm_mm == t->mm)
>>> -                add_to_kill(t, page, vma, to_kill);
>>> +                add_to_kill(t, page, NULL, 0, vma, to_kill);
>>>           }
>>>       }
>>>       read_unlock(&tasklist_lock);
>>> @@ -505,24 +511,19 @@ static void collect_procs_anon(struct page 
>>> *page, struct list_head *to_kill,
>>>   /*
>>>    * Collect processes when the error hit a file mapped page.
>>>    */
>>> -static void collect_procs_file(struct page *page, struct list_head 
>>> *to_kill,
>>> -                int force_early)
>>> +static void collect_procs_file(struct page *page, struct 
>>> address_space *mapping,
>>> +        pgoff_t pgoff, struct list_head *to_kill, int force_early)
>>>   {
>>>       struct vm_area_struct *vma;
>>>       struct task_struct *tsk;
>>> -    struct address_space *mapping = page->mapping;
>>> -    pgoff_t pgoff;
>>>         i_mmap_lock_read(mapping);
>>>       read_lock(&tasklist_lock);
>>> -    pgoff = page_to_pgoff(page);
>>>       for_each_process(tsk) {
>>>           struct task_struct *t = task_early_kill(tsk, force_early);
>>> -
>>>           if (!t)
>>>               continue;
>>> -        vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
>>> -                      pgoff) {
>>> +        vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, 
>>> pgoff) {
>>>               /*
>>>                * Send early kill signal to tasks where a vma covers
>>>                * the page but the corrupted page is not necessarily
>>> @@ -531,7 +532,7 @@ static void collect_procs_file(struct page 
>>> *page, struct list_head *to_kill,
>>>                * to be informed of all such data corruptions.
>>>                */
>>>               if (vma->vm_mm == t->mm)
>>> -                add_to_kill(t, page, vma, to_kill);
>>> +                add_to_kill(t, page, mapping, pgoff, vma, to_kill);
>>>           }
>>>       }
>>>       read_unlock(&tasklist_lock);
>>> @@ -550,7 +551,8 @@ static void collect_procs(struct page *page, 
>>> struct list_head *tokill,
>>>       if (PageAnon(page))
>>>           collect_procs_anon(page, tokill, force_early);
>>>       else
>>> -        collect_procs_file(page, tokill, force_early);
>>> +        collect_procs_file(page, page->mapping, page_to_pgoff(page),
>>
>> Why not use page_mapping() helper here? It would be safer for THPs if 
>> they
>> ever get here...
>>
>>                                 Honza
>>
>