lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 14 Jan 2021 09:44:14 +0800 From: Ruan Shiyang <ruansy.fnst@...fujitsu.com> To: zhong jiang <zhongjiang-ali@...ux.alibaba.com>, Jan Kara <jack@...e.cz> CC: <linux-kernel@...r.kernel.org>, <linux-xfs@...r.kernel.org>, <linux-nvdimm@...ts.01.org>, <linux-mm@...ck.org>, <linux-fsdevel@...r.kernel.org>, <linux-raid@...r.kernel.org>, <darrick.wong@...cle.com>, <dan.j.williams@...el.com>, <david@...morbit.com>, <hch@....de>, <song@...nel.org>, <rgoldwyn@...e.de>, <qi.fuli@...itsu.com>, <y-goto@...itsu.com> Subject: Re: [PATCH 04/10] mm, fsdax: Refactor memory-failure handler for dax mapping On 2021/1/13 下午6:04, zhong jiang wrote: > > On 2021/1/12 10:55 上午, Ruan Shiyang wrote: >> >> >> On 2021/1/6 下午11:41, Jan Kara wrote: >>> On Thu 31-12-20 00:55:55, Shiyang Ruan wrote: >>>> The current memory_failure_dev_pagemap() can only handle single-mapped >>>> dax page for fsdax mode. The dax page could be mapped by multiple >>>> files >>>> and offsets if we let reflink feature & fsdax mode work together. So, >>>> we refactor current implementation to support handle memory failure on >>>> each file and offset. >>>> >>>> Signed-off-by: Shiyang Ruan <ruansy.fnst@...fujitsu.com> >>> >>> Overall this looks OK to me, a few comments below. >>> >>>> --- >>>> fs/dax.c | 21 +++++++++++ >>>> include/linux/dax.h | 1 + >>>> include/linux/mm.h | 9 +++++ >>>> mm/memory-failure.c | 91 >>>> ++++++++++++++++++++++++++++++++++----------- >>>> 4 files changed, 100 insertions(+), 22 deletions(-) >> >> ... >> >>>> @@ -345,9 +348,12 @@ static void add_to_kill(struct task_struct >>>> *tsk, struct page *p, >>>> } >>>> tk->addr = page_address_in_vma(p, vma); >>>> - if (is_zone_device_page(p)) >>>> - tk->size_shift = dev_pagemap_mapping_shift(p, vma); >>>> - else >>>> + if (is_zone_device_page(p)) { >>>> + if (is_device_fsdax_page(p)) >>>> + tk->addr = vma->vm_start + >>>> + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); >>> >>> It seems strange to use 'pgoff' for dax pages and not for any other >>> page. >>> Why? I'd rather pass correct pgoff from all callers of add_to_kill() and >>> avoid this special casing... >> >> Because one fsdax page can be shared by multiple pgoffs. I have to >> pass each pgoff in each iteration to calculate the address in vma (for >> tk->addr). Other kinds of pages don't need this. They can get their >> unique address by calling "page_address_in_vma()". >> > IMO, an fsdax page can be shared by multiple files rather than > multiple pgoffs if fs query support reflink. Because an page only > located in an mapping(page->mapping is exclusive), hence it only has > an pgoff or index pointing at the node. > > or I miss something for the feature ? thanks, Yes, a fsdax page is shared by multiple files because of reflink. I think my description of 'pgoff' here is not correct. This 'pgoff' means the offset within the a file. (We use rmap to find out all the sharing files and their offsets.) So, I said that "can be shared by multiple pgoffs". It's my bad. I think I should name it another word to avoid misunderstandings. -- Thanks, Ruan Shiyang. > >> So, I added this fsdax case here. This patchset only implemented the >> fsdax case, other cases also need to be added here if to be implemented. >> >> >> -- >> Thanks, >> Ruan Shiyang. >> >>> >>>> + tk->size_shift = dev_pagemap_mapping_shift(p, vma, tk->addr); >>>> + } else >>>> tk->size_shift = page_shift(compound_head(p)); >>>> /* >>>> @@ -495,7 +501,7 @@ static void collect_procs_anon(struct page >>>> *page, struct list_head *to_kill, >>>> if (!page_mapped_in_vma(page, vma)) >>>> continue; >>>> if (vma->vm_mm == t->mm) >>>> - add_to_kill(t, page, vma, to_kill); >>>> + add_to_kill(t, page, NULL, 0, vma, to_kill); >>>> } >>>> } >>>> read_unlock(&tasklist_lock); >>>> @@ -505,24 +511,19 @@ static void collect_procs_anon(struct page >>>> *page, struct list_head *to_kill, >>>> /* >>>> * Collect processes when the error hit a file mapped page. >>>> */ >>>> -static void collect_procs_file(struct page *page, struct list_head >>>> *to_kill, >>>> - int force_early) >>>> +static void collect_procs_file(struct page *page, struct >>>> address_space *mapping, >>>> + pgoff_t pgoff, struct list_head *to_kill, int force_early) >>>> { >>>> struct vm_area_struct *vma; >>>> struct task_struct *tsk; >>>> - struct address_space *mapping = page->mapping; >>>> - pgoff_t pgoff; >>>> i_mmap_lock_read(mapping); >>>> read_lock(&tasklist_lock); >>>> - pgoff = page_to_pgoff(page); >>>> for_each_process(tsk) { >>>> struct task_struct *t = task_early_kill(tsk, force_early); >>>> - >>>> if (!t) >>>> continue; >>>> - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, >>>> - pgoff) { >>>> + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, >>>> pgoff) { >>>> /* >>>> * Send early kill signal to tasks where a vma covers >>>> * the page but the corrupted page is not necessarily >>>> @@ -531,7 +532,7 @@ static void collect_procs_file(struct page >>>> *page, struct list_head *to_kill, >>>> * to be informed of all such data corruptions. >>>> */ >>>> if (vma->vm_mm == t->mm) >>>> - add_to_kill(t, page, vma, to_kill); >>>> + add_to_kill(t, page, mapping, pgoff, vma, to_kill); >>>> } >>>> } >>>> read_unlock(&tasklist_lock); >>>> @@ -550,7 +551,8 @@ static void collect_procs(struct page *page, >>>> struct list_head *tokill, >>>> if (PageAnon(page)) >>>> collect_procs_anon(page, tokill, force_early); >>>> else >>>> - collect_procs_file(page, tokill, force_early); >>>> + collect_procs_file(page, page->mapping, page_to_pgoff(page), >>> >>> Why not use page_mapping() helper here? It would be safer for THPs if >>> they >>> ever get here... >>> >>> Honza >>> >> > >
Powered by blists - more mailing lists