[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c11714b-06f8-8eba-e0b3-8bb1caa8ebf2@oracle.com>
Date: Thu, 19 Aug 2021 13:50:49 -0700
From: Jane Chu <jane.chu@...cle.com>
To: "ruansy.fnst@...itsu.com" <ruansy.fnst@...itsu.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
"nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"dm-devel@...hat.com" <dm-devel@...hat.com>
Cc: "djwong@...nel.org" <djwong@...nel.org>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
"david@...morbit.com" <david@...morbit.com>,
"hch@....de" <hch@....de>, "agk@...hat.com" <agk@...hat.com>,
"snitzer@...hat.com" <snitzer@...hat.com>
Subject: Re: [PATCH RESEND v6 1/9] pagemap: Introduce ->memory_failure()
On 8/19/2021 2:10 AM, ruansy.fnst@...itsu.com wrote:
>> From: Jane Chu <jane.chu@...cle.com>
>> Subject: Re: [PATCH RESEND v6 1/9] pagemap: Introduce ->memory_failure()
>>
>> Sorry, correction in line.
>>
>> On 8/19/2021 12:18 AM, Jane Chu wrote:
>>> Hi, Shiyang,
>>>
>>> > > > 1) What does it take and cost to make > > >
>>> xfs_sb_version_hasrmapbt(&mp->m_sb) to return true?
>>> >
>>> > Enable rmpabt feature when making xfs filesystem > `mkfs.xfs
>>> -m rmapbt=1 /path/to/device` > BTW, reflink is enabled by default.
>>>
>>> Thanks! I tried
>>> mkfs.xfs -d agcount=2,extszinherit=512,su=2m,sw=1 -m reflink=0 -m
>>> rmapbt=1 -f /dev/pmem0
>>>
>>> Again, injected a HW poison to the first page in a dax-file, had the
>>> poison consumed and received a SIGBUS. The result is better -
>>>
>>> ** SIGBUS(7): canjmp=1, whichstep=0, **
>>> ** si_addr(0x0x7ff2d8800000), si_lsb(0x15), si_code(0x4,
>>> BUS_MCEERR_AR) **
>>>
>>> The SIGBUS payload looks correct.
>>>
>>> However, "dmesg" has 2048 lines on sending SIGBUS, one per 512bytes -
>>
>> Actually that's one per 2MB, even though the poison is located in pfn 0x1850600
>> only.
>>
>>>
>>> [ 7003.482326] Memory failure: 0x1850600: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption [ 7003.507956]
>>> Memory failure: 0x1850800: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption [ 7003.531681]
>>> Memory failure: 0x1850a00: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption [ 7003.554190]
>>> Memory failure: 0x1850c00: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption [ 7003.575831]
>>> Memory failure: 0x1850e00: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption [ 7003.596796]
>>> Memory failure: 0x1851000: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption ....
>>> [ 7045.738270] Memory failure: 0x194fe00: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption [ 7045.758885]
>>> Memory failure: 0x1950000: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption [ 7045.779495]
>>> Memory failure: 0x1950200: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption [ 7045.800106]
>>> Memory failure: 0x1950400: Sending SIGBUS to
>>> fsdax_poison_v1:4109 due to hardware memory corruption
>>>
>>> That's too much for a single process dealing with a single poison in a
>>> PMD page. If nothing else, given an .si_addr_lsb being 0x15, it
>>> doesn't make sense to send a SIGBUS per 512B block.
>>>
>>> Could you determine the user process' mapping size from the
>>> filesystem, and take that as a hint to determine how many iterations
>>> to call
>>> mf_dax_kill_procs() ?
>>
>> Sorry, scratch the 512byte stuff... the filesystem has been notified the length of
>> the poison blast radius, could it take clue from that?
>
> I think this is caused by a mistake I made in the 6th patch: xfs handler iterates the file range in block size(4k here) even though it is a PMD page. That's why so many message shows when poison on a PMD page. I'll fix it in next version.
>
Sorry, just to clarify, it looks like XFS has iterated through out the
entire file in 2MiB stride. The test file size is 4GiB, that explains
'dmesg' showing 2048 line about sending SIGBUS.
thanks,
-jane
>
> --
> Thanks,
> Ruan.
>
>>
>> thanks,
>> -jane
>>
>>>
>>> thanks!
>>> -jane
>>>
>>>
>>>
Powered by blists - more mailing lists