lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Apr 2022 17:32:31 +0000
From:   Jane Chu <jane.chu@...cle.com>
To:     Christoph Hellwig <hch@...radead.org>
CC:     "david@...morbit.com" <david@...morbit.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "dan.j.williams@...el.com" <dan.j.williams@...el.com>,
        "vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
        "dave.jiang@...el.com" <dave.jiang@...el.com>,
        "agk@...hat.com" <agk@...hat.com>,
        "snitzer@...hat.com" <snitzer@...hat.com>,
        "dm-devel@...hat.com" <dm-devel@...hat.com>,
        "ira.weiny@...el.com" <ira.weiny@...el.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "vgoyal@...hat.com" <vgoyal@...hat.com>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v7 4/6] dax: add DAX_RECOVERY flag and .recovery_write
 dev_pgmap_ops

On 4/5/2022 10:19 PM, Christoph Hellwig wrote:
> On Tue, Apr 05, 2022 at 01:47:45PM -0600, Jane Chu wrote:
>> Introduce DAX_RECOVERY flag to dax_direct_access(). The flag is
>> not set by default in dax_direct_access() such that the helper
>> does not translate a pmem range to kernel virtual address if the
>> range contains uncorrectable errors.  When the flag is set,
>> the helper ignores the UEs and return kernel virtual adderss so
>> that the caller may get on with data recovery via write.
>>
>> Also introduce a new dev_pagemap_ops .recovery_write function.
>> The function is applicable to FSDAX device only. The device
>> page backend driver provides .recovery_write function if the
>> device has underlying mechanism to clear the uncorrectable
>> errors on the fly.
> 
> I know Dan suggested it, but I still think dev_pagemap_ops is the very
> wrong choice here.  It is about VM callbacks to ZONE_DEVICE owners
> independent of what pagemap type they are.  .recovery_write on the
> other hand is completely specific to the DAX write path and has no
> MM interactions at all.

Yes, I believe Dan was motivated by avoiding the dm dance as a result of
adding .recovery_write to dax_operations.

I understand your point about .recovery_write is device specific and
thus not something appropriate for device agnostic ops.

I can see 2 options so far -

1)  add .recovery_write to dax_operations and do the dm dance to hunt 
down to the base device that actually provides the recovery action

2)  an ugly but expedient approach based on the observation that 
dax_direct_access() has already gone through the dm dance and thus could 
scoop up the .recovery_write function pointer if DAX_RECOVERY flag is 
set.  Like bundle action-flag with action, and if should there need more
device specific actions, just add another action with associated flag.

I'm thinking about something like this

    long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
                           long nr_pages, struct daxdev_specific *action,
                           int flags, void **kaddr, pfn_t *pfn)

    where
    struct daxdev_specific {
	int flags;	/* DAX_RECOVERY, etc */
	size_t (*recovery_write) (pfn_t pfn, pgoff_t pgoff, void *addr,
				 size_t bytes, void *iter);
    }

    __pmem_direct_access() provides the .recovery_write function pointer;
    dax_iomap_iter() ends up directly invoke the function in pmem.c
      which finds pgmap from pfn_t, and (struct pmem *) from
      pgmap->owner;

In this way, we get rid of dax_recovery_write() interface as well as the
dm dance.

What do you think?

Dan, could you also chime in ?

> 
>>   /* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */
>>   __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,
>> -		long nr_pages, void **kaddr, pfn_t *pfn)
>> +		long nr_pages, int flags, void **kaddr, pfn_t *pfn)
>>   {
>>   	resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
>> +	sector_t sector = PFN_PHYS(pgoff) >> SECTOR_SHIFT;
>> +	unsigned int num = PFN_PHYS(nr_pages) >> SECTOR_SHIFT;
>> +	struct badblocks *bb = &pmem->bb;
>> +	sector_t first_bad;
>> +	int num_bad;
>> +	bool bad_in_range;
>> +	long actual_nr;
>> +
>> +	if (!bb->count)
>> +		bad_in_range = false;
>> +	else
>> +		bad_in_range = !!badblocks_check(bb, sector, num, &first_bad, &num_bad);
>>   
>> -	if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512,
>> -					PFN_PHYS(nr_pages))))
>> +	if (bad_in_range && !(flags & DAX_RECOVERY))
>>   		return -EIO;
> 
> The use of bad_in_range here seems a litle convoluted.  See the attached
> patch on how I would structure the function to avoid the variable and
> have the reocvery code in a self-contained chunk.

Much better, will use your version, thanks!

> 
>> -		map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size),
>> -				&kaddr, NULL);
>> +		nrpg = PHYS_PFN(size);
>> +		map_len = dax_direct_access(dax_dev, pgoff, nrpg, 0, &kaddr, NULL);
> 
> Overly long line here.

Okay, will run the checkpatch.pl test again.

thanks!
-jane

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ