lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 28 Apr 2015 21:24:39 +0000
From:	"Elliott, Robert (Server Storage)" <Elliott@...com>
To:	Dan Williams <dan.j.williams@...el.com>,
	"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>
CC:	Neil Brown <neilb@...e.de>, Dave Chinner <david@...morbit.com>,
	"H. Peter Anvin" <hpa@...or.com>, Christoph Hellwig <hch@....de>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Robert Moore <robert.moore@...el.com>,
	"Ingo Molnar" <mingo@...nel.org>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	Jens Axboe <axboe@...com>, Borislav Petkov <bp@...en8.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Greg KH <gregkh@...uxfoundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Andy Lutomirski <luto@...capital.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: RE: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory
 device	support

> -----Original Message-----
> From: Linux-nvdimm [mailto:linux-nvdimm-bounces@...ts.01.org] On Behalf Of
> Dan Williams
> Sent: Tuesday, April 28, 2015 1:24 PM
> To: linux-nvdimm@...ts.01.org
> Cc: Neil Brown; Dave Chinner; H. Peter Anvin; Christoph Hellwig; Rafael J.
> Wysocki; Robert Moore; Ingo Molnar; linux-acpi@...r.kernel.org; Jens Axboe;
> Borislav Petkov; Thomas Gleixner; Greg KH; linux-kernel@...r.kernel.org;
> Andy Lutomirski; Andrew Morton; Linus Torvalds
> Subject: [Linux-nvdimm] [PATCH v2 00/20] libnd: non-volatile memory device
> support
> 
> Changes since v1 [1]: Incorporates feedback received prior to April 24.

Here are some comments on the sysfs properties reported for a pmem device.
They are based on v1, but I don't think v2 changes anything.

1. This confuses lsblk (part of util-linux):
/sys/block/pmem0/device/type:4

lsblk shows:
NAME                          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
pmem0                         251:0    0     8G  0 worm
pmem1                         251:16   0     8G  0 worm
pmem2                         251:32   0     8G  0 worm
pmem3                         251:48   0     8G  0 worm
pmem4                         251:64   0     8G  0 worm
pmem5                         251:80   0     8G  0 worm
pmem6                         251:96   0     8G  0 worm
pmem7                         251:112  0     8G  0 worm

lsblk's blkdev_scsi_type_to_name() considers 4 to mean 
SCSI_TYPE_WORM (write once read many ... used for certain optical
and tape drives).

I'm not sure what nd and pmem are doing to result in that value.

2. To avoid confusing software trying to detect fast storage vs.
slow storage devices via sysfs, this value should be 0:
/sys/block/pmem0/queue/rotational:1

That can be done by adding this shortly after the blk_alloc_queue call:
                queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->pmem_queue);

3. Is there any reason to have a 512 KiB limit on the transfer
length?
/sys/block/pmem0/queue/max_hw_sectors_kb:512

That is from:
       blk_queue_max_hw_sectors(pmem->pmem_queue, 1024);

4. These are read-writeable, but IOs never reach a queue, so 
the queue size is irrelevant and merging never happens:
/sys/block/pmem0/queue/nomerges:0
/sys/block/pmem0/queue/nr_requests:128

Consider making them both read-only with: 
* nomerges set to 2 (no merging happening) 
* nr_requests as small as the block layer allows to avoid 
wasting memory.

5. No scatter-gather lists are created by the driver, so these
read-only fields are meaningless:
/sys/block/pmem0/queue/max_segments:128
/sys/block/pmem0/queue/max_segment_size:65536

Is there a better way to report them as irrelevant?

6. There is no completion processing, so the read-writeable
cpu affinity is not used:
/sys/block/pmem0/queue/rq_affinity:0

Consider making it read-only and set to 2, meaning the
completions always run on the requesting CPU.

7. With mmap() allowing less than logical block sized accesses
to the device, this could be considered misleading:
/sys/block/pmem0/queue/physical_block_size:512

Perhaps that needs to be 1 byte or a cacheline size (64 bytes
on x86) to indicate that direct partial logical block accesses
are possible.  The btt driver could report 512 as one indication
it is different.

I wouldn't be surprised if smaller values than the logical block
size confused some software, though.

---
Robert Elliott, HP Server Storage
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ