lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120308154326.GA6777@thunk.org>
Date:	Thu, 8 Mar 2012 10:43:26 -0500
From:	Ted Ts'o <tytso@....edu>
To:	Sage Weil <sage@...dream.net>
Cc:	Boaz Harrosh <bharrosh@...asas.com>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [PATCH, RFC] Don't do page stablization if
 !CONFIG_BLKDEV_INTEGRITY

On Wed, Mar 07, 2012 at 10:27:43PM -0800, Sage Weil wrote:
> 
> This avoids the problem for devices that don't need stable pages, but 
> doesn't help for those that do (btrfs, raid, iscsi, dif/dix, etc.).  It 
> seems to me like a more elegant solution would be to COW the page in the 
> address_space so that you get stable writeback pages without blocking.  
> That's clearly more complex, and I'm sure there are a range of issues 
> involved in making that work, but I would hope that it would be doable 
> with generic MM infrastructure so that everyone would benefit.

Well, even doing a COW (or anything that involves messing with page
tables) is not free.  So even if we can make the cost of stable
writeback pages cheaper, if we can completely avoid the cost, this
would be good.  I'd also rather fix the performance regression sooner
rather than later, and I suspect the COW solution is not something
that could be prepared in time for the upcoming merge window.

Martin, would you be willing to try to get your patch submitted for
the upcoming merge window?  Or I'd be willing to carry your patch and
then rework Darrick's to use the exported flag, and carry it in my
tree, maybe that would be better.

> I would love to talk to some MM people at LSF about what it would take to 
> make this work...

Sure, long term, I'm much more sympathetic to iSCSI than I am about
DIF/DIX (which due to drive manufacturer's pricing strategies I don't
think it will ever become mainstream --- which was my main concern;
why should we be inflicting pretty severe performance regressions for
the common case, just to improve things for obscure high-end hardware?
It's similar to how Solaris trashed 1 and 2 processor performance,
because they were optimizing things for their high margin SunFires).
So getting something which makes page stablization not suck so much in
the long term seems like a fine goal.  But even though we've worked on
improve SMP scalability in a way that didn't sacrifice 2 and 4-way
processors, we've still supported compiling with !CONFIG_SMP....

     		       	   	      - Ted

> 
> sage
> 
> 
> 
> > 
> > When submitted I will also send a patch to set .needs_stable_pages in 
> > iscsi when needed.
> > 
> > Thanks, Martin
> > Boaz
> > 
> > > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > > index 5680b91..442a0df 100644
> > > --- a/block/blk-settings.c
> > > +++ b/block/blk-settings.c
> > > @@ -125,6 +125,7 @@ void blk_set_default_limits(struct queue_limits *lim)
> > >  	lim->io_opt = 0;
> > >  	lim->misaligned = 0;
> > >  	lim->cluster = 1;
> > > +	lim->needs_stable_pages = false;
> > >  }
> > >  EXPORT_SYMBOL(blk_set_default_limits);
> > >  
> > > @@ -571,6 +572,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> > >  	t->cluster &= b->cluster;
> > >  	t->discard_zeroes_data &= b->discard_zeroes_data;
> > >  
> > > +	t->needs_stable_pages &= b->needs_stable_pages;
> > > +
> > >  	/* Physical block size a multiple of the logical block size? */
> > >  	if (t->physical_block_size & (t->logical_block_size - 1)) {
> > >  		t->physical_block_size = t->logical_block_size;
> > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > > index 5b85d91..d464aca 100644
> > > --- a/block/blk-sysfs.c
> > > +++ b/block/blk-sysfs.c
> > > @@ -161,6 +161,11 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
> > >  	return queue_var_show(queue_discard_zeroes_data(q), page);
> > >  }
> > >  
> > > +static ssize_t queue_needs_stable_pages_show(struct request_queue *q, char *page)
> > > +{
> > > +	return queue_var_show(q->limits.needs_stable_pages, page);
> > > +}
> > > +
> > >  static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
> > >  {
> > >  	return sprintf(page, "%llu\n",
> > > @@ -364,6 +369,11 @@ static struct queue_sysfs_entry queue_discard_zeroes_data_entry = {
> > >  	.show = queue_discard_zeroes_data_show,
> > >  };
> > >  
> > > +static struct queue_sysfs_entry queue_needs_stable_pages_entry = {
> > > +	.attr = {.name = "needs_stable_pages", .mode = S_IRUGO },
> > > +	.show = queue_needs_stable_pages_show,
> > > +};
> > > +
> > >  static struct queue_sysfs_entry queue_write_same_max_entry = {
> > >  	.attr = {.name = "write_same_max_bytes", .mode = S_IRUGO },
> > >  	.show = queue_write_same_max_show,
> > > @@ -416,6 +426,7 @@ static struct attribute *default_attrs[] = {
> > >  	&queue_discard_granularity_entry.attr,
> > >  	&queue_discard_max_entry.attr,
> > >  	&queue_discard_zeroes_data_entry.attr,
> > > +	&queue_needs_stable_pages_entry.attr,
> > >  	&queue_write_same_max_entry.attr,
> > >  	&queue_nonrot_entry.attr,
> > >  	&queue_nomerges_entry.attr,
> > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > > index 26eff46..146bed4 100644
> > > --- a/drivers/scsi/sd.c
> > > +++ b/drivers/scsi/sd.c
> > > @@ -1752,10 +1752,11 @@ static void sd_read_protection_type(struct scsi_disk *sdkp, unsigned char *buffe
> > >  		return;
> > >  	}
> > >  
> > > -	if (scsi_host_dif_capable(sdp->host, type))
> > > +	if (scsi_host_dif_capable(sdp->host, type)) {
> > >  		sd_printk(KERN_NOTICE, sdkp,
> > >  			  "Enabling DIF Type %u protection\n", type);
> > > -	else
> > > +		sdkp->disk->queue->limits.needs_stable_pages = true;
> > > +	} else
> > >  		sd_printk(KERN_NOTICE, sdkp,
> > >  			  "Disabling DIF Type %u protection\n", type);
> > >  }
> > > diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
> > > index 0cb39ff..9dc330c 100644
> > > --- a/drivers/scsi/sd_dif.c
> > > +++ b/drivers/scsi/sd_dif.c
> > > @@ -338,6 +338,8 @@ void sd_dif_config_host(struct scsi_disk *sdkp)
> > >  	sd_printk(KERN_NOTICE, sdkp,
> > >  		  "Enabling DIX %s protection\n", disk->integrity->name);
> > >  
> > > +	disk->queue->limits.needs_stable_pages = true;
> > > +
> > >  	/* Signal to block layer that we support sector tagging */
> > >  	if (dif && type && sdkp->ATO) {
> > >  		if (type == SD_DIF_TYPE3_PROTECTION)
> > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > > index 92956b7..a5a33db 100644
> > > --- a/include/linux/blkdev.h
> > > +++ b/include/linux/blkdev.h
> > > @@ -266,6 +266,8 @@ struct queue_limits {
> > >  	unsigned char		discard_misaligned;
> > >  	unsigned char		cluster;
> > >  	unsigned char		discard_zeroes_data;
> > > +
> > > +	bool			needs_stable_pages;
> > >  };
> > >  
> > >  struct request_queue {
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ