lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.1006291811300.27462@hs20-bc2-1.build.redhat.com>
Date:	Tue, 29 Jun 2010 18:28:55 -0400 (EDT)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	device-mapper development <dm-devel@...hat.com>
cc:	Mike Snitzer <snitzer@...hat.com>, axboe@...nel.dk,
	linux-scsi@...r.kernel.org, martin.petersen@...cle.com,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	Christoph Hellwig <hch@....de>
Subject: Re: [dm-devel] [PATCH 1/2] block: fix leaks associated with discard
 request payload



On Sun, 27 Jun 2010, James Bottomley wrote:

> linux-scsi cc added, since it's a SCSI patch.
> 
> On Sat, 2010-06-26 at 15:56 -0400, Mike Snitzer wrote:
> > Fix leaks introduced via "block: don't allocate a payload for discard
> > request" commit a1d949f5f44.
> > 
> > sd_done() is not called for REQ_TYPE_BLOCK_PC commands so cleanup
> > discard request's payload directly in scsi_finish_command().
> > 
> > Also cleanup page allocated for discard payload in
> > scsi_setup_discard_cmnd's scsi_setup_blk_pc_cmnd error path.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@...hat.com>
> > ---
> >  block/blk-core.c       |   23 +++++++++++++++++++++++
> >  drivers/scsi/scsi.c    |    8 ++++++++
> >  drivers/scsi/sd.c      |   18 ++++++++----------
> >  include/linux/blkdev.h |    1 +
> >  4 files changed, 40 insertions(+), 10 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 98b4cee..07925aa 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1167,6 +1167,29 @@ void blk_add_request_payload(struct request *rq, struct page *page,
> >  }
> >  EXPORT_SYMBOL_GPL(blk_add_request_payload);
> >  
> > +/**
> > + * blk_clear_request_payload - clear a request's payload
> > + * @rq: request to update
> > + *
> > + * The driver needs to take care of freeing the payload itself.
> > + */
> > +void blk_clear_request_payload(struct request *rq)
> > +{
> > +	struct bio *bio = rq->bio;
> > +
> > +	rq->__data_len = rq->resid_len = 0;
> > +	rq->nr_phys_segments = 0;
> > +	rq->buffer = NULL;
> > +
> > +	bio->bi_size = 0;
> > +	bio->bi_vcnt = 0;
> > +	bio->bi_phys_segments = 0;
> > +
> > +	bio->bi_io_vec->bv_page = NULL;
> > +	bio->bi_io_vec->bv_len = 0;
> > +}
> > +EXPORT_SYMBOL_GPL(blk_clear_request_payload);
> > +
> >  void init_request_from_bio(struct request *req, struct bio *bio)
> >  {
> >  	req->cpu = bio->bi_comp_cpu;
> > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > index ad0ed21..69c7ea4 100644
> > --- a/drivers/scsi/scsi.c
> > +++ b/drivers/scsi/scsi.c
> > @@ -851,6 +851,14 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
> >  		 */
> >  		if (good_bytes == old_good_bytes)
> >  			good_bytes -= scsi_get_resid(cmd);
> > +	} else if (cmd->request->cmd_flags & REQ_DISCARD) {
> > +		/*
> > +		 * If this is a discard request that originated from the kernel
> > +		 * we need to free our payload here.  Note that we need to check
> > +		 * the request flag as the normal payload rules apply for
> > +		 * pass-through UNMAP / WRITE SAME requests.
> > +		 */
> > +		__free_page(bio_page(cmd->request->bio));
> 
> This is another layering violation:  the page is allocated in the Upper
> layer and freed in the mid-layer.
> 
> I really hate these growing contortions for discard.  They're a clear
> signal that we haven't implemented it right.
> 
> So let's first work out how it should be done.  I really like Tomo's
> idea of doing discard through the normal REQ_TYPE_FS route, which means
> we can control the setup in prep and the tear down in done, all confined
> to the ULD.
> 
> Where I think I'm at is partially what Christoph says:  The command
> transformation belongs in the ULD so that's where the allocation and
> deallocation should be, and partly what Tomo says in that we should
> eliminate the special case paths.
> 
> The payload vs actual request size should be a red herring if we've got
> everything correct:  only the ULD cares about the request parameters.
> Once we've got everything set up, the mid layer and LLD should only care
> about the parameters in the command, so we can confine the size changing
> part to the ULD doing the discard.
> 
> Could someone take a stab at this and see if it works without layering
> violations or any other problematic signals?
> 
> Thanks,
> 
> James

Well, I think that you overestimate the importance of scsi code too much.

There is a layering violation in the code. So what --- you either fix the 
layering violation or let it be there and grind your teeth on it. But in 
either case, that layering violation won't affect anyone except scsi 
developers.

On the other hand, if you say "because we want to avoid layering violation 
in SCSI, every issuer of discard request must supply an empty page", you 
create havoc all over the Linux codebase. md, dm, drbd, xvd, virtio --- 
whatever you think of, will be allocating a dummy page when constructing 
a discard request.

If the layering violation spans only scsi code, it can be eventually 
fixed, but this, much worse "layering violation" that will be spanning all 
block device midlayers, won't ever be fixed.

Imagine for example --- a discard request arrivers at a dm-snapshot 
device. The driver splits it into chunks, remaps each chunk to the 
physical chunk, submits the requests, the elevator merges adjacent 
requests and submits fewer bigger requests to the device. Now, if you had 
to allocate a zeroed page each time you are splitting the request, that 
would exhaust memory and burn cpu needlessly. You delete a 100MB file? --- 
fine, allocate a 100MB of zeroed pages.

So I say --- let there be a layering violation in the scsi code, but don't 
put this problem with a page allocation to all the other bio midlayer 
developers.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ