[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49FDE3BB.505@garzik.org>
Date: Sun, 03 May 2009 14:34:35 -0400
From: Jeff Garzik <jeff@...zik.org>
To: Boaz Harrosh <bharrosh@...asas.com>
CC: Matthew Wilcox <matthew@....cx>, Hugh Dickins <hugh@...itas.com>,
Matthew Wilcox <willy@...ux.intel.com>,
linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org,
Jeff Garzik <jgarzik@...hat.com>, linux-scsi@...r.kernel.org,
Jens Axboe <jens.axboe@...cle.com>,
Bartlomiej Zolnierkiewicz <bzolnier@...il.com>,
Mark Lord <lkml@....ca>
Subject: Re: New TRIM/UNMAP tree published (2009-05-02)
Boaz Harrosh wrote:
> On 05/03/2009 06:42 PM, Matthew Wilcox wrote:
>> On Sun, May 03, 2009 at 06:02:51PM +0300, Boaz Harrosh wrote:
>>> I agree with Hugh. The allocation is done at, too-low in the food chain.
>>> (And that free of buffer at upper layer allocated by lower layer).
>>>
>>> I think you need to separate the: "does lld need buffer, what size"
>>> from the "here is buffer prepare", so upper layer that can sleep does
>>> sleep.
>> So you want two function pointers in the request queue relating to discard?
>>
>
> OK I don't know what I want, I guess. ;-)
>
> I'm not a block-device export but from the small osdblk device I maintain
> it looks like osdblk_prepare_flush which is set into:
> blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, osdblk_prepare_flush);
>
> does some internal structure setup, but the actual flush command is only executed
> later in the global osdblk_rq_fn which is set into:
> blk_init_queue(osdblk_rq_fn, &osdev->lock);
>
> But I'm not even sure that prepare_flush is called in a better context then
> queue_fn, and what does it means to let block devices take care of another
> new command type at queue_fn.
>
> I guess it comes back to Jeff Garzik's comment about not having a central
> place to ask the request what we need to do.
>
> But I do hate that allocation is done by driver and free by mid-layer,
> so yes two vectors, request_queue is allocated once per device it's not
> that bad. And later when Jeff's comment is addressed it can be removed.
May I presume you are referring to the following osdblk.c comment?
/* deduce our operation (read, write, flush) */
/* I wish the block layer simplified
* cmd_type/cmd_flags/cmd[]
* into a clearly defined set of RPC commands:
* read, write, flush, scsi command, power mgmt req,
* driver-specific, etc.
*/
Yes, the task of figuring out -what to do- in the queue's request
function is quite complex, and discard makes it even more so.
The API makes life difficult -- you have to pass temporary info to
yourself in ->prepare_flush_fn() and ->prepare_discard_fn(), and the
overall sum is a bewildering collection of opcodes, flags, and internal
driver notes to itself.
Add to this yet another prep function, ->prep_rq_fn()
It definitely sucks, especially with regards to failed atomic
allocations... but I think fixing this quite a big more than Matthew
probably willing to tackle ;-)
My ideal block layer interface would be a lot more opcode-based, e.g.
(1) create REQ_TYPE_DISCARD
(2) determine at init if queue (a) supports explicit DISCARD and/or (b)
supports DISCARD flag passed with READ or WRITE
(3) when creating a discard request, use block helpers w/ queue-specific
knowledge to create either
(a) one request, REQ_TYPE_FS, with discard flag or
(b) two requests, REQ_TYPE_FS followed by REQ_TYPE_DISCARD
(4) blkdev_issue_discard() would function like an empty barrier, and
unconditionally create REQ_TYPE_DISCARD.
This type of setup would require NO prepare_discard command, as all
knowledge would be passed directly to ->prep_rq_fn() and ->request_fn()
And to tangent a bit... I feel barriers should be handled in exactly
the same way. Create REQ_TYPE_FLUSH, which would be issued for above
examples #2a and #4, if the queue is setup that way.
All this MINIMIZES the amount of information a driver must "pass to
itself", by utilizing existing ->prep_fn_rq() and ->request_fn() pathways.
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists