[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yq15yqvw1f0.fsf@ca-mkp.ca.oracle.com>
Date: Fri, 07 Jan 2022 19:21:28 -0500
From: "Martin K. Petersen" <martin.petersen@...cle.com>
To: Eric Wheeler <bcache@...ts.ewheeler.net>
Cc: Coly Li <colyli@...e.de>, linux-block@...r.kernel.org,
Jonathan Corbet <corbet@....net>,
Kent Overstreet <kent.overstreet@...il.com>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
"open list:BCACHE (BLOCK LAYER CACHE)" <linux-bcache@...r.kernel.org>,
"Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] bcache: make stripe_size configurable and persistent
for hardware raid5/6
Eric,
> Even new new RAID controlers that _do_ provide `io_opt` still do _not_
> indicate partial_stripes_expensive (which is an mdraid feature, but Martin
> please correct me if I'm wrong here).
partial_stripes_expensive is a bcache thing, I am not sure why it needs
a separate flag. It is implied, although I guess one could argue that
RAID0 is a special case since partial writes are not as painful as with
parity RAID.
The SCSI spec states that submitting an I/O that is smaller than io_min
"may incur delays in processing the command". And similarly, submitting
a command larger than io_opt "may incur delays in processing the
command".
IOW, the spec says "don't write less than an aligned multiple of the
stripe chunk size" and "don't write more than an aligned full
stripe". That leaves "aligned multiples of the stripe chunk size but
less than the full stripe width" unaccounted for. And I guess that's
what the bcache flag is trying to capture.
SCSI doesn't go into details about RAID levels and other implementation
details which is why the wording is deliberately vague. But obviously
the expectation is that partial stripe writes are slower than full.
In my book any component in the stack that sees either io_min or io_opt
should try very hard to send I/Os that are aligned multiples of those
values. I am not opposed to letting users manually twiddle the
settings. But I do think that we should aim for the stack doing the
right thing when it sees io_opt reported on a device.
--
Martin K. Petersen Oracle Linux Engineering
Powered by blists - more mailing lists