lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yq15yqvw1f0.fsf@ca-mkp.ca.oracle.com>
Date:   Fri, 07 Jan 2022 19:21:28 -0500
From:   "Martin K. Petersen" <martin.petersen@...cle.com>
To:     Eric Wheeler <bcache@...ts.ewheeler.net>
Cc:     Coly Li <colyli@...e.de>, linux-block@...r.kernel.org,
        Jonathan Corbet <corbet@....net>,
        Kent Overstreet <kent.overstreet@...il.com>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "open list:BCACHE (BLOCK LAYER CACHE)" <linux-bcache@...r.kernel.org>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCH] bcache: make stripe_size configurable and persistent
 for hardware raid5/6


Eric,

> Even new new RAID controlers that _do_ provide `io_opt` still do _not_ 
> indicate partial_stripes_expensive (which is an mdraid feature, but Martin 
> please correct me if I'm wrong here).

partial_stripes_expensive is a bcache thing, I am not sure why it needs
a separate flag. It is implied, although I guess one could argue that
RAID0 is a special case since partial writes are not as painful as with
parity RAID.

The SCSI spec states that submitting an I/O that is smaller than io_min
"may incur delays in processing the command". And similarly, submitting
a command larger than io_opt "may incur delays in processing the
command".

IOW, the spec says "don't write less than an aligned multiple of the
stripe chunk size" and "don't write more than an aligned full
stripe". That leaves "aligned multiples of the stripe chunk size but
less than the full stripe width" unaccounted for. And I guess that's
what the bcache flag is trying to capture.

SCSI doesn't go into details about RAID levels and other implementation
details which is why the wording is deliberately vague. But obviously
the expectation is that partial stripe writes are slower than full.

In my book any component in the stack that sees either io_min or io_opt
should try very hard to send I/Os that are aligned multiples of those
values. I am not opposed to letting users manually twiddle the
settings. But I do think that we should aim for the stack doing the
right thing when it sees io_opt reported on a device.

-- 
Martin K. Petersen	Oracle Linux Engineering

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ