linux-ext4 - Re: [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic file systems)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAKdFiL6Gv8OGhoT6hhhAyjpzaWYxdFNMh3u0fnepVo3wh7kTkw@mail.gmail.com>
Date:	Sun, 15 Feb 2015 22:02:19 -0700
From:	Adrian Palmer <adrian.palmer@...gate.com>
To:	Alireza Haghdoost <haghdoost@...il.com>
Cc:	Andreas Dilger <adilger@...ger.ca>,
	ext4 development <linux-ext4@...r.kernel.org>,
	Linux Filesystem Development List 
	<linux-fsdevel@...r.kernel.org>
Subject: Re: [LSF/MM TOPIC] - SMR Modifications to EXT4 (and other generic
 file systems)

That is a an issue that is on deck to explore further.  The DM needs
to manage each disk independently, but aggregate them and present it
as 1 vdev.  The trick to be figured out is in how it mixes the disks
in a ZBD aware way.  Stripes of 256Mib are easily handled, but
impractical.  Stripes of 128k are practical, but not easily handled.

I see the changes that we're exploring/implementing as working on both
an SMR drive and a conventional drive.  ZBD does not require SMR, so
the superblock and group descriptor changes should not affect
conventional drives.  In fact, the gd will mark the bg as a
conventional zone by default, but can still use the ZBD changes
(forward-write and defragmentation) without the writepointer
information.

EXT4 will need to be forward-write only as per SMR/ZBD.  If working
with a combination of drives with small stripe sizes, re-writes would
work on one drive (conventional) but not the other (SMR).  The bulk of
the change would need to be in the DM, and will likely not bleed over
to the FS.  The exception I can see is that the bg size may need to
increase to accommodate multiple zones on multiple SMR drives (768MiB
or 1GiB BGs for RAID5).  The DM would be responsible for aggregating
the REPORT_ZONE data before presenting it to the FS (which would
behave as normally expected).  Of note, the standard requires zone
size as a power of 2, so a 3-disk RAID5 may violate that on
implementation.  RAID0 has similar constraints, and RAID1 can
operating in the same paradigm with no changes to zone information.

So, in short, the DM would have to be modified to pass the aggregated
zone information up to EXT4.  I don't see much divergence in the
proposed redesign of EXT4.

Adrian Palmer
Firmware Engineer II
R&D Firmware
Seagate, Longmont Colorado
720-684-1307

On Sun, Feb 15, 2015 at 1:27 PM, Alireza Haghdoost <haghdoost@...il.com> wrote:
>>> I think one of the important design decisions that needs to be made
>>> early on is whether it is possible to directly access some storage
>>> that can be updated with small random writes (either a separate flash
>>> LUN on the device, or a section of the disk that is formatted for 4kB
>>> sectors without SMR write requirements).
>>
>> This would be nice, but I looking more generally to what I call
>> 'single disk' systems.  Several more complicated FSs use a separate
>> flash drive for this purpose, but ext4 expects 1 vdev, and thus only
>> one type of media (agnostic).  We have hybrid HDD that have flash on
>> them, but the lba space isn't separate, so the FS or the DM couldn't
>> very easily treat them as 2 devices.
>>
>
> Adrian,
> What if vdev that has been exposed to ext4 composed out of md device
> instead of regular block device ? In other words, how do you see that
> these changes in EXT4 file system apply on software RAID array of SMR
> drives ?
>
> --Alireza
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html