[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAExFE6nqRk6zv-2NX40eCFe3CPhQKuPF86e_PuCBAgNwJBkpSg@mail.gmail.com>
Date: Tue, 10 Apr 2018 15:26:11 +0530
From: Sayan Ghosh <sgdgp.2014@...il.com>
To: Andreas Dilger <adilger@...ger.ca>
Cc: "Theodore Y. Ts'o" <tytso@....edu>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Linux FS Devel <linux-fsdevel@...r.kernel.org>,
"Bhattacharya, Suparna" <suparna.bhattacharya@....com>,
niloy ganguly <ganguly.niloy@...il.com>,
Madhumita Mallick <madhu.cse.ju@...il.com>,
"Bharde, Madhumita" <madhumita.bharde@....com>
Subject: Re: [Patch 0/4] RFC : Support for data gradation of a single file.
Hi Andreas,
> In the absence of other information, the Stream ID would just mean "group
> allocations with the same ID together". After some discussion, it looks
> like the latest patch has generic "lifetime" hints rather than "stream IDs",
> but the end result is largely the same.
I looked up the links you provided for StreamID which provides
lifetime hints for a file. In our case we have different importance
levels/grade levels pertaining to different blocks of a single file
itself. I am not sure if akin to lifetime hints, different *allocation
type hint* can be achieved by using StreamID. However I am yet to read
details about the concept of StreamID to see if we can use StreamID to
our advantage in allocations of different blocks of a single file to
separate tiers, as well as in providing a reduced view. Any insight on
this would be really helpful.
> series instead of trying to introduce a new interface. IMHO, there is
> no need to make these hints persistent on disk, since their state could
> be inferred by the allocation placement directly
The problem with not making the hints persistent can be 1) if the
higher graded block got stored in HDD due to for e.g - overflowing of
the higher tier, but is critical from application point of view(can be
accessed from hdd in case of our code) and, 2) to preserve grade
information even when the file is copied : Suppose the higher tier
gets full, thus we store the high graded blocks of file in the lower
tier, and after storing we delete the grade metadata as well. Now if
we copy this file to some other mixed block device which has
sufficient space in higher tier we would still not be able to store
that high graded block in higher tier here (in case of inferring the
state by the allocation placement).
> That said, having a hard-coded separation of flash vs. disks does not
> make sense, even from an intermediate development point of view. There
> definitely should be a block-device interface for querying what the
> actual layout is, perhaps something like the SMR zones?
Yes, I agree, that the ideal situation would be to have a mechanism to
identify the segment boundaries automatically inside the LVM. But we
were not able to get a method to access the boundaries or rather the
location of a free block in each segment by such system call.
So, in order to just test out the system we proceeded by hardcoding
the boundaries as per our simulated LVM. But since this is not
practical we provided the TODO/FIX IT in those areas. We are still
looking for a good mechanism, and would welcome any
advice/suggestions.
Also, we chose to use Ext4 since it is generally the most commonly
used file system in linux based systems. However, I am not aware if
the problem of getting the boundaries can be solved in a simpler
manner by using XFS.
Regards,
Sayan Ghosh
On Mon, Apr 9, 2018 at 9:33 AM, Andreas Dilger <adilger@...ger.ca> wrote:
> On Apr 6, 2018, at 4:27 PM, Theodore Y. Ts'o <tytso@....edu> wrote:
>> The other thing to consider is whether it makes any sense at all to
>> solve this problem by haing a single file system where part of the
>> storage is DAX, and part is not. Why not just have two file systems,
>> one which is 100% DAX, and another which is 100% HDD/SSD, and store
>> the data in two files in two different file systems?
>
> I think there definitely *are* benefits to having both flash and HDDs
> (and/or other different storage classes such as RAID-10 and RAID-6) in
> the same filesystem namespace. This is the premise behind bcache,
> XFS realtime volumes, Btrfs, etc.
>
> That said, having a hard-coded separation of flash vs. disks does not
> make sense, even from an intermediate development point of view. There
> definitely should be a block-device interface for querying what the
> actual layout is, perhaps something like the SMR zones?
>
> Alternately, ext4 could add something akin to the realtime volume in
> XFS, where it can directly address multiple storage devices to handle
> different storage classes, but that would need at least some amount of
> development. It was actually one of the options on the table for the
> early ext2resize development, to split the ext4 block groups across
> devices and then concatenate them logically at runtime. That would
> allow e.g. some number of DAX block groups, NVMe block groups, and HDD
> RAID-6 block groups all in the same filesystem. Even then, there would
> need to be some way for ext4 to query the storage type of the underlying
> devices, so that these could be mapped to the lifetime hints.
>
> Cheers, Andreas
>
>
>
>
>
</tytso@....edu></adilger@...ger.ca>
Powered by blists - more mailing lists