[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 01 Jan 2009 04:54:16 -0500
From: Jeff Garzik <jeff@...zik.org>
To: Benny Halevy <bhalevy@...asas.com>
CC: James Bottomley <James.Bottomley@...senPartnership.com>,
open-osd development <osd-dev@...n-osd.org>,
Boaz Harrosh <bharrosh@...asas.com>,
linux-scsi <linux-scsi@...r.kernel.org>,
linux-kernel@...r.kernel.org, avishay@...il.com,
viro@...IV.linux.org.uk, linux-fsdevel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [osd-dev] [PATCH 7/9] exofs: mkexofs
Benny Halevy wrote:
> On Dec. 31, 2008, 17:57 +0200, James Bottomley <James.Bottomley@...senPartnership.com> wrote:
>> On Wed, 2008-12-31 at 17:19 +0200, Boaz Harrosh wrote:
>>> Andrew Morton wrote:
>>>> On Tue, 16 Dec 2008 17:33:48 +0200
>>>> Boaz Harrosh <bharrosh@...asas.com> wrote:
>>>>
>>>>> We need a mechanism to prepare the file system (mkfs).
>>>>> I chose to implement that by means of a couple of
>>>>> mount-options. Because there is no user-mode API for committing
>>>>> OSD commands. And also, all this stuff is highly internal to
>>>>> the file system itself.
>>>>>
>>>>> - Added two mount options mkfs=0/1,format=capacity_in_meg, so mkfs/format
>>>>> can be executed by kernel code just before mount. An mkexofs utility
>>>>> can now be implemented by means of a script that mounts and unmount the
>>>>> file system with proper options.
>>>> Doing mkfs in-kernel is unusual. I don't think the above description
>>>> sufficiently helps the uninitiated understand why mkfs cannot be done
>>>> in userspace as usual. Please flesh it out a bit.
>>> There are a few main reasons.
>>> - There is no user-mode API for initiating OSD commands. Such a subsystem
>>> would be hundredfold bigger then the mkfs code submitted. I think it would be
>>> hard and stupid to maintain a complex user-mode API just for creating
>>> a couple of objects and writing a couple of on disk structures.
>> This is really a reflection of the whole problem with the OSD paradigm.
>>
>> In theory, a filesystem on OSD is a thin layer of metadata mapping
>> objects to files. Get this right and the storage will manage things,
>> like security and access and attributes (there's even a natural mapping
>> to the VFS concept of extended attributes). Plus, the storage has
>> enough information to manage persistence, backups and replication.
>>
>> The real problem is that no-one has actually managed to come up with a
>> useful VFS<->OSD mapping layer (even by extending or altering the VFS).
>> Every filesystem that currently uses OSD has a separate direct OSD
>> speaking interface (i.e. it slices out the block layer to do this and
>> talks directly to the storage).
>>
>> I suppose this could be taken to show that such a layer is impossibly
>> complex, as you assert, but its lack is reflected in strange looking
>> design decisions like in-kernel mkfs. It would also mean that there
>> would be very little layered code sharing between ODS based filesystems.
>
> I think that we may need to gain some more experience to extract the
> commonalities of such file systems. Currently we came up with the
> lowest possible denominator the osd initiator library that deals
> with command formatting and execution, including attrs, sense status,
> and security.
Not putting words in James' mouth, but I definitely agree that the
in-kernel mkfs raises a red flag or two. mkfs.ext3 for block-based
filesystems has direct and intimate knowledge of ext3 filesystem
structure, and it writes that information from userland directly to the
block(s) necessary.
Similarly, mkfs for an object-based filesystem should be issuing SCSI
commands to the OSD device from userland, AFAICS.
> To provide a higher level abstraction that would help with "administrative"
> tasks like mkfs and the like we already tossed an idea in the past -
> a file system that will represent the contents of an OSD in a namespace,
> for example: partition_id / object_id / {data, attrs / ..., ctl / ...}.
> Such a file system could provide a generic mapping which one could
> use to easily develop management applications for the OSD. That said,
> it's out of the scope of exofs which focuses mostly on the filesystem
> data and metadata paths.
That's far too complex for what is necessary. Just issue SCSI commands
from userland. We don't need an abstract interface specifically for
low-level details. The VFS is that abstract interface; anything else
should be low-level and purpose-built.
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists