[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <000901c73439$297f3050$6721100a@nuitysystems.com>
Date: Tue, 9 Jan 2007 13:57:29 -0800
From: "Hua Zhong" <hzhong@...il.com>
To: "'Hugh Dickins'" <hugh@...itas.com>
Cc: <linux-kernel@...r.kernel.org>, <hch@...radead.org>,
<kenneth.w.chen@...el.com>, <akpm@...l.org>, <torvalds@...l.org>,
<mjt@....msk.ru>
Subject: RE: [PATCH] support O_DIRECT in tmpfs/ramfs
> > Here is a simple patch that does it.
>
> Looks more likely to work than Ken's - which I didn't try,
> but I couldn't see what magic prevented it from just going BUG.
>
> But I have to say, having seen the ensuing requests for this
> to impose the same constraints as other implementations of
> O_DIRECT (though NFS does not), I've veered right back to my
> original position: this all just seems silly to me. O_DIRECT
> is and always has been rather an awkward hack (Linus
> described it in stronger terms!), supported by many but not
> all filesystems: shall we just leave it at that?
So I take your word that NFS does not impose this restraint,
which means filesystems could choose their own alignment
requirement that makes sense. So it would not be too horrible
if tmpfs chooses to be liberal.
In fact, in the O_DIRECT man page it says:
O_DIRECT
[....] Under Linux 2.4 transfer sizes, and the alignment of
user buffer and file offset must all be multiples of the logi-
cal block size of the file system. Under Linux 2.6 alignment to
512-byte boundaries suffices.
So even Linux 2.4 and 2.6 are different - 2.6 is less restrictive.
My point is that as long as we don't put more restrictions, it should
not cause real problems.
And about Linus..let's put his comment aside because O_DIRECT
is there to stay. :-) In fact, since O_DIRECT is not the most
beaufitul piece of code in the kernel, what I try to do is just to
make software developer's life easier by making filesystem
behavior as close to each other as possible with the minimal
effort.
> In particular, having now looked into the code, I'm amused to
> be reminded that one of its particular effects is to
> invalidate the pagecache for the area directly written.
> Well, it's hardly going to be worth replicating that
> behaviour with tmpfs or ramfs; yet if we don't, then we stand
> accused of it behaving misleadingly differently on them.
>
> I think Michael, who started off this discussion, did just the right
> thing: used a direct_IO filesystem on a loop device on a tmpfs file.
That's a rather heavy-weight workaround don't you think?
> > 1. A new fs flag FS_RAM_BASED is added and the O_DIRECT
> flag is ignored
> > if this flag is set (suggestions on a better name?)
> >
> > 2. Specify FS_RAM_BASED for tmpfs and ramfs.
>
> If this is pursued (not my preference, but let me stand aside
> now), you'd want to add in at least hugetlbfs and
> tiny-shmem.c. And set your (renamed) FS_RAM_BASED flag in
> ext2_aops_xip: that seems to be what they're wanting, then
> you can remove that strange test for
> f->f_mapping->a_ops->get_xip_page from __dentry_open.
>
> >
> > 3. When EINVAL is returned only a fput is done. I changed it to go
> > through cleanup_all. But there is still a cleanup problem:
>
> Is that change correct? Are you saying that the existing
> code leaks some structures? If so, please do send a patch to
> fix just that as soon as you can. But are you sure?
Having looked at the code more closely, the change is probably
not correct. fput(f) apparently does everything cleanup_all does,
and more, despite it's a single call. I guess those names are
a bit confusing at first glance. :-)
> > If a new file is created and then EINVAL is returned due to
> > O_DIRECT, the file is still left on the disk. I am not exactly
> > sure how to fix it other than adding another fs flag so we
> > could check O_DIRECT support at a much earlier stage.
> > Comments on how to fix it?
>
> None from me, sorry. It's untidy, but not a new issue you
> have to fix.
Well, looks like people are not in consensus to add the tmpfs
direct-io support, but since I've looked at the code, it would be
nice to fix this bug though.
The get_xip_page thing you mentioned makes it a bit more
complicated since XIP support is a mount option, not a
register_filesystem time option. If we ought to add a flag somewhere,
where is the right place? vfsmount?
I can cook up a patch for this bug if people think it's worth fixing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists