[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070509143613.GA13375@amitarora.in.ibm.com>
Date: Wed, 9 May 2007 20:06:13 +0530
From: "Amit K. Arora" <aarora@...ux.vnet.ibm.com>
To: David Chinner <dgc@....com>
Cc: Mingming Cao <cmm@...ibm.com>, "Theodore Ts'o" <tytso@....edu>,
linux-ext4@...r.kernel.org,
Johann Lombardi <johann.lombardi@...l.net>,
linux-kernel@...r.kernel.org
Subject: Re: 2.6.21-ext4-1
On Wed, May 09, 2007 at 09:24:49AM +1000, David Chinner wrote:
> On Tue, May 08, 2007 at 03:05:56PM -0700, Mingming Cao wrote:
> > On Tue, 2007-05-08 at 12:50 +1000, David Chinner wrote:
> > > BTW, have you guys tested mmap writes into unwritten extents? ;)
> > >
> > I am not sure, Amit, have you done some mmap write test into
> > uninitialized extents?
> >
> > Sorry, I still not quite clear what's the mapped problem you are worry
> > about. Could you explain to me a bit more? thanks!
>
> XFS needs a ->page_mkwrite() callout to correctly map pages that
> have been dirtied by mmap that span unwritten extents. mmap reads
> (i.e. when the fault first occurred) treat unwritten extents like
> holes and so we need to remap them when they are dirtied to set all
> the unwritten state in the bufferheads correctly for writeback.
>
> See test 166 here:
>
> http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfstests/
> http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfstests/src/unwritten_mmap.c
Hi David,
I updated the above testcase to use fallocate() and ran it on the
ext4 (with the fallocate patches applied).
It threw following message on console :
# ./a.out 2000 /home/test/mnt/testfile
BUG: at fs/buffer.c:1640 __block_write_full_page()
08c6fcd0: [<080572a7>] dump_stack+0x1b/0x1d
08c6fce8: [<080c0d07>] __block_write_full_page+0xc4/0x24a
08c6fd10: [<080c21e0>] block_write_full_page+0xb0/0xb8
08c6fd40: [<0810c6a3>] ext4_ordered_writepage+0xcb/0x139
08c6fd80: [<08091ba8>] generic_writepages+0x178/0x2a0
08c6fdfc: [<08091cfd>] do_writepages+0x2d/0x38
08c6fe10: [<0808cea2>] __filemap_fdatawrite_range+0x62/0x6d
08c6fe88: [<0808cec5>] filemap_fdatawrite+0x18/0x1d
08c6fea8: [<080bf4bf>] do_fsync+0x26/0x67
08c6fec0: [<080bf521>] __do_fsync+0x21/0x35
08c6fed8: [<080bf542>] sys_fsync+0xd/0xf
08c6fee8: [<08058cac>] handle_syscall+0x8c/0xa4
08c6ff64: [<0806728a>] handle_trap+0xc1/0xc9
08c6ff80: [<08067683>] userspace+0x123/0x166
08c6ffd8: [<080589db>] fork_handler+0xa0/0xa2
08c6fffc: [<a55a5a5a>] 0xa55a5a5a
This is coming from:
fs/buffer.c
1628 if (block > last_block) {
1629 ..........
... .........
1639 } else if (!buffer_mapped(bh) && buffer_dirty(bh)) {
=> 1640 WARN_ON(bh->b_size != blocksize);
1641 err = get_block(inode, block, bh, 1);
1642 ..........
... .........
1649 }
Thus, I think in ext4 also we may need to have ->page_mkwrite
implemented.
I came across a patch you had submitted couple of months back which
implemented a generic block_page_mkwrite() function, to which any file
system could hook easily. Here is the link:
http://lkml.org/lkml/2007/3/18/198
Any idea when is it going to be in the mainline ? Not sure if it is
already part of some -mm kernel, but I did not find it in 2.6.21.
Or, since there was a talk of ->fault() replacing ->page_mkwrite() the
patch is not in the pipeline now ?
And, how does XFS behave now if we write to mmapped preallocated blocks,
since XFS also doesn't have ->page_mkwrite() implemented as of date ?
Thanks!
--
Regards,
Amit Arora
>
> The same behaviour is needed for delalloc extents to prevent ENOSPC
> errors on writeback - the mmap write needs to do the freespace
> accounting at the time the page is dirtied and that can only be done
> through the ->page_mkwrite callout. Otherwise ENOSPC will occur in
> the writeback path and that is a major pain....
>
> This may not be a problem for ext4, but I thought I better point
> out a couple of the more subtle problems mmap can introduce....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>
>
> ---
> xfstests/src/Makefile | 7
> xfstests/src/falloc.c | 376 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 381 insertions(+), 2 deletions(-)
>
> Index: xfs-cmds/xfstests/src/falloc.c
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ xfs-cmds/xfstests/src/falloc.c 2007-04-30 12:41:13.862302450 +1000
> @@ -0,0 +1,376 @@
> +/*
> + * Copyright (c) 2000-2003,2007 Silicon Graphics, Inc.
> + * All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation,
> + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "global.h"
> +
> +/* should end up include somewhere */
> +#define FA_ALLOCATE 0x1
> +#define FA_DEALLOCATE 0x2
> +#define FA_PREALLOCATE 0x3
> +
> +/* ia64 */
> +#define __NR_fallocate 1305
> +/*
> + * Block I/O parameterization. A basic block (BB) is the lowest size of
> + * filesystem allocation, and must equal 512. Length units given to bio
> + * routines are in BB's.
> + */
> +
> +/* Assume that if we have BTOBB, then we have the rest */
> +#ifndef BTOBB
> +#define BBSHIFT 9
> +#define BBSIZE (1<<BBSHIFT)
> +#define BBMASK (BBSIZE-1)
> +#define BTOBB(bytes) (((__u64)(bytes) + BBSIZE - 1) >> BBSHIFT)
> +#define BTOBBT(bytes) ((__u64)(bytes) >> BBSHIFT)
> +#define BBTOB(bbs) ((bbs) << BBSHIFT)
> +#define OFFTOBBT(bytes) ((__u64)(bytes) >> BBSHIFT)
> +
> +#define SEEKLIMIT32 0x7fffffff
> +#define BBSEEKLIMIT32 BTOBBT(SEEKLIMIT32)
> +#define SEEKLIMIT 0x7fffffffffffffffLL
> +#define BBSEEKLIMIT OFFTOBBT(SEEKLIMIT)
> +#endif
> +
> +#ifndef OFFTOBB
> +#define OFFTOBB(bytes) (((__u64)(bytes) + BBSIZE - 1) >> BBSHIFT)
> +#define BBTOOFF(bbs) ((__u64)(bbs) << BBSHIFT)
> +#endif
> +
> +#define FSBTOBB(f) (OFFTOBBT(FSBTOOFF(f)))
> +#define BBTOFSB(b) (OFFTOFSB(BBTOOFF(b)))
> +#define OFFTOFSB(o) ((o) / blocksize)
> +#define FSBTOOFF(f) ((f) * blocksize)
> +
> +void usage(void)
> +{
> + printf("usage: alloc [-b blocksize] [-d dir] [-f file] [-n] [-r] [-t]\n"
> + "flags:\n"
> + " -n - non-interractive mode\n"
> + " -r - real time file\n"
> + " -t - truncate on open\n"
> + "\n"
> + "commands:\n"
> + " a [offset] [length] - alloc\n"
> + " p [offset] [length] - prealloc\n"
> + " f [offset] [length] - free\n"
> + " m [offset] [length] - print map\n"
> + " s - sync file\n"
> + " t [offset] - truncate\n"
> + " q - quit\n"
> + " h/? - this help\n");
> +
> +}
> +
> +int fd = -1;
> +int blocksize;
> +char *filename;
> +
> +/* params are in bytes */
> +void map(off64_t off, off64_t len)
> +{
> + struct getbmap bm[2];
> +
> + bzero(bm, sizeof(bm));
> +
> + bm[0].bmv_count = 2;
> + bm[0].bmv_offset = OFFTOBB(off);
> + if (len==(off64_t)-1) { /* unsigned... */
> + bm[0].bmv_length = -1;
> + printf(" MAP off=%lld, len=%lld [%lld-]\n",
> + (long long)off, (long long)len,
> + (long long)BBTOFSB(bm[0].bmv_offset));
> + } else {
> + bm[0].bmv_length = OFFTOBB(len);
> + printf(" MAP off=%lld, len=%lld [%lld,%lld]\n",
> + (long long)off, (long long)len,
> + (long long)BBTOFSB(bm[0].bmv_offset),
> + (long long)BBTOFSB(bm[0].bmv_length));
> + }
> +
> + printf(" [ofs,count]: start..end\n");
> + for (;;) {
> +#ifdef XFS_IOC_GETBMAP
> + if (xfsctl(filename, fd, XFS_IOC_GETBMAP, bm) < 0) {
> +#else
> +#ifdef F_GETBMAP
> + if (fcntl(fd, F_GETBMAP, bm) < 0) {
> +#else
> +bozo!
> +#endif
> +#endif
> + perror("getbmap");
> + break;
> + }
> +
> + if (bm[0].bmv_entries == 0)
> + break;
> +
> + printf(" [%lld,%lld]: ",
> + (long long)BBTOFSB(bm[1].bmv_offset),
> + (long long)BBTOFSB(bm[1].bmv_length));
> +
> + if (bm[1].bmv_block == -1)
> + printf("hole");
> + else
> + printf("%lld..%lld",
> + (long long)BBTOFSB(bm[1].bmv_block),
> + (long long)BBTOFSB(bm[1].bmv_block +
> + bm[1].bmv_length - 1));
> + printf("\n");
> + }
> +}
> +
> +int
> +main(int argc, char **argv)
> +{
> + int c;
> + char *dirname = NULL;
> + int done = 0;
> + int status = 0;
> + struct flock64 f;
> + off64_t len;
> + char line[1024];
> + off64_t off;
> + int oflags;
> + static char *opnames[] = { "alloc",
> + "prealloc",
> + "free" };
> + int opno;
> + static int optab[] = { FA_ALLOCATE,
> + FA_PREALLOCATE,
> + FA_DEALLOCATE };
> + int rflag = 0;
> + struct statvfs64 svfs;
> + int tflag = 0;
> + int nflag = 0;
> + int unlinkit = 0;
> + __int64_t v;
> +
> + while ((c = getopt(argc, argv, "b:d:f:rtn")) != -1) {
> + switch (c) {
> + case 'b':
> + blocksize = atoi(optarg);
> + break;
> + case 'd':
> + if (filename) {
> + printf("can't specify both -d and -f\n");
> + exit(1);
> + }
> + dirname = optarg;
> + break;
> + case 'f':
> + if (dirname) {
> + printf("can't specify both -d and -f\n");
> + exit(1);
> + }
> + filename = optarg;
> + break;
> + case 'r':
> + rflag = 1;
> + break;
> + case 't':
> + tflag = 1;
> + break;
> + case 'n':
> + nflag++;
> + break;
> + default:
> + printf("unknown option\n");
> + usage();
> + exit(1);
> + }
> + }
> + if (!dirname && !filename)
> + dirname = ".";
> + if (!filename) {
> + static char tmpfile[] = "allocXXXXXX";
> +
> + mkstemp(tmpfile);
> + filename = malloc(strlen(tmpfile) + strlen(dirname) + 2);
> + sprintf(filename, "%s/%s", dirname, tmpfile);
> + unlinkit = 1;
> + }
> + oflags = O_RDWR | O_CREAT | (tflag ? O_TRUNC : 0);
> + fd = open(filename, oflags, 0666);
> + if (!nflag) {
> + printf("alloc:\n");
> + printf(" filename %s\n", filename);
> + }
> + if (fd < 0) {
> + perror(filename);
> + exit(1);
> + }
> + if (!blocksize) {
> + if (fstatvfs64(fd, &svfs) < 0) {
> + perror(filename);
> + status = 1;
> + goto done;
> + }
> + blocksize = (int)svfs.f_bsize;
> + }
> + if (blocksize<0) {
> + fprintf(stderr,"illegal blocksize %d\n", blocksize);
> + status = 1;
> + goto done;
> + }
> + printf(" blocksize %d\n", blocksize);
> + if (rflag) {
> + struct fsxattr a;
> +
> +#ifdef XFS_IOC_FSGETXATTR
> + if (xfsctl(filename, fd, XFS_IOC_FSGETXATTR, &a) < 0) {
> + perror("XFS_IOC_FSGETXATTR");
> + status = 1;
> + goto done;
> + }
> +#else
> +#ifdef F_FSGETXATTR
> + if (fcntl(fd, F_FSGETXATTR, &a) < 0) {
> + perror("F_FSGETXATTR");
> + status = 1;
> + goto done;
> + }
> +#else
> +bozo!
> +#endif
> +#endif
> +
> + a.fsx_xflags |= XFS_XFLAG_REALTIME;
> +
> +#ifdef XFS_IOC_FSSETXATTR
> + if (xfsctl(filename, fd, XFS_IOC_FSSETXATTR, &a) < 0) {
> + perror("XFS_IOC_FSSETXATTR");
> + status = 1;
> + goto done;
> + }
> +#else
> +#ifdef F_FSSETXATTR
> + if (fcntl(fd, F_FSSETXATTR, &a) < 0) {
> + perror("F_FSSETXATTR");
> + status = 1;
> + goto done;
> + }
> +#else
> +bozo!
> +#endif
> +#endif
> + }
> + while (!done) {
> + char *p;
> +
> + if (!nflag) printf("alloc> ");
> + fflush(stdout);
> + if (!fgets(line, 1024, stdin)) break;
> +
> + p=line+strlen(line);
> + if (p!=line&&p[-1]=='\n') p[-1]=0;
> +
> + opno = 0;
> + switch (line[0]) {
> + case 'f':
> + opno++;
> + case 'p':
> + opno++;
> + case 'a':
> + v = strtoll(&line[2], &p, 0);
> + if (*p == 'b') {
> + off = FSBTOOFF(v);
> + p++;
> + } else
> + off = v;
> + if (*p == '\0')
> + v = -1;
> + else
> + v = strtoll(p, &p, 0);
> + if (*p == 'b') {
> + len = FSBTOOFF(v);
> + p++;
> + } else
> + len = v;
> +
> + printf(" CMD %s, off=%lld, len=%lld\n",
> + opnames[opno], (long long)off, (long long)len);
> +
> + c = syscall(__NR_fallocate, fd, optab[opno], off, len);
> +
> + if (c < 0) {
> + perror(opnames[opno]);
> + break;
> + }
> +
> + map(off,len);
> + break;
> + case 'm':
> + p = &line[1];
> + v = strtoll(p, &p, 0);
> + if (*p == 'b') {
> + off = FSBTOOFF(v);
> + p++;
> + } else
> + off = v;
> + if (*p == '\0')
> + len = -1;
> + else {
> + v = strtoll(p, &p, 0);
> + if (*p == 'b')
> + len = FSBTOOFF(v);
> + else
> + len = v;
> + }
> + map(off,len);
> + break;
> + case 't':
> + p = &line[1];
> + v = strtoll(p, &p, 0);
> + if (*p == 'b')
> + off = FSBTOOFF(v);
> + else
> + off = v;
> + printf(" TRUNCATE off=%lld\n", (long long)off);
> + if (ftruncate64(fd, off) < 0) {
> + perror("ftruncate");
> + break;
> + }
> + break;
> + case 's':
> + printf(" SYNC\n");
> + fsync(fd);
> + break;
> + case 'q':
> + printf(" QUIT\n");
> + done = 1;
> + break;
> + case '?':
> + case 'h':
> + usage();
> + break;
> + default:
> + printf("unknown command '%s'\n", line);
> + break;
> + }
> + }
> + if (!nflag) printf("\n");
> +done:
> + if (fd != -1)
> + close(fd);
> + if (unlinkit)
> + unlink(filename);
> + exit(status);
> + /* NOTREACHED */
> +}
> Index: xfs-cmds/xfstests/src/Makefile
> ===================================================================
> --- xfs-cmds.orig/xfstests/src/Makefile 2007-04-23 16:22:06.000000000 +1000
> +++ xfs-cmds/xfstests/src/Makefile 2007-04-30 12:18:42.126750949 +1000
> @@ -14,7 +14,7 @@ TARGETS = dirstress fill fill2 getpagesi
>
> LINUX_TARGETS = loggen xfsctl bstat t_mtab getdevicesize \
> preallo_rw_pattern_reader preallo_rw_pattern_writer ftrunc trunc \
> - fs_perms testx looptest locktest unwritten_mmap
> + fs_perms testx looptest locktest unwritten_mmap falloc
>
> IRIX_TARGETS = open_unlink
>
> @@ -98,7 +98,10 @@ trunc: trunc.o
>
> fs_perms: fs_perms.o
> $(LINKTEST)
> -
> +
> +falloc: falloc.o
> + $(LINKTEST)
> +
> testx: testx.o
> $(LINKTEST)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists