[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200802170306.m1H36xeT031583@agora.fsl.cs.sunysb.edu>
Date: Sat, 16 Feb 2008 22:06:59 -0500
From: Erez Zadok <ezk@...sunysb.edu>
To: Hugh Dickins <hugh@...itas.com>
Cc: Erez Zadok <ezk@...sunysb.edu>, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: unionfs_copy_attr_times oopses
In message <Pine.LNX.4.64.0802011905230.32485@...nde.site>, Hugh Dickins writes:
> Hi Erez,
>
> Aside from the occasional "unionfs: new lower inode mtime" messages
> on directories (which I've got into the habit of ignoring now), the
> only problem I'm still suffering with unionfs over tmpfs (not tested
> any other fs's below it recently) is oops in unionfs_copy_attr_times.
>
> I believe I'm working with your latest: 2.6.24-rc8-mm1 plus the four
> patches you posted to lkml on 26 Jan. But this problem has been around
> for a while before that: I'd been hoping to debug it myself, but taken
> too long to make too little progress, so now handing over to you.
>
> The oops occurs while doing repeated "make -j20" kernel builds in a
> unionfs mount of a tmpfs (though I doubt tmpfs is relevant): most of
> my testing was while swapping, but today I find that's irrelevant,
> and it should happen much quicker without. SMP kernels (4 cpus),
> I haven't tried UP; happens with or without PREEMPT, may just be
> coincidence that it happens quicker on the machines with PREEMPT.
>
> Most commonly it's unionfs_copy_attr_times called from unionfs_create,
> but that's probably just the most common route in this workload:
> I've seen it also when called from unionfs_rename or unionfs_open or
> unionfs_unlink. It looks like there's a locking or refcounting bug,
> hence a race: the unionfs_inode_info which unionfs_copy_attr_times
> is working on gets changed underneath it, so it oopses on NULL
> lower_inodes.
[...]
Hugh,
Check out my latest set of patches (which correspond to release 2.2.4 of
Unionfs). Thanks to your info and the patch, I was able to trigger several
races more frequently, and fix them. I've tested my code with make -j N
(for N=4 and N=20), on a 4 cpu machine a well as a 2 cpu machine (w/
different amounts of memory and CPU speeds, also 32-bit vs 64-bit); I ran a
kernel compile for ~10-12 hours. With the patches I just posted, I wasn't
able to trigger any of the WARN_ON's in unionfs_copy_attr_times. I also
tried it while flushing caches via /proc, and/or performing branch-mgmt
commands in unionfs.
Give it a good shake and let me know what you find.
Thanks,
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists