[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120702134104.GC785@redhat.com>
Date: Mon, 2 Jul 2012 09:41:04 -0400
From: Mike Snitzer <snitzer@...hat.com>
To: Lukáš Czerner <lczerner@...hat.com>
Cc: Zdenek Kabelac <zkabelac@...hat.com>,
Hugh Dickins <hughd@...gle.com>,
Mikulas Patocka <mpatocka@...hat.com>,
Joe Thornber <ejt@...hat.com>,
LVM general discussion and development
<linux-lvm@...hat.com>, amwang@...hat.com,
Alasdair G Kergon <agk@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: Regression with FALLOC_FL_PUNCH_HOLE in 3.5-rc kernel
On Mon, Jul 02 2012 at 6:35am -0400,
Lukáš Czerner <lczerner@...hat.com> wrote:
> >
> > So you're testing rather old kernel so you might be missing some
> > fixes there. Could you rerun the test with the recent kernel ?
> >
> > Also it appears that the bug here happens because dm requested a
> > destination page which is within the kernel space. It seems that
> > this has been initiated by the write request from the mirror target.
> > So I do not immediately see how punch hole (discard) is involved at
> > all. You might have been lucky enough to hit a different bug
> > probably ?
> >
> > Looking at git log, this commit has been brought to my attention:
> >
> > 0c535e0d6f463365c29623350dbd91642363c39b dm io: fix discard support
> >
> > seems related to this crash.
> >
> > Please retest with recent kernel.
Ah, you beat me to recommending that fix ;)
> So from the original backtrace for the problem Zdenek is seeing on 3.5.0-rc4
> (https://lkml.org/lkml/2012/6/30/98) I think that this is
> problem in the device mapper itself. I do not think it has anything
> to do with tmpfs or mm. According to bisects from Zdenek it clearly
> shows that the problem appear when the discard support for the loop
> device is added, so it is most likely related to the dm discard support.
What about using scsi_debug with the dm-mirror target?
Never say never, DM-mirror and/or dm-io code could still have an issue,
but the commit referenced above did fix discard with the mirror target
back in 3.3.
> Anyway, the backtrace points to the NULL pointed dereference in
> dm_rh_region_context() which is simple function:
>
> void *dm_rh_region_context(struct dm_region *reg)
> {
> return reg->rh->context;
> }
>
> so either reg, or reg-rh is NULL. Now the only place this is used is
> from recovery_complete() in dm-raid1.c. So this is somewhat related
> to raid recovery. I am not familiar with the dm code, but can
> someone from the dm team look at this ?
I'll coordiinate with Zdenek.
> But just to be sure to rule out the punch hole thing Zdenek can you
> run your tests on the "real" discard capable device ? Or at least on
> the device which does not convert discard requests into punch hole ?
> You can use scsi_debug to create such device:
>
> modprobe scsi_debug dev_size_mb=16 sector_size=512 num_tgts=1 lbpu=1
Great minds think alike ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists