linux-kernel - Re: Regression with FALLOC_FL_PUNCH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120702134104.GC785@redhat.com>
Date:	Mon, 2 Jul 2012 09:41:04 -0400
From:	Mike Snitzer <snitzer@...hat.com>
To:	Lukáš Czerner <lczerner@...hat.com>
Cc:	Zdenek Kabelac <zkabelac@...hat.com>,
	Hugh Dickins <hughd@...gle.com>,
	Mikulas Patocka <mpatocka@...hat.com>,
	Joe Thornber <ejt@...hat.com>,
	LVM general discussion and development 
	<linux-lvm@...hat.com>, amwang@...hat.com,
	Alasdair G Kergon <agk@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: Regression with FALLOC_FL_PUNCH_HOLE in 3.5-rc kernel

On Mon, Jul 02 2012 at  6:35am -0400,
Lukáš Czerner <lczerner@...hat.com> wrote:
> > 
> > So you're testing rather old kernel so you might be missing some
> > fixes there. Could you rerun the test with the recent kernel ?
> >
> > Also it appears that the bug here happens because dm requested a
> > destination page which is within the kernel space. It seems that
> > this has been initiated by the write request from the mirror target.
> > So I do not immediately see how punch hole (discard) is involved at
> > all. You might have been lucky enough to hit a different bug
> > probably ?
> > 
> > Looking at git log, this commit has been brought to my attention:
> > 
> > 0c535e0d6f463365c29623350dbd91642363c39b dm io: fix discard support
> > 
> > seems related to this crash.
> > 
> > Please retest with recent kernel.

Ah, you beat me to recommending that fix ;)
 
> So from the original backtrace for the problem Zdenek is seeing on 3.5.0-rc4
> (https://lkml.org/lkml/2012/6/30/98) I think that this is
> problem in the device mapper itself. I do not think it has anything
> to do with tmpfs or mm. According to bisects from Zdenek it clearly
> shows that the problem appear when the discard support for the loop
> device is added, so it is most likely related to the dm discard support.

What about using scsi_debug with the dm-mirror target?

Never say never, DM-mirror and/or dm-io code could still have an issue,
but the commit referenced above did fix discard with the mirror target
back in 3.3.
 
> Anyway, the backtrace points to the NULL pointed dereference in
> dm_rh_region_context() which is simple function:
> 
> void *dm_rh_region_context(struct dm_region *reg)
> {
>        return reg->rh->context;
> }
> 
> so either reg, or reg-rh is NULL. Now the only place this is used is
> from recovery_complete() in dm-raid1.c. So this is somewhat related
> to raid recovery. I am not familiar with the dm code, but can
> someone from the dm team look at this ?

I'll coordiinate with Zdenek.

> But just to be sure to rule out the punch hole thing Zdenek can you
> run your tests on the "real" discard capable device ? Or at least on
> the device which does not convert discard requests into punch hole ?
> You can use scsi_debug to create such device:
> 
> modprobe scsi_debug dev_size_mb=16 sector_size=512 num_tgts=1 lbpu=1

Great minds think alike ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/