lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Jul 2014 01:07:03 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Sasha Levin <sasha.levin@...cle.com>
cc:	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Konstantin Khlebnikov <koct9i@...il.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michel Lespinasse <walken@...gle.com>,
	Lukas Czerner <lczerner@...hat.com>,
	Dave Jones <davej@...hat.com>, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] shmem: fix faulting into a hole while it's punched,
 take 3

On Mon, 21 Jul 2014, Sasha Levin wrote:
> On 07/19/2014 07:44 PM, Hugh Dickins wrote:
> >> Otherwise, I've been unable to reproduce the shmem_fallocate hang.
> > Great.  Andrew, I think we can say that it's now safe to send
> > 1/2 shmem: fix faulting into a hole, not taking i_mutex
> > 2/2 shmem: fix splicing from a hole while it's punched
> > on to Linus whenever suits you.
> > 
> > (You have some other patches in the mainline-later section of the
> > mmotm/series file: they're okay too, but not in doubt as these two were.)
> 
> I think we may need to hold off on sending them...
> 
> It seems that this code in shmem_fault():
> 
> 	/*
> 	 * shmem_falloc_waitq points into the shmem_fallocate()
> 	 * stack of the hole-punching task: shmem_falloc_waitq
> 	 * is usually invalid by the time we reach here, but
> 	 * finish_wait() does not dereference it in that case;
> 	 * though i_lock needed lest racing with wake_up_all().
> 	 */
> 	spin_lock(&inode->i_lock);
> 	finish_wait(shmem_falloc_waitq, &shmem_fault_wait);
> 	spin_unlock(&inode->i_lock);
> 
> Is problematic. I'm not sure what changed, but it seems to be causing everything
> from NULL ptr derefs:
> 
> [  169.922536] BUG: unable to handle kernel NULL pointer dereference at 0000000000000631
> 
> To memory corruptions:
> 
> [ 1031.264226] BUG: spinlock bad magic on CPU#1, trinity-c99/25740
> 
> And hangs:
> 
> [  212.010020] INFO: rcu_preempt detected stalls on CPUs/tasks:

Ugh.

I'm so tired of this, I'm flailing around here, and could use some help.

But there is one easy change which might do it: please would you try
changing the TASK_KILLABLE a few lines above to TASK_UNINTERRUPTIBLE.

I noticed when deciding on the i_lock'ing there, that a lot of the
difficulty in races between the two ends, came from allowing KILLABLE
at the faulting end.  Which was a nice courtesy, but one I'll gladly
give up on now, if it is responsible for these troubles.

Please give it a try, I don't know what else to suggest.  And I've
no idea why the problem should emerge only now.  If this change
does appear to fix it, please go back and forth with and without,
to gather confidence that the problem is still reproducible without
the fix.  If fix it turns out to be - touch wood.

Thanks,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ