linux-kernel - Re: v4.2-rc dcache regression, probably 75a6f82a0d10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.11.1507311207160.11122@eggly.anvils>
Date:	Fri, 31 Jul 2015 12:42:30 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	"J. Bruce Fields" <bfields@...ldses.org>,
	Dominique Martinet <dominique.martinet@....fr>,
	Hugh Dickins <hughd@...gle.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: v4.2-rc dcache regression, probably 75a6f82a0d10

On Fri, 31 Jul 2015, Linus Torvalds wrote:
> On Fri, Jul 31, 2015 at 10:46 AM, Hugh Dickins <hughd@...gle.com> wrote:
> >
> > Sounds like a dcache problem, and 75a6f82a0d10 seemed the only
> > likely candidate, so I experimented with reverting it yesterday,
> > and ran successfully for 24 hours.
> 
> Hmm. Sounds odd. Are you running nfsd? That would explain why it
> happens on ext4 but not tmpfs: ext4 has a get_parent method that can
> get a disconnected entry, while tmpfs does not.
> 
> That said, your load doesn't sound like it would actually ever trigger
> this, unless you just didn't mention that you also end up using that
> filesystem over nfs on another machine.

No, no nfsd nor any kind of networking filesystem stuff going on.
Right, I never looked to see what DCACHE_DISCONNECTED is actually
about, just rushed ahead and tried running with the revert.

> 
> So leave it running a while longer, but maybe it's 4bf46a272647 like
> Dominique suspects. Although I don't see how that could trigger
> anything either..

I restarted with a slightly different version of the load this
morning, which has sometimes shown the issue more easily - I thought
it better to restart with a variant than persist with a run that
might have settled into a protected pattern.  We'll see what that
shows later on.

It will indeed be weird and odd if it confirms that DCACHE_DISCONNECTED
revert is good.  I agree that Dominique's 4bf46a272647 seems now more
likely, if still unlikely; but that was included in v4.1, and I saw
no problem with v4.1 once the rmap_walk() skip was fixed.

There may be some completely unrelated commit which alters the
timing enough to expose or mask whatever is the guilty commit.
Or something corrupting dentry->d_flags occasionally.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/