lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 13 Apr 2012 14:06:03 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andi Kleen <andi@...stfloor.org>, Vivek Haldar <haldar@...gle.com>,
	Andreas Dilger <adilger@...ger.ca>, linux-ext4@...r.kernel.org,
	tim.c.chen@...ux.intel.com
Subject: Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent
 cache

On Fri, Apr 13, 2012 at 10:42:10AM -0700, Linus Torvalds wrote:
> 
> > Can we please revert fix it or revert
> > 556b27abf73833923d5cd4be80006292e1b31662 before release.
> 
> That commit ID doesn't make any sense, and doesn't seem to have
> anything to do with any statistics counters that your email talks
> about. So regardless, you'd need to explain why that commit causes the
> problems you talk about, I'm not going to revert a random commit that
> doesn't even look what you describe.
> 
> Ted?

I think Andi cited the wrong commit.  The commit in question is
77f4135f2a219a2127be6cc1208c42e6175b11dd, which first showed up in
2.6.39.  If you have a very fast (PCIe-attached would be required I
suspect) flash device, with a system with a large SMP box, and a
random-read benchmark which reads from large number of CPU's in
parallel, such that we thrash cache line containing
sbi->extent_cache_hits.

It makes sense that there is a problem here, and what had been
discussed was either (a) removing these counters entirely, or (b)
replacing them with a percpu counter.  At the moment these counters
aren't really worth it, although in the future when we replace the
single extent cache with a larger cache, the statistics would be more
interesting.

I just ran out of time before the 3.4 merge window, and quite frankly
I didn't think it was worth persuing this with a high urgency, since
(a) it's not a regression (the commit in question which added these
counters has been around since 2.6.39), and (b) in most circumstances
people it's not noticeable at all.

I'm willing to push commit to you to remove the counters for now, and
we'll probably add it back later using percpu counters if you think
it's worth making a change at this time --- or if Andi can explain why
he's treating this with a high degree of urgency.  Is there some
common use case that I'm missing which is being very badly impacted
with this cache-line thrashing?

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ