linux-kernel - Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANP1eJH5g0oWaLO0nD7XAcAO-rFHNTGUopF1aPLEifAbPnPOKQ@mail.gmail.com>
Date:	Mon, 30 Mar 2015 18:49:06 -0400
From:	Milosz Tanski <milosz@...in.com>
To:	Jeremy Allison <jra@...ba.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-aio@...ck.org" <linux-aio@...ck.org>,
	Mel Gorman <mgorman@...e.de>,
	Volker Lendecke <Volker.Lendecke@...net.de>,
	Tejun Heo <tj@...nel.org>, Jeff Moyer <jmoyer@...hat.com>,
	"Theodore Ts'o" <tytso@....edu>, Al Viro <viro@...iv.linux.org.uk>,
	Linux API <linux-api@...r.kernel.org>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	linux-arch@...r.kernel.org, Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

On Mon, Mar 30, 2015 at 4:32 PM, Jeremy Allison <jra@...ba.org> wrote:
> On Mon, Mar 30, 2015 at 01:26:25PM -0700, Andrew Morton wrote:
>>
>> cons:
>>
>> d) fincore() is more expensive
>>
>> e) fincore() will very occasionally block
>
> The above is the killer for Samba. If fincore
> returns true but when we schedule the pread
> we block, we're hosed.
>
> Once we block, we're done serving clients on the main
> thread until this returns. That can cause unpredictable
> response times which can cause client timeouts.
>
> A fincore+pread solution that blocks is simply unsafe
> to use for us. We'll have to stay with the threadpool :-(.

We're getting data from a network filesystem Ceph in our case, but it
could be pNFS. In many cases those filesystems have some kind
hierarchy and it's not uncommon for us to se requests that take 20 to
25 milliseconds to complete. In this case the miss becomes very
expensive. And it's not just that one requests experiences the slow
down all the request being serviced by that (single) epoll thread
experience head-of-line blocking because of one stalled request.

10K request a second is a common load for many web services / video
servers servings chunks of data. If we experience one miss a second,
that 25 million stall will impact 250 other requests (all of them will
have a 25ms latency tacked on).

>
>> And I don't believe that e) will be a problem in the real world.  It's
>> a significant increase in worst-case latency and a negligible increase
>> in average latency.  I've asked at least three times for someone to
>> explain why this is unacceptable and no explanation has been provided.
>
> See above.



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@...in.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/