netdev - Re: Distributed storage.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070804163740.GB14175@2ka.mipt.ru>
Date:	Sat, 4 Aug 2007 20:37:40 +0400
From:	Evgeniy Polyakov <johnpol@....mipt.ru>
To:	Daniel Phillips <phillips@...nq.net>
Cc:	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: Distributed storage.

On Fri, Aug 03, 2007 at 06:19:16PM -0700, Daniel Phillips (phillips@...nq.net) wrote:
> It depends on the characteristics of the physical and virtual block 
> devices involved.  Slow block devices can produce surprising effects.  
> Ddsnap still qualifies as "slow" under certain circumstances (big 
> linear write immediately following a new snapshot). Before we added 
> throttling we would see as many as 800,000 bios in flight.  Nice to 

Mmm, sounds tasty to work with such a system :)

> know the system can actually survive this... mostly.  But memory 
> deadlock is a clear and present danger under those conditions and we 
> did hit it (not to mention that read latency sucked beyond belief). 
> 
> Anyway, we added a simple counting semaphore to throttle the bio traffic 
> to a reasonable number and behavior became much nicer, but most 
> importantly, this satisfies one of the primary requirements for 
> avoiding block device memory deadlock: a strictly bounded amount of bio 
> traffic in flight.  In fact, we allow some bounded number of 
> non-memalloc bios *plus* however much traffic the mm wants to throw at 
> us in memalloc mode, on the assumption that the mm knows what it is 
> doing and imposes its own bound of in flight bios per device.   This 
> needs auditing obviously, but the mm either does that or is buggy.  In 
> practice, with this throttling in place we never saw more than 2,000 in 
> flight no matter how hard we hit it, which is about the number we were 
> aiming at.  Since we draw our reserve from the main memalloc pool, we 
> can easily handle 2,000 bios in flight, even under extreme conditions.
> 
> See:
>     http://zumastor.googlecode.com/svn/trunk/ddsnap/kernel/dm-ddsnap.c
>     down(&info->throttle_sem);
> 
> To be sure, I am not very proud of this throttling mechanism for various 
> reasons, but the thing is, _any_ throttling mechanism no matter how 
> sucky solves the deadlock problem.  Over time I want to move the 

make_request_fn is always called in process context, we can wait in it
for memory in mempool. Although that means we already in trouble.

I agree, any kind of high-boundary leveling must be implemented in
device itself, since block layer does not know what device is at the end
and what it will need to process given block request.

-- 
	Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html