linux-kernel - Re: Queue upcall locking (was: [dm-devel] [RFC][PATCH] fix dm_any_congested() to properly sync up with suspend code path)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0811100929380.16965@hs20-bc2-1.build.redhat.com>
Date:	Mon, 10 Nov 2008 09:32:27 -0500 (EST)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
cc:	Christoph Hellwig <hch@...radead.org>,
	Chandra Seetharaman <sekharan@...ibm.com>,
	Alasdair G Kergon <agk@...hat.com>,
	dm-devel <dm-devel@...hat.com>, linux-kernel@...r.kernel.org,
	axboe@...nel.dk
Subject: Re: Queue upcall locking (was: [dm-devel] [RFC][PATCH] fix
 dm_any_congested() to properly sync up with suspend code path)



On Mon, 10 Nov 2008, Peter Zijlstra wrote:

> On Mon, 2008-11-10 at 09:19 -0500, Mikulas Patocka wrote:
> > On Mon, 10 Nov 2008, Christoph Hellwig wrote:
> > 
> > > On Mon, Nov 10, 2008 at 08:11:51AM -0500, Mikulas Patocka wrote:
> > > > For upstream Linux developers: you are holding a spinlock and calling 
> > > > bdi*_congested functions that can take indefinite amount of time (there 
> > > > are even users reporting having 50 disks in one logical volume or so). I 
> > > > think it would be good to move these calls out of spinlocks.
> > > 
> > > Umm, they shouldn't block that long, as that completely defeats their
> > > purpose.  These functions are mostly used to avoid throwing more I/O at
> > > a congested device if pdflush could do more useful things instead.  But
> > > if it blocks in those functions anyway we wouldn't have to bother using
> > > them.  Do you have more details about the uses cases when this happens
> > > and where the routines spend so much time?
> > 
> > For device mapper, congested_fn asks every device in the tree and make OR 
> > of their bits --- so if the user has 50 devices, it asks them all.
> > 
> > For md-linear, md-raid0, md-raid1, md-raid10 and md-multipath it does the 
> > same --- asking every device.
> > 
> > If you have a better idea how to implement congested_fn, say it.
> 
> Fix the infrastructure by adding a function call so that you can have
> the individual devices report their congestion state to the aggregate.
> 
> Then congestion_fn can return a valid state in O(1) because the state is
> keps up-to-date by the individual state changes.
> 
> IOW, add a set_congested_fn() and clear_congested_fn().

If you have a physical disk that has many LVM volumes on it, you end up in 
a situation when disk congestion state change is reported to all the 
volumes. So it will create O(n) problem at the other side.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/