linux-kernel - Re: [PATCH] writeback: avoid race when update bandwidth

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20120614134818.GA15553@localhost>
Date:	Thu, 14 Jun 2012 21:48:18 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Wanpeng Li <liwp.linux@...il.com>, linux-kernel@...r.kernel.org,
	Gavin Shan <shangw@...ux.vnet.ibm.com>,
	Wanpeng Li <liswp@...ux.vnet.ibm.com>
Subject: Re: [PATCH] writeback: avoid race when update bandwidth

On Thu, Jun 14, 2012 at 11:36:45AM +1000, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 12:21:15PM +0800, Fengguang Wu wrote:
> > On Wed, Jun 13, 2012 at 01:56:47PM +1000, Dave Chinner wrote:
> > > On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> > > > On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <liwp@...ux.vnet.ibm.com>
> > > > 
> > > > That email address is no longer in use?
> > > > 
> > > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > > so the flushers who call wb_writeback to writeback pages will
> > > > > stuck when bandwidth update policy holds this lock. In order
> > > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > > is responsible for protecting bandwidth update policy.
> > > 
> > > This is not a race condition - it is a lock contention condition.
> > 
> > Nod.
> > 
> > > > This looks good to me. wb.list_lock could be contended and it's better
> > > > for bdi_update_bandwidth() to use a standalone and hardly contended
> > > > lock.
> > > 
> > > I'm not sure it will be "hardly contended". That's a global lock, so
> > > now we'll end up with updates on different bdis contending and it's
> > > not uncommon to see a couple of thousand processes on large machines
> > > beating on balance_dirty_pages().  Putting a global scope lock
> > > around such a function doesn't seem like a good solution to me.
> > 
> > It's more about the number of bdi's than the number of processes that matters.
> > Because here is a per-bdi 200ms ratelimit:
> > 
> > bdi_update_bandwidth():
> > 
> >        if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
> >                 return;         
> >        // lock it
> 
> So now you get a thousand processes on a thousand CPUs all hit that
> case at the same time because they are all writing to disk at the
> same time, all nicely synchronised by MPI. Lock contention ahoy!

Yeah, the cost does increase fast with number of CPUs...

> > So a global should be enough when there are only dozens of disks.
> 
> Only needs one bdi, just with lots of processes trying to hit it at
> the same time such that they all pass the time after check.

It's more related to number of CPUs: once task A updates
bdi->bw_time_stamp, the other tasks B, C, D, ... will see the updated
value and will all back off in the next 200ms period.

> > However, the global bandwidth_lock will probably become a problem when
> > there comes hundreds of disks. If there are (or will be) such setups,
> > I'm fine to revert to the old per-bdi locking.
> 
> There are setups with hundreds of disks. They also tend to
> have hundreds of CPUs, too....

OK.. I'll drop the change.

> > > Oh, and if you want to remove the dirty_lock from
> > > global_update_limit(), then replacing the lock with a cmpxchg loop
> > > will do it just fine....
> > 
> > Yes. But to be frank, I don't care about that dirty_lock at all,
> > because it has its own 200ms rate limiting :-)
> 
> That has the same problem, only it's currently nested inside another
> lock which isolates it from contention.  This is why measurement is
> important - until there is that evidence shows that the lock
> contention is a problem, don't change it because it generally has a
> unpredictable cascading effect that often results in worse
> contention that was there originally....

You are right, it's good attitude to avoid "might be better" changes
for some "suspected problem".

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/