linux-kernel - Re: [PATCH] writeback: avoid race when update bandwidth

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120614013645.GA7339@dastard>
Date:	Thu, 14 Jun 2012 11:36:45 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Fengguang Wu <fengguang.wu@...el.com>
Cc:	Wanpeng Li <liwp.linux@...il.com>, linux-kernel@...r.kernel.org,
	Gavin Shan <shangw@...ux.vnet.ibm.com>,
	Wanpeng Li <liswp@...ux.vnet.ibm.com>
Subject: Re: [PATCH] writeback: avoid race when update bandwidth

On Wed, Jun 13, 2012 at 12:21:15PM +0800, Fengguang Wu wrote:
> On Wed, Jun 13, 2012 at 01:56:47PM +1000, Dave Chinner wrote:
> > On Tue, Jun 12, 2012 at 07:21:29PM +0800, Fengguang Wu wrote:
> > > On Tue, Jun 12, 2012 at 06:26:43PM +0800, Wanpeng Li wrote:
> > > > From: Wanpeng Li <liwp@...ux.vnet.ibm.com>
> > > 
> > > That email address is no longer in use?
> > > 
> > > > Since bdi->wb.list_lock is used to protect the b_* lists,
> > > > so the flushers who call wb_writeback to writeback pages will
> > > > stuck when bandwidth update policy holds this lock. In order
> > > > to avoid this race we can introduce a new bandwidth_lock who
> > > > is responsible for protecting bandwidth update policy.
> > 
> > This is not a race condition - it is a lock contention condition.
> 
> Nod.
> 
> > > This looks good to me. wb.list_lock could be contended and it's better
> > > for bdi_update_bandwidth() to use a standalone and hardly contended
> > > lock.
> > 
> > I'm not sure it will be "hardly contended". That's a global lock, so
> > now we'll end up with updates on different bdis contending and it's
> > not uncommon to see a couple of thousand processes on large machines
> > beating on balance_dirty_pages().  Putting a global scope lock
> > around such a function doesn't seem like a good solution to me.
> 
> It's more about the number of bdi's than the number of processes that matters.
> Because here is a per-bdi 200ms ratelimit:
> 
> bdi_update_bandwidth():
> 
>        if (time_is_after_eq_jiffies(bdi->bw_time_stamp + BANDWIDTH_INTERVAL))
>                 return;         
>        // lock it

So now you get a thousand processes on a thousand CPUs all hit that
case at the same time because they are all writing to disk at the
same time, all nicely synchronised by MPI. Lock contention ahoy!

> So a global should be enough when there are only dozens of disks.

Only needs one bdi, just with lots of processes trying to hit it at
the same time such that they all pass the time after check.

> However, the global bandwidth_lock will probably become a problem when
> there comes hundreds of disks. If there are (or will be) such setups,
> I'm fine to revert to the old per-bdi locking.

There are setups with hundreds of disks. They also tend to
have hundreds of CPUs, too....

> > Oh, and if you want to remove the dirty_lock from
> > global_update_limit(), then replacing the lock with a cmpxchg loop
> > will do it just fine....
> 
> Yes. But to be frank, I don't care about that dirty_lock at all,
> because it has its own 200ms rate limiting :-)

That has the same problem, only it's currently nested inside another
lock which isolates it from contention.  This is why measurement is
important - until there is that evidence shows that the lock
contention is a problem, don't change it because it generally has a
unpredictable cascading effect that often results in worse
contention that was there originally....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/