[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1bqixtsr1.fsf@ebiederm.dsl.xmission.com>
Date: Tue, 13 Mar 2007 03:09:06 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Herbert Poetzl <herbert@...hfloor.at>
Cc: Pavel Emelianov <xemul@...ru>, containers@...ts.osdl.org,
Paul Menage <menage@...gle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH 1/7] Resource counters
Herbert Poetzl <herbert@...hfloor.at> writes:
> On Sun, Mar 11, 2007 at 01:00:15PM -0600, Eric W. Biederman wrote:
>> Herbert Poetzl <herbert@...hfloor.at> writes:
>>
>> >
>> > Linux-VServer does the accounting with atomic counters,
>> > so that works quite fine, just do the checks at the
>> > beginning of whatever resource allocation and the
>> > accounting once the resource is acquired ...
>>
>> Atomic operations versus locks is only a granularity thing.
>> You still need the cache line which is the cost on SMP.
>>
>> Are you using atomic_add_return or atomic_add_unless or
>> are you performing you actions in two separate steps
>> which is racy? What I have seen indicates you are using
>> a racy two separate operation form.
>
> yes, this is the current implementation which
> is more than sufficient, but I'm aware of the
> potential issues here, and I have an experimental
> patch sitting here which removes this race with
> the following change:
>
> - doesn't store the accounted value but
> limit - accounted (i.e. the free resource)
> - uses atomic_add_return()
> - when negative, an error is returned and
> the resource amount is added back
>
> changes to the limit have to adjust the 'current'
> value too, but that is again simple and atomic
>
> best,
> Herbert
>
> PS: atomic_add_unless() didn't exist back then
> (at least I think so) but that might be an option
> too ...
I think as far as having this discussion if you can remove that race
people will be more willing to talk about what vserver does.
That said anything that uses locks or atomic operations (finer grained locks)
because of the cache line ping pong is going to have scaling issues on large
boxes.
So in that sense anything short of per cpu variables sucks at scale. That said
I would much rather get a simple correct version without the complexity of
per cpu counters, before we optimize the counters that much.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists