[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070113132023.0f8d2da8.akpm@osdl.org>
Date: Sat, 13 Jan 2007 13:20:23 -0800
From: Andrew Morton <akpm@...l.org>
To: Ravikiran G Thirumalai <kiran@...lex86.org>
Cc: nickpiggin@...oo.com.au, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, ak@...e.de, shai@...lex86.org,
pravin.shelar@...softinc.com
Subject: Re: High lock spin time for zone->lru_lock under extreme conditions
> On Sat, 13 Jan 2007 11:53:34 -0800 Ravikiran G Thirumalai <kiran@...lex86.org> wrote:
> On Sat, Jan 13, 2007 at 12:00:17AM -0800, Andrew Morton wrote:
> > > On Fri, 12 Jan 2007 23:36:43 -0800 Ravikiran G Thirumalai <kiran@...lex86.org> wrote:
> > > > >void __lockfunc _spin_lock_irq(spinlock_t *lock)
> > > > >{
> > > > > local_irq_disable();
> > > > > ------------------------> rdtsc(t1);
> > > > > preempt_disable();
> > > > > spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
> > > > > _raw_spin_lock(lock);
> > > > > ------------------------> rdtsc(t2);
> > > > > if (lock->spin_time < (t2 - t1))
> > > > > lock->spin_time = t2 - t1;
> > > > >}
> > > > >
> > > > >On some runs, we found that the zone->lru_lock spun for 33 seconds or more
> > > > >while the maximal CS time was 3 seconds or so.
> > > >
> > > > What is the "CS time"?
> > >
> > > Critical Section :). This is the maximal time interval I measured from
> > > t2 above to the time point we release the spin lock. This is the hold
> > > time I guess.
> >
> > By no means. The theory here is that CPUA is taking and releasing the
> > lock at high frequency, but CPUB never manages to get in and take it. In
> > which case the maximum-acquisition-time is much larger than the
> > maximum-hold-time.
> >
> > I'd suggest that you use a similar trick to measure the maximum hold time:
> > start the timer after we got the lock, stop it just before we release the
> > lock (assuming that the additional rdtsc delay doesn't "fix" things, of
> > course...)
>
> Well, that is exactly what I described above as CS time.
Seeing the code helps.
> The
> instrumentation goes like this:
>
> void __lockfunc _spin_lock_irq(spinlock_t *lock)
> {
> unsigned long long t1,t2;
> local_irq_disable();
> t1 = get_cycles_sync();
> preempt_disable();
> spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
> _raw_spin_lock(lock);
> t2 = get_cycles_sync();
> lock->raw_lock.htsc = t2;
> if (lock->spin_time < (t2 - t1))
> lock->spin_time = t2 - t1;
> }
> ...
>
> void __lockfunc _spin_unlock_irq(spinlock_t *lock)
> {
> unsigned long long t1 ;
> spin_release(&lock->dep_map, 1, _RET_IP_);
> t1 = get_cycles_sync();
> if (lock->cs_time < (t1 - lock->raw_lock.htsc))
> lock->cs_time = t1 - lock->raw_lock.htsc;
> _raw_spin_unlock(lock);
> local_irq_enable();
> preempt_enable();
> }
>
> Am I missing something? Is this not what you just described? (The
> synchronizing rdtsc might not be really required at all locations, but I
> doubt if it would contribute a significant fraction to 33s or even
> the 3s hold time on a 2.6 GHZ opteron).
OK, now we need to do a dump_stack() each time we discover a new max hold
time. That might a bit tricky: the printk code does spinlocking too so
things could go recursively deadlocky. Maybe make spin_unlock_irq() return
the hold time then do:
void lru_spin_unlock_irq(struct zone *zone)
{
long this_time;
this_time = spin_unlock_irq(&zone->lru_lock);
if (this_time > zone->max_time) {
zone->max_time = this_time;
dump_stack();
}
}
or similar.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists