[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171214215404.GK3326@worktop>
Date: Thu, 14 Dec 2017 22:54:04 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Bart Van Assche <Bart.VanAssche@....com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"kernel-team@...com" <kernel-team@...com>,
"oleg@...hat.com" <oleg@...hat.com>, "hch@....de" <hch@....de>,
"axboe@...nel.dk" <axboe@...nel.dk>,
"jianchao.w.wang@...cle.com" <jianchao.w.wang@...cle.com>,
"osandov@...com" <osandov@...com>, "tj@...nel.org" <tj@...nel.org>
Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU
and generation based scheme
On Thu, Dec 14, 2017 at 09:42:48PM +0000, Bart Van Assche wrote:
> On Thu, 2017-12-14 at 21:20 +0100, Peter Zijlstra wrote:
> > On Thu, Dec 14, 2017 at 06:51:11PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
> > > > + write_seqcount_begin(&rq->gstate_seq);
> > > > + blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
> > > > + blk_add_timer(rq);
> > > > + write_seqcount_end(&rq->gstate_seq);
> > >
> > > My understanding is that both write_seqcount_begin() and write_seqcount_end()
> > > trigger a write memory barrier. Is a seqcount really faster than a spinlock?
> >
> > Yes lots, no atomic operations and no waiting.
> >
> > The only constraint for write_seqlock is that there must not be any
> > concurrency.
> >
> > But now that I look at this again, TJ, why can't the below happen?
> >
> > write_seqlock_begin();
> > blk_mq_rq_update_state(rq, IN_FLIGHT);
> > blk_add_timer(rq);
> > <timer-irq>
> > read_seqcount_begin()
> > while (seq & 1)
> > cpurelax();
> > // life-lock
> > </timer-irq>
> > write_seqlock_end();
>
> Hello Peter,
>
> Some time ago the block layer was changed to handle timeouts in thread context
> instead of interrupt context. See also commit 287922eb0b18 ("block: defer
> timeouts to a workqueue").
That only makes it a little better:
Task-A Worker
write_seqcount_begin()
blk_mq_rw_update_state(rq, IN_FLIGHT)
blk_add_timer(rq)
<timer>
schedule_work()
</timer>
<context-switch to worker>
read_seqcount_begin()
while(seq & 1)
cpu_relax();
Now normally this isn't fatal because Worker will simply spin its entire
time slice away and we'll eventually schedule our Task-A back in, which
will complete the seqcount and things will work.
But if, for some reason, our Worker was to have RT priority higher than
our Task-A we'd be up some creek without no paddles.
We don't happen to have preemption of IRQs off here? That would fix
things nicely.
Powered by blists - more mailing lists