lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Dec 2017 11:19:35 -0800
From:   "tj@...nel.org" <tj@...nel.org>
To:     Bart Van Assche <Bart.VanAssche@....com>
Cc:     "axboe@...nel.dk" <axboe@...nel.dk>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "kernel-team@...com" <kernel-team@...com>,
        "oleg@...hat.com" <oleg@...hat.com>, "hch@....de" <hch@....de>,
        "jianchao.w.wang@...cle.com" <jianchao.w.wang@...cle.com>,
        "osandov@...com" <osandov@...com>
Subject: Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU
 and generation based scheme

Hello,

On Thu, Dec 14, 2017 at 06:51:11PM +0000, Bart Van Assche wrote:
> On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
> > rules.  Unfortunatley, it contains quite a few holes.
>           ^^^^^^^^^^^^^
>           Unfortunately?
> 
> > While this change makes REQ_ATOM_COMPLETE synchornization unnecessary
>                                             ^^^^^^^^^^^^^^^
>                                             synchronization?

lol, believe it or not, my english spelling is a lot better than my
korean.  Will fix them.

> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -126,6 +126,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
> >  	rq->start_time = jiffies;
> >  	set_start_time_ns(rq);
> >  	rq->part = NULL;
> > +	seqcount_init(&rq->gstate_seq);
> > +	u64_stats_init(&rq->aborted_gstate_sync);
> >  }
> >  EXPORT_SYMBOL(blk_rq_init);
> 
> Sorry but the above change looks ugly to me. My understanding is that 
> blk_rq_init() is only used inside the block layer to initialize legacy block
> layer requests while gstate_seq and aborted_gstate_sync are only relevant
> for blk-mq requests. Wouldn't it be better to avoid that blk_rq_init() is
> called for blk-mq requests such that the above change can be left out? The
> only callers outside the block layer core of blk_rq_init() I know of are
> ide_prep_sense() and scsi_ioctl_reset(). I can help with converting the SCSI
> code if you want.

This is also used by flush path.  We probably should clean that up,
but let's worry about that later cuz flush handling has enough of its
own complications.

> > +	write_seqcount_begin(&rq->gstate_seq);
> > +	blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
> > +	blk_add_timer(rq);
> > +	write_seqcount_end(&rq->gstate_seq);
> 
> My understanding is that both write_seqcount_begin() and write_seqcount_end()
> trigger a write memory barrier. Is a seqcount really faster than a spinlock?

Write memory barrier has no cost on x86 and usually pretty low cost
elsewhere too and they're likely in the same cacheline as other rq
fields.

> > @@ -792,6 +811,14 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved)
> >  		__blk_mq_complete_request(req);
> >  		break;
> >  	case BLK_EH_RESET_TIMER:
> > +		/*
> > +		 * As nothing prevents from completion happening while
> > +		 * ->aborted_gstate is set, this may lead to ignored
> > +		 * completions and further spurious timeouts.
> > +		 */
> > +		u64_stats_update_begin(&req->aborted_gstate_sync);
> > +		req->aborted_gstate = 0;
> > +		u64_stats_update_end(&req->aborted_gstate_sync);
> 
> If a blk-mq request is resubmitted 2**62 times, can that result in the above
> code setting aborted_gstate to the same value as gstate? Isn't that a bug?
> If so, how about setting aborted_gstate in the above code to e.g. gstate ^ (2**63)?

A request gets aborted only if the state is in-flight, 0 isn't that.
Also, how many years would 2^62 times be?

> > +	struct u64_stats_sync aborted_gstate_sync;
> > +	u64 aborted_gstate;
> > +
> >  	unsigned long deadline;
> >  	struct list_head timeout_list;
> 
> Why are gstate and aborted_gstate 64-bit variables? What makes you think that
> 32 bits would not be enough?

Because 32 bits puts it in the rance where a false hit is still
theoretically possible in a reasonable amount of time.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ