linux-kernel - Re: [RFC PATCH 1/2] bdev: add support for CPU latency PM QoS tuning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <f67b46158be286ebfad2215022d2c903b4156388.camel@linux.intel.com>
Date: Wed, 04 Sep 2024 14:37:20 +0300
From: Tero Kristo <tero.kristo@...ux.intel.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/2] bdev: add support for CPU latency PM QoS tuning

On Fri, 2024-08-30 at 22:26 +0800, Ming Lei wrote:
> On Fri, Aug 30, 2024 at 02:55:56PM +0300, Tero Kristo wrote:
> > On Thu, 2024-08-29 at 05:37 -0600, Jens Axboe wrote:
> > > On 8/29/24 1:18 AM, Tero Kristo wrote:
> > > > diff --git a/block/bio.c b/block/bio.c
> > > > index e9e809a63c59..6c46d75345d7 100644
> > > > --- a/block/bio.c
> > > > +++ b/block/bio.c
> > > > @@ -282,6 +282,8 @@ void bio_init(struct bio *bio, struct
> > > > block_device *bdev, struct bio_vec *table,
> > > >  	bio->bi_max_vecs = max_vecs;
> > > >  	bio->bi_io_vec = table;
> > > >  	bio->bi_pool = NULL;
> > > > +
> > > > +	bdev_update_cpu_latency_pm_qos(bio->bi_bdev);
> > > >  }
> > > >  EXPORT_SYMBOL(bio_init);
> > > 
> > > This is entirely the wrong place to do this, presumably it should
> > > be
> > > done at IO dispatch time, not when something initializes a bio.
> > > 
> > > And also feels like entirely the wrong way to go about this,
> > > adding
> > > overhead to potentially each IO dispatch, of which there can be
> > > millions
> > > per second.
> > 
> > Any thoughts where it could/should be added?
> > 
> > I moved the bdev_* callback from bio_init to the below location and
> > it
> > seems to work also:
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 3b4df8e5ac9e..d97a3a4252de 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2706,6 +2706,7 @@ static void __blk_mq_flush_plug_list(struct
> > request_queue *q,
> >  {
> >         if (blk_queue_quiesced(q))
> >                 return;
> > +       bdev_update_cpu_latency_pm_qos(q->disk->part0);
> >         q->mq_ops->queue_rqs(&plug->mq_list);
> 
> IO submission CPU may not be same with the completion CPU, so this
> approach looks wrong.
> 
> What you are trying to do is to avoid IO completion CPU to enter
> deep idle in case of inflight block IOs.
> 
> Only fast device cares this CPU latency, maybe you just need to call
> some generic helper in driver(NVMe), and you may have to figure out
> the exact IO completion CPU for hardware queue with inflight IOs.
> 
> Thanks,
> Ming
> 

Thanks for feedback, I've updated my patch to work on the NVMe driver
instead, taking the queue CPU affinity into account. I will send a
separate RFC of that out soonish to the corresponding list.

-Tero