[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1433920246.7916.121.camel@haakon3.risingtidesystems.com>
Date: Wed, 10 Jun 2015 00:10:46 -0700
From: "Nicholas A. Bellinger" <nab@...ux-iscsi.org>
To: Christoph Hellwig <hch@....de>
Cc: "Nicholas A. Bellinger" <nab@...erainc.com>,
target-devel <target-devel@...r.kernel.org>,
linux-scsi <linux-scsi@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Hannes Reinecke <hare@...e.de>,
Sagi Grimberg <sagig@...lanox.com>
Subject: Re: [RFC 0/2] target: Add TFO->complete_irq queue_work bypass
On Tue, 2015-06-09 at 09:19 +0200, Christoph Hellwig wrote:
> On Thu, Jun 04, 2015 at 12:06:09AM -0700, Nicholas A. Bellinger wrote:
> > So I've been using tcm_loop + RAMDISK backends for prototyping, but this
> > patch is intended for vhost-scsi so it can avoid the unnecessary
> > queue_work() context switch within target_complete_cmd() for all backend
> > driver types.
> >
> > This is because vhost_work_queue() is just updating vhost_dev->work_list
> > and immediately wake_up_process() into a different vhost_worker()
> > process context. For heavy small block workloads into fast IBLOCK
> > backends, avoiding this extra context switch should be a nice efficiency
> > win.
>
> How about trying to merge the two workers instead?
>
IIRC, vhost.c has a existing requirement for running completions within
a single kernel thread context for each vhost_dev context -> vhost-scsi
WWPN.
> > Perhaps tcm_loop LLD code should just be limited to RAMDISK here..?
>
> I'd prefer to not do it especially for the loopback code, as that
> should serve as a simple example.
Fair enough.
> But before making further judgement I'd really like to see the numbers.
>
Sure, will include some performance + context switch results for -v2.
> Note that something that might help much more is getting rid of
> the remaining irq or bh disabling spinlocks in the target core,
> as that tends to introduce a lot of additional latency. Moving
> additional code to hardirq context is fairly diametrical to that
> design.
Within for-next RCU enabled target code, the three spinlocks who irq
disable from fast-path submit + completion path are:
* se_cmd->t_state_lock:
Used to update se_cmd->transport_state within target_complete_cmd() from
backend driver irq context, and when passing se_cmd ownership back to
fabric driver code via fast-path transport_cmd_check_stop() response
completion.
Still required while iblock backends are calling target_complete_cmd()
from irq context.
* se_device->execute_task_lock
Used for tracking device TMR tasks. Completion path called from irq
context in transport_cmd_check_stop() -> target_remove_from_state_list()
when passing se_cmd ownership back to fabric driver.
transport_generic_free_cmd() needs to obtain this lock during se_cmd
exception status too, if the failure occurs before se_cmd->execute_cmd()
submission happens.
* se_session->sess_cmd_lock
Originally required for tcm_qla2xxx, where qla_hw_data->hardware_lock
must be held while performing the initial per se_session shutdown of
outstanding + active se_cmd_list entries. Other HW LLD fabric drivers
also need this when target-core is responsible for active I/O
shutdown.
However, not all fabric drivers need to disable irq while acquiring this
specific lock.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists