linux-kernel - Re: IO queuing and complete affinity with threads (was Re: [PATCH 0/8] IO queuing and complete affinity)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20080212082839.GA216917@sgi.com>
Date:	Tue, 12 Feb 2008 00:28:39 -0800
From:	Jeremy Higdon <jeremy@....com>
To:	David Chinner <dgc@....com>
Cc:	Jens Axboe <jens.axboe@...cle.com>, Nick Piggin <npiggin@...e.de>,
	linux-kernel@...r.kernel.org, Alan.Brunelle@...com,
	arjan@...ux.intel.com
Subject: Re: IO queuing and complete affinity with threads (was Re: [PATCH 0/8] IO queuing and complete affinity)

On Mon, Feb 11, 2008 at 04:22:11PM +1100, David Chinner wrote:
> 
> What I think Nick is referring to is the comments I made that at a
> higher layer (e.g. filesystems) migrating completions to the
> submitter CPU may be exactly the wrong thing to do. I don't recall
> making any comments on migrating submitters - I think others have
> already commented on that so I'll ignore that for the moment and
> try to explain why completion on submitter CPU /may/ be bad.
> 
> For example, in the case of XFS it is fine for data I/O but it is
> wrong for transaction I/O completion. We want to direct all
> transaction completions to as few CPUs as possible (one, ideally) so
> that all the completion processing happens on the same CPU, rather
> than bouncing global cachelines and locks between all the CPUs
> taking completion interrupts.

So what you want is all XFS processing (for a given filesystem,
presumably) on a limited set of cores (ideally 1) and all block
and SCSI processing (for a given device) on a similarly limited
set.

On Altix, that was far more important than having the interrupt
and issue CPU be close to the hardware -- at least with typical
LSI or Qlogic controllers where there are only one or two MMIO
reads per command issued, and completions can be stacked up.

There is still an advantage to being close to the hardware, but
a much bigger advantage to not bouncing cachelines.

Maybe what you want is a multistage completion mechanism where
each stage can run on a different CPU, if thread context switches
are cheaper than bouncing data structures around....

jeremy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/