lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20080212082839.GA216917@sgi.com>
Date:	Tue, 12 Feb 2008 00:28:39 -0800
From:	Jeremy Higdon <jeremy@....com>
To:	David Chinner <dgc@....com>
Cc:	Jens Axboe <jens.axboe@...cle.com>, Nick Piggin <npiggin@...e.de>,
	linux-kernel@...r.kernel.org, Alan.Brunelle@...com,
	arjan@...ux.intel.com
Subject: Re: IO queuing and complete affinity with threads (was Re: [PATCH 0/8] IO queuing and complete affinity)

On Mon, Feb 11, 2008 at 04:22:11PM +1100, David Chinner wrote:
> 
> What I think Nick is referring to is the comments I made that at a
> higher layer (e.g. filesystems) migrating completions to the
> submitter CPU may be exactly the wrong thing to do. I don't recall
> making any comments on migrating submitters - I think others have
> already commented on that so I'll ignore that for the moment and
> try to explain why completion on submitter CPU /may/ be bad.
> 
> For example, in the case of XFS it is fine for data I/O but it is
> wrong for transaction I/O completion. We want to direct all
> transaction completions to as few CPUs as possible (one, ideally) so
> that all the completion processing happens on the same CPU, rather
> than bouncing global cachelines and locks between all the CPUs
> taking completion interrupts.

So what you want is all XFS processing (for a given filesystem,
presumably) on a limited set of cores (ideally 1) and all block
and SCSI processing (for a given device) on a similarly limited
set.

On Altix, that was far more important than having the interrupt
and issue CPU be close to the hardware -- at least with typical
LSI or Qlogic controllers where there are only one or two MMIO
reads per command issued, and completions can be stacked up.

There is still an advantage to being close to the hardware, but
a much bigger advantage to not bouncing cachelines.

Maybe what you want is a multistage completion mechanism where
each stage can run on a different CPU, if thread context switches
are cheaper than bouncing data structures around....

jeremy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ