lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 10 Nov 2014 15:22:44 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Dan Williams <dan.j.williams@...el.com>
Cc:	Jens Axboe <axboe@...com>,
	"Jason B. Akers" <jason.b.akers@...el.com>,
	IDE/ATA development list <linux-ide@...r.kernel.org>,
	"Karkra, Kapil" <kapil.karkra@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives

[Been distrcted with other issues, so just getting back to this.]

On Thu, Oct 30, 2014 at 10:07:47AM -0700, Dan Williams wrote:
> On Thu, Oct 30, 2014 at 12:21 AM, Dave Chinner <david@...morbit.com> wrote:
> > On Wed, Oct 29, 2014 at 03:24:11PM -0700, Dan Williams wrote:
> >> On Wed, Oct 29, 2014 at 3:09 PM, Dave Chinner <david@...morbit.com> wrote:
> >> > On Wed, Oct 29, 2014 at 03:10:51PM -0600, Jens Axboe wrote:
> >> >> As for the fs accessing this, the io nice fields are readily exposed
> >> >> through the ->bi_rw setting. So while the above example uses ionice to
> >> >> set a task io priority (that a bio will then inherit), nothing prevents
> >> >> you from passing it in directly from the kernel.
> >> >
> >> > Right, but now the filesystem needs to provide that on a per-inode
> >> > basis, not from the task structure as the task that is submitting
> >> > the bio is not necesarily the task doing the read/write syscall.
> >> >
> >> > e.g. the write case above doesn't actually inherit the task priority
> >> > at the bio level at all because the IO is being dispatched by a
> >> > background flusher thread, not the ioniced task calling write(2).
> >>
> >> When the ioniced task calling write(2) inserts the page into the page
> >> cache then the current priority is recorded in the struct page.  The
> >
> > It does? Can you point me to where the page cache code does this,
> > because I've clearly missed something important go by in the past
> > few months...
> 
> Sorry, should have been more clear that this patch set added that
> capability in patch-4.  The idea is to claim some unused extended page
> flags to stash priority bits.  Yes, the PageSetAdvice() helper needs
> to be fixed up to do the flags update atomically, and yes this
> precludes hinting on 32-bit platforms.  I also think that
> bio_add_page() is the better place to read the per-page priority into
> the bio.  We felt ok deferring these items until after the initial
> RFC.

I think that using page flags for this is a 'orrible idea. Yeah,
it's a neat hack that you can use for proff of concept
demonstrations, but my biggest concern is that it isn't a scalable
channel for carrying IO priority information through the page cache.
e.g. it can't carry existing ionice priority scheduling information,
it can't carry blkcg IO control information, etc.

So, really, I think that this buffered write IO priority issue is
bigger than this patch series, and we need to solve it properly
rather than hack ugly special cases into core infrastructure
that are an evolutionary dead-end....

> >> > IOWs, to make effective use of this the task will need different
> >> > cache hints for each different type of data needs to do IO on, and
> >> > so overloading IO priorities just seems the wrong direction to be
> >> > starting from.
> >>
> >> There's also the fadvise() enabling that could be bolted on top of
> >> this capability.  But, before that step, is a thread-id per-caching
> >> context too much to ask?
> >
> > If we do it that way, we are stuck with it forever. So let's get our
> > ducks in line first before pulling the trigger...
> 
> Are you objecting to ionice as the interface or per-pid based hinting
> in general?

Neither. It's the implementation I don't like.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ