linux-kernel - Re: Read starvation by sync writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121213123032.GB18843@quack.suse.cz>
Date:	Thu, 13 Dec 2012 13:30:32 +0100
From:	Jan Kara <jack@...e.cz>
To:	Jeff Moyer <jmoyer@...hat.com>
Cc:	Jens Axboe <axboe@...nel.dk>, Jan Kara <jack@...e.cz>,
	Shaohua Li <shli@...nel.org>, linux-fsdevel@...r.kernel.org,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Read starvation by sync writes

On Wed 12-12-12 14:41:13, Jeff Moyer wrote:
> Jeff Moyer <jmoyer@...hat.com> writes:
> 
> >> I agree. This isn't about scheduling, we haven't even reached that part
> >> yet. Back when we split the queues into read vs write, this problem
> >> obviously wasn't there. Now we have sync writes and reads, both eating
> >> from the same pool. The io scheduler can impact this a bit by forcing
> >> reads to must allocate (Jan, which io scheduler are you using?). CFQ
> >> does this when it's expecting a request from this process queue.
> >>
> >> Back in the day, we used to have one list. To avoid a similar problem,
> >> we reserved the top of the list for reads. With the batching, it's a bit
> >> more complicated. If we make the request allocation (just that, not the
> >> scheduling) be read vs write instead of sync vs async, then we have the
> >> same issue for sync vs buffered writes.
> >>
> >> How about something like the below? Due to the nature of sync reads, we
> >> should allow a much longer timeout. The batch is really tailored towards
> >> writes at the moment. Also shrink the batch count, 32 is pretty large...
> >
> > Does batching even make sense for dependent reads?  I don't think it
> > does.
> 
> Having just read the batching code in detail, I'd like to ammend this
> misguided comment.  Batching logic kicks in when you happen to be lucky
> enough to use up the last request.  As such, I'd be surprised if the
> patch you posted helped.  Jens, don't you think the writer is way more
> likely to become the batcher?  I do agree with shrinking the batch count
> to 16, whether or not the rest of the patch goes in.
  Well, batching logic also triggers unconditionally after you waited for
a request...

> >  Assuming you disagree, then you'll have to justify that fixed
> > time value of 2 seconds.  The amount of time between dependent reads
> > will vary depending on other I/O sent to the device, the properties of
> > the device, the I/O scheduler, and so on.  If you do stick 2 seconds in
> > there, please comment it.  Maybe it's time we started keeping track of
> > worst case Q->C time?  That could be used to tell worst case latency,
> > and adjust magic timeouts like this one.
> >
> > I'm still thinking about how we might solve this in a cleaner way.
> 
> The way things stand today, you can do a complete end run around the I/O
> scheduler by queueing up enough I/O.  To address that, I think we need
> to move to a request list per io_context as Jan had suggested.  That
> way, we can keep the logic about who gets to submit I/O when in one
> place.
> 
> Jens, what do you think?
> 
> Jan, for now, try bumping nr_requests up really high.  ;-)
  Good idea. I tried bumping nr_requests to 100000 (so one can queue 50 GB
of streaming writes - one transaction can carry several hundred MB) and
reader can now progress at a reasonable speed. So it indeed worked.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/