lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140801132754.GI19379@twins.programming.kicks-ass.net>
Date:	Fri, 1 Aug 2014 15:27:54 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Ilya Dryomov <ilya.dryomov@...tank.com>
Cc:	Mike Galbraith <umgwanakikbuti@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Ceph Development <ceph-devel@...r.kernel.org>,
	davidlohr@...com, jason.low2@...com, axboe@...nel.dk
Subject: Re: [PATCH] locking/mutexes: Revert "locking/mutexes: Add extra
 reschedule point"

On Fri, Aug 01, 2014 at 04:56:27PM +0400, Ilya Dryomov wrote:
> I'm going to fix up rbd_request_fn(), but I want to make sure
> I understand this in full.
> 
> - Previously the danger of calling blocking primitives on the way to
>   schedule(), i.e. with task->state != TASK_RUNNING, was that if the
>   blocking primitive was indeed to block the task state would be set
>   back to TASK_RUNNING and the schedule() that that task was on the way
>   to wouldn't have any effect.  Your "Add extra reschedule point" patch
>   essentially made calling mutex_lock() and probably others much more
>   wrong that it used to be, because mutex_lock() may now reschedule
>   when the task is not on the mutex wait queue.

Right and in general we cannot allow spurious wakeups (although we try
very hard to deal with them in generic code, which is why things more or
less worked).

But if you do a patch that 'randomly' ignores ->state on schedule (I did
one) stuff comes apart _real_ quick.

Therefore you should very much not destroy ->state on the way to
schedule.

> - There is nothing wrong with releasing queue_lock and reenabling IRQs
>   in rbd_request_fn() as long as it doesn't block and I remember to
>   disable IRQs and take queue_lock back on return.

Releasing queue_lock might be ok, dunno about the blk locking, however
reenabling IRQs it is actually wrong as per blk_flush_plug_list() since
that uses local_irq_save()/restore() which means it can be called from
contexts which cannot deal with enabling IRQs, and then your
->request_fn() goes and does that.

Now maybe blk_flush_plug_list() is overly paranoid and it could use
local_irq_disable()/enable() instead, I don't know. But until it does, a
request_fn() should never reenable IRQs.

> I'm asking because rbd_request_fn() is probably not the only broken in
> this way code path.  I poked around and found read_events() in aio.c,
> it seems to have been written with the "danger" assumption that
> I outlined above and there is even a comment to it.

I'm fairly sure there's more broken stuff, I didn't dare looking.

> Does that above make sense or am I missing something?

I think that's about it.

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ