lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.1510211718310.21723@file01.intranet.prod.int.rdu2.redhat.com>
Date:	Wed, 21 Oct 2015 17:49:20 -0400 (EDT)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	Ming Lei <tom.leiming@...il.com>
cc:	Mike Snitzer <snitzer@...hat.com>, Jens Axboe <axboe@...nel.dk>,
	Kent Overstreet <kent.overstreet@...il.com>,
	dm-devel@...hat.com,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"Alasdair G. Kergon" <agk@...hat.com>,
	Jeff Moyer <jmoyer@...hat.com>
Subject: Re: [PATCH v3 for-4.4] block: flush queued bios when process blocks
 to avoid deadlock



On Thu, 22 Oct 2015, Ming Lei wrote:

> > Some drivers (dm-snapshot, dm-thin) do acquire a mutex in .make_requests()
> > for every bio. It wouldn't be practical to convert them to not acquire the
> > mutex (and it would also degrade performance of these drivers, if they had
> > to offload every bio to a worker thread that can acquire the mutex).
> 
> Lots of drivers handle I/O in that way, and this way makes AIO not possible
> basically for dm-snapshot.

It doesn't have to do anything with asynchronous I/O. Of course you can do 
asynchronous I/O on dm-snapshot.

> >> Also sometimes it can hurt performance by converting I/O submission
> >> from one context into concurrent contexts of workqueue, especially
> >> in case of sequential I/O, since plug & plug merge can't be used any
> >> more.
> >
> > You can add blk_start_plug/blk_finish_plug to the function
> > bio_alloc_rescue. That would be reasonable to make sure that the requests
> > are merged even when they are offloaded to rescue thread.
> 
> The IOs submitted from each wq context becomes not contineous any
> more, so plug merge isn't doable, not mention the extra context switch
> cost.

If the requests are mergeable, blk_start_plug/blk_finish_plug will merge 
them, if not, it won't.

> This kind of cost can be introduced for all bio devices just for handling
> the unusual case, fair?

Offloading bios to a worker thread when the make_request_fn function 
blocks is required to avoid a deadlock (BTW. the deadlock became more 
common in the kernel 4.3 due to unrestricted size of bios).

The bio list current->bio_list introduces a false locking dependency - 
completion of a bio depends on completion of other bios on 
current->bio_list directed for different devices, thus it could create 
circular dependency resulting in deadlock.

To avoid the circular dependency, each bio must be offloaded to a specific 
workqueue, so that completion of bio for device A no longer depends on 
completion of another bio for device B.

> >> > -       queue_work(bs->rescue_workqueue, &bs->rescue_work);
> >> > +               spin_lock(&bs->rescue_lock);
> >> > +               bio_list_add(&bs->rescue_list, bio);
> >> > +               queue_work(bs->rescue_workqueue, &bs->rescue_work);
> >> > +               spin_unlock(&bs->rescue_lock);
> >> > +       }
> >>
> >> Not like rescuring path, schedule out can be quite frequent, and the
> >> above change will switch to submit these I/Os from wq concurrently,
> >> which might hurt performance for sequential I/O.
> >>
> >> Also I am wondering why not submit these I/Os in 'current' context
> >> just like what flush plug does?
> >
> > Processing requests doesn't block (they only take the queue spinlock).
> >
> > Processing bios can block (they can take other mutexes or semaphores), so
> > processing them from the schedule hook is impossible - the bio's
> > make_request function could attempt to take some lock that is already
> > held. So - we must offload the bios to a separate workqueue.
> 
> Yes, so better to just handle dm-snapshot in this way.

All dm targets and almost all other bio-processing drivers can block in 
the make_request_fn function (for example, they may block when allocating 
from a mempool).

Mikulas

> Thanks,
> Ming Lei
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ