linux-kernel - Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20111222154138.d6c583e3.akpm@linux-foundation.org>
Date:	Thu, 22 Dec 2011 15:41:38 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	avi@...hat.com, nate@...nel.net, cl@...ux-foundation.org,
	oleg@...hat.com, axboe@...nel.dk, vgoyal@...hat.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and
 fix blkcg percpu alloc deadlock

On Thu, 22 Dec 2011 15:24:33 -0800
Tejun Heo <tj@...nel.org> wrote:

> Hello,
> 
> On Thu, Dec 22, 2011 at 03:16:49PM -0800, Andrew Morton wrote:
> > > The ones allocated in the last patch.  blk_group_cpu_stats.
> > 
> > What last patch.
> > 
> > I can find no occurence of "blk_group_cpu_stats" on linux-kernel or in
> > the kernel tree.
> 
> Sorry it's blkio_group_stats_cpu.  It's in the seventh path in this
> series.
> 
> > > The stats are per cgroup - request_queue pair.  We don't want to
> > > allocate for all of them for each combination as there are
> > > configurations with stupid number of request_queues and silly many
> > > cgroups and #cgroups * #request_queue * #cpus can be huge.  So, we
> > > want on-demand allocation.  While the stats are important, they are
> > > not critical and allocations can be opportunistic.  If the allocation
> > > fails this time, we can try it for the next time.
> > 
> > Without code to look at I am at a loss.
> 
> block/blk-cgroup.c blk-throttle.c cfq-iosched.c.  Have fun.
> 
> > request_queues are allocated in blk_alloc_queue_node(), which uses
> > GFP_KERNEL (and also mysteriously takes a gfp_t arg).
> 
> Yeah, sure, we *can* allocate everything for every combination when
> either request_queue or cgroup comes up.  That's the thing I tried to
> explain in the above quoted paragraph.
> 

All the code I'm looking at assumes that blkio_group.stats_cpu is
non-zero.  Won't the kerenl just go splat if that allocation failed?

If the code *does* correctly handle ->stats_cpu == NULL then we have
options.

a) Give userspace a new procfs/debugfs file to start stats gathering
   on a particular cgroup/request_queue pair.  Allocate the stats
   memory in that.

b) Or allocate stats_cpu on the first call to blkio_read_stat_cpu()
   and return zeroes for this first call.

c) Or change the low-level code to do
   blkio_group.want_stats_cpu=true, then test that at the top level
   after we've determined that blkio_group.stats_cpu is NULL.

d) Or, worse, punt the allocation into a workqueue thread.

Note that all these option will permit us to use GFP_KERNEL, which is
better.

Note that a) and b) means that users get control over whether these
stats are accumulated at all, so many won't incur needless memory and
CPU consumption.

I think I like b).  Fix the code so it doesn't oops when ->stats_cpu is
NULL, then turn on stats gathering the first time someone tries to read
the stats.

(Someone appears to have misspelled "throttle" as "throtl" for no
apparent reason about 1000 times.  Sigh.)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/