linux-kernel - Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20111222151649.de57746f.akpm@linux-foundation.org>
Date:	Thu, 22 Dec 2011 15:16:49 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	avi@...hat.com, nate@...nel.net, cl@...ux-foundation.org,
	oleg@...hat.com, axboe@...nel.dk, vgoyal@...hat.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and
 fix blkcg percpu alloc deadlock

On Thu, 22 Dec 2011 15:00:47 -0800
Tejun Heo <tj@...nel.org> wrote:

> Hello, Andrew.
> 
> On Thu, Dec 22, 2011 at 02:54:26PM -0800, Andrew Morton wrote:
> > > These stats are userland visible and quite useful ones if blkcg is in
> > > use.  I don't really see how these can be removed.
> > 
> > What stats?
> 
> The ones allocated in the last patch.  blk_group_cpu_stats.

What last patch.

I can find no occurence of "blk_group_cpu_stats" on linux-kernel or in
the kernel tree.

> > For starters, doing pagetable allocation on the I/O path sounds nutty.
> > 
> > Secondly, GFP_NOIO is a *weaker* allocation mode than GFP_KERNEL.  By
> > permitting it with this patchset, we have a kernel which is more likely
> > to get oom failures.  Fixing the kernel to not perform GFP_NOIO
> > allocations for these counters will result in a more robust kernel. 
> > This is a good thing, which improves the kernel while avoiding adding
> > more compexity elsewhere.
> > 
> > This patchset is the worst option and we should try much harder to avoid
> > applying it!
> 
> The stats are per cgroup - request_queue pair.  We don't want to
> allocate for all of them for each combination as there are
> configurations with stupid number of request_queues and silly many
> cgroups and #cgroups * #request_queue * #cpus can be huge.  So, we
> want on-demand allocation.  While the stats are important, they are
> not critical and allocations can be opportunistic.  If the allocation
> fails this time, we can try it for the next time.

Without code to look at I am at a loss.

request_queues are allocated in blk_alloc_queue_node(), which uses
GFP_KERNEL (and also mysteriously takes a gfp_t arg).

> So, yeah, the suggested solution fits the problem.  If you have a
> better idea, please don't be shy.

Unsure which solution you're referring to here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/