linux-kernel - Re: [REVIEW] NVM Express driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sat, 12 Mar 2011 00:51:46 -0500
From:	Matthew Wilcox <willy@...ux.intel.com>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [REVIEW] NVM Express driver

On Fri, Mar 11, 2011 at 02:29:19PM -0800, Andi Kleen wrote:
> Matthew Wilcox <willy@...ux.intel.com> writes:
> > +
> > +static struct nvme_queue *get_nvmeq(struct nvme_ns *ns)
> > +{
> > +	int qid, cpu = get_cpu();
> > +	if (cpu < ns->dev->queue_count)
> > +		qid = cpu + 1;
> > +	else
> > +		qid = (cpu % rounddown_pow_of_two(ns->dev->queue_count))
> > + 1;
> 
> This will be likely a full divide, better use a mask.

I have a TODO to replace this calculation with a lookup; I've discovered
that not even all Intel systems number the CPUs in the logical fashion
(eg on a two-socket system; cpu 0 in socket 0, cpu 1 in socket 1, cpu
2 in socket 0, etc; some two socket systems have cpus 0-3 in socket 0;
4-7 in socket 1; and 8-15 are the HT siblings of 0-7).

Is there a good API to iterate through each socket, then each core in a
socket, then each HT sibling?  eg, if I have 20 queues and 2x6x2 CPUs,
I want to assign at least one queue to each core; some threads will get
their own queues and others will have to share with their HT sibling.

> > +	nprps = DIV_ROUND_UP(length, PAGE_SIZE);
> > +	npages = DIV_ROUND_UP(8 * nprps, PAGE_SIZE);
> > +	prps = kmalloc(sizeof(*prps) + sizeof(__le64 *) * npages, GFP_ATOMIC);
> > +	prp_page = 0;
> > +	if (nprps <= (256 / 8)) {
> > +		pool = dev->prp_small_pool;
> > +		prps->npages = 0;
> 
> 
> Unchecked GFP_ATOMIC allocation? That will oops soon.
> Besides GFP_ATOMIC a very risky thing to do on a low memory situation,
> which can trigger writeouts.

Ah yes, thank you.  There are a few other places like this.  Bizarrely,
they've not oopsed during the xfstests runs.

My plan for this is, instead of using a mempool, to submit partial I/Os
in the rare cases where a write cannot allocate memory.  I have the
design in my head, just not committed to code yet.  The design also
avoids allocating any memory in the driver for I/Os that do not cross
a page boundary.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/