linux-kernel - Re: [PATCH 1/2] irq: sparseirq enabling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081124144007.GA30725@elte.hu>
Date:	Mon, 24 Nov 2008 15:40:07 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] irq: sparseirq enabling

* Yinghai Lu <yinghai@...nel.org> wrote:

> +/*
> + * Protect the sparse_irqs_free freelist:
> + */
> +static DEFINE_SPINLOCK(sparse_irq_lock);
> +LIST_HEAD(sparse_irqs_head);
> +
> +/*
> + * The sparse irqs are in a hash-table as well, for fast lookup:
> + */
> +#define SPARSEIRQHASH_BITS          (13 - 1)
> +#define SPARSEIRQHASH_SIZE          (1UL << SPARSEIRQHASH_BITS)
> +#define __sparseirqhashfn(key)      hash_long((unsigned long)key, SPARSEIRQHASH_BITS)
> +#define sparseirqhashentry(key)     (sparseirqhash_table + __sparseirqhashfn((key)))
> +
> +static struct list_head sparseirqhash_table[SPARSEIRQHASH_SIZE];
> +

> +struct irq_desc *irq_to_desc(unsigned int irq)
> +{
> +	struct irq_desc *desc;
> +	struct list_head *hash_head;
> +
> +	hash_head = sparseirqhashentry(irq);
> +
> +	/*
> +	 * We can walk the hash lockfree, because the hash only
> +	 * grows, and we are careful when adding entries to the end:
> +	 */
> +	list_for_each_entry(desc, hash_head, hash_entry) {
> +		if (desc->irq == irq)
> +			return desc;
> +	}
> +
> +	return NULL;
> +}

I have talked to Thomas about the current design of sparseirq, and the 
current code looks pretty good already, but we'd like to suggest one 
more refinement to the hashing scheme:

Please simplify it by using a sparse pointers static array instead of 
a full hash. I.e. do it as a:

       struct irq_desc *irq_desc_ptrs[NR_IRQS] __read_mostly;

This can scale up just fine to 4096 CPUs without measurable impact to 
the kernel image size on x86 distro kernels.

Data structure rules:

1) allocation: an entry would be allocated at IRQ number creation time 
   - never at any other time. Pure access to the desc returns NULL - 
   it's atomic and lockless.

2) freeing: we never free them. If a system allocates an irq_desc[], 
   it signals that it uses it. This makes all access lockless.

3) lookup: irq_to_desc() just does a simple irq_desc_ptrs[irq]
   dereference (inlined) - no locking or hash lookup needed. Since 
   irq_desc_ptrs[] is accessed __read_mostly, this will scale really 
   well even on NUMA.

4) iterators: we still keep NR_IRQS as a limit for the
   irq_desc_ptrs[] array - but it would never be used directly by any 
   iteration loop, in generic code.

5) bootup, pre-kmalloc irq_desc access: on x86 we should preallocate a 
   pool of ~32 irq_desc entries for allocation use. This can be a
   simple static array of struct irq_desc that is fed into the
   first 32 legacy irq_desc[] slots or so. No bootmem-alloc and
   no after_bootmem flaggery needed. All SMP init and secondary CPU 
   irq desc allocation happens after kmalloc is active already - 
   cleaning up the allocation path.

6) limits on x86: please scale NR_IRQS to this rule:
               max(32*MAX_IO_APICS, 8*NR_CPUS)

   That gives a minimum of 4096 IRQs on 64-bit systems. On even larger 
   systems, we scale linearly up to 32K IRQs on 4K CPUs. That should 
   be more than enough (and it can be increased if really needed).

Please keep all the irq_desc[] abstraction APIs and cleanups you did 
so far - they are great. In the far future we can still make irq_desc 
a full hash if needed - but right now we'll be just fine with such a 
simpler scheme as well, and scale fine up to 16K CPUs or so.

This should simplify quite a few things in the sparseirq code.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/