linux-kernel - Re: [PATCH 1/2] irq: sparseirq enabling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <492AFEF9.609@kernel.org>
Date:	Mon, 24 Nov 2008 11:22:33 -0800
From:	Yinghai Lu <yinghai@...nel.org>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] irq: sparseirq enabling

Ingo Molnar wrote:
> * Yinghai Lu <yinghai@...nel.org> wrote:
> 
>> +/*
>> + * Protect the sparse_irqs_free freelist:
>> + */
>> +static DEFINE_SPINLOCK(sparse_irq_lock);
>> +LIST_HEAD(sparse_irqs_head);
>> +
>> +/*
>> + * The sparse irqs are in a hash-table as well, for fast lookup:
>> + */
>> +#define SPARSEIRQHASH_BITS          (13 - 1)
>> +#define SPARSEIRQHASH_SIZE          (1UL << SPARSEIRQHASH_BITS)
>> +#define __sparseirqhashfn(key)      hash_long((unsigned long)key, SPARSEIRQHASH_BITS)
>> +#define sparseirqhashentry(key)     (sparseirqhash_table + __sparseirqhashfn((key)))
>> +
>> +static struct list_head sparseirqhash_table[SPARSEIRQHASH_SIZE];
>> +
> 
>> +struct irq_desc *irq_to_desc(unsigned int irq)
>> +{
>> +	struct irq_desc *desc;
>> +	struct list_head *hash_head;
>> +
>> +	hash_head = sparseirqhashentry(irq);
>> +
>> +	/*
>> +	 * We can walk the hash lockfree, because the hash only
>> +	 * grows, and we are careful when adding entries to the end:
>> +	 */
>> +	list_for_each_entry(desc, hash_head, hash_entry) {
>> +		if (desc->irq == irq)
>> +			return desc;
>> +	}
>> +
>> +	return NULL;
>> +}
> 
> I have talked to Thomas about the current design of sparseirq, and the 
> current code looks pretty good already, but we'd like to suggest one 
> more refinement to the hashing scheme:
> 
> Please simplify it by using a sparse pointers static array instead of 
> a full hash. I.e. do it as a:
> 
>        struct irq_desc *irq_desc_ptrs[NR_IRQS] __read_mostly;
> 
> This can scale up just fine to 4096 CPUs without measurable impact to 
> the kernel image size on x86 distro kernels.
> 
> Data structure rules:
> 
> 1) allocation: an entry would be allocated at IRQ number creation time 
>    - never at any other time. Pure access to the desc returns NULL - 
>    it's atomic and lockless.
> 
> 2) freeing: we never free them. If a system allocates an irq_desc[], 
>    it signals that it uses it. This makes all access lockless.
> 
> 3) lookup: irq_to_desc() just does a simple irq_desc_ptrs[irq]
>    dereference (inlined) - no locking or hash lookup needed. Since 
>    irq_desc_ptrs[] is accessed __read_mostly, this will scale really 
>    well even on NUMA.
> 
> 4) iterators: we still keep NR_IRQS as a limit for the
>    irq_desc_ptrs[] array - but it would never be used directly by any 
>    iteration loop, in generic code.
> 
> 5) bootup, pre-kmalloc irq_desc access: on x86 we should preallocate a 
>    pool of ~32 irq_desc entries for allocation use. This can be a
>    simple static array of struct irq_desc that is fed into the
>    first 32 legacy irq_desc[] slots or so. No bootmem-alloc and
>    no after_bootmem flaggery needed. All SMP init and secondary CPU 
>    irq desc allocation happens after kmalloc is active already - 
>    cleaning up the allocation path.
> 
> 6) limits on x86: please scale NR_IRQS to this rule:
>                max(32*MAX_IO_APICS, 8*NR_CPUS)
> 
>    That gives a minimum of 4096 IRQs on 64-bit systems. On even larger 
>    systems, we scale linearly up to 32K IRQs on 4K CPUs. That should 
>    be more than enough (and it can be increased if really needed).
> 
> Please keep all the irq_desc[] abstraction APIs and cleanups you did 
> so far - they are great. In the far future we can still make irq_desc 
> a full hash if needed - but right now we'll be just fine with such a 
> simpler scheme as well, and scale fine up to 16K CPUs or so.
> 
> This should simplify quite a few things in the sparseirq code.

ok, will change to that. 

1. nr_irqs will be legacy number or GSI numbers. MSI will start to use irq nr from nr_irqs
2. or MSI still to use irq from NR_IRQS - 1, and ... go smaller?

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/