[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45ED1F35.4070600@cosmosbay.com>
Date: Tue, 06 Mar 2007 08:58:45 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: David Miller <davem@...emloft.net>
CC: netdev@...r.kernel.org, robert.olsson@....uu.se, npiggin@...e.de
Subject: Re: [RFC PATCH]: Dynamically sized routing cache hash table.
David Miller a écrit :
> From: Eric Dumazet <dada1@...mosbay.com>
> Date: Tue, 06 Mar 2007 08:14:46 +0100
>
>> I wonder... are you sure this has no relation with the size of rt_hash_locks /
>> RT_HASH_LOCK_SZ ?
>> One entry must have the same lock in the two tables when resizing is in flight.
>> #define MIN_RTHASH_SHIFT LOG2(RT_HASH_LOCK_SZ)
>
> Good point.
>
>>> +static struct rt_hash_bucket *rthash_alloc(unsigned int sz)
>>> +{
>>> + struct rt_hash_bucket *n;
>>> +
>>> + if (sz <= PAGE_SIZE)
>>> + n = kmalloc(sz, GFP_KERNEL);
>>> + else if (hashdist)
>>> + n = __vmalloc(sz, GFP_KERNEL, PAGE_KERNEL);
>>> + else
>>> + n = (struct rt_hash_bucket *)
>>> + __get_free_pages(GFP_KERNEL, get_order(sz));
>> I dont feel well with this.
>> Maybe we could try a __get_free_pages(), and in case of failure, fallback to
>> vmalloc(). Then keep a flag to be able to free memory correctly. Anyway, if
>> (get_order(sz)>=MAX_ORDER) we know __get_free_pages() will fail.
>
> We have to use vmalloc() for the hashdist case so that the pages
> are spread out properly on NUMA systems. That's exactly what the
> large system hash allocator is going to do on bootup anyways.
Yes, but on bootup you have an appropriate NUMA active policy. (Well... we
hope so, but it broke several time in the past)
I am not sure what kind of mm policy is active for scheduled works.
Anyway I have some XX GB machines, non NUMA, and I would love to be able to
have a 2^20 slots hash table, without having to increase MAX_ORDER.
>
> Look, either both are right or both are wrong. I'm just following
> protocol above and you'll note the PRECISE same logic exists in other
> dynamically growing hash table implementations such as
> net/xfrm/xfrm_hash.c
>
Yes, they are both wrong/dumb :)
Can we be smarter, or do we have to stay dumb ? :)
struct rt_hash_bucket *n = NULL;
if (sz <= PAGE_SIZE) {
n = kmalloc(sz, GFP_KERNEL);
*kind = allocated_by_kmalloc;
}
else if (!hashdist) {
n = (struct rt_hash_bucket *)
__get_free_pages(GFP_KERNEL, get_order(sz));
*kind = allocated_by_get_free_pages;
}
if (!n) {
n = __vmalloc(sz, GFP_KERNEL, PAGE_KERNEL);
*kind = allocated_by_vmalloc;
}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists