linux-kernel - RE: RFC: memory leak in udp_table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1330605218.2465.50.camel@edumazet-laptop>
Date:	Thu, 01 Mar 2012 04:33:38 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Laight <David.Laight@...LAB.COM>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	David Miller <davem@...emloft.net>,
	paul.gortmaker@...driver.com, tim.bird@...sony.com,
	kuznet@....inr.ac.ru, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org
Subject: RE: RFC: memory leak in udp_table_init

Le jeudi 01 mars 2012 à 08:55 +0000, David Laight a écrit :
> > > The pid table is a good example of something where a hash
> > > table is unnecessary.
> > > Linux should steal the code I put into NetBSD :-)
> > 
> > On this unrelated topic.  What algorithm did you use on NetBSD for
> > dealing with pids?
> 
> Basically I forced the hash chain length to one by allocating
> a pid that hit an empty entry in the table.
> 
> So you start off with (say) 64 entries and use the low 6
> bits to index the table. The higher bits are incremented
> each time a 'slot' is reused.
> Free entries are kept in a FIFO list.
> So each entry either contains a pointer to the process,
> or the high bits and the index of the next free slot.
> (and the PGID pointer).
> When there are only (say) 2 free entries, then the table
> size is doubled, the pointers moved to the correct places,
> the free ist fixed up, and the minimum number of free entries
> doubled.
> 
> The overall effect:
> - lookup is only ever a mask and index + compare.
> - Allocate is always fast and fixed cost (except when
>   the table size has to be doubled).
> - A pid value will never be reused within (about) 2000
>   allocates (for 16bit pids, much larger for 32bit ones).
> - Allocated pid numbers tend to be random, certainly
>   very difficult to predict.
> - Small memory footprint for small systems.
> For pids we normally avoid issuing large values, but
> will do so to avoid immediate re-use on systems that
> have 1000s of active processes.
> 

You describe a hash table mechanism still, and you made chain lengthes
be 0 or 1.

This GEN_ID/SLOT schem is the one used in IBM AIX for pid allocations
(with a 31 (or was it 32) bit range). They did not use FIFO, because
they tried to use lower slots of the proc table.

Hashes values in network land are unpredictable, so we cannot make sure
a slot contains at most one entry.

Note: If you manage 4 million tcp sockets in your server, hash table
must be at _least_ 4 million slots. I would not call that suboptimal as
you said in your previous mail.

We could argue that default sizes of these hash tables (ip route,
tcp, ...) are now more suited for high end uses instead of
desktop/embedded uses, because ram sizes increased so much last 10
years.

[ We added in commits 0ccfe61803ad & c9503e0fe05 a 512K limits for
tcp/ip route hash tables to stop insanity, but thats it ]

RCU lookups make dynamic resizes of these hash table a bit complex.
IIRC David Miller have submitted a patch but this work was not
completed. Not sure its an issue these days, as we prefer to work on ip
route cache (and its associated hash table) removal anyway.

For UDP/UDPLite it certainly is not an issue, since max size is 65536
slots. On a 32bit kernel, with at more 1GB of LOWMEM, max is 512 slots.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/