lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <937d4a14-050a-56f1-8af9-2c7acdd5ae86@arista.com> Date: Fri, 11 Aug 2023 22:02:47 +0100 From: Dmitry Safonov <dima@...sta.com> To: Eric Dumazet <edumazet@...gle.com> Cc: David Ahern <dsahern@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>, "David S. Miller" <davem@...emloft.net>, linux-kernel@...r.kernel.org, Andy Lutomirski <luto@...capital.net>, Ard Biesheuvel <ardb@...nel.org>, Bob Gilligan <gilligan@...sta.com>, Dan Carpenter <error27@...il.com>, David Laight <David.Laight@...lab.com>, Dmitry Safonov <0x7f454c46@...il.com>, Donald Cassidy <dcassidy@...hat.com>, Eric Biggers <ebiggers@...nel.org>, "Eric W. Biederman" <ebiederm@...ssion.com>, Francesco Ruggeri <fruggeri05@...il.com>, "Gaillardetz, Dominik" <dgaillar@...na.com>, Herbert Xu <herbert@...dor.apana.org.au>, Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>, Ivan Delalande <colona@...sta.com>, Leonard Crestez <cdleonard@...il.com>, Salam Noureddine <noureddine@...sta.com>, "Tetreault, Francois" <ftetreau@...na.com>, netdev@...r.kernel.org, Steen Hegelund <Steen.Hegelund@...rochip.com> Subject: Re: [PATCH v9 net-next 01/23] net/tcp: Prepare tcp_md5sig_pool for TCP-AO Hi Eric, On 8/8/23 10:58, Eric Dumazet wrote: > On Wed, Aug 2, 2023 at 7:27 PM Dmitry Safonov <dima@...sta.com> wrote: [..] >> - hash = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC); >> - if (IS_ERR(hash)) >> - return; >> - >> - for_each_possible_cpu(cpu) { >> - void *scratch = per_cpu(tcp_md5sig_pool, cpu).scratch; >> - struct ahash_request *req; >> - >> - if (!scratch) { >> - scratch = kmalloc_node(sizeof(union tcp_md5sum_block) + >> - sizeof(struct tcphdr), >> - GFP_KERNEL, >> - cpu_to_node(cpu)); >> - if (!scratch) >> - return; >> - per_cpu(tcp_md5sig_pool, cpu).scratch = scratch; >> - } >> - if (per_cpu(tcp_md5sig_pool, cpu).md5_req) >> - continue; >> - >> - req = ahash_request_alloc(hash, GFP_KERNEL); >> - if (!req) >> - return; >> - >> - ahash_request_set_callback(req, 0, NULL, NULL); >> - >> - per_cpu(tcp_md5sig_pool, cpu).md5_req = req; >> + scratch_size = sizeof(union tcp_md5sum_block) + sizeof(struct tcphdr); >> + ret = tcp_sigpool_alloc_ahash("md5", scratch_size); >> + if (ret >= 0) { >> + tcp_md5_sigpool_id = ret; > > tcp_md5_alloc_sigpool() can be called from multiple cpus, > yet you are writing over tcp_md5_sigpool_id here without any > spinlock/mutex/annotations ? Yeah, it's writing-only: as long as there was a TCP-MD5 key in the system, sigpool_id returned by tcp_sigpool_alloc_ahash() would stay the same. The only time when it writes a different value - iff all previous MD5 keys were released. > KCSAN would eventually file a report, and a compiler might very well > transform this to > > tcp_md5_sigpool_id = random_value; > <window where readers might catch garbage> > tcp_md5_sigpool_id = ret; Agree, haven't thought about the optimizing compiler nightmares. Will add WRITE_ONCE(). [..] >> + >> +/** >> + * sigpool_reserve_scratch - re-allocates scratch buffer, slow-path >> + * @size: request size for the scratch/temp buffer >> + */ >> +static int sigpool_reserve_scratch(size_t size) >> +{ >> + struct scratches_to_free *stf; >> + size_t stf_sz = struct_size(stf, scratches, num_possible_cpus()); > > This is wrong. num_possible_cpus() could be 2, with two cpus numbered 0 and 8. > > Look for nr_cpu_ids instead. Hmm, but it seems num_possible_cpus() was exactly what I wanted? As free_old_scratches() does: : while (stf->cnt--) : kfree(stf->scratches[stf->cnt]); So, that's just the number of previously allocated scratches, not per-CPU index. >> + int cpu, err = 0; >> + >> + lockdep_assert_held(&cpool_mutex); >> + if (__scratch_size >= size) >> + return 0; >> + >> + stf = kmalloc(stf_sz, GFP_KERNEL); >> + if (!stf) >> + return -ENOMEM; >> + stf->cnt = 0; >> + >> + size = max(size, __scratch_size); >> + cpus_read_lock(); >> + for_each_possible_cpu(cpu) { >> + void *scratch, *old_scratch; >> + >> + scratch = kmalloc_node(size, GFP_KERNEL, cpu_to_node(cpu)); >> + if (!scratch) { >> + err = -ENOMEM; >> + break; >> + } >> + >> + old_scratch = rcu_replace_pointer(per_cpu(sigpool_scratch, cpu), >> + scratch, lockdep_is_held(&cpool_mutex)); >> + if (!cpu_online(cpu) || !old_scratch) { >> + kfree(old_scratch); >> + continue; >> + } >> + stf->scratches[stf->cnt++] = old_scratch; >> + } > > > I wonder why you are not simply using something around __alloc_percpu() I presume that was a heritage of __tcp_alloc_md5sig_pool() that allocated scratches this way. In pre-crypto-cloning versions per-CPU ahash_requests were allocated with alloc_percpu(). But now looking closer, if I read pcpu allocator code correctly, it does pcpu_mem_zalloc() for populating per-CPU chunks which depending on pcpu_chunk_struct_size may be allocated with vmalloc() or kzalloc(). IOW, as long as CONFIG_NEED_PER_CPU_KM != n, which is: : config NEED_PER_CPU_KM : depends on !SMP || !MMU The memory, returned by pcpu allocator has no guarantee of being from direct-map. Which I presume defeats the purpose of having any scratch/temporary area for sg/crypto. Am I missing something? >> + cpus_read_unlock(); >> + if (!err) >> + __scratch_size = size; >> + >> + call_rcu(&stf->rcu, free_old_scratches); >> + return err; >> +} >> + >> +static void sigpool_scratch_free(void) >> +{ >> + int cpu; >> + >> + for_each_possible_cpu(cpu) >> + kfree(rcu_replace_pointer(per_cpu(sigpool_scratch, cpu), >> + NULL, lockdep_is_held(&cpool_mutex))); > > I wonder why bothering about freeing some space ? One scratch buffer > per cpu is really small. When all TCP-MD5/TCP-AO keys removed, why not free some memory? [..] >> + /* slow-path */ >> + mutex_lock(&cpool_mutex); >> + ret = sigpool_reserve_scratch(scratch_size); >> + if (ret) >> + goto out; >> + for (i = 0; i < cpool_populated; i++) { >> + if (!cpool[i].alg) >> + continue; >> + if (strcmp(cpool[i].alg, alg)) >> + continue; >> + >> + if (kref_read(&cpool[i].kref) > 0) >> + kref_get(&cpool[i].kref); >> + else >> + kref_init(&cpool[i].kref); > > This looks wrong. kref_init() should be called once, after allocation, > not based on kref_read(). Well, as long as it's the slow-path and cpool_mutex was grabbed, this essentially re-allocating an entry that had it's last reference released, but wasn't yet destroyed by __cpool_free_entry(). I think, if I interpreted it correctly, it was suggested-by Jakub: https://lore.kernel.org/all/20230106175326.2d6a4dcd@kernel.org/T/#u [..] >> +static void __cpool_free_entry(struct sigpool_entry *e) >> +{ >> + crypto_free_ahash(e->hash); >> + kfree(e->alg); >> + memset(e, 0, sizeof(*e)); >> +} >> + >> +static void cpool_cleanup_work_cb(struct work_struct *work) > > Really this looks over complicated to me. > > All this kref maintenance and cpool_mutex costs :/ Hmm, unsure: I can remove this sigpool thing and just allocate a crypto tfm for every new setsockopt(). Currently, this allocates one tfm per hash algorithm, and without it it would be one tfm per setsockopt()/key. The supported scale currently is thousands of keys per-socket, which means that with sizeof(struct crypto_ahash) == 104, that will increase memory consumption by hundreds of kilobytes to megabytes. I thought that's a reasonable thing to do. Should I proceed TCP-AO patches without this manager thing? >> +{ >> + bool free_scratch = true; >> + unsigned int i; >> + >> + mutex_lock(&cpool_mutex); >> + for (i = 0; i < cpool_populated; i++) { >> + if (kref_read(&cpool[i].kref) > 0) { >> + free_scratch = false; >> + continue; >> + } >> + if (!cpool[i].alg) >> + continue; >> + __cpool_free_entry(&cpool[i]); >> + } >> + if (free_scratch) >> + sigpool_scratch_free(); >> + mutex_unlock(&cpool_mutex); >> +} [..] Thanks, Dmitry
Powered by blists - more mailing lists