linux-kernel - Re: Regression from 2.6.36

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1302177428.3357.25.camel@edumazet-laptop>
Date:	Thu, 07 Apr 2011 13:57:08 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Américo Wang <xiyou.wangcong@...il.com>
Cc:	Jiri Slaby <jslaby@...e.cz>, azurIt <azurit@...ox.sk>,
	linux-kernel@...r.kernel.org, Changli Gao <xiaosuo@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org, Jiri Slaby <jirislaby@...il.com>
Subject: Re: Regression from 2.6.36

Le jeudi 07 avril 2011 à 19:21 +0800, Américo Wang a écrit :
> On Thu, Apr 7, 2011 at 6:19 PM, Jiri Slaby <jslaby@...e.cz> wrote:
> > Cced few people.
> >
> > Also the series which introduced this were discussed at:
> > http://lkml.org/lkml/2010/5/3/53


> >
> 
> I guess this is due to that lots of fdt are allocated by kmalloc(),
> not vmalloc(), and we kfree() them in rcu callback.
> 
> How about deferring all of the removal to workqueue? This may
> hurt performance I think.
> 
> Anyway, like the patch below... makes sense?
> 
> Not-yet-signed-off-by: WANG Cong <xiyou.wangcong@...il.com>
> 
> ---
> diff --git a/fs/file.c b/fs/file.c
> index 0be3447..34dc355 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -96,20 +96,14 @@ void free_fdtable_rcu(struct rcu_head *rcu)
>                                 container_of(fdt, struct files_struct, fdtab));
>                 return;
>         }
> -       if (!is_vmalloc_addr(fdt->fd) && !is_vmalloc_addr(fdt->open_fds)) {
> -               kfree(fdt->fd);
> -               kfree(fdt->open_fds);
> -               kfree(fdt);
> -       } else {
> -               fddef = &get_cpu_var(fdtable_defer_list);
> -               spin_lock(&fddef->lock);
> -               fdt->next = fddef->next;
> -               fddef->next = fdt;
> -               /* vmallocs are handled from the workqueue context */
> -               schedule_work(&fddef->wq);
> -               spin_unlock(&fddef->lock);
> -               put_cpu_var(fdtable_defer_list);
> -       }
> +
> +       fddef = &get_cpu_var(fdtable_defer_list);
> +       spin_lock(&fddef->lock);
> +       fdt->next = fddef->next;
> +       fddef->next = fdt;
> +       schedule_work(&fddef->wq);
> +       spin_unlock(&fddef->lock);
> +       put_cpu_var(fdtable_defer_list);
>  }


Nope, this makes no sense at all.

Its probably the other way. We want to free those blocks ASAP

A fix would be to make alloc_fdmem() use vmalloc() if size is more than
4 pages, or whatever limit is reached.

We had a similar memory problem in fib_trie in the past  : We force a
synchronize_rcu() every XXX Mbytes allocated to make sure we dont have
too much ram waiting to be freed in rcu queues.







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/