[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YrOx/eG/JZpRv22m@google.com>
Date: Thu, 23 Jun 2022 09:21:17 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Minchan Kim <minchan@...nel.org>,
Nitin Gupta <ngupta@...are.org>, linux-kernel@...r.kernel.org,
linux-block@...r.kernel.org
Subject: Re: [PATCH] zram: do not lookup algorithm in backends table
On (22/06/22 12:19), Andrew Morton wrote:
> > On (22/06/22 11:35), Sergey Senozhatsky wrote:
> > > Always use crypto_has_comp() so that crypto can lookup module,
> > > call usermodhelper to load the modules, wait for usermodhelper
> > > to finish and so on. Otherwise crypto will do all of these steps
> > > under CPU hot-plug lock and this looks like too much stuff to
> > > handle under the CPU hot-plug lock. Besides this can end up in
> > > a deadlock when usermodhelper triggers a code path that attempts
> > > to lock the CPU hot-plug lock, that zram already holds.
> >
> > And we think that we (not exactly "we", our partners) actually
> > see a deadlock. It goes something like this:
> >
> > - path A. zram grabs CPU hot-plug lock, execs /sbin/modprobe from crypto
> > and waits for modprobe to finish
>
> Nope, can't do that.
>
> > disksize_store
> > zcomp_create
> > __cpuhp_state_add_instance
> > __cpuhp_state_add_instance_cpuslocked
> > zcomp_cpu_up_prepare
> > crypto_alloc_base
> > crypto_alg_mod_lookup
> > call_usermodehelper_exec
> > wait_for_completion_killable
> > do_wait_for_common
> > schedule
>
> The usermode helper is free to do anything it wants, including
> operations that take the CPU hotplug lock. Or operations which might
> in the future be changed to take that lock.
Agreed.
> > - path B. async work kthread that brings in scsi device. It wants to
> > register CPUHP states at some point, and it needs the CPU hot-plug
> > lock for that, which is owned by zram.
> >
> > async_run_entry_fn
> > scsi_probe_and_add_lun
> > scsi_mq_alloc_queue
> > blk_mq_init_queue
> > blk_mq_init_allocated_queue
> > blk_mq_realloc_hw_ctxs
> > __cpuhp_state_add_instance
> > __cpuhp_state_add_instance_cpuslocked
> > mutex_lock
> > schedule
> >
> > - path C. modprobe sleeps, waiting for all aync works to finish.
> >
> > load_module
> > do_init_module
> > async_synchronize_full
> > async_synchronize_cookie_domain
> > schedule
> >
> > And none can make any progress.
> >
> > So I think we need to move crypto_alg_mod_lookup()->call_usermodehelper_exec()
> > out of CPU hot-plug lock and pre-load modules in advance, before we grab the
> > hot-plug lock.
>
> If the locking is fixed, why is there still a need to preload modules?
We "fix" locking by doing initial crypto compression algorithm lookup
outside of hot-plug lock (pre-load).
Crypto API handles a list of preloaded modules internally. What we do
currently, we call crypto_alloc_base() under hot-plug lock, which calls
crypto_alg_mod_lookup(), which figures out that crypto modules list does
not contain that module yet so then it modprobes it.
With this patch we do the first crypto_alg_mod_lookup() outside of
hot-plug lock, so that it safely modprobes compression module. Then
when we grab the hot-plug lock and setup per-CPU streams,
crypto_alloc_base()->crypto_alg_mod_lookup() figures that module is
already on the list so no modprobe is needed.
Powered by blists - more mailing lists