linux-kernel - Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <s5h36eqmtf3.wl-tiwai@suse.de>
Date:   Thu, 14 Nov 2019 15:20:00 +0100
From:   Takashi Iwai <tiwai@...e.de>
To:     Chih-Yang Hsia <paulhsia@...omium.org>
Cc:     linux-kernel@...r.kernel.org, Mark Brown <broonie@...nel.org>,
        Takashi Iwai <tiwai@...e.com>, alsa-devel@...a-project.org
Subject: Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access

On Thu, 14 Nov 2019 15:16:04 +0100,
Chih-Yang Hsia wrote:
> 
> On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@...e.de> wrote:
> >
> > On Wed, 13 Nov 2019 10:47:51 +0100,
> > Takashi Iwai wrote:
> > >
> > > On Wed, 13 Nov 2019 08:24:41 +0100,
> > > Chih-Yang Hsia wrote:
> > > >
> > > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@...e.de> wrote:
> > > > >
> > > > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > > > paulhsia wrote:
> > > > > >
> > > > > > Since
> > > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > > > >   stream lock.
> > > > > >
> > > > > > This will trigger null memory access in snd_pcm_running() call in
> > > > > > snd_pcm_period_elapsed.
> > > > >
> > > > > Well, if a stream is detached, it means that the stream must have been
> > > > > already closed; i.e. it's already a clear bug in the driver that
> > > > > snd_pcm_period_elapsed() is called against such a stream.
> > > > >
> > > > > Or am I missing other possible case?
> > > > >
> > > > >
> > > > > thanks,
> > > > >
> > > > > Takashi
> > > > >
> > > >
> > > > In multithreaded environment, it is possible to have to access both
> > > > `interrupt_handler` (from irq) and `substream close` (from
> > > > snd_pcm_release) at the same time.
> > > > Therefore, in driver implementation, if "substream close function" and
> > > > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > > > same lock, then the following things can happen:
> > > >
> > > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > > > sustream pointer
> > > > 2. snd_pcm_release_substream: call close without blocking
> > > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > > > substream->runtime to NULL
> > > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > > > accessing fields in `substream->runtime`
> > > >
> > > > e.g. In intel8x0.c driver for ac97 device,
> > > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > > > checking `ichdev->substream` in `snd_intel8x0_update`.
> > > > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > > > and run to snd_pcm_detach_substream() in another thread, it's possible
> > > > to trigger a crash.
> > > > I can reproduce the issue within a multithread VM easily.
> > > >
> > > > My patches are trying to provide a basic protection for this situation
> > > > (and internal pcm lock between detach and elapsed), since
> > > > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > > > the possible race if the driver does not  force the order for `calling
> > > > snd_pcm_period_elapsed` and `close` by lock and
> > > > - lots of drivers already have this hidden issue and I can't fix them
> > > > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > > > "close implementation" within all the drivers). The most common
> > > > mistake is that
> > > >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> > > >   - But `close` can happen anytime, pass without block and
> > > > snd_pcm_detach_substream will be trigger right after it
> > >
> > > Thanks, point taken.  While this argument is valid and it's good to
> > > harden the PCM core side, the concurrent calls are basically a bug,
> > > and we'd need another fix in anyway.  Also, the patch 2 makes little
> > > sense; there can't be multiple close calls racing with each other.  So
> > > I'll go for taking your fix but only the first patch.
> > >
> > > Back to this race: the surfaced issue is, as you pointed out, the race
> > > between snd_pcm_period_elapsed() vs close call.  However, the
> > > fundamental problem is the pending action after the PCM trigger-stop
> > > call.  Since the PCM trigger doesn't block nor wait until the hardware
> > > actually stops the things, the driver may go to the other step even
> > > after this "supposed-to-be-stopped" point.  In your case, it goes up
> > > to close, and crashes.  If we had a sync-stop operation, the interrupt
> > > handler should have finished before moving to the close stage, hence
> > > such a race could be avoided.
> > >
> > > It's been a long known problem, and some drivers have the own
> > > implementation for stop-sync.  I think it's time to investigate and
> > > start implementing the fundamental solution.
> >
> > BTW, what we need essentially for intel8x0 is to just call
> > synchronize_irq() before closing, at best in hw_free procedure:
> >
> > --- a/sound/pci/intel8x0.c
> > +++ b/sound/pci/intel8x0.c
> > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
> >
> >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> >  {
> > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> >         struct ichdev *ichdev = get_ichdev(substream);
> >
> > +       synchronize_irq(chip->irq);
> >         if (ichdev->pcm_open_flag) {
> >                 snd_ac97_pcm_close(ichdev->pcm);
> >                 ichdev->pcm_open_flag = 0;
> >
> >
> > The same would be needed also at the beginning of the prepare, as the
> > application may restart the stream without release.
> >
> > My idea is to add sync_stop PCM ops and call it from PCM core at
> > snd_pcm_prepare() and snd_pcm_hw_free().
> >
> Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue?
> Is it possible to have sequence like the following steps ?
> - [Thread 1] snd_pcm_hw_free: just pass synchronize_irq()
> - [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes
> into the lock region of snd_pcm_period_elapsed() and passes the
> PCM_RUNTIME_CHECK (right before snd_pcm_running())

This shouldn't happen because at the point snd_pcm_hw_free() the
stream has been already in the SETUP state, i.e. with trigger PCM
callback, the hardware has been programmed not to generate the PCM
stream IRQ.


Takashi


> - [Thread 1] snd_pcm_hw_free finished() -> snd_pcm_detach_substream()
> -> runtime=NULL
> - [Thread 2] Execute snd_pcm_running and crash
> 
> I can't trigger the issue after adding the synchronize_irq(), but
> maybe it's just luck. Correct my if I miss something.
> 
> Thanks,
> Paul
> 
> 
> 
> 
> >
> > thanks,
> >
> > Takashi
>