lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220304015650.GC6112@X58A-UD3R>
Date:   Fri, 4 Mar 2022 10:56:51 +0900
From:   Byungchul Park <byungchul.park@....com>
To:     Jan Kara <jack@...e.cz>
Cc:     torvalds@...ux-foundation.org, damien.lemoal@...nsource.wdc.com,
        linux-ide@...r.kernel.org, adilger.kernel@...ger.ca,
        linux-ext4@...r.kernel.org, mingo@...hat.com,
        linux-kernel@...r.kernel.org, peterz@...radead.org,
        will@...nel.org, tglx@...utronix.de, rostedt@...dmis.org,
        joel@...lfernandes.org, sashal@...nel.org, daniel.vetter@...ll.ch,
        chris@...is-wilson.co.uk, duyuyang@...il.com,
        johannes.berg@...el.com, tj@...nel.org, tytso@....edu,
        willy@...radead.org, david@...morbit.com, amir73il@...il.com,
        bfields@...ldses.org, gregkh@...uxfoundation.org,
        kernel-team@....com, linux-mm@...ck.org, akpm@...ux-foundation.org,
        mhocko@...nel.org, minchan@...nel.org, hannes@...xchg.org,
        vdavydov.dev@...il.com, sj@...nel.org, jglisse@...hat.com,
        dennis@...nel.org, cl@...ux.com, penberg@...nel.org,
        rientjes@...gle.com, vbabka@...e.cz, ngupta@...are.org,
        linux-block@...r.kernel.org, axboe@...nel.dk,
        paolo.valente@...aro.org, josef@...icpanda.com,
        linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk,
        jack@...e.com, jlayton@...nel.org, dan.j.williams@...el.com,
        hch@...radead.org, djwong@...nel.org,
        dri-devel@...ts.freedesktop.org, airlied@...ux.ie,
        rodrigosiqueiramelo@...il.com, melissa.srw@...il.com,
        hamohammed.sa@...il.com
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1

On Thu, Mar 03, 2022 at 10:54:56AM +0100, Jan Kara wrote:
> On Thu 03-03-22 10:00:33, Byungchul Park wrote:
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> > 
> >    consumer			producer
> > 
> > 				lock L
> > 				(too much work queued == true)
> > 				unlock L
> > 				--- preempted
> >    lock L
> >    unlock L
> >    do work
> >    lock L
> >    unlock L
> >    do work
> >    ...
> >    (no work == true)
> >    sleep
> > 				--- scheduled in
> > 				sleep
> > 
> > This code leads a deadlock without another wakeup source, say, not safe.
> 
> So the scenario you describe above is indeed possible. But the trick is
> that the wakeup from 'consumer' as is doing work will remove 'producer'
> from the wait queue and change the 'producer' process state to
> 'TASK_RUNNING'. So when 'producer' calls sleep (in fact schedule()), the
> scheduler will just treat this as another preemption point and the
> 'producer' will immediately or soon continue to run. So indeed we can think
> of this as "another wakeup source" but the source is in the CPU scheduler
> itself. This is the standard way how waitqueues are used in the kernel...

Nice! Thanks for the explanation. I will take it into account if needed.

> > Lastly, just for your information, I need to explain how Dept works a
> > little more for you not to misunderstand Dept.
> > 
> > Assuming the consumer and producer guarantee not to lead a deadlock like
> > the following, Dept won't report it a problem:
> > 
> >    consumer			producer
> > 
> > 				sleep
> >    wakeup work_done
> > 				queue work
> >    sleep
> > 				wakeup work_queued
> >    do work
> > 				sleep
> >    wakeup work_done
> > 				queue work
> >    sleep
> > 				wakeup work_queued
> >    do work
> > 				sleep
> >    ...				...
> > 
> > Dept does not consider all waits preceeding an event but only waits that
> > might lead a deadlock. In this case, Dept works with each region
> > independently.
> > 
> >    consumer			producer
> > 
> > 				sleep <- initiates region 1
> >    --- region 1 starts
> >    ...				...
> >    --- region 1 ends
> >    wakeup work_done
> >    ...				...
> > 				queue work
> >    ...				...
> >    sleep <- initiates region 2
> > 				--- region 2 starts
> >    ...				...
> > 				--- region 2 ends
> > 				wakeup work_queued
> >    ...				...
> >    do work
> >    ...				...
> > 				sleep <- initiates region 3
> >    --- region 3 starts
> >    ...				...
> >    --- region 3 ends
> >    wakeup work_done
> >    ...				...
> > 				queue work
> >    ...				...
> >    sleep <- initiates region 4
> > 				--- region 4 starts
> >    ...				...
> > 				--- region 4 ends
> > 				wakeup work_queued
> >    ...				...
> >    do work
> >    ...				...
> > 
> > That is, Dept does not build dependencies across different regions. So
> > you don't have to worry about unreasonable false positives that much.
> > 
> > Thoughts?
> 
> Thanks for explanation! And what exactly defines the 'regions'? When some
> process goes to sleep on some waitqueue, this defines a start of a region
> at the place where all the other processes are at that moment and wakeup of
> the waitqueue is an end of the region?

Yes. Let me explain it more for better understanding.
(I copied it from the talk I did with Matthew..)


   ideal view
   -----------
   context X			context Y

   request event E		...
      write REQUESTEVENT	when (notice REQUESTEVENT written)
   ...				   notice the request from X [S]

				--- ideally region 1 starts here
   wait for the event		...
      sleep			if (can see REQUESTEVENT written)
   				   it's on the way to the event
   ...				
   				...
				--- ideally region 1 ends here

				finally the event [E]

Dept basically works with the above view with regard to wait and event.
But it's very hard to identify the ideal [S] point in practice. So Dept
instead identifies [S] point by checking WAITSTART with memory barriers
like the following, which would make Dept work conservatively.


   Dept's view
   ------------
   context X			context Y

   request event E		...
      write REQUESTEVENT	when (notice REQUESTEVENT written)
   ...				   notice the request from X

				--- region 2 Dept gives up starts
   wait for the event		...
      write barrier
      write WAITSTART		read barrier
      sleep			when (notice WAITSTART written)
				   ensure the request has come [S]

				--- region 2 Dept gives up ends
				--- region 3 starts here
				...
				if (can see WAITSTART written)
				   it's on the way to the event
   ...				
   				...
				--- region 3 ends here

   				finally the event [E]

In short, Dept works with region 3.

Thanks,
Byungchul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ