lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 21 Sep 2015 12:53:40 +0200
From:	Dmitry Vyukov <dvyukov@...gle.com>
To:	Jeff Layton <jlayton@...chiereds.net>
Cc:	bfields@...ldses.org, Al Viro <viro@...iv.linux.org.uk>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Kostya Serebryany <kcc@...gle.com>,
	Andrey Konovalov <andreyknvl@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	ktsan@...glegroups.com, Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH] fs: fix data races on inode->i_flctx

On Mon, Sep 21, 2015 at 12:50 PM, Jeff Layton <jlayton@...chiereds.net> wrote:
> On Mon, 21 Sep 2015 09:43:06 +0200
> Dmitry Vyukov <dvyukov@...gle.com> wrote:
>
>> locks_get_lock_context() uses cmpxchg() to install i_flctx.
>> cmpxchg() is a release operation which is correct. But it uses
>> a plain load to load i_flctx. This is incorrect. Subsequent loads
>> from i_flctx can hoist above the load of i_flctx pointer itself
>> and observe uninitialized garbage there. This in turn can lead
>> to corruption of ctx->flc_lock and other members.
>>
>
> I don't get this bit. The i_flctx field is initialized to zero when the
> inode is allocated, and we only assign it with cmpxchg(). How would you
> ever see uninitialized garbage in there? It should either be NULL or a
> valid pointer, no?

This is not about i_flctx pointer, this is about i_flctx object
contents (pointee).
Readers can see a non-NULL pointer, but read garbage from *i_flctx.

> If that'st the case, then most of the places where you're adding
> smp_load_acquire are places that can tolerate seeing NULL there. e.g.
> you have racing LOCK_EX/LOCK_UN flock() calls in different tasks or
> something.
>
>> Documentation/memory-barriers.txt explicitly requires to use
>> a barrier in such context:
>> "A load-load control dependency requires a full read memory barrier".
>>
>
> The same file also says:
>
> "Any atomic operation that modifies some state in memory and returns information
> about the state (old or new) implies an SMP-conditional general memory barrier
> (smp_mb()) on each side of the actual operation (with the exception of
> explicit lock operations, described later).  These include:
>
> ...
> /* when succeeds */
> cmpxchg();
> "
>
> If there is an implied smp_mb() before and after the cmpxchg(), how
> could the CPU hoist anything before that point?
>
> Note that I'm not saying that you're wrong, I just want to make sure I
> fully understand the problem before we go trying to fix it.
>
>> Use smp_load_acquire() in locks_get_lock_context() and in bunch
>> of other functions that can proceed concurrently with
>> locks_get_lock_context().
>>
>> The data race was found with KernelThreadSanitizer (KTSAN).
>>
>> Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
>> ---
>> For the record, here is the KTSAN report:
>>
>> ThreadSanitizer: data-race in _raw_spin_lock
>>
>> Read at 0xffff8800bae50da0 of size 1 by thread 3060 on CPU 2:
>>  [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
>>  [<ffffffff81ee37d0>] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
>>  [<     inline     >] spin_lock include/linux/spinlock.h:312
>>  [<ffffffff812e1187>] flock_lock_inode+0xb7/0x440 fs/locks.c:895
>>  [<ffffffff812e16e6>] flock_lock_inode_wait+0x46/0x110 fs/locks.c:1871
>>  [<ffffffff812e2814>] SyS_flock+0x224/0x23
>>
>> Previous write at 0xffff8800bae50da0 of size 8 by thread 3063 on CPU 8:
>>  [<ffffffff81248628>] kmem_cache_alloc+0xd8/0x2f0 mm/slab.c:3420
>>  [<ffffffff812ddba0>] locks_get_lock_context+0x60/0x140 fs/locks.c:213
>>  [<ffffffff812e112a>] flock_lock_inode+0x5a/0x440 fs/locks.c:882
>>  [<ffffffff812e16e6>] flock_lock_inode_wait+0x46/0x110 fs/locks.c:1871
>>  [<     inline     >] flock_lock_file_wait include/linux/fs.h:1219
>>  [<     inline     >] SYSC_flock fs/locks.c:1941
>>  [<ffffffff812e2814>] SyS_flock+0x224/0x230 fs/locks.c:1904
>>  [<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95 arch/x86/entry/entry_64.S:188
>> ---
>>  fs/locks.c | 63 +++++++++++++++++++++++++++++++++++---------------------------
>>  1 file changed, 36 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/locks.c b/fs/locks.c
>> index 2a54c80..316e474 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -205,28 +205,32 @@ static struct kmem_cache *filelock_cache __read_mostly;
>>  static struct file_lock_context *
>>  locks_get_lock_context(struct inode *inode, int type)
>>  {
>> -     struct file_lock_context *new;
>> +     struct file_lock_context *ctx;
>>
>> -     if (likely(inode->i_flctx) || type == F_UNLCK)
>> +     /* paired with cmpxchg() below */
>> +     ctx = smp_load_acquire(&inode->i_flctx);
>> +     if (likely(ctx) || type == F_UNLCK)
>>               goto out;
>>
>> -     new = kmem_cache_alloc(flctx_cache, GFP_KERNEL);
>> -     if (!new)
>> +     ctx = kmem_cache_alloc(flctx_cache, GFP_KERNEL);
>> +     if (!ctx)
>>               goto out;
>>
>> -     spin_lock_init(&new->flc_lock);
>> -     INIT_LIST_HEAD(&new->flc_flock);
>> -     INIT_LIST_HEAD(&new->flc_posix);
>> -     INIT_LIST_HEAD(&new->flc_lease);
>> +     spin_lock_init(&ctx->flc_lock);
>> +     INIT_LIST_HEAD(&ctx->flc_flock);
>> +     INIT_LIST_HEAD(&ctx->flc_posix);
>> +     INIT_LIST_HEAD(&ctx->flc_lease);
>>
>>       /*
>>        * Assign the pointer if it's not already assigned. If it is, then
>>        * free the context we just allocated.
>>        */
>> -     if (cmpxchg(&inode->i_flctx, NULL, new))
>> -             kmem_cache_free(flctx_cache, new);
>> +     if (cmpxchg(&inode->i_flctx, NULL, ctx)) {
>> +             kmem_cache_free(flctx_cache, ctx);
>> +             ctx = smp_load_acquire(&inode->i_flctx);
>> +     }
>>  out:
>> -     return inode->i_flctx;
>> +     return ctx;
>>  }
>>
>>  void
>> @@ -762,7 +766,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
>>       struct file_lock_context *ctx;
>>       struct inode *inode = file_inode(filp);
>>
>> -     ctx = inode->i_flctx;
>> +     ctx = smp_load_acquire(&inode->i_flctx);
>>       if (!ctx || list_empty_careful(&ctx->flc_posix)) {
>>               fl->fl_type = F_UNLCK;
>>               return;
>> @@ -1203,7 +1207,7 @@ int locks_mandatory_locked(struct file *file)
>>       struct file_lock_context *ctx;
>>       struct file_lock *fl;
>>
>> -     ctx = inode->i_flctx;
>> +     ctx = smp_load_acquire(&inode->i_flctx);
>>       if (!ctx || list_empty_careful(&ctx->flc_posix))
>>               return 0;
>>
>> @@ -1388,7 +1392,7 @@ any_leases_conflict(struct inode *inode, struct file_lock *breaker)
>>  int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>>  {
>>       int error = 0;
>> -     struct file_lock_context *ctx = inode->i_flctx;
>> +     struct file_lock_context *ctx;
>>       struct file_lock *new_fl, *fl, *tmp;
>>       unsigned long break_time;
>>       int want_write = (mode & O_ACCMODE) != O_RDONLY;
>> @@ -1400,6 +1404,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>>       new_fl->fl_flags = type;
>>
>>       /* typically we will check that ctx is non-NULL before calling */
>> +     ctx = smp_load_acquire(&inode->i_flctx);
>>       if (!ctx) {
>>               WARN_ON_ONCE(1);
>>               return error;
>> @@ -1494,9 +1499,10 @@ EXPORT_SYMBOL(__break_lease);
>>  void lease_get_mtime(struct inode *inode, struct timespec *time)
>>  {
>>       bool has_lease = false;
>> -     struct file_lock_context *ctx = inode->i_flctx;
>> +     struct file_lock_context *ctx;
>>       struct file_lock *fl;
>>
>> +     ctx = smp_load_acquire(&inode->i_flctx);
>>       if (ctx && !list_empty_careful(&ctx->flc_lease)) {
>>               spin_lock(&ctx->flc_lock);
>>               if (!list_empty(&ctx->flc_lease)) {
>> @@ -1543,10 +1549,11 @@ int fcntl_getlease(struct file *filp)
>>  {
>>       struct file_lock *fl;
>>       struct inode *inode = file_inode(filp);
>> -     struct file_lock_context *ctx = inode->i_flctx;
>> +     struct file_lock_context *ctx;
>>       int type = F_UNLCK;
>>       LIST_HEAD(dispose);
>>
>> +     ctx = smp_load_acquire(&inode->i_flctx);
>>       if (ctx && !list_empty_careful(&ctx->flc_lease)) {
>>               spin_lock(&ctx->flc_lock);
>>               time_out_leases(file_inode(filp), &dispose);
>> @@ -1713,9 +1720,10 @@ static int generic_delete_lease(struct file *filp, void *owner)
>>       struct file_lock *fl, *victim = NULL;
>>       struct dentry *dentry = filp->f_path.dentry;
>>       struct inode *inode = dentry->d_inode;
>> -     struct file_lock_context *ctx = inode->i_flctx;
>> +     struct file_lock_context *ctx;
>>       LIST_HEAD(dispose);
>>
>> +     ctx = smp_load_acquire(&inode->i_flctx);
>>       if (!ctx) {
>>               trace_generic_delete_lease(inode, NULL);
>>               return error;
>> @@ -2359,13 +2367,14 @@ out:
>>  void locks_remove_posix(struct file *filp, fl_owner_t owner)
>>  {
>>       struct file_lock lock;
>> -     struct file_lock_context *ctx = file_inode(filp)->i_flctx;
>> +     struct file_lock_context *ctx;
>>
>>       /*
>>        * If there are no locks held on this file, we don't need to call
>>        * posix_lock_file().  Another process could be setting a lock on this
>>        * file at the same time, but we wouldn't remove that lock anyway.
>>        */
>> +     ctx =  smp_load_acquire(&file_inode(filp)->i_flctx);
>>       if (!ctx || list_empty(&ctx->flc_posix))
>>               return;
>>
>> @@ -2389,7 +2398,7 @@ EXPORT_SYMBOL(locks_remove_posix);
>>
>>  /* The i_flctx must be valid when calling into here */
>>  static void
>> -locks_remove_flock(struct file *filp)
>> +locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
>>  {
>>       struct file_lock fl = {
>>               .fl_owner = filp,
>> @@ -2400,7 +2409,6 @@ locks_remove_flock(struct file *filp)
>>               .fl_end = OFFSET_MAX,
>>       };
>>       struct inode *inode = file_inode(filp);
>> -     struct file_lock_context *flctx = inode->i_flctx;
>>
>>       if (list_empty(&flctx->flc_flock))
>>               return;
>> @@ -2416,10 +2424,8 @@ locks_remove_flock(struct file *filp)
>>
>>  /* The i_flctx must be valid when calling into here */
>>  static void
>> -locks_remove_lease(struct file *filp)
>> +locks_remove_lease(struct file *filp, struct file_lock_context *ctx)
>>  {
>> -     struct inode *inode = file_inode(filp);
>> -     struct file_lock_context *ctx = inode->i_flctx;
>>       struct file_lock *fl, *tmp;
>>       LIST_HEAD(dispose);
>>
>> @@ -2439,17 +2445,20 @@ locks_remove_lease(struct file *filp)
>>   */
>>  void locks_remove_file(struct file *filp)
>>  {
>> -     if (!file_inode(filp)->i_flctx)
>> +     struct file_lock_context *ctx;
>> +
>> +     ctx = smp_load_acquire(&file_inode(filp)->i_flctx);
>> +     if (!ctx)
>>               return;
>>
>>       /* remove any OFD locks */
>>       locks_remove_posix(filp, filp);
>>
>>       /* remove flock locks */
>> -     locks_remove_flock(filp);
>> +     locks_remove_flock(filp, ctx);
>>
>>       /* remove any leases */
>> -     locks_remove_lease(filp);
>> +     locks_remove_lease(filp, ctx);
>>  }
>>
>>  /**
>> @@ -2616,7 +2625,7 @@ void show_fd_locks(struct seq_file *f,
>>       struct file_lock_context *ctx;
>>       int id = 0;
>>
>> -     ctx = inode->i_flctx;
>> +     ctx = smp_load_acquire(&inode->i_flctx);
>>       if (!ctx)
>>               return;
>>
>
>
> --
> Jeff Layton <jlayton@...chiereds.net>



-- 
Dmitry Vyukov, Software Engineer, dvyukov@...gle.com
Google Germany GmbH, Dienerstraße 12, 80331, München
Geschäftsführer: Graham Law, Christine Elizabeth Flores
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat
sind, leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists