netdev - Re: [PATCH v1 net-next 5/9] phonet: Don't hold RTNL for getaddr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241020013039.4151-1-kuniyu@amazon.com>
Date: Sat, 19 Oct 2024 18:30:39 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <remi@...lab.net>
CC: <kuniyu@...zon.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH v1 net-next 5/9] phonet: Don't hold RTNL for getaddr_dumpit().

From: "Rémi Denis-Courmont" <remi@...lab.net>
Date: Sat, 19 Oct 2024 10:48:57 +0300
> > > >  	list_for_each_entry_rcu(pnd, &pndevs->list, list) {
> > > > 
> > > > +		DECLARE_BITMAP(addrs, 64);
> > > > 
> > > >  		u8 addr;
> > > >  		
> > > >  		if (dev_idx > dev_start_idx)
> > > > 
> > > > @@ -143,23 +146,26 @@ static int getaddr_dumpit(struct sk_buff *skb,
> > > > struct
> > > > netlink_callback *cb) continue;
> > > > 
> > > >  		addr_idx = 0;
> > > > 
> > > > -		for_each_set_bit(addr, pnd->addrs, 64) {
> > > > +		memcpy(addrs, pnd->addrs, sizeof(pnd->addrs));
> > > 
> > > Is that really safe? Are we sure that the bit-field writers are atomic
> > > w.r.t. memcpy() on all platforms? If READ_ONCE is needed for an integer,
> > > using memcpy() seems sketchy, TBH.
> > 
> > I think bit-field read/write need not be atomic here because even
> > if a data-race happens, for_each_set_bit() iterates each bit, which
> > is the real data, regardless of whether data-race happened or not.
> 
> Err, it looks to me that a corrupt bit would lead to the index getting corrupt 
> and addresses getting skipped or repeated. AFAICT, the RTNL lock is still 
> needed here.

Let's say pnd->addrs has 0b00010001 as the lowest in 8 bits
and addr_dumpit() and addr_doit() are executed concurrently,
and addr_doit() tries to change it to 0b00010011.

If we have a lock, there could be two results of dumpit().

  1.
  lock()
  dumpit() -> 0b00010001
  unlock()
                   lock()
                   doit()
                   unlock()

  2.
                   lock()
                   doit()
                   unlock()
  lock()
  dumpit() -> 0b00010011
  unlock()

If we don't have a lock and dumpit()'s read and doit()'s write
are split to upper 4 bits (0b0001) and lower 4 bits (0b0011),

there are 6 patterns of occurences, but the results are only
2 patterns and the same with the locked version.

If you get 0b00010001, you can think dumpit() completes earlier,
and the opposite if you get 0b00010011.

This is how we think with lockless algo.  In this case, we do
not require lock to iterate bitmaps.

  1.
  upper half read of dumpit()
  lower half read of dumpit()
                   upper half write of doit() -> 0b0001
                   lower half write of doit() -> 0b0011
  -> 0b00010001

  2.
  upper half read of dumpit()
                   upper half write of doit() -> 0b0001
  lower half read of dumpit()
                   lower half write of doit() -> 0b0011
  -> 0b00010001

  3.
                   upper half write of doit() -> 0b0001
  upper half read of dumpit()
  lower half read of dumpit()
                   lower half write of doit() -> 0b0011
  -> 0b00010001

  4.
                   upper half write of doit() -> 0b0001
                   lower half write of doit() -> 0b0011
  upper half read of dumpit()
  lower half read of dumpit()
  -> 0b00010011

  5.
                   upper half write of doit() -> 0b0001
  upper half read of dumpit()
                   lower half write of doit() -> 0b0011
  lower half read of dumpit()
  -> 0b00010011

  6.
  upper half read of dumpit()
                   upper half write of doit() -> 0b0001
                   lower half write of doit() -> 0b0011
  lower half read of dumpit()
  -> 0b00010011