linux-kernel - Kernel rwlock design, Multicore and IGMP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 11 Nov 2010 21:49:56 +0800
From:	Cypher Wu <cypher.w@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: Kernel rwlock design, Multicore and IGMP

I'm using TILEPro and its rwlock in kernel is a liitle different than
other platforms. It have a priority for write lock that when tried it
will block the following read lock even if read lock is hold by
others. Its code can be read in Linux Kernel 2.6.36 in
arch/tile/lib/spinlock_32.c.

That different could cause a deadlock in kernel if we join/leave
Multicast Group simultaneous and frequently on mutlicores. IGMP
message is sent by

igmp_ifc_timer_expire() -> igmpv3_send_cr() -> igmpv3_sendpack()

in timer interrupt, igmpv3_send_cr() will generate the sk_buff for
IGMP message with mc_list_lock read locked and then call
igmpv3_sendpack() with it unlocked.
But if we have so many join/leave messages have to generate and it
can't be sent in one sk_buff then igmpv3_send_cr() -> add_grec() will
call igmpv3_sendpack() to send it and reallocate a new buffer. When
the message is sent:

__mkroute_output() -> ip_check_mc()

will read lock mc_list_lock again. If there is another core is try
write lock mc_list_lock between the two read lock, then deadlock
ocurred.

The rwlock on other platforms I've check, say, PowerPC, x86, ARM, is
just read lock shared and write_lock mutex, so if we've hold read lock
the write lock will just wait, and if there have a read lock again it
will success.

So, What's the criteria of rwlock design in the Linux kernel? Is that
read lock re-hold of IGMP a design error in Linux kernel, or the read
lock has to be design like that?

There is a other thing, that the timer interrupt will start timer on
the same in_dev, should that be optimized?

BTW: If we have so many cores, say 64, is there other things we have
to think about spinlock? If there have collisions ocurred, should we
just read the shared memory again and again, or just a very little
'delay' is better? I've seen relax() is called in the implementation
of spinlock on TILEPro platform.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/