Damn Heisenbugs!
Last night someone came into the Mumble IRC channel with a complaint about our permissions system not working correctly. Initially, I responded with some skepticism - our permissions system looks pretty complicated to an end user, but the code to actually implement it is merely a pair of for
loops toggling matching states, with the last state “winning”, and for the most part it hasn’t changed in about 5 years. I consider it (in my uneducated opinion) to be one of our most thoroughly-tested pieces of code.
After a bit more complaining, I eventually got the user to elaborate on what exactly the problem is, attempted to replicate it, and managed to! What the hell? I was certainly not expecting that. I also figured out a workaround too.
It seemed at the time that creating a new channel, then editing the ACL of the channel to not inherit from it’s parent, then adding an @all
rule to deny all the privileges to a channel then adding another rule to allow @admin
write/speak privileges meant that only the user who edited the rules could actually enter the channel (enter is implied by write).
That last part makes sense - Murmur detects when you’re denying everything to everyone and automatically creates a rule allowing the user who made the change to write ACLs so you don’t end up with a broken channel that only the superuser can fix. But if you have an @admin
group last, anyone in the @admin
group should be able to enter the channel. What’s going on?
It also seemed that simply creating a new group that wasn’t called @admin
fixed this, which the user agreed with: it fixed their problem to simply create a copy of the @admin
group and assign permissions to that. However as I was documenting the how-to-reproduce, I blew away the channel and recreated it using the exact above steps - at least ten times in total - and not once could I reproduce the error! Every single time it worked as you’d expect.
I’ve asked the user if they can anonymize a copy of their server’s sqlite database and send it my way, but at the moment the only thing I can think of is that there’s some mistake the user made in creating the @admin
group that I managed to reproduce the first time but not when I paid more attention. I can’t see any reason in the code for there to be special treatment of the @admin
group.
I’m not entirely sure if this qualifies for “Heisenbug” status, but it’s sure infuriating!