Less known Solaris 11.1 features: A user in 1024 groups and a workaround for a 25 year old problem

For a long time the maximum number of groups a user could belong to was 16, albeit there was a way to get 32. In Solaris 11 and recent versions of Solaris 10, the maximum number of groups a user could belong to is 1024 (which is the same limit Windows sets in this regard). It’s easy to set the new limit.

set ngroups_max=1024

After a reboot, this change will be active. But why isn’t this the default? There are good reasons for it. I will show you one of them in this entry. Like thinking that two digits for the year or using a signed 32-bit integer for storing the system time, the issue has it’s root cause in a decision made a long time ago … in this example the moment in the past is at least 25 years ago. And often just changing something, breaks stuff that is really old, but still in use. Experienced Solaris users, who tuned their Solaris System for up to 32 groups per user, already know the component that will be broken by having more than 16 users, because a message at the next boot of the system after the change in /etc/system that next startup will deliver a warning. It’s about the problem that many people encounter when using NFS with one user in more than 16 groups (it’s not an NFS problem, but i will explain that). However, as i already said, there is a a solution for this problem since Solaris 11.1. This blog entry will show the workaround in action.

AUTH_SYS

What is the problem? The problem is NFS, or to be exact a mechanisms used by NFS. So it isn’t a problem of NFS, it’s a problem when using NFS in conjunction with the AUTH_SYS in the ONC RPC specification, as NFS depends on the mechanisms provided by RPC for user authentication and user identification. The security mechanism AUTH_SYS (or as it is called sometimes - AUTH_UNIX ) doesn’t accept more than 16 groups for a user. It just ignores more than that by not transmitting them. The protocol cannot pass more than 16 group ids as groups to the server due to it’s specification: The RFC 1831 specifies in appendix A:

   struct authsys_parms {
         unsigned int stamp;
         string machinename<255>;
         unsigned int uid;
         unsigned int gid;
         <b>unsigned int gids<16>;</b>
      };

The problem is the bold part. The 16 group limit with AUTH_SYS originates from that part. That said: The same specification definition is already in RFC1057 from 1988. Windows 2.1x was introduced that year. Linus would release his kernel three years later. It’s a 25 year old specification. Perhaps another example why you should never assume your stuff won’t be used in 25 years. Perhaps people have to develop workarounds for your stuff in 25 years

A short rant and an announcement

That said, i have a lot of mental problems with AUTH_SYS as a security mechanism for NFS. I don’t want to make the case against AUTH_SYS in my blog entry. Many have written before about the intrinsic security issues of this mechanism introduced a long time ago based on assumptions that are not longer valid in many cases. At least the usage of AUTH_SYS needs a lot of thought to protect your data. However it’s the reality that many installations are still using this really basic mechanism.
That said one of the next tutorials will be a tutorial about setting up the alternatives that were developed since to make user authentication and identification more secure.

Preparations

The problem is easy to show: At first you have to start up a Solaris 11.1 client, after that you have to fire up a Solaris 10 VM and a Solaris 11.1 VM for use as a fileserver. 192.168.1.147 is the client named client, 192.168.1.149 is the Solaris 11 system named s11, 192.168.1.150 is the Solaris 10 system named s10. Then you have to add the following line to /etc/system of all systems used in the test:

set ngroups_max=1024

Now you had to reboot the systems. Create a user. In my example i’ve used the username jmoekamp.Now you add a number of groups to your systems by adding the following snippet to all /etc/group-files of the three systems.

g2::101:jmoekamp
g3::102:jmoekamp
g4::103:jmoekamp
g5::104:jmoekamp
g6::105:jmoekamp
g7::106:jmoekamp
g8::107:jmoekamp
g9::108:jmoekamp
g10::109:jmoekamp
g11::110:jmoekamp
g12::111:jmoekamp
g13::112:jmoekamp
g14::113:jmoekamp
g15::114:jmoekamp
g16::115:jmoekamp
g17::116:jmoekamp
g18::117:jmoekamp
g19::118:jmoekamp
g20::119:jmoekamp
g21::120:jmoekamp
g22::121:jmoekamp

As you want to demonstrate something with NFS, setting up an NFS share doesn’t harm. At first on the Solaris 11.1 system.

root@s11:~# mkdir /nfsshare
root@s11:~# share /nfsshare
root@s11:~# touch /nfsshare/test
root@s11:~# chmod 660
root@s11:~# chgroup g21

Then you repeat the same steps on the Solaris 10 system.

root@s10:~# mkdir /nfsshare
root@s10:~# share /nfsshare
root@s10:~# touch /nfsshare/test
root@s10:~# chmod 660
root@s10:~# chgroup g21

Now you mount both filesystems on the client:

root@client:~# mkdir /s10
root@client:~# mkdir /s11
root@client:~# mount -o vers=3 192.168.1.149:/nfsshare /s11
root@client:~# mount -o vers=3 192.168.1.150:/nfsshare /s10

Situation with a Solaris 10 NFS server

Okay, now let’s check. By using the /s10 directory we are using the Solaris 10 based System.

jmoekamp@client:~$ cd /s10
jmoekamp@client:/s10$ ls -l
total 1
-rw-rw----   1 root     g21            0 Aug 23 19:32 file
jmoekamp@client:/s10$ ls -ln
total 1
-rw-rw----   1 0        120            0 Aug 23 19:32 file

As you see, the file is owned by group 120 and it’s correctly translated to the name g21. The user jmoekamp is member of group g21. So you should be able to access it. However when you try it, the outcome is a little bit different.

jmoekamp@client:/s10$ cat file
cat: cannot open file: Permission denied

The system denies you access to the file. Okay, let’s try it with a different group. I log into the fileserver and change the group of the file:

jmoekamp@s10:/nfsshare$ chgrp g5 file

I try it again. And now the system give me access to the file.

jmoekamp@client:/s10$ cat file
jmoekamp@client:/s10$

However the user is definitely in both groups:

jmoekamp@client:/s10$ groups jmoekamp
g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12 g13 g14 g15 g16 g17 g18 g19 g20 g21 g22

So why does one requests yield only a permission denied, and the other gives you access to the file. Despite the fact that the user is in both groups. Despite the fact that the /etc/group and the /etc/passwd are absolutely identical. Simply said, the problem is, that the NFS server doesn’t use its own /etc/group to allow or to deny access. It uses data in the AUTH_SYS data structure i’ve shown to you earlier. Let’s look into it by using snoop -v -d net0 host 192.168.1.147 and host 192.168.1.150:

RPC:  ----- SUN RPC Header -----
RPC:
RPC:  Record Mark: last fragment, length = 164
RPC:  Transaction id = 516980345
RPC:  Type = 0 (Call)
RPC:  RPC version = 2
RPC:  Program = 100003 (NFS), version = 3, procedure = 1
RPC:  Credentials: Flavor = 1 (Unix), len = 88 bytes
RPC:     Time = 23-Aug-13 23:24:57
RPC:     Hostname = ca
RPC:     Uid = 100, Gid = 100
RPC:     Groups = 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes

As you have surely recognised, there isn’t the gid of g21 (120) in the credentials, however the gid of g5 (104) is in structure. So for the NFS server, the user is simply not in the group g21 and so it denies access.
Of course this behaviour is broken, however it obeys the standard. This is the basic reason why Solaris 10 prints when starting up.

Aug 23 21:29:22 unknown unix: [ID 953839 kern.warning] WARNING: ngroups_max of 1024 > 16, NFS AUTH_SYS will not work properly

Situation with a Solaris 11.1 NFS server

The situation is different since Solaris 11.1. because it introduced a mechanism to work with NFS and AUTH_SYS with more than 16 groups per user, without breaking the standard.

jmoekamp@client:/s11$ ls -l
total 1
-rw-rw----   1 root     g21            0 Aug 23 21:29 file
jmoekamp@client:/s11$ cat file
jmoekamp@client:/s11$ 

You can access the file. Perhaps it was just luck. Let’s try another group.

root@s11:/# chgrp g22 /nfsshare/file

Another test:

jmoekamp@client:/s11$ ls -l
total 1
-rw-rw----   1 root     g22            0 Aug 23 21:29 file
jmoekamp@client:/s11$ cat file
jmoekamp@client:/s11$

With Solaris 11.1 your user can be in more than 16 group and AUTH_SYS still works the issue shown with Solaris 10. It’s not the way that the protocol has been extended to carry more groups, as this could break compatibility and would not work with other NFS clients unaware of such extensions. When you look at the output of tcpdump, it’s pretty much the same than with Solaris 10.

RPC:  ----- SUN RPC Header -----
RPC:
RPC:  Record Mark: last fragment, length = 164
RPC:  Transaction id = 1422950009
RPC:  Type = 0 (Call)
RPC:  RPC version = 2
RPC:  Program = 100003 (NFS), version = 3, procedure = 1
RPC:  Credentials: Flavor = 1 (Unix), len = 88 bytes
RPC:     Time = 24-Aug-13 02:46:16
RPC:     Hostname = ca
RPC:     Uid = 100, Gid = 100
RPC:     Groups = 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:

So, why does it work with Solaris 11.1? The answer is in the message that appears when you start a Solaris 11.1 system with ngroups_max larger than 16:

Aug 23 21:18:03 client unix: [ID 489543 kern.warning] WARNING: ngroups_max of 1024 > 16, NFS AUTH_SYS will use look-aside groups

That’s different from the output in Solaris 10. The “trick” is pretty much straightforward: As long you don’t have touched ngroups_max or when the user credential of AUTH_SYS contain less than 16 groups, nothing changes. The NFS server will use the group information in the user credentials delivered by AUTH_SYS. However if ngroups_max is equal or larger than 16 and there are exactly 16 groups in the credentials transmitted by AUTH_SYS, the server will resolve the username to a username and look this up on it’s own from the configure name services. Obviously you need user and group information equal on all hosts, however then using AUTH_SYS it should be that way anyways, as the user with a certain user id should be the same both systems. So using NIS or LDAP is a really good idea for such an environment.

Conclusion

So when you have users belonging to more than 16 groups, still want or have to use AUTH_SYS, Solaris 11.1 gives you the necessary mechanisms to do so.

Do you want to learn more?

docs.oracle.com - ngroups_max