Archive for February, 2019

Let’s talk micro kernels

Thursday, February 14th, 2019

Over four years ago I wrote about Why monolithic kernels are fail and last night I live streamed some talk about micro kernels:

And in my famous prior art patent voiding serious I wanted to formerly disclose a potentially novel system call implementation detail idea: In multi server micro kernels context switches happen more frequent than in simple, shared address space monolithic kernel (such as Linux). This is due to device drivers (and filesystem, network stacks and hopefully everything else, ..!) hopefully running in regular, isolated processes. This is obviously great for security and stability, but naturally requires way more context switches that can hurt performance. Now slim and lightweight micro kernels (such as L4) are usually quite optimized and often require less instruction cycles for inter address space context switches, however such a horde of multiple servers will execute multiple times more of those. Now my (novel?) idea to further minimize the costs of context switches is to bundle system calls in batches (or molecules? ;-) so that an application (or device driver, after all they should now be regular processes, …!) only context switch once for a couple of system calls. The exact ABI calling convention and format has to be defined by the implementor and implementation, for simplicity let’s assume some kind of “protocol buffer”, so a open, read, seek, read sequence could look like:

syscallnr | arg0 | arg1 | arg2 | …
read | fd | buffer | 128
seek | fd | -128 | SEEK_END
read | fd | buffer | 128

The same concept obviously applies to all kind of similar operations, and is somewhat similar to the existing Unix concept of writev(), and it’s iovec structure – just way more generic and flexible.

As a bonus point for more flexibility, one could either implement it so, that the syscalls are executed one-by-one until the first error, or optional –for a little more flexibility– introduce flags / tags of groups to execute until and error occurs or not. E.g. the operations might be related to two different file descriptors, so the other tagged group could still be executed, even when one of them failed. Obviously error checking for the calling application becomes a multi return value operations, instead of just a single if (!read/write/…) {} block.