Kernel Hacking

Ok, It is 6:00 AM again, but I feel great that I just complete this damn project. I want to write this blog since the pain still fresh.

About this project

This is project from my OS course, it requires me to compare SLAB and SLUB allocator in the kernel by measuring program execution time.
Do detailed kernel measurements, I will compare the performance of cache allocation, the total times in kernel, and the total time in user space. Particularly, clock() kind time measurement unit in the linux library is not precise enough to observe the difference between kernel execution.

Build Linux Kernel

  1. Use git to clone linux source code, in this project, I used 3.18.5. Proc filesystem create API changed in 3.18.5 then lower version.
  2. make menuconfig sometimes, libncurses library is missing causing an error. use apt-get install libncurses
  3. make -j9 -jx means use x jobs parallel compiling project. usually x = # of cores +1
  4. make modules build kernel modules, REMEMBER to make a snapshot after this step . The most annoying part of this project is once you install built kernel, and reboot it. It maybe crashes your system, which can’t boot up anymore. A snapshot give you change to regret what you have done. remember, snapshot only work before you copy the new kernel
  5. make modules_install this step copy the built kernel modules into the booting directory. if you execute this step before snapshotting, the snapshot is not working anymore.
  6. make install
  7. reboot

Approach of measurement

To measure the time elapse in the user space is quit easy.
To measure the time elapse in the kernel is tricky. How can we determine which block of code to measure? printk can show the result directly, but after measurement, How can we pass the result to user space? In the preview class, we learned proc filesystem to communicate between user space and kernel space. There we go. Proc could be a bridge between kernel and user code.

Kernel code

To make sure a measure result can dependent with the process and the thread.
task_struct data structure describes a process or task in the system.
so in my understanding, each thread or process has an unique task_struct, it’s created and destroyed until a process is terminated. and current can point to current process which is executing the system call. then we can store the measurement result in the task_struct. Therefore,

struct task_struct {
  ull_t time_start;
  ull_t time_end;
 }

Some members need to add into task_struct in kernel/sched.h
After we measure the time, we can use current->time_end = time_data; record into process.

DO NOT USE printk
There are huge numbers of process and cache allocation in the kernel code execution time. printk will cause extremely slow down at kernel performance. You won’t be able to boot into system for big chance.

Proc

create proc file, and output current->time_end - current->time_start when reading the file. proc is a virtual file system, so the return value of current is changed when different process access same /proc/slab file. which means when process 1009 access /proc/slab, it returns process 1009 allocator time, and when process 1008 access, it returns 1008’s.
In this case, when user code program read proc among threads.

User code

In the user code, program read proc file as a normal text file when an observed code is execute.

malloc(sizeof(int)*25);
cycles_count = read_slab();

Time precision

To compare the kernel time and user time, we should use same unit to measure both of code. get_cycles() is used in the kernel part, but this user code can’t recognize the linux include file. get_cycles() is defined as an inline function. it is quoted from linux kernel source code.

static inline cycles_t     get_cycles(void)    
{                                                                               
cycles_t eax, edx;
__asm__("cpuid": : : "eax", "ebx", "ecx", "edx");
__asm__("rdtsc": "=a" (eax), "=d" (edx));
return eax;
}

Where to measure.

Sleepy, so keep it short. I will finish this part soon

Apparently, the most difficult part is measuring kernel time and allocator time. I use strace to trace test program. malloc will point me to mmap. mmap maps files or devices into memory.

However, after trace all the APIs, I go to look at kmem_cache_alloc and kmem_cache_free in the mm/slab.c and mm/slub.c. These are true syscalls to manage cache.

In the kernel/fork.c, do_fork() I believe is starting point of a process created.

Thread Program(User code and proc module)

Linux kernel Repo