OS kernel basics
Transcription
OS kernel basics
OS kernel basics Michal Sojka1 Czech Technical University in Prague, Faculty of Electrical Engineering Email: [email protected] April 15, 2015 1 Based on exercises by Benjamin Engel from TU Dresden. M. Sojka A3B33OSD, task E2 April 15, 2015 1 / 19 Roadmap Last week System calls from user space perspective This week I Switching from kernel to user space I Kernel side of system calls I Virtual memory basics M. Sojka A3B33OSD, task E2 April 15, 2015 2 / 19 NOVA microhypervisor I Research project of TU Dresden (< 2012) and Intel Labs (≥ 2012). I http://hypervisor.org/, x86, GPL. I We will use a stripped down version of the microhypervisor (kernel). M. Sojka A3B33OSD, task E2 April 15, 2015 3 / 19 Getting started tar xf osd-e2.tar.gz cd osd-e2 make # Compile everything make run # Run it in Qemu emulator I Stdout will show the serial line of the emulated machine I user/ – user space code I kern/ – stripped down NOVA kernel M. Sojka A3B33OSD, task E2 April 15, 2015 4 / 19 Asignment overview BIOS Kernel boot usercode syscall_handler usercode iret CPU reset sysenter sysexit time Priviledged (kernel) mode User mode M. Sojka A3B33OSD, task E2 April 15, 2015 5 / 19 Booting and starting user space Booting in a nutshell 1. CPU reset, BIOS executes 2. Bootloader loads the kernel binary into memory 3. We use a small trick – the kernel image contains also user space code (user/usercode.c) 4. Kernel starts executing – initializes CPU ... 5. and paging (virtual memory) 6. Kernel prepares user space mapping – user space code expects itself loaded at address 0x2000. 7. User code is started (first switch from kernel to user mode) Physical memory M. Sojka A3B33OSD, task E2 April 15, 2015 6 / 19 Booting and starting user space Booting in a nutshell 1. CPU reset, BIOS executes 2. Bootloader loads the kernel binary into memory 3. We use a small trick – the kernel image contains also user space code (user/usercode.c) 4. Kernel starts executing – initializes CPU ... 5. and paging (virtual memory) 6. Kernel prepares user space mapping – user space code expects itself loaded at address 0x2000. 7. User code is started (first switch from kernel to user mode) Kernel code/data Physical memory kern/build/hypervisor M. Sojka A3B33OSD, task E2 April 15, 2015 6 / 19 Booting and starting user space Booting in a nutshell 1. CPU reset, BIOS executes 2. Bootloader loads the kernel binary into memory 3. We use a small trick – the kernel image contains also user space code (user/usercode.c) 4. Kernel starts executing – initializes CPU ... 5. and paging (virtual memory) 6. Kernel prepares user space mapping – user space code expects itself loaded at address 0x2000. 7. User code is started (first switch from kernel to user mode) Kernel code/data User code/data Physical memory kern/build/hypervisor M. Sojka A3B33OSD, task E2 April 15, 2015 6 / 19 Booting and starting user space Booting in a nutshell 1. CPU reset, BIOS executes 2. Bootloader loads the kernel binary into memory 3. We use a small trick – the kernel image contains also user space code (user/usercode.c) 4. Kernel starts executing – initializes CPU ... 5. and paging (virtual memory) 6. Kernel prepares user space mapping – user space code expects itself loaded at address 0x2000. 7. User code is started (first switch from kernel to user mode) Kernel code/data EIP User code/data Physical memory kern/build/hypervisor M. Sojka A3B33OSD, task E2 April 15, 2015 6 / 19 Booting and starting user space Booting in a nutshell 1. CPU reset, BIOS executes 2. Bootloader loads the kernel binary into memory 3. We use a small trick – the kernel image contains also user space code (user/usercode.c) 4. Kernel starts executing – initializes CPU ... 5. and paging (virtual memory) 6. Kernel prepares user space mapping – user space code expects itself loaded at address 0x2000. 7. User code is started (first switch from kernel to user mode) 3G (0xC0000000) 0 4G Virtual memory User space EIP Kernel space Kernel code/data User code/data Physical memory kern/build/hypervisor M. Sojka A3B33OSD, task E2 April 15, 2015 6 / 19 Booting and starting user space Booting in a nutshell 1. CPU reset, BIOS executes 2. Bootloader loads the kernel binary into memory 3. We use a small trick – the kernel image contains also user space code (user/usercode.c) 4. Kernel starts executing – initializes CPU ... 5. and paging (virtual memory) 6. Kernel prepares user space mapping – user space code expects itself loaded at address 0x2000. 7. User code is started (first switch from kernel to user mode) 0 3G (0xC0000000) 0x2000 4G Virtual memory User space EIP Kernel space Kernel code/data User code/data Physical memory User stack kern/build/hypervisor M. Sojka A3B33OSD, task E2 April 15, 2015 6 / 19 Booting and starting user space Booting in a nutshell 1. CPU reset, BIOS executes 2. Bootloader loads the kernel binary into memory 3. We use a small trick – the kernel image contains also user space code (user/usercode.c) 4. Kernel starts executing – initializes CPU ... 5. and paging (virtual memory) 6. Kernel prepares user space mapping – user space code expects itself loaded at address 0x2000. 7. User code is started (first switch from kernel to user mode) – YOUR TASK 0 3G (0xC0000000) 0x2000 4G Virtual memory User space EIP Kernel space Kernel code/data User code/data Physical memory User stack kern/build/hypervisor M. Sojka A3B33OSD, task E2 April 15, 2015 6 / 19 Prerequisites What you need to know? I NOVA is implemented in C++ (and assembler). I Each user “program” is represented by execution context data structure. I The first executed program is called root task. I How is the user program mapped into virtual memory, i.e. what are the virtual addressed of code, data and stack? I Intel Instruction Set Reference (link) M. Sojka A3B33OSD, task E2 April 15, 2015 7 / 19 Prerequisites Execution context I I In NOVA, execution context (Ec) represents a thread of execution (similar to tasks in other OSes). Data stored in the execution context: class Ec { void (*cont)(); Exc_regs regs; static Ec * current; }; I I I // Continuation address // Registers // Currently running Ec Ec::regs stores user space registers (i.e. syscall parameters) Ec::current is a (global) pointer to the currently executing Ec. First Ec is created in bootstrap(), init.cc: // Create a new Ec with Ec::root_invoke as entry point Ec::current = new Ec (Ec::root_invoke, addr); // Start executing the new "task" (in kernel space) Ec::current->make_current(); UNREACHED; // This is never executed. I Ec::root invoke is responsible for steps 6 and 7 from the “booting” slide. M. Sojka A3B33OSD, task E2 April 15, 2015 8 / 19 Prerequisites Root task I First user space “task” invoked by the kernel. I Similar to UNIX init process. I Our user space code expects the following memory layout (see user/linker.ld): 0x2000 0x3000 Text (code) Data Entry point I We also need stack – let’s put it before the code page. I First page is left “not present” to catch NULL pointer deference errors. M. Sojka A3B33OSD, task E2 April 15, 2015 9 / 19 Prerequisites Root task I First user space “task” invoked by the kernel. I Similar to UNIX init process. I Our user space code expects the following memory layout (see user/linker.ld): 0 0x2000 0x3000 Text (code) Data } Not Stack present Page Entry point I We also need stack – let’s put it before the code page. I First page is left “not present” to catch NULL pointer deference errors. M. Sojka A3B33OSD, task E2 April 15, 2015 9 / 19 Prerequisites I Symbol binary usercode bin start is the address where linker puts our usercode.bin. I Page table manipulation will be the topic of next week. 0 0x2000 0x3000 Not Stack present Text (code) Data } Mapping of root task memory Page Entry point void Ec::root_invoke() { // Allocate one page for stack void *stack = Kalloc::allocator.alloc_page(1); // Map the stack page at address 0x1000 Ptab::insert_mapping(1 * PAGE_SIZE, Kalloc::virt2phys(stack), Ptab::PRESENT | Ptab::RW | Ptab::USER); // Map our user space code at 0x2000 Ptab::insert_mapping(2 * PAGE_SIZE, Kalloc::virt2phys(&_binary_usercode_bin_start), Ptab::PRESENT | Ptab::USER); // Map our user space data at 0x3000 Ptab::insert_mapping(3 * PAGE_SIZE, Kalloc::virt2phys(&_binary_usercode_bin_start+PAGE_SIZE), Ptab::PRESENT | Ptab::RW | Ptab::USER); M. Sojka A3B33OSD, task E2 April 15, 2015 10 / 19 Switch to user space First switch to user space – your task I I I Not Stack present After mapping the memory to the Page right place, we can start executing the code. Use iret instruction to exit the kernel mode and continue in user mode. iret takes the operands from the stack! Prepare an array with 5 elements: I I I I I I 0x2000 0x3000 Text (code) Data } I 0 Entry point 0x2000: user instruction pointer to return to SEL USER CODE: new CS ESP (include/selectors.h) 0x200: EFLAGS – just set interrupt enabled flag 0x2000: new stack pointer SEL USER DATA: new SS stack segment SS 16 ESP 12 EFLAGS 8 CS 4 EIP 0 Point ESP to the array and execute iret instruction. M. Sojka A3B33OSD, task E2 April 15, 2015 11 / 19 Switch to user space In the user space I After successful exit to user space you should see: Ec::handle exc Page Fault (eip=0x2000 cr2=0x42) I This says that the instruction at address 0x2000 tried to access address 0x42 but no page was mapped there. I This is expected. See objdump -d user/usercode I The output will be: 00002000 <_start>: 2000: c6 05 42 00 00 00 12 2007: 0f 0b 2009: eb fe I movb ud2 jmp $0x12,0x42 2009 <_start+0x9> Now, we can look at NOVA’s system calls. M. Sojka A3B33OSD, task E2 April 15, 2015 12 / 19 Kernel side of system calls Kernel side of system calls I CPU initialization I Kernel entry code I Syscall handler I Kernel exit code M. Sojka A3B33OSD, task E2 April 15, 2015 13 / 19 Kernel side of system calls CPU initialization I Set Model-Specific Registers (MSR) to tell the CPU what to do when user space invokes the sysenter instruction (see init.cc, init()) Msr::write<mword>(Msr::IA32_SYSENTER_CS, SEL_KERN_CODE); Msr::write<mword>(Msr::IA32_SYSENTER_ESP, reinterpret_cast<mword>(&Tss::run.sp0)); Msr::write<mword>(Msr::IA32_SYSENTER_EIP, reinterpret_cast<mword>(&entry_sysenter)); I CS (code segment) register will be set to kernel code segment I I I Note that code segment descriptor determines the privilege level of executing code. ESP (stack pointer) will point to sp0 member of Tss::run global variable (see tss.h) EIP (instruction pointer) will be set to entry sysenter (see entry.S) M. Sojka A3B33OSD, task E2 April 15, 2015 14 / 19 Kernel side of system calls Syscall entry 1 2 3 4 5 6 7 entry_sysenter: cld pop %esp lea -44(%esp), %esp pusha mov $(KSTCK_ADDR + PAGE_SIZE), %esp jmp syscall_handler 3. Set ESP to the point behind address of Ec::current->regs (see Ec::make current() in ec.h). 4. Decrease ESP to skip 11 registers that are used only during exception handling (Exc regs) 5. Store 8 general purpose registers (syscall arguments) to Ec::current->regs 6. Set ESP to the top of kernel stack 7. Jump to Ec::syscall handler M. Sojka A3B33OSD, task E2 April 15, 2015 15 / 19 Kernel side of system calls Syscall implementation I Ec::syscall handler – A C++ function implementing the syscalls I Where do we get the number argument? void Ec::syscall_handler (uint8 number) { switch(number) { case 0: ... case 1: ... } ret_user_sysexit(); UNREACHED; // Tell the compiler to not generate // function epilog } M. Sojka A3B33OSD, task E2 April 15, 2015 16 / 19 Kernel side of system calls Returning to user space 1 2 3 4 5 6 7 8 9 void Ec::ret_user_sysexit() { asm volatile ("lea %0, %%esp;" "popa;" "sti;" "sysexit" : : "m" (current->regs) : "memory"); UNREACHED; } 3. Set ESP to point Ec::current->regs. 4. Restore 8 general purpose registers from there. 5. Enable interrupts. 6. Return to user space. M. Sojka A3B33OSD, task E2 April 15, 2015 17 / 19 Kernel side of system calls sysenter/sysexit I I I I Faster alternative to int 0x80 and iret. Does not use stack to store return address. sysexit sets EIP←EDX, ESP←ECX and decreases the privilege level. Therefore the user space syscall wrapper must be different from the “int 0x80” variant: unsigned syscall1 (unsigned w0) { asm volatile ( " mov %%esp,%%ecx;" " mov $1f,%%edx;" // set edx to the addr. of label 1: " sysenter;" "1:" // continue here after sysexit : "+a" (w0) : : "ecx", "edx", "memory"); return w0; } M. Sojka A3B33OSD, task E2 April 15, 2015 18 / 19 Assignment Assignment 1. In Ec::root invoke use iret to exit the kernel and run user space code (usercode.c). 2. In the kernel, implement “write” system call with the following prototype: void write(char *buf, int len) It sends len bytes pointed by buf to the serial line (printf() function in the kernel). 3. Implement “add” system call that adds two integer arguments, returns the result and prints it in ASCII to serial line. The prototype of the add system call will be: int add(int a, int b) 4. Invoke these two system calls from usercode.c. Use if to check that the result of add is correct. ABI: write – AL=1, EDI=buf, ESI=len; add – AL=2, EDI=a, ESI=b BIOS Kernel boot usercode syscall_handler usercode CPU reset M. Sojka iret sysenter A3B33OSD, task E2 sysexit Priviledged (kernel) mode time April 15, 2015 User mode 19 / 19