Processes

This lecture is about the first process in xv6.

Processes

Recall the goals of processes:
- Give each process a private memory area for code, data, stack.
- Prevent one process from reading/writing outside its address space.
- Allow sharing when needed.
- Bring a program to live
Usually the implementation is split between the O/S and the hardware.
The O/S manages processes:
- Allocate physical memory for them (for creation, growth, deletion).
- Keep track of them when they are not executing.
- Switch between them (to switch processes).
- Configure the hardware.
The hardware performs address translation and protection:
- Translate user addresses to physical addresses.
- Detect and prevent attempts to use memory outside the address space.
- Allow cross-space transfers (system calls, interrupts).
Also:
- O/S has its own address space.
- O/S must be able to conveniently read/write user memory.
Hardware support may or may not correspond well to what the O/S wants.
Two main approaches: segments and page tables. Paging has won: most O/S are designed for paging, most modern CPU designs support only paging. BUT x86 provides many features only via segmentation h/w (interrupts, protection), so we must learn about x86 segments, a little bit. Also xv6 uses segments, not paging.

Example hardware for address spaces: x86 segments

The operating system can switch the x86 to protected mode, which supports virtual and physical addresses, and allows the O/S to set up address spaces so that user processes can't change them. Translation in protected mode is as follows:

selector:offset (virtual / logical addr)
==SEGMENTATION==>
linear address
==PAGING ==>
physical address

Next lecture covers paging; now we focus on segmentation.

Protected-mode segmentation works as follows (see handout):

segment register holds segment selector
selector: 13 bits of index, local vs global flag, 2-bit RPL
selector indexes into global descriptor table (GDT)
segment descriptor holds 32-bit base, limit, type, protection
la = va + base ; assert(va < limit);
choice of seg register usually implicit in instruction
- ESP uses SS, EIP uses CS, others (mostly) use DS
- some instructions can take far addresses:
  - ljmp $selector, $offset
GDT lives in memory, CPU's GDTR register points to base of GDT
LGDT instruction loads GDTR
you turn on protected mode by setting PE bit in CR0 register
What about protection?
- instructions can only r/w/x memory reachable through seg regs
- not before base, not after limit
- can my program change a segment register? yes, but... to one of the permitted (accessible) descriptors in the GDT
- can my program re-load GDTR? no!
- how does h/w know if user or kernel?
- Current privilege level (CPL) is in the low 2 bits of CS
- CPL=0 is privileged O/S, CPL=3 is user
- why can't app modify the descriptors in the GDT? it's in memory...
- what about system calls? how do they transfer to kernel?
- app cannot just lower the CPL

Case study (xv6)

xv6 is a reimplementation of Unix 6th edition.

v6 is an early Unix operating system for DEC PDP11
- Thompson and Ritchie, 1976
- PDP11: 16 bit data and addresses, 18 bit physical addresses
- ancestor of Linux &c but much smaller
- recognizable: shell, multi-user, directories
- written in C
- Unix papers.
xv6 written for the OS course at MIT (6.828):
- even smaller than v6, maybe not useable as is
- preserves basic structure (processes, files, pipes, &c)
- you don't have to learn PDP11 and x86
- runs on multi-processor PCs.

Newer Unixs have inherited many of the conceptual ideas even though they added paging, networking, graphics, improve performance, etc.

You will need to read most of the source code multiple times. Your goal is to explain every line to yourself. The chapters published on the schedule page (by Cox, Kaashoek, and Morris) are helpful and also required reading before the lectures!

Overview of processes in xv6

In today's lecture we see how xv6 creates the kernel address spaces, and the first user process. A process consists of an address space and one thread of control (to run the program) in xv6. The kernel address space is the only address space with multiple threads of control. We will study context switching and process management in detail next weeks; creation of the first user process (init) will get you a first flavor.

The process chapter covers the material below in more detail.

xv6 uses only the segmentation hardware on the x86; it doesn't use paging. (In pintos you will use page-table hardware too, which we cover in next lecture.)

The kernel address space:

  the code segment runs from 0 to 2^32 and is mapped X and R
  the data segment runs from 0 to 2^32 but is mapped W (read and write).

Each process has an address space, laid out as follows starting at virtual address zero:
```
  text
  original data and bss
  fixed-size stack
  expandable heap
```
A process's code, data, and stack segments all map this virtual address space to the same range of linear addresses. That is, all three segments are the same.

The x86 designers probably had in mind more interesting uses of segments. What might they have been?

xv6 process structure

we're about to look at how the first XV6 process starts up
  it will run initcode.S, which does exec("/init")
  /init is a program that starts up a shell we can type to

what's the important state of an xv6 process?
  kernel proc[] table has an entry for each process
    p->mem points to user mem phys address
    p->kstack points to kern stack phys address
    struct context holds saved kernel registers
      EIP, ESP, EAX, &c
      for when a system call is waiting for input
  user half: user memory
    user process sees memory as starting at zero
    instructions, data, stack, expandable heap
  kernel half: kernel executing a system call for a process
    on the process's kernel stack

xv6 has two kinds of transitions
  trap + return: user->kernel, kernel->user
    system calls, interrupts, divide-by-zero, &c
    save user process state ... run in kernel ... restore state
  process switch: between kernel halves
    one process is waiting for input, run another
      or time-slicing between compute-bound processes
    save p1's kernel-half state ... restore p2's kernel-half state
  setting up first process involves manually initializing this state

saved state for trap
  during trap, the CPU:
    switches to process's kernel stack
    pushes SS, ESP, EFLAGS, CS, EIP onto kernel stack
    jumps into kernel
  kernel then pushes the other user registers
  this is struct trapframe
  trap return reverses this, resuming at saved user EIP
  for first process:
    manually set up these "saved" registers on the kernel stack
    EIP 0, ESP top of user memory, &c

saved state for process switch
  save registers (EIP, ESP, EAX, &c) in oldp->context
  restore registers from newp->context
  now we are at the EIP of newp, and using its kernel stack
  this is the only way xv6 switches among processes
    there is no direct user->user process switch
    instead, user TRAP kernel PROCESS-SWITCH kernel TRAP-RETURN user
  for first process:
    manually set up EIP and ESP to run forkret, which returns from trap

Since an xv6 process's address space is essentially a single segment, a process's physical memory must be contiguous. So xv6 may run into fragmentation if process sizes are a significant fraction of physical memory.

xv6 kernel address space

Let's see how xv6 creates the kernel address space by tracing xv6 from when it boots, focusing on address space management.

Start with ksegment(), which is called from main()
look at GDT after lgdt with info gdt
How are text, data, and stack mapped?
What do the protection bits mean?
How does address translation change right after lgdt completes? (what happened during boot?)

main() calls userinit() to create first process

then scheduler() to start running processes

creating the first process

userinit()
What is in binary_initcode?
How does the user stack look like?
allocproc()
how does the kernel stack look for the new process? what is on it?

running the first process

remember that main calls scheduler() after userinit()
we'll see how scheduler() works in a few lectures, overview now
usegment---how is the address for the user space setup? protection?
swtch
forkret
Which stack is the kernel using now?
What is on this stack?
trapret?
What is on the stack?
In which address space is processor after executing iret?
What is on the stack in that address space?

Timeline of a CPU: proc-switch-scheduler-switch-proc2-...

swtch called by interrupt handler on behalf of the current running process. iret may be called on the behalf of another process

All switched-out processes (state=RUNNABLE) will have eip value somewhere in the swtch function, unless they are newly created in which case they will have eip pointing to forkret. When is forkret called? (only on new process creation). Why does it need to release the ptable lock? (how does it get released on a regular context switch?)

Each process has two halves: user and kernel. Transitions from user to kernel: interrupts, exceptions, int instruction. Transitions from kernel to user: iret instruction.

Process switch: User->Kernel->Kernel2->User2. The switching between user and kernel involves switching the registers and the address space. More on address space switching later.

trapret uses iret instruction to return to user mode.

Question: Why can't we release the ptable lock in allocproc itself? Answer: because we are not done with the PCB structure or the PCBs themselves. We still need to keep the lock till the process has actually started running.

main topic: virtual memory (VM)
  how the kernel controls what memory addresses mean
  traditional basis for memory multiplexing and protection
  source of many neat tricks (later)
  you'll play with VM a lot in the labs
    though JOS uses VM quite differently than xv6
  
state of play
  we were powering on ("booting") the machine
  nothing in RAM, need to read kernel from disk
  BIOS copied first ("boot") sector to 0x7c00 in mem
  boot sector = bootasm.S + bootmain.c
  executing at end of bootasm.S
  stack below 0x7c00, so we can call into C
  call bootmain

bootmain (sheet 83)
  two jobs:
  1. copy kernel from disk to mem
  2. jump to kernel's first instruction

where is the kernel on disk?
  starts at 2nd sector of disk 0, just after boot sector
  [diagram of disk]
  *not* in a file; written directly to the raw disk

what format?
  the result of running the compiler and linker on xv6 source
  they produce instructions+data, in machine format
  linker writes a file in ELF format (same as the JOS kernel)
  ELF headers indicate offset and size of data
    and the start address!

to where in memory should bootmem copy the kernel?
  where to store it -- in DRAM -- physical memory
  bootmain loads kernel starting at 0x100000
  [diagram of phys mem]

why 0x100000?
  that's 1 MB
  why not 0x0?
    there are memory-mapped devices at phys addrs < 1 MB
  why not somewhere else, e.g. at 2 MB?
    that would work too
    it actually doesn't matter much where bootmain loads the kernel!
    b/c kernel will use VM hardware to put itself at predictable virt addr
  only real constraints on where bootmain loads the kernel:
    1. there must be DRAM at that phys addr
    2. kernel must be able to find itself after it starts
  0x100000 satisfies both those constraints
  
why doesn't 0x100000 appear in bootmain.c?
  it's actually in the ELF header
  the linker put it there, in response to a config option
  % i386-jos-elf-objdump -p kernel

to where should bootmain jump?
  where is the kernel's first instruction?
  not actually 0x100000!
  ELF header specifies start address, for flexibility
  % i386-jos-elf-objdump -f kernel
  0x10000c -- this is "start", in entry.S, sheet 10
  we told the linker to put 0x10000c in the ELF header

let's skip details of bootmain.c and enter the kernel
  (gdb) b *0x10000c
  sheet 10, entry.S

kernel is about to start
  it must set up the hardware
  and set up its own sub-systems (processes, files, &c)
  let's start with how xv6 configures the MMU
    i.e. how xv6 manages memory and addressing
    one of the most interesting parts of kernel design

how is addressing currently configured?
  program addresses refer directly to physical memory
  MMU segments set for identity mapping
  MMU paging is disabled

are we going to be happy with this arrangement?
  for example, once xv6 starts running processes?
  no: protect kernel memory from user writes
  no: protect process A's memory from process B's writes
  no: give user code memory at predictable addresses
  no: let user code allocate big contiguous memory blocks
  there are many ways to solve these problems...

xv6 uses x86 MMU's paging hardware to fix these problems
  paging provides a level of indirection for addressing
  s/w can only ld/st to virtual addresses, not physical
  kernel tells MMU how to map each virtual address to a physical address
    MMU essentially has a table, indexed by va, yielding pa
    called a "page table"
  MMU can restrict what virtual addresses user code can use

big picture of xv6's virtual addressing scheme
  [diagram]
  0x00000000:0x80000000 -- user addresses below KERNBASE
  0x80000000:0x80100000 -- map low 1MB devices (for kernel)
  0x80100000:?          -- kernel instructions/data
  ?         :0x8E000000 -- 224 MB of DRAM mapped here
  0xFE000000:0x00000000 -- more memory-mapped devices

where does the paging h/w map these regions, in phys mem?
  [diagram]
  note double-mapping of user pages

this is an "address space"
  each process has its own address space
    and its own page table
  all processes have the same kernel (high memory) mappings
  kernel switches page tables when switching processes

Q: what is good about this arrangement?
  user memory always starts at zero
    of course user va 0 maps to different pa for each process
  lots of room for user to grow contiguously
  terms: "low memory" and "high memory"
  both kernel and user mapped -- easy to switch for syscall, interrupt
  kernel mapped at same place for all processes
    eases switching between processes
  easy for kernel to r/w user memory
    using user addresses, e.g. sys call arguments
  easy for kernel to r/w physical memory
    pa x mapped at va x+0x80000000

Q: what's the largest process this scheme can accomodate?

Q: could we increase that by increasing/decreasing 0x80000000?

To think about: does the kernel have to map all of phys mem
  into its virtual address space?
  
reminder:
  we're about to execute first instruction in kernel
  paging hardware is disabled, so pa = va
  bootmain loaded kernel at 0x100000
  bootmain jumped to 0x10000c

what's going to happen when the kernel enables paging?
  i.e. the above virtual address translation comes into effect?
  presumably the enabling instruction has address 0x100xxx
  0x100xxx maps to user memory, not kernel instructions!

solution:
  temporary page table with kernel mapped at both 0x80100000 and 0x100000
  that's what the first few kernel instructions set up

entry, sheet 10:
  (gdb) si ...
  CR4_PSE
  entrypgdir -- sheet 13
  page directory
    MMU reads it to map va to pa
    page dir lives in RAM
    1024 32-bit PDEs (page directory entry)
    each PDE describes 4 MB of virtual address space
    MMU uses top 10 bits of va as index
      replaces top bits with top bits of PDE
      this is virtual to physical translation
    this is a simple pagetable in special 4MB mode, more complex later
  what's in entrypgdir?
    x/x 0x10a000
    x/x 0x10a000+4*(0x80000000>>22)
    [diagram]
  why V2P(entrypgdir)?
    i386-jos-elf-nm kernel | grep entrypgdir
    V2P subtracts 0x8000000 -- sheet 02
  what is %cr3?
  what is %cr0?
  CR0_PG
  before enabling:
    x/x 0x10a000
    x/x 0x8010a000
  after enabling:
    x/x 0x8010a000
  si to jmp *%eax
  info reg
  print main
  step into main

main
  long sequence of calls to initialize subsystems
  first -- call kvmalloc to generate *another* page table

why another page table?
  map kernel only once, leaving space for low user memory
  map more physical memory, not just first 4 MB
  use 4KB pages instead of 4 MB pages

4 KB pages
  entrypgdir was simple: array of 1024 PDEs, each mapping 4 MB
  but 4 MB is too large a page size!
    very wasteful if you have small processes
    xv6 programs are a few dozen kilobytes
    4 MB pages require allocating full 4 MB of phys mem
  solution: x86 MMU supports 4 KB pages

how much space required for a flat page table w/ 4 KB pages?
  there are a million of them on a 32-bit machine
  32 bits per entry, so 4 MB for a full page table
  that also wastes lots of memory for small programs!
    you only need mappings for a few hundred pages
    so the rest of the million entries would be there but not needed
   
detailed structure of x86 page table
  see handout
  page table translates va to pa
    replacing top 20 bits in va with a [different] pa
    leaving bottom 12 bits alone
  logical view:
    a table of 32-bit PTEs
    2^20 PTEs, one for each 4096 bytes of address
    each PTE holds 20-bit pa of page
      or is marked invalid
    flags in low 12 bits
  implementation view:
    split page table into 4096-byte chunks ("page table pages")
    diagram
    each holds 1024 32-bit PTEs
    omit parts you don't need
    how to know where the parts of the table are? and which are valid?
  page-directory page
    1024 32-bit entries (PDE)
    each entry holds pa of a page-table page
      with entries for a range of 1024 pages
      or is blank (invalid)
    really top 20 bits of pa of page-table page
      pages are aligned, so bottom 12 bits always zero
    PDE holds some flags in those 12 bits
  %cr3 holds phys addr of page directory

how does x86 paging hardware translate a va?
  need to find the right PTE
  top 10 bits index PD to get address of PT
  next 10 bits index PT to get PTE
  flags in PTE
    P, W, U
    xv6 uses U to forbid user from using anything above 640 K



the page table is stored in DRAM
  thus xv6 has to allocate physical memory for PD and PTs
  each PD and PT fits in a physical page (4096 bytes) of memory

back to vm.c (sheet 17)
  main calls kvmalloc
  kvmalloc calls setupkvm

setupkvm
  creates a page table
  install mappings the kernel will need
  will be called once for each newly created process

setupkvm
  must allocate PD, allocate some PTs, put mappings in PTEs
  alloc allocates a physical page for the PD
    returns that page's virtual address above 0x8000000
    so we'll have to call V2P before installing in cr3
  memset so that default is no translation (no P bit)
  a call to mappages for each entry in kmap[]

what is kmap[]?
  an entry for each region of kernel virtual address space
  same as address space setup from earlier in lecture
  0x80000000 -> 0x00000000 (memory-mapped devices)
  0x80100000 -> 0x00100000 (kernel instructions and data)
  data -> data-0x80000000  (phys mem after where kernel was loaded)
  DEVSPACE -> DEVSPACE     (more memory mapped devices)
  note no mappings below va 0x80000000 -- future user memory there

mappages (sheet 16)
  (called by setupkvm for each kernel mapping range)
  arguments are PD, va, size, pa, perm
  adds mappings from a range of va's to corresponding pa's
  rounds b/c other uses pass in non-page-aligned addresses
  for each page-aligned address in the range
  call walkpgdir to find address of PTE
    need the PTE's address (not just content) b/c we want to modify

diagram of PD &c, as following steps build it

walkpgdir
  mimics how the paging h/w finds the PTE for an address
  refer to the handout
  a little complex b/c xv6 allocates page-table pages lazily
    might not be present, might have to allocate
  PDX extracts top ten bits
  &pgdir[PDX(va)] is the address of the relevant PDE
  now *pde is the PDE
  if PTE_P
    the relevant page-table page already exists
    PTE_ADDR extracts the PPN from the PDE
    p2v() adds 0x80000000, since PTE holds physical address
  if not PTE_P
    alloc a page-table page
    fill in PDE with PPN -- thus v2p
  now the PTE we want is in the page-table page
    at offset PTX(va)
    which is 2nd 10 bits of va

back to mappages
  put the desired pa into the PTE
  mark PTE as valid w/ PTE_P
  repeat for all pages in the range