OS NOTES

Garbage Collection

reference counting

tracing

mark-and-sweep
mark-and-copy

ELF

portable format!
file header: info like 64 vs 32bit, target OS (Linux, FreeBSD, Solaris, System V, ...)
sections: actual data or metadata: .text, .data, .rdata
section header: refrences the sections. used by linker (and dynamic linker). describes sections
program header: refrences the sections. used by OS/loader. describes memory segments

a memory segment contains 1 or more sections

Program header:

describes runtime execution (memory segments)
tells the system how to create a process image
specifies memory segments (size in file and in memory, offset in file and in virtual memory)
etc...

Section header:

used by linker/dynamic-linker/relocation
refrences 0 or more sections

Loader

uses mmap to lazily load binary from disk and uses the page cache (see below)

binaries/libraries are shared between process automagically
unused parts of binaries/libraries are automatically flushed out of memory

Virtual Memory

MMAP

lazy by default
can be set to eager by passing POPULATE flag, or
using mlock (pages are guaranteed to always be resident in RAM)
a usecase: reading files... faster because it avoids many read syscalls

shared vs private:

shared: Share this mapping. Updates to the mapping are visible to
other processes mapping the same region, and (in the case of
file-backed mappings) are carried through to the underlying
file.
usecases: memorymapped I/O and IPC
private: Create a private copy-on-write mapping. Updates to the
mapping are not visible to other processes mapping the same
file, and are not carried through to the underlying file

anonymous vs file-backed:

anonymous is:

not backed by any file
its contents are initialized to zero
malloc uses anon mmap!

Page cache/buffer cache

see filesystem cache below!
swap has nothing to do with the page cache: program data saved to disk. Completely different!
non-dirty page cache pages are the first to go if system is tight on memory. They're just discarded since they're already on disk.
most file I/O is driven by virtual memory's page cache: try running free -h before and after running python -c "print 'a'*(4000*2**20)" > ~/f.bin
flush page cache: echo 1 > /proc/sys/vm/drop_caches

Swap

store processes data on disk
free -h

should be able to explain most of /proc/meminfo
https://github.com/torvalds/linux/blob/master/Documentation/filesystems/proc.txt

shared memory (shmem):

shared memory + tmpfs (df to see all tmpfs mounts)
shared column in free
tmpfs is basically ram disk

File systems (FS)

File:

regular file
special file
- block
- character
- named pipe (aka. FIFO file): in-memory like regular pipes, but have a file discriptor, so can be shared by processes
- socket
  - Unix Domain Socket
  - Network Socket (TCP/stream, UDP/datagram, RAW)

Some steps required to write data to end of some file:

find empty disk blocks and mark them as in use
associate these blocks with the file
adjust file size
actually copy the data to the blocks

at least 3 datastructures are needed:

for tracking free disk blocks
for tracking which data blocks belong to a file (Inode!)
the data blocks themselves

Inode (aka index inode)

An inode:

each inode is named and located by a number: ls -i
stores location of file's data blocks on disk
stores file metadata: permissions, various timestamps

ext4:

amount of inodes is determined at format time!
inode takes space on disk (256 bytes == 1/16th of a block)
tradoff: more inodes == more files, but less space for data blocks
default: 1 inode for every 16 KB of data blocks (configurable at format time)

How to retrieve (a specific) disk block given an inode and offset?

multilevel indexing:
inode points to data blocks and an "indirect block", indirect blocks points to more data blocks and an indirect block, so on. This works well (time & space) for large and small files.

Directory

contains (filename --> inode number) mappings (aka dentry??)

steps for open("/etc/f.txt"):

"/" probably has hardcoded inode num (2)
fetch inode #2, lookup inode num associated with "etc" in the contents: x
fetch inode #x, lookup inode num associated with "f.txt" in the contents: y
now we know the inode num for target file!

hard vs soft (symoblic) link

hard link:

filename to inode number (just like in directory)
doesn't work cross-filesystems: because the inode num we map to is on the same fs.
if original file is deleted or moved, hardlinks are unaffected and the file can still be accessed!

soft link:

name --> name mapping
works cross filesystems

Caching

  applications  
      |  
     vfs  
    /   \  
   fs1  fs2   
    \   /  
     disk

where to stick the cache?
above vfs:

sees file contents only
doesn't see fs metadata (e.g. inodes)
cache (aka pagecache) in /proc/meminfo, free, and vmstat

below actual filesystem:

sees disk blocks, i.e. file contents and metadata, but to avoid storing file contents twice (in buffer and in cache), linux only stores file contents in cache, while the buffer points to cache
buffers in /proc/meminfo, free, and vmstat

Caching policy:

write-through:
writes to cache immediately make it to disk: slow writes, fast reads, safe
write-back:
writes to cache make it to disk asynchronously: fast writes and reads, risky
sync/fsync flush cache to disk

Recovering from incosistencies:
(writing a single disk block is assumed to be atomic!)

fsck checks entire file system: slow!
journaling:
- record changes to be made in a journal (a circular log)
- check them off as writes make it to disk (aka checkpointing)
- Everything before the last checkpoint is assumed to have safely made it to disk
- Anything after the last ckeckpoint may/may not have made it. We only need to check entries since last checkpoint
- much faster!
- ext4 is a journaling fs

stracing `ls /home/sam`

stat("/home/sam") returns stats, like inode num (x)
fd = openat(x) opens directory using its inode num x
fstat(fd) get the same stats again, this time using the open fd (why??)
getdents(fd) returns dentries in directory: [(filename, inode), (filename, inode), ...]
write(1, "...") write out all filenames

Misc.

poll/select:

Need to read from (say) 3 file descriptors, each may block of it's not currently readable/writable (aka. ready) naively:

read(fd1)
read(fd2)
read(fd3)

can't block on more than one fd at once (assuming 1 thread). e.g. Might block on fd1 while fd2 is actually ready and a read wouldn't block.

Solution, nonblocking IO:
read(fd) returns error without blocking if fd currently not readable. Application needs to continuously poll, wasting cpu.
Solution, multiplexed IO (select/poll):
blocks on all fd's at once, then issue nonblocking read on the ready fd's:

select(fd1,fd2,...,timeout)

sleeps until an fd is ready, or a time out, then issue a read on the ready fd, which won't block.
in other words: blocking (if we choose to) select on multiple fd's at once, followed by nonblocking read.
note:

depending on the timeout, select can block forever, return immediately, or until the timeout
select/poll is O(n) in number of fds, epoll is more efficient!

level-triggred vs edge-triggred

level-triggered: get a list of every file descriptor you’re interested in that is readable
edge-triggered: get notifications every time a file descriptor becomes readable

Zombies

parent process terminates before child ---> child's parent becomes init
When a process terminates, it is not immediately removed from the system but parts of the it are kept resident in memory to allow the parent to inquire about its status upon terminating (aka wait)
once parent wait()ed on the child process, it's fully destroyed
zombie: process that terminated but hasn't yet been waited upon
init routinely waits on all of its children

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OS NOTES

Garbage Collection

reference counting

tracing

ELF

Program header:

Section header:

Loader

Virtual Memory

MMAP

shared vs private:

anonymous vs file-backed:

Page cache/buffer cache

Swap

File systems (FS)

Inode (aka index inode)

Directory

hard vs soft (symoblic) link

Caching

stracing `ls /home/sam`

Misc.

poll/select:

level-triggred vs edge-triggred

Zombies

FilesExpand file tree

OS.md

Latest commit

History

OS.md

File metadata and controls

OS NOTES

Garbage Collection

reference counting

tracing

ELF

Program header:

Section header:

Loader

Virtual Memory

MMAP

shared vs private:

anonymous vs file-backed:

Page cache/buffer cache

Swap

File systems (FS)

Inode (aka index inode)

Directory

hard vs soft (symoblic) link

Caching

stracing ls /home/sam

Misc.

poll/select:

level-triggred vs edge-triggred

Zombies

stracing `ls /home/sam`