CS5460: Operating Systems

Transcription

CS5460: Operating Systems
Lecture 19: File System
Implementation (Ch. 11-12)
CS 5460: Operating Systems
File Allocation Strategies
 
Contiguous allocation
–  Files allocated (only) in contiguous blocks on disk
–  Analogous to base-and-bounds memory management
 
Linked file allocation
–  Maintain a linked list of blocks used to contain file
–  At end of each block, add a (hidden) pointer to the next block
 
Indexed file allocation
–  Maintain array of block numbers in inode
 
Multi-level indexed file allocation
–  Maintain array of block numbers in inode
–  Maintain pointers to blocks full of more block numbers in inode
(indirect blocks, double-indirect blocks, …)
 
Extents, and muti-level extents
Multi-level Indexed File Allocation
 
Inode contains:
inode
–  Fixed-size array of direct blocks
–  Small array of indirect blocks
–  (Optional) double/triple indirect
 
Good points
indirect
block
–  Simple offsetàblock computation for sequential or random access
–  Allows incremental growth/shrinkage
–  Fixed size (small) inodes
–  Very fast access to (common) small files
 
Bad points
–  Indirection adds overhead to random access to large files
–  Blocks can be spread all over disk à more seeks
Multi-level Indexed File Allocation
 
Example: 4.3 BSD file system
–  Inode contains 12 direct block addresses
–  Inode contains 1 indirect block address
–  Inode contains 1 double-indirect block address
 
If block addrs are 4-bytes and blocks are 1024-bytes,
what is maximum file size?
–  Number of block addrs per block = 1024/4 = 256
–  Number of blocks mapped by direct blocks à 12
–  Number of blocks mapped by indirect block à 256
–  Number of blocks mapped by double-indirect block à 2562 = 65536
–  Max file size: (12 + 256 + 65536) * 1024 = 66MB (67,383,296 bytes)
 
Modern file systems have triple-indirect blocks
Links
 
Links let us have multiple
names to same file
 
Hard links:
ln /foo/bar /tmp/moo
/foo directory
bar
/tmp directory
inode#
moo
inode#
–  Two entries point to same inode
–  Link count tracks connections
2
»  Decrement link count on delete
»  Only delete file when last
connection is deleted
–  Problems: loops, unreachable
directories, unreachable files
 
inode
/foo directory
/tmp directory
Soft links:
–  Adds symbolic pointer to file
bar
inode#
moo
/foo/bar
–  Special flag in directory entry
–  Only one real link to file
»  File goes away when its deleted
–  Problems: Infinite loops
1
inode
ln –s /foo/bar /tmp/moo
Mounting a Filesystem
 
Locate superblock(s)
 
Read file system format information
 
Initialize inode cache
 
Initialize buffer cache
 
Initialize name cache
 
Optional: perform sanity checks (more detail later)
–  UNIX / Linux / Mac OS X: fsck
–  Windows: ScanDisk / CHKDSK
File System Optimizations
Modern
Historic
 
Technique
Effect
Disk buffer cache
Eliminates problem
Aggregated disk I/O
Reduces seeks
Prefetching
Overlap/hide disk access
Disk head scheduling
Reduces seeks
Disk interleaving
Reduces rotational latency
Goal: Reduce or hide expensive disk operations
Buffer/Page Cache
 
Idea: Keep recently used disk blocks in kernel
memory
 
Process reads from a file:
–  If blocks are not in buffer cache
»  Allocate space in buffer cache
 
Q: What do we purge and how?
»  Initiate a disk read
»  Block the process until disk operations complete
–  Copy data from buffer cache to process memory
–  Finally, system call returns
 
Usually, a process does not see the buffer cache
directly
 
mmap() maps buffer cache pages into process RAM
Buffer/Page Cache
 
Process writes to a file:
–  If blocks are not in the buffer cache
»  Allocate pages
»  Initiate disk read
»  Block process until disk operations complete
–  Copy written data from process RAM to buffer cache
 
Default: writes create dirty pages in the cache, then
the system call returns
–  Data gets written to device in the background
–  What if the file is unlinked before it goes to disk?
 
Optional: Synchronous writes which go to disk
before the system call returns
–  Really slow!
Performing Large File I/Os
 
Idea: Try to allocate contiguous chunks of file in
large contiguous regions of the disk
–  Disks have excellent bandwidth, but poor latency
–  Amortize expensive seeks over many block read/writes
 
Question: How?
–  Maintain free block bitmap (cache parts in memory)
–  When you allocate blocks, use a modified best fit algorithm,
rather than allocating a block at a time (pre-allocate even)
 
Problem: Hard to do this when disk full/fragmented
–  Solution A: Keep a reserve (e.g., 10%) available at all times
–  Solution B: Run a disk defragger occasionally
Prefetching
 
Idea: Read blocks from disk ahead of user request
 
Goal: Reduce number of seeks visible to user
–  If block read before request à hits in file buffer cache
User
Read 0
File System
Read 0
Read 1
Read 1
Read 2
 
Read 2
Problem: What blocks should we prefetch?
–  Easy: Detect sequential access and prefetch ahead N blocks
–  Harder: Detect periodic/predictable random accesses
Disk Scheduling
 
Idea: Permute order of disk requests to reduce seeks
 
Some policies:
 
–  First-come, first-served
–  SCAN (0 à 100, 100 à 0, 0 à 100, …)
–  Shortest seek time first
–  C-SCAN (0 à 100, 0 à 100, …)
Example: head @ 30, requests 61, 40, 18, 78
–  FCFS: 30 à 61 à 40 à 18 à 78 = 31 + 21 + 32 + 60 à 134 tracks
»  Discussion: Lots of unnecessary seeks under load
–  SSTF: 30 à 40 à 61 à 78 à 18 = 10 + 21 + 17 + 60 à 108 tracks
»  Discussion : Starvation (How?), high variance
–  SCAN: 30 à 18 à 40 à 61 à 78 = 12 + 22 + 21 + 17 à 72 tracks
»  Discussion : Handles heavy load well, but not in middle enough
–  C-SCAN: 30 à 18 à 78 à 61 à 40 = 12 + 60 + 17 + 21 à 110 tracks
»  Discussion: Elevator-like, similar to SCAN, but more fair
 
Disk scheduling used to be done by the OS
–  Disks lacked onboard processing power to do this
–  There were relatively few disk models so it wasn’t too hard for
OSes to understand disks geometry
 
Now, disk scheduling is done on the disk
–  Disk hardware and firmware has gotten quite complicated
–  OS gives the disk a batch of request which complete in an order
chosen by the disk, unless the OS forces sequential accesses
Example Operation: Open /tmp/foo
 
Open `/tmp/foo
1.  check to see if we already know what inode for:
»  /tmp/foo (goto `B')
»  /tmp (goto `A')
2.  check to see if root inode is in inode cache (else read root inode)
3.  check permissions for user on root directory
4.  determine location of blocks containing root directory
5.  check to see if each block is in buffer cache
6.  load ones that are not and place in cache
7.  search root directory for entry matching `tmp' and extract inode number
8.  [A] check to see if inode is in inode cache (else read it)
9.  check permissions for user on file /tmp
10. determine location of blocks containing /tmp directory
11. check to see if each block is in buffer cache
12. load ones that are not and place in cache
13. search directory for entry matching `foo' and extract inode number
Example Operation: Open /tmp/foo
 
Open /tmp/foo (cont d)
14.  [B] check to see if inode is in inode cache (else read it)
15.  check permissions for user on file /tmp/foo
16.  use inode number to determine if /tmp/foo is already open (i.e., has an
entry in the open file table):
» 
if not, allocate an entry in the open file table, mark it as being for that inodenumber, add a link to the now in-core inode for /tmp/foo
17.  find free slot in per-process open file table
» 
return error if no space
» 
else, initialize entry, add link to appropriate entry in system-wide open file
table
18.  initialize entry
19.  return index of entry in open file table to user as fd
Example: Seek to offset 10,000
 
fseek(fd, 10000, …)
1.  Check that fd is a valid open file (return error if not)
2.  Update seek_offset in process open file table
3.  Return
–  Optimization: Initiate prefetch at new file offset à why?
Example: Read 1000 bytes
 
fread( fd, buffer, 1000, …)
1.  Check that fd is a valid open file (return error if not)
2.  Check that [buffer, buffer+1000] is valid user buffer
3.  Determine which disk block(s) are already in buffer cache
4.  If any blocks not in buffer cache:
1.  Determine disk addresses of block(s) that need to be read à how?
2.  Initiate disk read operations to read necessary block(s)
3.  Put process on disk queue awaiting block read completion
5.  Copy requested data from buffer cache to user buffer
6.  Return
Exercises
 
fwrite(fd, buffer, 4096, …)
 
fclose(fd)
 
rename(oldname, newname) à tricky!
 
unlink( /tmp/foo ) à delete file
 
link(existing, new) à create hard link
 
symlink (existing, new) à create soft link
Questions?

CS5460: Operating Systems

Transcription

Similar documents

Case study: ext2 FS

How to Manually Regenerate or Cycle Your Kinetico Softener

Kumihimo on a Plate

Cleaning House - Linux Magazine

Imaging degenerative disk disease in the lumbar spine

tilera

QuestBusters - Museum of Computer Adventure Game History

ShoreMT

Iowa IDEA Supported Browsers and Settings

Geocaching Presentation

Digital Forensics Tutorials – Hashing

Block 2 - Ganpat.Yolasite.Com

Content Management in Planet-Scale Video CDNs by Kianoosh

Veritas File System Administrator`s Guide Linux