VO Betriebssysteme Windows

Transcription

VO Betriebssysteme Windows
VO Betriebssysteme
Windows - A Case Study
operating system:
software that controls the
operation of a computer and
directs the processing of
programs.
(Merriam-Webster OnLine)
Andreas Schabus
Academic Relations
Microsoft Österreich GmbH
[email protected]
http://blogs.msdn.com/msdnat
VO Einführung
Betriebssystemstruktur
Ein Betriebssystem besteht häufig aus folgenden
Komponenten:
‡ Prozessmanagement
‡ Hauptspeichermanagement
‡ Sekundärspeichermanagement
‡ Netzwerkmanagement
‡ Schutzmechanismus
‡ Kommando-Interpreter-System
Computer scientists should be able to map those
concepts to real systems
Different Implementations yield
to different Behavior
Understanding the implications
prevent problems
Agenda
¾ Why Engage in Operating Systems
¾ Windows Evolution
¾ Windows Architecture
¾ Memory Management Fundamentals
¾ Virtual Address Translation
Why Engage in OS?
v1
How to design an OS?
1.
2.
3.
Make hardware assumptions
Identify problems to be solved
Determine the architecture
²
Abstractions and layers
The higher an abstraction in the software stack the more it is
circumvented, ignored, or upcalled
higher
abstraction
abstraction
abstraction
abstraction
abstraction
OS ² Traditional Picture
USER
Apps
OS
Hardware
‡
‡
‡
‡
‡
‡
Hardware/OS is the platform
Goal is to sell (proprietary) hardware
3XUSRVHLVWRUXQVSHFLILF¶EXVLQHVV·DSSOLFDWLRQV
OS and other system software is incidental/subsidiary
Economics allows apps to target particular HW/OS
HW/OS often single-vendor (IBM mainframes)
² 0RGHODOVRXVHGE\%XUURXJKV'LJLWDO681+3$SSOH¬
‡ Revenues sweetened through service & education
OS ² Windows-like Ecosystem
USER
OS
Apps
PC Hardware
‡ OS is the platform, hardware is commoditized
‡ Goal is to grow the ecosystem
² MS sells OS & Apps, OEMs/IHVs sell HW, ISVs sell Apps
‡
‡
‡
‡
‡
Purpose is consumer experience, running business apps
OS is core
Apps target the OS
OEMs are the principle integrators
Service & education just more elements in ecosystem
Windows And Linux Evolution
Windows and Linux kernels are based on foundations
developed in the mid-1970s
1970
1980
1990
2000
1970
1980
1990
2000
(see http://www.levenez.com for diagrams showing history of Windows & Unix)
OS Design Environments
UNIX
(1970s)
NT
(1980/1990s)
??
(2000/2010s)
Address space
16b, swapping 32-bit, linear VM
64-bit, ??
CPU perf
KIPS
MIPS
GIPS
IRQL
Test&Set,
Comp&Swap
Transactional
memory?
Memory size
KBytes
MBytes
GBytes
Hard concurrency
none
SMP
High-Multicore
Mass storage
Kbytes,
slow seek
Mbytes,
slow seeks
GBytes, no seeks
TBytes
Client/server
Peer-to-peer
Synchr
Distrib. computing Tape
Old OS designs can (of course) be ported, but
How you would design an OS on blank paper?
How should the CPU system architecture evolve?
v1
What problems do we need to solve in
future operating systems?
Well-understood problems (sort-of)
‡ Processes
‡ Threads
‡ Virtual Memory
‡ Input/output
‡ Local file systems
‡ Client/server computing
‡ Virtual machines
‡ Network protocols (quasi-secure)
‡ Security technologies (ACLs, authentication)
v1
What problems do we need to solve in
future operating systems?
Difficult problems
‡ Security models
‡ Application model, System Extensibility
‡ Configuration/state management
‡ System Extensibility
‡ Compatibility/Fragility, Versioning
‡ Data Management
‡ Federated Computing
‡ Industrial Design
‡ Ecosystem
Agenda
¾ Why Engage in Operating Systems
¾ Windows Evolution
¾ Windows Architecture
¾ Memory Management Fundamentals
¾ Virtual Address Translation
Requirements and Design Goals
‡ Provide a true 32-bit, preemptive, reentrant, virtual memory
operating system
‡ Run and scale well on symmetric multiprocessing systems
‡ Be a great distributed computing platform (Client & Server)
‡ Run most existing 16-bit MS-DOS and Microsoft Windows 3.1
applications
‡ Meet government requirements for POSIX 1003.1 compliance
‡ Meet government and industry requirements for operating
system security
‡ Be easily adaptable to the global market by supporting
Unicode
Requirements and Design Goals
Extensibility
² Code must be able to grow and change as market requirements change.
Portability
² The system must be able to run on multiple hardware architectures and must
be able to move with relative ease to new ones as market demands dictate.
Reliability and Robustness
² Protection against internal malfunction and external tampering.
² Applications should not be able to harm the OS or other running applications.
Compatibility
² User interface and APIs should be compatible with older versions of Windows
as well as older operating systems such as MS-DOS.
² It should also interoperate well with UNIX, OS/2, and NetWare.
Performance
² Within the constraints of the other design goals, the system should be as fast
and responsive as possible on each hardware platform.
History of NT
‡ Team forms November 1988
‡ Developers from DEC and Microsoft
‡ Build from the ground up
² Advanced Commercial Operating System
² Designed for desktops and servers
² Secure, scalable SMP design
² All new code
‡ Initial effort targeted at Intel i860 code-named N10,
‡ hence the name NT which doubled as N-Ten and
‡ New Technology
Overview of Windows Architecture
‡ Heritage is RSX-11, VMS ² not UNIX
‡ Kernel-based, microkernel-like
² OS personalities in subsystems(i.e. for Posix, OS/2, Win32)
² Kernel focused on memory, processes, threads, IPC, I/O
² Kernel implementation organized around the object manager
² Win32 and other subsystems built on native NT APIs
² System functionality heavily based on client/server computing
‡ Primary supported programming interfaces: Win32 (and .NET)
‡ NT APIs
² Generally not documented (except for DDK)
² NT APIs are rich (many parameters)
‡ NTOS (kernel)
² Implements the NTAPI
² Drivers, file systems, protocol stacks not in NTOS
² Dynamic loading of drivers (.sys DLLs) is extension model
Windows Kernel Evolution
‡ Basic kernel architecture has remained stable while
system has evolved
² Windows 2000: major changes in I/O subsystem (plug & play,
power management, WDM), but rest similar to NT4
² Windows XP & Server 2003: modest upgrades as compared
to the changes from NT4 to Windows 2000
‡ Internal version numbers confirm this:
² Windows 2000 was 5.0
² Windows XP 32-bit and IA64 editions is 5.1
‡ So is XP Embedded
² Windows Server 2003 is 5.2
² Windows XP 64-bit Edition for x64 is also 5.2
‡ Based on the Windows Server 2003 SP1 kernel
² Windows Vista is 6.0
NT Timeline first 17 years
2/1989
Coding Begins
7/1993
NT 3.1
9/1994
NT 3.5
5/1995
NT 3.51
7/1996
NT 4.0
12/1999
NT 5.0 Windows 2000
8/2001
NT 5.1 Windows XP
3/2003
NT 5.2 Server 2003
8/2004
NT 5.2 Windows XP SP2
4/2005
NT 5.2 Windows XP 64 Bit Edition (& WS03SP1)
2006
NT 6.0 Windows Vista (client)
Agenda
¾ Why Engage in Operating Systems
¾ Windows Evolution
¾ Windows Architecture
¾ Memory Management Fundamentals
¾ Virtual Address Translation
Slides are based on materials of the Windows Operating System Internals Curriculum
Development Kit, developed by David A. Solomon and Mark E. Russinovich with Andreas Polze.
http://www.microsoft.com/resources/sharedsource/Licensing/WindowsAcademic.mspx
Simplified OS Architecture
System
support
processes
User
Mode
Service
processes
User
Environment
applications subsystems
Subsystem DLLs
Kernel
Mode
Executive
Kernel
Device drivers
Hardware Abstraction Layer (HAL)
Windowing
and graphics
Windows Kernel Attributes
Reentrant
² Kernel functions can be invoked by multiple threads
simultaneously
² No serialization of user threads when performing
system calls
Asynchronous
² I/O system works fully asynchronously
² $V\QFKURQRXV,2LPSURYHVDSSOLFDWLRQ·VWKURXJKSXW
² Synchronous wrapper functions provide ease-ofprogramming
Multiprocessor support
Microkernel OS?
Is Windows a microkernel-based OS?
² No ² not using the academic definition (OS
components and
drivers run in their own private address spaces,
layered on a primitive microkernel)
² All kernel components live in a common shared
address space
Why not pure microkernel?
² Performance ² separate address spaces would
mean context switching to call basic OS services
² Most other commercial OSs (Unix, VMS etc.) have
the same design
Microkernel OS?
But it does have some attributes of a
microkernel OS
² OS personalities running in user space as separate
processes
² Kernel-mode components don't reach into one
DQRWKHU·Vdata structures
‡ Use formal interfaces to pass parameters and access
and/or modify data structures
² 7KHUHIRUHWKHWHUP´PRGLILHGPLFURNHUQHOµ
Demo
User vs. Kernel Mode
Windows Architecture
Applications
Subsystem
servers
DLLs
System Services
Kernel32
Critical services
Login/GINA
User32 / GDI
ntdll / run-time library
User-mode
Kernel-mode
Trap interface / LPC
Security refmon
I/O Manager
Net devices
File filters
Net protocols
File systems
Net Interfaces
Volume mgrs
Device stacks
Virtual memory
Procs & threads
Win32 GUI
Filesys run-time
Scheduler
Cache mgr
Synchronization
Object Manager / Configuration Management (registry)
Kernel run-time / Hardware Adaptation Layer
HAL - Hardware Abstraction Layer
‡ 5HVSRQVLEOHIRUDVPDOOSDUWRI´KDUGZDUHDEVWUDFWLRQµ
² Components on the motherboard not handled by drivers
‡ System timers, Cache coherency, and flushing
‡ SMP support, Hardware interrupt priorities
‡ Subroutine library for the kernel & device drivers
² Isolates Kernel and Executive from platform-specific details
² Presents uniform model of I/O hardware interface to drivers
‡ Reduced role as of Windows 2000
² Bus support moved to bus drivers
² Majority of HALs are vendor-independent
‡ HAL also implements some functions that appear to be in
the Executive and Kernel
‡ Selected at installation time
² See \windows\repair\setup.log to find out which one
² Can select manually at boot time with /HAL= in boot.ini
‡ HAL kit
² Special kit only for vendors that must write custom HALs (requires
approval from Microsoft)
Kernel
‡ Lower layers of the operating system
² Implements processor-dependent functions (x86 vs.
Itanium etc.)
² Also implements many processor-independent functions
that are closely associated with processor-dependent
functions
‡ Main services
²
²
²
²
Thread waiting, scheduling & context switching
Exception and interrupt dispatching
Operating system synchronization primitives (different for MP vs. UP)
A few of these are exposed to user mode
‡ 1RWDFODVVLF´PLFURNHUQHOµ
² shares address space with rest of kernel-mode components
Executive
‡ Upper layer of the operating system
‡ 3URYLGHV´JHQHULFRSHUDWLQJV\VWHPµIXQFWLRQV´VHUYLFHVµ
²
²
²
²
²
²
²
²
²
²
Process Manager
Object Manager
Cache Manager
LPC (local procedure call) Facility
Configuration Manager
Memory Manager
Security Reference Monitor
I/O Manager
Power Manager
Plug-and-Play Manager
‡ Almost completely portable C code
‡ 5XQVLQNHUQHO´SULYLOHJHGµULQJPRGH
‡ Most interfaces to executive services not documented
NTDLL.DLL
Support library for use of subsystem DLLs:
‡ System service dispatch stubs to NT executive system
services
² NtCreateFile, NtSetEvent
² More than 200
² Most of them are accessible through Win32 Stubs call
service-dispatcher/kernel-mode service in NTOSKRNL.EXE
‡ Support functions used by subsystems
²
²
²
²
²
Image loader (Ldr...)
Heap manager
Win32 subsyst. Comm. func. (Csr...)
Runtime library func. (Rtl...)
User-mode asynch. procedure call (APC) dispatcher,
exception disp.
Device Drivers
‡ Loadable kernel modules
‡ 'RQ¶WPDQLSXODWHKDUGZDUHEXWFDOOSDUWVRI+$/
² Written in C/C++ typically
² Source code portable across CPU architectures
Types:
‡ Hardware device drivers: implement device/network I/O
‡ File system drivers: file I/O <-> device I/O
‡ Filter drivers: disk mirroring, encryption
‡ Network redirectors and servers: send/receive remote I/O
requests
I/O Objects
‡ Driver Object: represents loaded driver
² Creates device objects for the devices it manages
‡ Device Object: represents an instance of a device
² Can have names for direct access from applications
and other drivers
‡ File Object: represents open instance of a device
² Created by I/O Manager
² Process handle table entries for open files/devices
point at file objects
I/O Request Flow
Process
DeviceIoControl
User Mode
Kernel Mode
Dispatch
Table
NtDeviceIoControlFile
File
Device
Driver
Object
Object
Object
Handle
Table
IRP
DispatchDeviceControl( DeviceObject, Irp )
Driver Code
Driver Layering and Filtering
To divide functionality across
drivers, provide added value, etc.
Process
User Mode
² Only the lowest layer talks to the
I/O hardware
´)LOWHUGULYHUVµDWWDFKWKHLUGHYLFHV
to other devices
² They see all requests first and
can manipulate them
² Example filter drivers:
‡ File system filter driver
‡ Bus filter driver
Kernel Mode
System Services
File System
Driver
I/O Manager
Volume
Manager
Driver
Disk Driver
IRP
Vista I/O Enhancements
‡ I/O priorities: device drivers that use the I/O Manager
for device queues will prioritize IRPs
² Based on the priority of the issuing thread or the explicitly set
I/O priority
² Stored in IRP flags
² 6 priority levels (0-5)
‡ Cancellable synchronous I/O: synchronous I/O
RSHUDWLRQVLQFOXGLQJ´RSHQµFDQEHFDQFHOOHG
² Explorer hangs on network resources can be aborted
‡ I/O completion no longer requires Asynchronous
Procedure Calls
² Significant performance improvement on > 4-way systems
Security Reference Monitor
‡ Implements common object access model
shared by all kernel subsystems
² Exposes model for use by applications
‡ Performs object access checks (authorization),
manipulates privileges, and generates audit
messages
² Core function: SeAccessCheck (user-mode version:
AccessCheck)
Agenda
¾ Why Engage in Operating Systems
¾ Windows Evolution
¾ Windows Architecture
¾ Memory Management Fundamentals
¾ Virtual Address Translation
Slides are based on materials of the Windows Operating System Internals Curriculum
Development Kit, developed by David A. Solomon and Mark E. Russinovich with Andreas Polze.
http://www.microsoft.com/resources/sharedsource/Licensing/WindowsAcademic.mspx
Windows API Memory Management
Architecture
Windows Program
C library: malloc, free
Heap API:
‡ HeapCreate,HeapDestroy,
‡ HeapAlloc, HeapFree
Memory-Mapped Files API:
‡ CreateFileMapping,
‡ CreateViewOfFile
Virtual Memory API
Windows Kernel with
Virtual Memory Manager
Physical Memory
Disc &
File System
51
Windows Memory Management
Fundamentals
‡ Classical virtual memory management
²
²
²
²
Flat virtual address space per process
Private process address space
Global system address space
Per session address space
‡ Object based
² Section object and object-based security (ACLs...)
‡ Demand paged virtual memory
² Pages are read in on demand & written out when necessary
(to make room for other memory needs)
‡ Provides flat virtual address space
² 32-bit: 4 GB, 64-bit: 16 Exabyte's (theoretical)
Windows Memory Management
Fundamentals
‡ Lazy evaluation
² Sharing ² usage of prototype PTEs (page table entries)
² Extensive usage of copy-on-write
² ...whenever possible
‡ Shared memory with copy on write
‡ Mapped files (fundamental primitive)
² Provides basic support for file system cache manager
Memory Manager Components
‡ System services for allocating, deallocating, and managing
virtual memory
‡ A access fault trap handler for resolving hardware-detected
memory management exceptions and making virtual pages
resident on behalf of a process
‡ Six system threads
² Working set manager (priority 16) ² drives overall memory
management policies, such as working set trimming, aging, and
modified page writing
² Process/stack swapper (priority 23) -- performs both process
and kernel thread stack inswapping and outswapping
² Modified page writer (priority 17) ² writes dirty pages on the
modified list back to the appropriate paging files
² Mapped page writer (priority 17) ² writes dirty pages from
mapped files to disk
² Dereference segment thread (priority 18) is responsible for
cache and page file growth and shrinkage
² Zero page thread (priority 0) ² zeros out pages on the free list
Protecting Memory
Attribute
Description
PAGE_NOACCESS
Read/write/execute causes access violation
PAGE_READONLY
Write/execute causes access violation; read permitted
PAGE_READWRITE
Read/write accesses permitted
PAGE_EXECUTE
Any read/write causes access violation; execution of code is
permitted (relies on special processor support)
PAGE_EXECUTE_
READ
Read/execute access permitted (relies on special processor
support)
PAGE_EXECUTE_
READWRITE
All accesses permitted (relies on special processor support)
PAGE_WRITECOPY
Write access causes the system to give process a private copy of
this page; attempts to execute code cause access violation
PAGE_EXECUTE_
WRITECOPY
Write access causes creation of private copy of pg.
PAGE_GUARD
Any read/write attempt raises EXCEPTION_GUARD_PAGE and
turns off guard page status
55
Physical Memory Limits (in GB)
x86
x64 32-bit x64 64-bit
I64 64-bit
XP Home
4
4
n/a
n/a
XP Professional
4
4
16
n/a
Server 2003 Web
Edition
2
2
n/a
n/a
Server 2003
Standard
4
4
16
n/a
Server 2003
Enterprise
32
32
64
64
Server 2003
Datacenter
64
128
1024
1024
Agenda
¾ Why Engage in Operating Systems
¾ Windows Evolution
¾ Windows Architecture
¾ Memory Management Fundamentals
¾ Virtual Address Translation
Slides are based on materials of the Windows Operating System Internals Curriculum
Development Kit, developed by David A. Solomon and Mark E. Russinovich with Andreas Polze.
http://www.microsoft.com/resources/sharedsource/Licensing/WindowsAcademic.mspx
Virtual Memory
Concepts
‡ $SSOLFDWLRQDOZD\VUHIHUHQFHV´YLUWXDODGGUHVVHVµ
‡ Hardware and software translates, or maps, virtual
addresses to physical
‡ 1RWDOORIDQDSSOLFDWLRQ·VYLUWXDODGGUHVVVSDFHLVLQ
physical memory at one time...
² ...But hardware and software fool the application into
thinking that it is
² The rest is kept on disk, and is brought into physical memory
automatically as needed
Virtual address descriptors (VADs)
‡ Memory manager uses demand paging algorithm
‡ Lazy evaluation is also used to construct page tables
² Reserved vs. committed memory
² Even for committed memory, page table are constructed on
demand
‡ Memory manager maintains VAD structures to keep
track of reserved virtual addresses
² Self-balancing binary tree
‡ VAD store:
²
²
²
²
range of addresses being reserved;
whether range will be shared or private;
Whether child process can inherit contents of the range
Page protection applied to pages within the address range
Mapping Virtual to
Physical Pages
00000000
virtual
pages
Physical Memory
7FFFFFFF
80000000
C0000000
C1000000
FFFFFFFF
page
table
entries
‹Successive
page table entries
describe successive virtual pages,
SRLQWLQJWR³VFDWWHUHG´LHQRW
physically contiguous) physical
pages
Sample ² PDE Definition
typedef struct _HARDWARE_PTE_X86 {
ULONG Valid : 1;
ULONG Write : 1;
ULONG Owner : 1;
ULONG WriteThrough : 1;
ULONG CacheDisable : 1;
ULONG Accessed : 1;
ULONG Dirty : 1;
ULONG LargePage : 1;
ULONG Global : 1;
ULONG CopyOnWrite : 1; // software field
ULONG Prototype : 1;
ULONG reserved : 1;
// software field
// software field
ULONG PageFrameNumber : 20;
} HARDWARE_PTE_X86, *PHARDWARE_PTE_X86;
Introduction
85
Address Translation
Mapping virtual addresses to physical memory
user
‡ Mapping via page table entries
Virtual
pages
‡ Indirect relationship between virtual
pages and physical memory
Physical memory
31
system
22 21
10
user
system
x86:
Page table
entries
12 11
10
0
12
Page directory
index
Page table
index
Byte index
Shared and Private Pages
Process A
Process B
00000000
Physical
Memory
7FFFFFFF
80000000
C0000000
‡
C1000000
FFFFFFFF
For shared pages, multiple
SURFHVVHV·37(VSRLQWWR
same physical pages
32-bit x86 Address Space
Default
2 GB
User
process
space
2 GB
System
Space
3 GB user space
3 GB
User
process
space
1 GB
System Space
Increased Limits in
64-bit Windows
Itanium
x64
x86
User Address Space
7152 GB
8192 GB
2-3 GB
Page file limit
16 TB
16 TB
4095 MB
PAE: 16 TB
Max page file space
256 TB
256 TB
~64 GB
System PTE Space
128 GB
128 GB
1.2 GB
System Cache
1 TB
1 TB
960 MB
Paged pool
MB
128 GB
128 GB
470-650
Non-paged pool
128 GB
128 GB
256 MB
32-bit x86
Virtual Address Space
00000000
Unique per
process,
accessible in
user or
kernel mode
7FFFFFFF
80000000
Per process,
accessible only
in kernel mode
C0000000
System wide,
accessible only
in kernel mode
FFFFFFFF
2 GB per-process
Code: EXE/DLLs
Data: EXE/DLL
static storage, perthread user mode
stacks, process
heaps, etc.
Code:
NTOSKRNL,
HAL, drivers
Data: kernel
Process
stacks, page tables,
hyperspace
File system cache
Non-paged pool,
Paged pool
² Address space of one process is
not directly reachable from other
processes
2 GB system-wide
² The operating system is loaded
here, and appears in every
SURFHVV·VDGGUHVVVSDFH
² The operating system is not a
process (though there are
processes that do things for the
26PRUHRUOHVVLQ´EDFNJURXQGµ
3 GB user space option & Address
Windowing Extensions (AWE)
described later
Address Translation 32-bit Windows
Hardware Support Intel x86
‡ Intel x86 provides two levels of address translation
² Segmentation (mandatory, since 8086)
² Paging (optional, since 80386)
‡ Segmentation: first level of address translation
² Intel: logical address (selector:offset) to linear address (32 bits)
² Windows virtual address is Intel linear address (32 bits)
‡ Paging: second level of address translation
² Intel: linear address (32 bits) to physical address
² Windows: virtual address (32 bits) to physical address
² Physical address: 32 bits (4 GB) all Windows versions, 36 bits (64
GB) PAE
² Page size:
‡ 4 kb since 80386 (all Windows versions)
‡ 4 MB since Pentium Pro (supported in NT 4, Windows 2000/XP/2003)
Intel x86
Segmentation
Offset
Segment Selector
Intel
Logical
address
15
3
Index
2
TI=0
1
31
0
RPL
0
:
Intel
Linear
Addresses
Global Descriptor
Table (GDT)
Access
0xffffffff
Limit=0xfffff
Base Address = 0
Access
Limit=0xfffff
Base Address = 0
+
Windows Virtual
Addresses
0
Intel x86 Paging
Address Translation
Intel Linear
31
Address
22 21
10
12 11
10
Physical Address
0
Windows-PFN
Database
n
12
Windows Virtual
Address
operand
4Kb PDE
4Mb PDE
PTE
Page table
1024 entries
4kb page
frame
22 bit
offset
4MB page frame
4 Kb page
operand
4 Mb page
Page directory
1024x4byte entries
(one per process)
cr 3
Physical address
3
2
1
0
Physical Memory
Page tables are created on demand
Page Frame
Number Database
Interpreting a Virtual Address
x86 32-bit
31
22
0
Page table
selector
10 bits
x64 64-bit
21
12
Page table
entry selector
11
Byte within page
12 bits
10 bits
(48-ELWLQWRGD\·VSURFHVVRUV
47
39 38
30 29
21 20
12 11
0Page map level Page directory
Page table
Page table
Byte within page
pointer selector
4 selector
selector
entry selector
9 bits
9 bits
9 bits
9 bits
12 bits
Windows Virtual Memory Use
Performance Counters
Performance
Counter
System Variable
Description
Memory:
Committed Bytes
MmTotalCommitedPages
Amount of committed private
address space that has a backing
store
Memory:
Commit Limit
MmTotalCommit-Limit
Amount of memory (in bytes) that
can be committed without
increasing size of paging file
Memory: %Commited
Bytes in Use
MmTotalCommittedPages
/ MmTotalCommitLimit
Ratio of committed bytes to
commit limit
x86 Virtual Address Translation
PFN 0
31
0 Page table
selector
1
Page table
entry selector
2
Byte within page
3
4
CR3
5
physical
address
6
inde
x
index
physical
page
number
´SDJH
frame
QXPEHUµRU
´3)1µ
7
8
9
10
11
12
Page Directory
(one per process, 1024 entries)
Page Tables
(up to 512 per process,
plus up to 512 system-wide)
Physical Pages
(up to 2^20)
Virtual Address Translation
The hardware converts each valid virtual address to a
physical address
virtual address
Page
Directory
Virtual page number
Byte within page
Address translation (hardware)
Page
Tables
if page
not valid...
Translation
Lookaside
Buffer
Physical page number
a cache of recentlyused page table entries
Byte within page
physical address
page fault
(exception,
handled by
software)
System and
process-private page tables
PTE 0
PDE 0
private
PDE 511
PDE 512
Process 1
page tables
PDE n
Process 1
page directory
PDE 0
PTE 0
PDE 511
System
page tables
Sys PTE 0
Sys PTE n
PDE 512
PDE n
Process 2
page tables
Process 2
page directory
‡ On process creation, system space page directory entries point to
existing system page tables
‡ Not all processes have same view of system space (after allocation
of new page tables)
Page Table Entries
‡ Page tables are array of Page Table Entries (PTEs)
‡ Valid PTEs have two fields:
² Page Frame Number (PFN)
² Flags describing state and protection of the page
Reserved bits
are used only
when PTE is
not valid
31
12
Page frame number
U
P
Cw
Gi
L D
A
Cd
Wt
O
Res (writable on MP Systems)
Res
Res
Global
Res (large page if PDE)
Dirty
Accessed
Cache disabled
Write through
Owner
Write (writable on MP Systems)
0
valid
W V
PTE Status and Protection Bits
(Intel x86 only)
Name of Bit
Meaning on x86
Accessed
Page has been read
Cache disabled
Disables caching for that page
Dirty
Page has been written to
Global
Translation applies to all processes
DWUDQVODWLRQEXIIHUIOXVKZRQµWDIIHFWWKLV37(
Large page
Indicates that PDE maps a 4MB page (used to map kernel)
Owner
Indicates whether user-mode code can access the page of
whether the page is limited to kernel mode access
Valid
Indicates whether translation maps to page in phys. Mem.
Write through
Disables caching of writes; immediate flush to disk
Write
Uniproc: Indicates whether page is read/write or read-only;
Multiproc: ind. whether page is writeable/write bit in res. bit
Translation Look-Aside Buffer (TLB)
‡ Address translation requires two lookups:
² Find right table in page directory
² Find right entry in page table
‡ Most CPU cache address translations
² Array of associative memory: translation look-aside buffer (TLB)
² TLB: virtual-to-physical page mappings of most recently used pages
Virtual page #: 17
Simultaneous
read and compare
Virtual page #: 5
Page frame 290
Virtual page #: 64
Invalid
Virtual page #: 17
Page frame 1004
Virtual page #: 7
Invalid
Virtual page #: 65
Page frame 801
Page Fault Handling
‡ Reference to invalid page is called a page fault
‡ Kernel trap handler dispatches:
² Memory manager fault handler (MmAccessFault) called
² Runs in context of thread that incurred the fault
² Attempts to resolve the fault or
raises exception
‡ Page faults can be caused by variety of conditions
‡ Four basic kinds of invalid Page Table Entries (PTEs)
In-Paging I/O due to Access Faults
‡ Accessing a page that is not resident in memory but
on disk in page file/mapped file
² Allocate memory and read page from disk into working set
‡ Occurs when read operation must be issued to a file
to satisfy page fault
² Page tables are pageable -> additional page faults possible
‡ In-page I/O is synchronous
² Thread waits until I/O completes
² Not interruptible by asynchronous procedure calls
In-Paging I/O due to Access Faults
‡ During in-page I/O: faulting thread does not own critical
memory management synchronization objects
Other threads in process may issue VM functions, but:
² Another thread could have faulted same page: collided page fault
² Page could have been deleted (remapped) from virtual address
space
² Protection on page may have changed
² Fault could have been for prototype PTE and page that maps
prototype PTE could have been out of working set
Other reasons for access faults
‡ Accessing page that is on standby or modified list
² Transition the page to process or system working set
‡ Accessing page that has no committed storage
² Access violation
‡ Accessing kernel page from user-mode
² Access violation
‡ Writing to a read-only page
² Access violation
Reasons for access faults (contd.)
‡ Writing to a guard page
² Guard page violation (if a reference to a user-mode stack,
perform automatic stack expansion)
‡ Writing to a copy-on-write page
² Make process-private copy of page and replace original in
process or system working set
‡ Referencing a page in system space that is valid but not
in the process page directory
(if paged pool expanded after process directory was created)
² Copy page directory entry from master system page directory
structure and dismiss exception
‡ On a multiprocessor system: writing to valid page that
has not yet been written to
² Set dirty bit in PTE
Invalid PTEs and their structure
Page file: desired page resides in paging file
in-page operation is initiated
31
12 11 10 9
Page file offset
54
1 0
Page
Protection
File No 0
Transition
Prototype
Valid
Demand Zero: pager looks at zero page list;
if list is empty, pager takes list from standby list and
zeros it;
PTE format as shown above, but page file number and
offset are zeros
Invalid PTEs and their structure (contd.)
Transition: the desired page is in memory on either the standby,
modified, or modified-no-write list
² Page is removed from the list and added to working set
31
12 11 10 9
Page Frame Number
1
1
5 4
Protection
3
2
1
0
0
Transition
Prototype
Protection
Cache disable
Write through
Owner
Write
Valid
Unknown: the PTE is zero, or the page table does not yet exist
- examine virtual address space descriptors (VADs) to see whether this
virtual address has been reserved
- Build page tables to represent newly committed space
Understanding the implications
prevent problems
Windows ² A Case Study
¾ Why Engage in Operating Systems
¾ Windows Evolution
¾ Windows Architecture
¾ Memory Management Fundamentals
¾ Virtual Address Translation
´0RGHUQµ2SHUDWLQJ6\VWHPV
Unix
Multics
1960
1970
Windows (NT)
Linux
VMS
1980
Design parameters
² scarce resources
² benign environment
² knowledgeable and trained
users
1990
Design parameters?
malicious environment
Safe Micro-Kernel
(e.g. Singularity)
untrained users
Virtualization
Convergence DB & OS
Works Citied
(iStockphoto) http://www.istockphoto.com
(Merriam-:HEVWHU³0HUULDP-Webster OnLine 6HDUFK´
http://www.merriam-webster.com/dictionary/operating%20system
(Probe) Probe, Dave. ³0LFURVRIW$FDGHPLF'D\V7RURQWR´.
(Solomon) Solomon David A., Russinovich Mark E., Polze Andreas.
³:LQGRZV2SHUDWLQJ6\VWHP,QWHUQDOV&XUULFXOXP´
http://www.microsoft.com/resources/sharedsource/Licensing/WindowsAcademic.mspx
Additional Readings
(Russinovich 2005) Russinovich, Mark E., Solomon, David A.. Microsoft Windows Internals.
Redmond, WA: Microsoft Press, 2005.
(Zachary) Zachary, Pascal G.. Show Stopper!: The Breakneck Tace to Create Windows NT.
© 2007 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.