UNIT ONE: DEFINITIONS AND COMPUTER FUNDAMENTALS

Transcription

UNIT ONE: DEFINITIONS AND COMPUTER FUNDAMENTALS
UNIT ONE: DEFINITIONS AND COMPUTER FUNDAMENTALS
Systems programming
Development of computer software that is part of a computer operating system or other control
program, especially as used in computer networks. Systems programming covers data and program
management, including operating systems, control programs, network software, and database
management systems.
Computer Data
To help understand computers it is best to first learn about computer data. Computer data is
information required by the computer to be able to operate. It is used to:

Run programs - This is the actual executable program data that the computer will execute to run
the program such as Microsoft Word.

Store program or system configuration information.

Store information that the computer user needs such as text files or other files that are
associated with the program the computer user is running. A common example of a program the
computer user is running is the Microsoft Office suite of products which include Microsoft Word,
Microsoft Excel, and others. These programs are also known as applications.
Data Structure
Computer data is in what is called binary format. This means that it is always a 0 or a 1. It only
has these two states and must be in one of them.
There are several fundamental data units which include:

Bit - A data unit which must be in one of the two binary states described above. It is the smallest
data unit that exists.

Byte - 8 bits of data which has a possible value from 0 to 255.

Word - Two bytes or 16 bits of data with a possible unsigned value from 0 to 16535.
Computer Hardware
The term computer hardware refers to the various electronic components that are required for you
to use a computer along with the hardware components inside the computer case. As you know your
computer equipment is made of several common components. These include:

The main computer box.

A monitor - Looks like a television screen.

A keyboard.

A mouse.

Speakers.

An optional printer
The main computer box is the main component of the computer. It has computer hardware parts inside
that perform the following functions:

Temporary storage of information (known as data in more technical terms) - This function is
done by memory.

Permanent storage of information - This function is done by a hard disk, floppy disk, or CD ROM.

Manipulation or processing of data - Used to determine where data is stored and perform
calculations which support operations that the user is doing.

Interfacing to the outside components or to the outside world - This supports the ability for the
user to communicate with the computer and know how the computer is responding to
commands which are done primarily through the monitor, keyboard, and mouse along with their
interface components in the main computer box.

A power supply which provides the electrical power to the components in the computer box.
The Main Computer Architecture
The main computer box is made of several computer hardware components and subcomponents
which include:

The case - The outside component which provides protection for the parts inside and provides a
fan and power supply which are used to both cool the working parts inside and provide power to
them.

The motherboard - Hold the following computer hardware subcomponents:
o
Memory - Used to provide temporary storage of information as discussed earlier.
o
Microprocessor - Used to provide the processing of data function as discussed earlier.
o
Video interface card which is also called the video card - This card is an interface between
the computer monitor and the motherboard and its subcomponents such as the
microprocessor and memory. This card may be included as part of the motherboard or it
may plug into a card slot on the motherboard.
o
Sound card is an interface between the computer speakers and the motherboard and its
subcomponents such as the microprocessor and memory. This card may be included as
part of the motherboard or it may plug into a card slot on the motherboard.

One or more permanent storage devices some of which may be optional:
o
Hard disk - Most computers today have a hard disk (sometimes called hard drives) which
is the component most commonly used to provide permanent storage of data. Hard disks
are usually permanantly installed in a computer.
o
CD ROM drive or DVD drive - Used to provide permanant storage of data but this type of
drive is used to bring information into the computer more commonly than it is used to
store information from the computer. Sometimes this type of drive is used to back up
data from the hard drive so data is not lost if a hard drive breaks. A DVD drive holds more
data than a CD ROM drive and DVDs have enough storage capacity that they may be used
to play or store movies. The storage media, the CD ROM or DVD may be removed from
the computer.
o
Floppy Drive - A low capacity storage device which can be written to as easily as it is read.
The floppy disk may be easily removed from the computer. It is called a floppy because
the part of the media that holds the data is on a material that is not rigid but it is enclosed
in a more rigid case to give it durability.
There are also other minor computer hardware components inside the case which include cables which
may be used to hook other internal parts together along with connecting an interface to the case for
printers and other devices such as a high speed serial bus called USB. (A serial bus simply refers to the
fact that data is sent in a stream which is like sending one bit at a time.
The Case
The drawing below shows a typical case. It may help you understand where your connections for
your monitor, keyboard, mouse, and other devices are if you should need to hook them up. For more
specific information you should refer to your computer owner's manual.
Fig 1.1: A typical computer with tower case
The drawing below shows a typical layout of the components inside your computer case.
Fig 1.2: A diagram showing the inside layer of a computer
Software and Hardware
Hardware
The term hardware describes the physical parts of your computer which you can physically touch
or see such as your monitor, case, disk drives, microprocessor and other physical parts.
Software
The term software describes the programs that run on your system. This includes your computer
operating system and other computer programs which run. Software is written in a computer language
(such as Basic, C, Java, or others) by programmers. The computer language is in a text format and can be
read by a person although if you do not understand the structure and rules of the language you may not
understand it very well. Once a program is written, an operation is performed on it which is called
compiling. Compiling is the process of changing the textual written language into a binary language
which can be understood by the computer.
Writing these text files and converting them to computer readable files is the way operating systems and
most application programs are created.
BIOS (Basic Input/Output System)
BIOS is a low level program used by your system to interface to computer devices such as your
video card, keyboard, mouse, hard drive, and other devices. What BIOS programs provide in the
computer are very simple function calls or small subprograms which can be used by higher level
programs to perform simple operations on computer devices. For example a BIOS program would
provide the ability to write a character to memory on a video card.
BIO is normally written in a low level computer language and is permanently or semi-permanently
written into the computer system. This type of computer program is commonly referred to as firmware
since it was historically written permanently into computer systems. Although BIOS is a program,
because of its permanent state, it was not quite considered to be software so the term firmware is used
to describe it.
Historically BIOS programs were written into a type of memory called ROM (read only memory).
This type of memory would not lose its data when the computer lost power thus ensuring these BIOS
programs would always be available. There were different variants of ROM memory some of which
could be written multiple times but this memory could not normally be changed or re-programmed once
the computer system was sold to the customer. Once ROM memory was written to, it could not be
written to again and could only be read when in the possession of the customer. In more recent years a
more flexible form of memory was developed called flash ROM which allows ROM memory to be written
to after the computer system is in possession of the customer.
UNIT TWO: OPERATING SYSTEM
Definition: Operating System
The operating system is the core software component of your computer. It performs many
functions and is, in very basic terms, an interface between your computer and the outside world. In the
section about hardware, a computer is described as consisting of several component parts including
your monitor, keyboard, mouse, and other parts. The operating system provides an interface to these
parts using what is referred to as "drivers". This is why sometimes when you install a new printer or
other piece of hardware, your system will ask you to install more software called a driver.
An operating system (OS) is software, consisting of programs and data, that runs on computers,
manages computer hardware resources, and provides common services for execution of various
application software. The operating system is the most important type of system software in a computer
system. Without an operating system, a user cannot run an application program on their computer,
unless the application program is self booting.
For hardware functions such as input and output and memory allocation, the operating system
acts as an intermediary between application programs and the computer hardware, although the
application code is usually executed directly by the hardware and will frequently call the OS or be
interrupted by it. Operating systems are found on almost any device that contains a computer—from
cellular phones and video game consoles to supercomputers and web servers.
Examples of popular modern operating systems include Android, iOS, Linux, Mac OS X, and Microsoft
Windows.
Fig 2.1: A typical structure of an operating system
Function of a Driver
A driver is a specially written program which understands the operation of the device it
interfaces to, such as a printer, video card, sound card or CD ROM drive. It translates commands from
the operating system or user into commands understood by the component computer part it interfaces
with. It also translates responses from the component computer part back to responses that can be
understood by the operating system, application program, or user. The below diagram gives a graphical
depiction of the interfaces between the operating system and the computer component.
Fig 2.2: How an OS controls peripheral devices
Other Operating System Functions
The operating system provides for several other functions including:

System tools (programs) used to monitor computer performance, debug problems, or maintain
parts of the system.

A set of libraries or functions which programs may use to perform specific tasks especially
relating to interfacing with computer system components.
The operating system makes these interfacing functions along with its other functions operate smoothly
and these functions are mostly transparent to the user.
Types of Operating Systems
Real-time
A real-time operating system is a multitasking operating system that aims at executing real-time
applications. Real-time operating systems often use specialized scheduling algorithms so that
they can achieve a deterministic nature of behavior. The main objective of real-time operating
systems is their quick and predictable response to events. They have an event-driven or timesharing design and often aspects of both. An event-driven system switches between tasks based
on their priorities or external events while time-sharing operating systems switch tasks based on
clock interrupts.
Multi-user vs. Single-user
A multi-user operating system allows multiple users to access a computer system concurrently.
Time-sharing system can be classified as multi-user systems as they enable a multiple user access
to a computer through the sharing of time. Single-user operating systems, as opposed to a multi-
user operating system, are usable by a single user at a time. Being able to have multiple accounts
on a Windows operating system does not make it a multi-user system. Rather, only the network
administrator is the real user. But for a Unix-like operating system, it is possible for two users to
login at a time and this capability of the OS makes it a multi-user operating system.
Multi-tasking vs. Single-tasking
When a single program is allowed to run at a time, the system is grouped under a single-tasking
system, while in case the operating system allows the execution of multiple tasks at one time, it
is classified as a multi-tasking operating system. Multi-tasking can be of two types namely, preemptive or co-operative. In pre-emptive multitasking, the operating system slices the CPU time
and dedicates one slot to each of the programs. Unix-like operating systems such as Solaris and
Linux support pre-emptive multitasking. Cooperative multitasking is achieved by relying on each
process to give time to the other processes in a defined manner. MS Windows prior to Windows
95 used to support cooperative multitasking.
Distributed
A distributed operating system manages a group of independent computers and makes them
appear to be a single computer. The development of networked computers that could be linked
and communicate with each other, gave rise to distributed computing. Distributed computations
are carried out on more than one machine. When computers in a group work in cooperation,
they make a distributed system.
Embedded
Embedded operating systems are designed to be used in embedded computer systems. They are
designed to operate on small machines like PDAs with less autonomy. They are able to operate
with a limited number of resources. They are very compact and extremely efficient by design.
Windows CE and Minix 3 are some examples of embedded operating systems.
Evolutionary Summary
Early computers were built to perform a series of single tasks, like a calculator. Operating
systems did not exist in their modern and more complex forms until the early 1960s. Some operating
system features were developed in the 1950s, such as monitor programs that could automatically run
different application programs in succession to speed up processing. Hardware features were added that
enabled use of runtime libraries, interrupts, and parallel processing. When personal computers by
companies such as Apple Inc., Atari, IBM and Amiga became popular in the 1980s, vendors added
operating system features that had previously become widely used on mainframe and mini computers.
Later, many features such as graphical user interface were developed specifically for personal computer
operating systems.
An operating system consists of many parts. One of the most important components is the
kernel, which controls low-level processes that the average user usually cannot see: it controls how
memory is read and written, the order in which processes are executed, how information is received and
sent by devices like the monitor, keyboard and mouse, and decides how to interpret information
received from networks. The user interface is a component that interacts with the computer user
directly, allowing them to control and use programs. The user interface may be graphical with icons and
a desktop, or textual, with a command line. Application programming interfaces provide services and
code libraries that let applications developers write modular code reusing well defined programming
sequences in user space libraries or in the operating system itself. Which features are considered part of
the operating system is defined differently in various operating systems. For example, Microsoft
Windows considers its user interface to be part of the operating system, while many versions of Linux do
not.
Fig 2.3: A typical Mainframe Computer
The history also includes,
a. IBM mainframe operating systems:

Burroughs MCP – B5000, 1961 to Unisys Clearpath/MCP, present.

IBM OS/360 – IBM System/360, 1966 to IBM z/OS, present.

IBM CP-67 – IBM System/360, 1967 to IBM z/VM, present.

UNIVAC EXEC 8 – UNIVAC 1108, 1967, to OS 2200 Unisys Clearpath Dorado, present.
b. Microcomputers
PC-DOS was an early personal computer OS that featured a command line interface.
a.
b.
Fig 2.4 a and b: A screenshots of DOS and Command line envornments
a)
c)
b)
d)
Fig 2.5 a, b, c and d: A screenshots of connected system, MAC OS, Ubuntu Linux and
Google Chrome OS
Mac OS by Apple Computer became the first widespread OS to feature a graphical user interface. Many
of its features such as windows and icons would later become commonplace in GUIs.
The first microcomputers did not have the capacity or need for the elaborate operating systems
that had been developed for mainframes and minis; minimalistic operating systems were developed,
often loaded from ROM and known as Monitors. One notable early disk-based operating system was
CP/M, which was supported on many early microcomputers and was closely imitated in MS-DOS, which
became wildly popular as the operating system chosen for the IBM PC (IBM's version of it was called IBM
DOS or PC DOS), its successors making Microsoft. In the '80s Apple Computer Inc. (now Apple Inc.)
abandoned its popular Apple II series of microcomputers to introduce the Apple Macintosh computer
with an innovative Graphical User Interface (GUI) to the Mac OS.
The introduction of the Intel 80386 CPU chip with 32-bit architecture and paging capabilities,
provided personal computers with the ability to run multitasking operating systems like those of earlier
minicomputers and mainframes. Microsoft responded to this progress by hiring Dave Cutler, who had
developed the VMS operating system for Digital Equipment Corporation. He would lead the
development of the Windows NT operating system, which continues to serve as the basis for Microsoft's
operating systems line. Steve Jobs, a co-founder of Apple Inc., started NeXT Computer Inc., which
developed the Unix-like NEXTSTEP operating system. NEXTSTEP would later be acquired by Apple Inc.
and used, along with code from FreeBSD as the core of Mac OS X.
c. Others include a subgroup of the UNIX family, a Unix Berkeley Software Distribution family
(FreeBSD, NetBSD, and OpenBSD).
Google Chrome OS
Chrome is an operating system based on the Linux kernel and designed by Google. Since Chrome
OS targets computer users who spend most of their time on the Internet, it is mainly a web browser with
no ability to run applications. It relies on Internet applications (or Web apps) used in the web browser to
accomplish tasks such as word processing and media viewing, as well as online storage for storing most
files.
AmigaOS
Fig 2.6: A screenshot of AmigaOS 4.1 Update 2.
AmigaOS is the default native operating system of the Amiga personal computer. It was
developed first by Commodore International, and initially introduced in 1985 with the Amiga 1000. Early
versions (1.0-3.9) run on the Motorola 68k series of 16-bit and 32-bit microprocessors, while the newer
AmigaOS 4 runs only on PowerPC microprocessors. On top of a preemptive multitasking kernel called
Exec, it includes an abstraction of the Amiga's unique hardware, a disk operating system called
AmigaDOS, a windowing system API called Intuition and a graphical user interface called Workbench. A
command line interface called AmigaShell is also available and integrated into the system. The GUI and
the CLI complement each other and share the same privileges. The current holder of the Amiga
intellectual properties is Amiga Inc. They oversaw the development of AmigaOS 4 but did not develop it
themselves, contracting it instead to Hyperion Entertainment. On 20 December 2006, Amiga Inc
terminated Hyperion's license to continue development of AmigaOS 4. However, in 30 September 2009,
Hyperion was granted an exclusive, perpetual, worldwide right to AmigaOS 3.1 in order to use, develop,
modify, commercialize, distribute and market AmigaOS 4.x and subsequent versions of AmigaOS
(including AmigaOS 5).
Microsoft Windows
Fig 2.7: Windows 7, shown here, is the newest release of Windows.
Microsoft Windows is a family of proprietary operating systems designed by Microsoft Corporation
and primarily targeted to Intel architecture based computers, with an estimated 88.9 percent total
usage share on Web connected computers. The most common is the Microsoft suite of operating
systems. They include from most recent to the oldest:

Windows XP Professional Edition - A version used by many businesses on workstations. It has the
ability to become a member of a corporate domain.

Windows XP Home Edition - A lower cost version of Windows XP which is for home use only and
should not be used at a business.

Windows 2000 - A better version of the Windows NT operating system which works well both at
home and as a workstation at a business. It includes technologies which allow hardware to be
automatically detected and other enhancements over Windows NT.

Windows ME - A upgraded version from windows 98 but it has been historically plagued with
programming errors which may be frustrating for home users.

Windows 98 - This was produced in two main versions. The first Windows 98 version was plagued
with programming errors but the Windows 98 Second Edition which came out later was much
better with many errors resolved.

Windows NT - A version of Windows made specifically for businesses offering better control over
workstation capabilities to help network administrators.

Windows 95 - The first version of Windows after the older Windows 3.x versions offering a better
interface and better library functions for programs.
Components of Operating System
The components of an operating system all exist in order to make the different parts of a
computer work together. All software—from financial databases to film editors—needs to go through
the operating system in order to use any of the hardware, whether it be as simple as a mouse or
keyboard or complex as an Internet connection.
Kernel
Fig 2.8: A kernel connects the application software to the hardware of a computer.
With the aid of the firmware and device drivers, the kernel provides the most basic level of
control over all of the computer's hardware devices. It manages memory access for programs in the
RAM, it determines which programs get access to which hardware resources, it sets up or resets the
CPU's operating states for optimal operation at all times, and it organizes the data for long-term nonvolatile storage with file systems on such media as disks, tapes, flash memory, etc.
Program execution
The operating system provides an interface between an application program and the computer
hardware, so that an application program can interact with the hardware only by obeying rules and
procedures programmed into the operating system. The operating system is also a set of services which
simplify development and execution of application programs. Executing an application program involves
the creation of a process by the operating system kernel which assigns memory space and other
resources, establishes a priority for the process in multi-tasking systems, loads program binary code into
memory, and initiates execution of the application program which then interacts with the user and with
hardware devices.
Interrupts Handling
Interrupts are central to operating systems, as they provide an efficient way for the operating
system to interact with and react to its environment. The alternative — having the operating system
"watch" the various sources of input for events (polling) that require action — can be found in older
systems with very small stacks (50 or 60 bytes) but are unusual in modern systems with large stacks.
Interrupt-based programming is directly supported by most modern CPUs. Interrupts provide a
computer with a way of automatically saving local register contexts, and running specific code in
response to events. Even very basic computers support hardware interrupts, and allow the programmer
to specify code which may be run when that event takes place.
When an interrupt is received, the computer's hardware automatically suspends whatever
program is currently running, saves its status, and runs computer code previously associated with the
interrupt; this is analogous to placing a bookmark in a book in response to a phone call. In modern
operating systems, interrupts are handled by the operating system's kernel. Interrupts may come from
either the computer's hardware or from the running program.
When a hardware device triggers an interrupt, the operating system's kernel decides how to deal
with this event, generally by running some processing code. The amount of code being run depends on
the priority of the interrupt (for example: a person usually responds to a smoke detector alarm before
answering the phone). The processing of hardware interrupts is a task that is usually delegated to
software called device driver, which may be either part of the operating system's kernel, part of another
program, or both. Device drivers may then relay information to a running program by various means.
A program may also trigger an interrupt to the operating system. If a program wishes to access
hardware for example, it may interrupt the operating system's kernel, which causes control to be passed
back to the kernel. The kernel will then process the request. If a program wishes additional resources (or
wishes to shed resources) such as memory, it will trigger an interrupt to get the kernel's attention.
Modes Protection and Supervision
Fig 2.9: A description of kernel and protection modes
Privilege rings for the x86 available in protected mode. Operating systems determine which
processes run in each mode. Modern CPUs support multiple modes of operation. CPUs with this
capability use at least two modes: protected mode and supervisor mode. The supervisor mode is used
by the operating system's kernel for low level tasks that need unrestricted access to hardware, such as
controlling how memory is written and erased, and communication with devices like graphics cards.
Protected mode, in contrast, is used for almost everything else. Applications operate within protected
mode, and can only use hardware by communicating with the kernel, which controls everything in
supervisor mode. CPUs might have other modes similar to protected mode as well, such as the virtual
modes in order to emulate older processor types, such as 16-bit processors on a 32-bit one, or 32-bit
processors on a 64-bit one. When a computer first starts up, it is automatically running in supervisor
mode. The first few programs to run on the computer, being the BIOS or EFI, bootloader, and the
operating system have unlimited access to hardware - and this is required because, by definition,
initializing a protected environment can only be done outside of one. However, when the operating
system passes control to another program, it can place the CPU into protected mode.
In protected mode, programs may have access to a more limited set of the CPU's instructions. A user
program may leave protected mode only by triggering an interrupt, causing control to be passed back to
the kernel. In this way the operating system can maintain exclusive control over things like access to
hardware and memory.
The term "protected mode resource" generally refers to one or more CPU registers, which
contain information that the running program isn't allowed to alter. An attempt to alter these resources
generally causes a switch to supervisor mode, where the operating system can deal with the illegal
operation the program was attempting (for example, by killing the program).
Memory management
Among other things, a multiprogramming operating system kernel must be responsible for
managing all system memory which is currently in use by programs. This ensures that a program does
not interfere with memory already used by another program. Since programs time share, each program
must have independent access to memory.
Cooperative memory management, used by many early operating systems, assumes that all
programs make voluntary use of the kernel's memory manager, and do not exceed their allocated
memory. This system of memory management is almost never seen any more, since programs often
contain bugs which can cause them to exceed their allocated memory. If a program fails, it may cause
memory used by one or more other programs to be affected or overwritten. Malicious programs or
viruses may purposefully alter another program's memory, or may affect the operation of the operating
system itself. With cooperative memory management, it takes only one misbehaved program to crash
the system.
Memory protection enables the kernel to limit a process' access to the computer's memory.
Various methods of memory protection exist, including memory segmentation and paging. All methods
require some level of hardware support (such as the 80286 MMU), which doesn't exist in all computers.
In both segmentation and paging, certain protected mode registers specify to the CPU what memory
address it should allow a running program to access. Attempts to access other addresses will trigger an
interrupt which will cause the CPU to re-enter supervisor mode, placing the kernel in charge. This is
called a segmentation violation or Seg-V for short, and since it is both difficult to assign a meaningful
result to such an operation, and because it is usually a sign of a misbehaving program, the kernel will
generally resort to terminating the offending program, and will report the error.
Windows 3.1-Me had some level of memory protection, but programs could easily circumvent the need
to use it. A general protection fault would be produced, indicating a segmentation violation had
occurred; however, the system would often crash anyway.
Virtual memory
Fig 2.10: A typical virtual memory management
Many operating systems can "trick" programs into using memory scattered around the hard disk
and RAM as if it is one continuous chunk of memory, called virtual memory.
The use of virtual memory addressing (such as paging or segmentation) means that the kernel
can choose what memory each program may use at any given time, allowing the operating system to
use the same memory locations for multiple tasks. If a program tries to access memory that isn't in its
current range of accessible memory, but nonetheless has been allocated to it, the kernel will be
interrupted in the same way as it would if the program were to exceed its allocated memory. (See
section on memory management.) Under UNIX this kind of interrupt is referred to as a page fault. When
the kernel detects a page fault it will generally adjust the virtual memory range of the program which
triggered it, granting it access to the memory requested. This gives the kernel discretionary power over
where a particular application's memory is stored, or even whether or not it has actually been allocated
yet.
In modern operating systems, memory which is accessed less frequently can be temporarily
stored on disk or other media to make that space available for use by other programs. This is called
swapping, as an area of memory can be used by multiple programs, and what that memory area
contains can be swapped or exchanged on demand.
Multitasking and Process Management
Multitasking refers to the running of multiple independent computer programs on the same
computer; giving the appearance that it is performing the tasks at the same time. Since most computers
can do at most one or two things at one time, this is generally done via time-sharing, which means that
each program uses a share of the computer's time to execute. An operating system kernel contains a
piece of software called a scheduler which determines how much time each program will spend
executing, and in which order execution control should be passed to programs. Control is passed to a
process by the kernel, which allows the program access to the CPU and memory. Later, control is
returned to the kernel through some mechanism, so that another program may be allowed to use the
CPU. This so-called passing of control between the kernel and applications is called a context switch.
An early model which governed the allocation of time to programs was called cooperative multitasking.
In this model, when control is passed to a program by the kernel, it may execute for as long as it wants
before explicitly returning control to the kernel. This means that a malicious or malfunctioning program
may not only prevent any other programs from using the CPU, but it can hang the entire system if it
enters an infinite loop. Modern operating systems extend the concepts of application preemption to
device drivers and kernel code, so that the operating system has preemptive control over internal runtimes as well.
The philosophy governing preemptive multitasking is that of ensuring that all programs are given
regular time on the CPU. This implies that all programs must be limited in how much time they are
allowed to spend on the CPU without being interrupted. To accomplish this, modern operating system
kernels make use of a timed interrupt. A protected mode timer is set by the kernel which triggers a
return to supervisor mode after the specified time has elapsed.
On many single user operating systems cooperative multitasking is perfectly adequate, as home
computers generally run a small number of well tested programs. Windows NT was the first version of
Microsoft Windows which enforced preemptive multitasking, but it didn't reach the home user market
until Windows XP (since Windows NT was targeted at professionals).
Disk access and file systems
Fig 2.11: A screenshots of file accessing
environment
Filesystems allow users and programs to organize and sort files on a computer, often through the use of
directories (or "folders")
Access to data stored on disks is a central feature of all operating systems. Computers store data
on disks using files, which are structured in specific ways in order to allow for faster access, higher
reliability, and to make better use out of the drive's available space. The specific way in which files are
stored on a disk is called a file system, and enables files to have names and attributes. It also allows
them to be stored in a hierarchy of directories or folders arranged in a directory tree.
Early operating systems generally supported a single type of disk drive and only one kind of file
system. Early file systems were limited in their capacity, speed, and in the kinds of file names and
directory structures they could use. These limitations often reflected limitations in the operating
systems they were designed for, making it very difficult for an operating system to support more than
one file system.
While many simpler operating systems support a limited range of options for accessing storage
systems, operating systems like UNIX and GNU/Linux support a technology known as a virtual file system
or VFS. An operating system such as UNIX supports a wide array of storage devices, regardless of their
design or file systems, allowing them to be accessed through a common application programming
interface (API). This makes it unnecessary for programs to have any knowledge about the device they
are accessing. A VFS allows the operating system to provide programs with access to an unlimited
number of devices with an infinite variety of file systems installed on them, through the use of specific
device drivers and file system drivers.
A connected storage device, such as a hard drive, is accessed through a device driver. The device
driver understands the specific language of the drive and is able to translate that language into a
standard language used by the operating system to access all disk drives. On UNIX, this is the language
of block devices.
When the kernel has an appropriate device driver in place, it can then access the contents of the
disk drive in raw format, which may contain one or more file systems. A file system driver is used to
translate the commands used to access each specific file system into a standard set of commands that
the operating system can use to talk to all file systems. Programs can then deal with these files systems
on the basis of filenames, and directories/folders, contained within a hierarchical structure. They can
create, delete, open, and close files, as well as gather various information about them, including access
permissions, size, free space, and creation and modification dates.
Various differences between file systems make supporting all file systems difficult. Allowed
characters in file names, case sensitivity, and the presence of various kinds of file attributes makes the
implementation of a single interface for every file system a daunting task. Operating systems tend to
recommend using (and so support natively) file systems specifically designed for them; for example,
NTFS in Windows and ext3 and ReiserFS in GNU/Linux. However, in practice, third party drives are
usually available to give support for the most widely used file systems in most general-purpose
operating systems (for example, NTFS is available in GNU/Linux through NTFS-3g, and ext2/3 and
ReiserFS are available in Windows through FS-driver and rfstool).
Support for file systems is highly varied among modern operating systems, although there are
several common file systems which almost all operating systems include support and drivers for.
Operating systems vary on file system support and on the disk formats they may be installed on. Under
Windows, each file system is usually limited in application to certain media; for example, CDs must use
ISO 9660 or UDF, and as of Windows Vista, NTFS is the only file system which the operating system can
be installed on. It is possible to install GNU/Linux onto many types of file systems. Unlike other operating
systems, GNU/Linux and UNIX allow any file system to be used regardless of the media it is stored in,
whether it is a hard drive, a disc (CD,DVD...), a USB flash drive, or even contained within a file located on
another file system.
Device drivers
A device driver is a specific type of computer software developed to allow interaction with
hardware devices. Typically this constitutes an interface for communicating with the device, through the
specific computer bus or communications subsystem that the hardware is connected to, providing
commands to and/or receiving data from the device, and on the other end, the requisite interfaces to
the operating system and software applications. It is a specialized hardware-dependent computer
program which is also operating system specific that enables another program, typically an operating
system or applications software package or computer program running under the operating system
kernel, to interact transparently with a hardware device, and usually provides the requisite interrupt
handling necessary for any necessary asynchronous time-dependent hardware interfacing needs.
The key design goal of device drivers is abstraction. Every model of hardware (even within the
same class of device) is different. Newer models also are released by manufacturers that provide more
reliable or better performance and these newer models are often controlled differently. Computers and
their operating systems cannot be expected to know how to control every device, both now and in the
future. To solve this problem, operating systems essentially dictate how every type of device should be
controlled. The function of the device driver is then to translate these operating system mandated
function calls into device specific calls. In theory a new device, which is controlled in a new manner,
should function correctly if a suitable driver is available. This new driver will ensure that the device
appears to operate as usual from the operating system's point of view.
Under versions of Windows before Vista and versions of Linux before 2.6, all driver execution
was co-operative, meaning that if a driver entered an infinite loop it would freeze the system. More
recent revisions of these operating systems incorporate kernel preemption, where the kernel interrupts
the driver to give it tasks, and then separates itself from the process until it receives a response from the
device driver, or gives it more tasks to do.
Networking
Currently most operating systems support a variety of networking protocols, hardware, and
applications for using them. This means that computers running dissimilar operating systems can
participate in a common network for sharing resources such as computing, files, printers, and scanners
using either wired or wireless connections. Networks can essentially allow a computer's operating
system to access the resources of a remote computer to support the same functions as it could if those
resources were connected directly to the local computer. This includes everything from simple
communication, to using networked file systems or even sharing another computer's graphics or sound
hardware. Some network services allow the resources of a computer to be accessed transparently, such
as SSH which allows networked users direct access to a computer's command line interface.
Client/server networking involves a program on a computer somewhere which connects via a
network to another computer, called a server. Servers offer (or host) various services to other network
computers and users. These services are usually provided through ports or numbered access points
beyond the server's network address. Each port number is usually associated with a maximum of one
running program, which is responsible for handling requests to that port. A daemon, being a user
program, can in turn access the local hardware resources of that computer by passing requests to the
operating system kernel. Many operating systems support one or more vendor-specific or open
networking protocols as well, for example, SNA on IBM systems, DECnet on systems from Digital
Equipment Corporation, and Microsoft-specific protocols (SMB) on Windows. Specific protocols for
specific tasks may also be supported such as NFS for file access. Protocols like ESound, or esd can be
easily extended over the network to provide sound from local applications, on a remote system's sound
hardware.
Security
A computer being secure depends on a number of technologies working properly. A modern
operating system provides access to a number of resources, which are available to software running on
the system, and to external devices like networks via the kernel.
The operating system must be capable of distinguishing between requests which should be
allowed to be processed, and others which should not be processed. While some systems may simply
distinguish between "privileged" and "non-privileged", systems commonly have a form of requester
identity, such as a user name. To establish identity there may be a process of authentication. Often a
username must be quoted, and each username may have a password. Other methods of authentication,
such as magnetic cards or biometric data, might be used instead. In some cases, especially connections
from the network, resources may be accessed with no authentication at all (such as reading files over a
network share). Also covered by the concept of requester identity is authorization; the particular
services and resources accessible by the requester once logged into a system are tied to either the
requester's user account or to the variously configured groups of users to which the requester belongs.
In addition to the allow/disallow model of security, a system with a high level of security will also offer
auditing options. These would allow tracking of requests for access to resources (such as, "who has been
reading this file?"). Internal security, or security from an already running program is only possible if all
possibly harmful requests must be carried out through interrupts to the operating system kernel. If
programs can directly access hardware and resources, they cannot be secured.
External security involves a request from outside the computer, such as a login at a connected
console or some kind of network connection. External requests are often passed through device drivers
to the operating system's kernel, where they can be passed onto applications, or carried out directly.
Security of operating systems has long been a concern because of highly sensitive data held on
computers, both of a commercial and military nature. The United States Government Department of
Defense (DoD) created the Trusted Computer System Evaluation Criteria (TCSEC) which is a standard that
sets basic requirements for assessing the effectiveness of security. This became of vital importance to
operating system makers, because the TCSEC was used to evaluate, classify and select computer systems
being considered for the processing, storage and retrieval of sensitive or classified information.
Network services include offerings such as file sharing, print services, email, web sites, and file
transfer protocols (FTP), most of which can have compromised security. At the front line of security are
hardware devices known as firewalls or intrusion detection/prevention systems. At the operating system
level, there are a number of software firewalls available, as well as intrusion detection/prevention
systems. Most modern operating systems include a software firewall, which is enabled by default. A
software firewall can be configured to allow or deny network traffic to or from a service or application
running on the operating system. Therefore, one can install and be running an insecure service, such as
Telnet or FTP, and not have to be threatened by a security breach because the firewall would deny all
traffic trying to connect to the service on that port.
An alternative strategy, and the only sandbox strategy available in systems that do not meet the
Popek and Goldberg virtualization requirements, is the operating system not running user programs as
native code, but instead either emulates a processor or provides a host for a p-code based system such
as Java.
Internal security is especially relevant for multi-user systems; it allows each user of the system to
have private files that the other users cannot tamper with or read. Internal security is also vital if
auditing is to be of any use, since a program can potentially bypass the operating system, inclusive of
bypassing auditing.
User interface
a)
b)
Fig 2.12 a: A screenshot of the Bourne Again Shell command line. Each command is typed out after the 'prompt', and then its output appears
below, working its way down the screen. The current command prompt is at the bottom.
b) A screenshot of the KDE graphical user interface. Programs take the form of images on the screen, and the files, folders (directories), and
applications take the form of icons and symbols. A mouse is used to navigate the computer.
Every computer that is to be operated by an individual requires a user interface. The user
interface is not actually a part of the operating system—it generally runs in a separate program usually
referred to as a shell, but is essential if human interaction is to be supported. The user interface requests
services from the operating system that will acquire data from input hardware devices, such as a
keyboard, mouse or credit card reader, and requests operating system services to display prompts,
status messages and such on output hardware devices, such as a video monitor or printer. The two most
common forms of a user interface have historically been the command-line interface, where computer
commands are typed out line-by-line, and the graphical user interface, where a visual environment
(most commonly with windows, buttons, icons and a mouse pointer) is present.
Graphical user interfaces
Most of the modern computer systems support graphical user interfaces (GUI), and often include
them. In some computer systems, such as the original implementation of Mac OS, the GUI is integrated
into the kernel.
While technically a graphical user interface is not an operating system service, incorporating
support for one into the operating system kernel can allow the GUI to be more responsive by reducing
the number of context switches required for the GUI to perform its output functions. Other operating
systems are modular, separating the graphics subsystem from the kernel and the Operating System. In
the 1980s UNIX, VMS and many others had operating systems that were built this way. GNU/Linux and
Mac OS X are also built this way. Modern releases of Microsoft Windows such as Windows Vista
implement a graphics subsystem that is mostly in user-space; however the graphics drawing routines of
versions between Windows NT 4.0 and Windows Server 2003 exist mostly in kernel space. Windows 9x
had very little distinction between the interface and the kernel.
Many computer operating systems allow the user to install or create any user interface they
desire. The X Window System in conjunction with GNOME or KDE is a commonly found setup on most
UNIX and UNIX-like (BSD, GNU/Linux, Solaris) systems. A number of Windows shell replacements have
been released for Microsoft Windows, which offer alternatives to the included Windows shell, but the
shell itself cannot be separated from Windows.
Numerous Unix-based GUIs have existed over time, most derived from X11. Competition among
the various vendors of Unix (HP, IBM, Sun) led to much fragmentation, though an effort to standardize in
the 1990s to COSE and CDE failed for various reasons, and were eventually eclipsed by the widespread
adoption of GNOME and KDE. Prior to free software-based toolkits and desktop environments, Motif
was the prevalent toolkit/desktop combination (and was the basis upon which CDE was developed).
Graphical user interfaces evolve over time. For example, Windows has modified its user interface almost
every time a new major version of Windows is released, and the Mac OS GUI changed dramatically with
the introduction of Mac OS X in 1999.
Real-time operating systems
A real-time operating system (RTOS) is a multitasking operating system intended for applications
with fixed deadlines (real-time computing). Such applications include some small embedded systems,
automobile engine controllers, industrial robots, spacecraft, industrial control, and some large-scale
computing systems. An early example of a large-scale real-time operating system was Transaction
Processing Facility developed by American Airlines and IBM for the Sabre Airline Reservations System.
Embedded systems that have fixed deadlines use a real-time operating system such as VxWorks, PikeOS,
eCos, QNX, MontaVista Linux and RTLinux. Windows CE is a real-time operating system that shares
similar APIs to desktop Windows but shares none of desktop Windows' codebase. Symbian OS also has
an RTOS kernel (EKA2) starting with version 8.0b.
Some embedded systems use operating systems such as Palm OS, BSD, and GNU/Linux, although
such operating systems do not support real-time computing.
Operating system development as a hobby
Operating system development is one of the most complicated activities in which a computing
hobbyist may engage. A hobby operating system may be classified as one whose code has not been
directly derived from an existing operating system, and has few users and active developers. [22]
In some cases, hobby development is in support of a "homebrew" computing device, for example, a
simple single-board computer powered by a 6502 microprocessor. Or, development may be for an
architecture already in widespread use. Operating system development may come from entirely new
concepts, or may commence by modeling an existing operating system. In either case, the hobbyist is
his/her own developer, or may interact with a small and sometimes unstructured group of individuals
who have like interests. Examples of a hobby operating system include ReactOS and Syllable.
Diversity of operating systems and portability
Application software is generally written for use on a specific operating system, and sometimes
even for specific hardware. When porting the application to run on another OS, the functionality
required by that application may be implemented differently by that OS (the names of functions,
meaning of arguments, etc.) requiring the application to be adapted, changed, or otherwise maintained.
This cost in supporting operating systems diversity can be avoided by instead writing applications against
software platforms like Java, or Qt for web browsers. These abstractions have already borne the cost of
adaptation to specific operating systems and their system libraries. Another approach is for operating
system vendors to adopt standards. For example, POSIX and OS abstraction layers provide
commonalities that reduce porting costs.
Operating System Concerns
As mentioned previously, an operating system is a computer program. Operating systems are written
by human programmers who make mistakes. Therefore there can be errors in the code even though
there may be some testing before the product is released. Some companies have better software quality
control and testing than others so you may notice varying levels of quality from operating system to
operating system. Errors in operating systems cause three main types of problems:

System crashes and instabilities - These can happen due to a software bug typically in the
operating system, although computer programs being run on the operating system can make the
system more unstable or may even crash the system by themselves. This varies depending on the
type of operating system. A system crash is the act of a system freezing and becoming
unresponsive which would cause the user to need to reboot.

Security flaws - Some software errors leave a door open for the system to be broken into by
unauthorized intruders. As these flaws are discovered, unauthorized intruders may try to use
these to gain illegal access to your system. Patching these flaws often will help keep your
computer system secure. How this is done will be explained later.

Sometimes errors in the operating system will cause the computer not to work correctly with
some peripheral devices such as printers.
Command processing
The command interpreter for DOS runs when no application programs are running. When an
application exits, if the transient portion of the command interpreter in memory was overwritten, DOS
will reload it from disk. Some commands are internal and built into COMMAND.COM, others are
external commands stored on disk. When the user types a line of text at the operating system command
prompt, COMMAND.COM will parse the line and attempt to match a command name to a built-in
command or to the name of an executable program file or batch file on disk. If no match is found, an
error message is printed and the command prompt is refreshed.
External commands were too large to keep in the command processor or were less frequently used.
Such utility programs would be stored on disk and loaded just like regular application programs but were
distributed with the operating system. Copies of these utility command programs had to be on an
accessible disk, either on the current drive or on the command path set in the command interpreter.
In the list Appendix I, commands that can accept more than one filename, or a filename including
wildcards (* and ?), are said to accept a filespec parameter. Commands that can accept only a single
filename are said to accept a filename parameter. Additionally, command line switches, or other
parameter strings, can be supplied on the command line. Spaces and symbols such as a "/" or a "-" may
be used to allow the command processor to parse the command line into file names, file specifications,
and other options.
The command interpreter preserves the case of whatever parameters are passed to commands but
the command names themselves and filenames are case-insensitive.
While many commands are the same across many DOS systems (MS-DOS, PC DOS, DR-DOS,
FreeDOS, etc.) some differ in command syntax or name. See Appendix I for a detailed list of MSDOS and
Linux commands
UNIT THREE: INSTRUCTION FORMAT
Introduction
The instruction format of an instruction is usually depicted in a rectangular box symbolizing the bits
of the instruction as they appear in memory words or in a control register. An instruction format defines
the layout of the bits of an instruction, in terms of its constituent parts. The bits of an instruction are
divided into groups called fields. The most common fields found in instruction formats are:

An operation code field that specifies the operation to be performed.

An address field that designates a memory address or a processor registers.

A mode field that specifies the way the operand or the effective address is determined.
Opcode
Mode
Address
Instruction fields
Other special fields are sometimes employed under certain circumstances. The operation code
field of an instruction is a group of bits that define various processor operations, such as add, subtract,
complement and shift. Address fields contain either a memory address field or a register address. Mode
fields offer a variety of ways in which an operand is chosen.
There are mainly four types of instruction formats:

Three address instructions

Two address instructions

One address instructions

Zero address instructions
Three address instructions
Computers with three address instructions formats can use each address field to specify either a
processor register or a memory operand. The program in assembly language that evaluates X=
(A+B)*(C+D) is shown below, together with comments that explain the register transfer operation of
each instruction.
Add R1, A, B
R1M [A] + M [B]
Add R2, C, D
R2M [C] + M [D]
Mul X, R1,R2
M [X]R1 + R2
It is assumed that the computer has two processor registers, R1 and R2. The symbol M[A]
denotes the operand at memory address symbolized by A. the advantage of the three address format is
that it results in short programs when evaluating arithmetic expressions. The disadvantage is that the
binary coded instructions require too many bits to specify three addresses. An example of an
commercial computer that uses three address instructions is the Cyber 170.The instruction formats in
the Cyber computer are restricted to either three register address fields or two register address fields
and one memory address field.
Two address instructions
Two address instructions are the most common in commercial computers. Here again each
address field can specify either a processor register or a memory word. The program to evaluate X=
(A+B)*(C+D) is as follows:
MOV R1, A
R1M [A]
ADD R2, B
R1R1 + M [B]
MOV R2, C
R2M [C]
ADD R2, D
R2R2 + M [D]
MUL R1, R2
R1R1 * R2
MOV X, R1
M [X] R1
The MOV instruction moves or transfers the operands to and from memory and processor registers. The
first symbol listed in an instruction is assumed to be both a source and the destination where the result
of the operation is transferred.
One address instructions
One address instructions use an implied accumulator (AC) register for all data manipulation. For
multiplication and division there is a need for a second register. However, here we will neglect the
second register and assume that the AC contains the result of all operations. The program to evaluate X=
(A+B)*(C+D) is
LOAD
A
ACM [A]
ADD
B
ACAC + M [B]
STORE
T
M [T]AC
LOAD
C
ACM [C]
ADD
D
ACAC +M [D]
MUL
T
ACAC * M [T]
STORE
X
M [X]AC
All operations are done between the AC register and a memory operand. T is the address of a temporary
memory location required for storing the intermediate result. Commercially available computers also
use this type of instruction format.
Zero address instructions
A stack organized computer does not use an address field for the instructions ADD and MUL. The
PUSH and POP instructions, however, need an address field to specify the operand that communicates
with the stack. The following program shows how X=(A+B)*(C+D) will be written for a stack organized
computer.(TOS stands for top of stack.)
PUSH
A
TOSA
PUSH
B
TOSB
TOS(A + B)
ADD
PUSH
C
TOSC
PUSH
D
TOSD
ADD
TOS(C+D)
MUL
TOS(C+D) * (A+B)
POP
X
M[X] TOS
To evaluate arithmetic expressions in a stack computer, it is necessary to convert the expression into
reverse polish notation. The name “zero address” is given to this type of computer because of the
absence of an address field in computational instructions.
ADDRESSING MODES
Introduction
The operation field of an instruction specifies the operation to be performed. The operation must be
executed on some data stored in computer registers or memory words. The way the operands are
chosen during program execution is dependent on the addressing mode of the instruction. The
addressing mode specifies a rule for interpreting or modifying the address field of the instruction before
the operand is actually referenced. Computers use addressing mode techniques for the purpose of
accommodating one or both of the following provisions:

To give programming versatility to the user by providing such facilities as

pointers to memory, counters for loop control, indexing of data, and program

relocation.

To reduce the number of bits in the addressing field of the instruction.
Addressing Modes
•
Implied addressing mode
•
Immediate addressing mode
•
Direct addressing mode
•
Indirect addressing mode
•
Register addressing mode
•
Register Indirect addressing mode
•
Autoincrement or Autodecrement addressing mode
•
Relative addressing mode
•
Indexed addressing mode
•
Base register addressing mode
Implied addressing mode
In this mode the operands are specified implicitly in the definition of the instruction. For example the
‘complement accumulator’ instruction is an implied mode instruction because the operand in the
accumulator register is implied in the definition of the instruction itself. All register reference
instructions that use an accumulator are implied mode instructions. Zero address instructions in a stack
organized computer are implied mode instructions since the operands are implied to be on the top of
the stack.
Implied addressing mode diagram
Instruction
Opcode
CMA
Immediate addressing mode
In this mode the operand is specified in the instruction itself. In other words, an immediate mode
instruction has a operand field rather than an address field. The operand field contains the actual
operand to be used in conjunction with the operation specified in the instruction. Immediate mode
instructions are useful for initializing registers to a constant value.
Example ADD 5
•
Add 5 to contents of accumulator
•
5 is operand
Advantages and disadvantages
•
No memory reference to fetch data
•
Fast
•
Limited range
Immediate addressing mode diagram
Instruction
Opcode
ADD
Operand
5
Direct addressing mode
In this mode the effective address is equal to the address part of the instruction. The operand
resides in memory and its address is given directly by the address field of instruction. In a branch type
instruction the address field specifies the actual branch address.
Effective address (EA) = address field (A)
e.g. LDA A
•
Look in memory at address A for operand which is to be loaded in the accumulator.
•
Load contents of cell A to accumulator
Advantages and disadvantages
•
Single memory reference to access data
•
No additional calculations to work out effective address
•
Limited address space
Direct addressing mode diagram
Instruction
Opcode
Address A
Memory
A
Operand
Indirect addressing mode
In this mode the address field of the instruction gives the address where the effective address is
stored in memory. Control fetches the instruction from memory and uses its address part to access
memory again to read the effective address.
EA = address contained in memory location M
•
Look in M, find address contained in M and look there for operand
For example
ADD @M
•
Add contents of memory location pointed to by contents of M to accumulator
Indirect addressing mode diagram
Instruction
Opcode
Address M
Memory
Pointer to operand
Operand
Register addressing mode
In this mode the operands are in the registers that reside within the CPU.
EA = R
Example ADD B
Advantages and disadvantages
•
No memory access. So very fast execution.
•
Very small address field needed.
-Shorter instructions
-Faster instruction fetch
•
Limited number of registers.
•
Multiple registers helps performance
-Requires good assembly programming or compiler writing
Register addressing mode diagram
Instruction
Opcode
Register Address R
Registers
Operand
Register indirect addressing mode
In this mode the instruction specifies a register in the CPU whose contents give the address of
the operand in the memory. In other words, the selected register contains the address of the operand
rather than the operand itself. Before using a register indirect mode instruction, the programmer must
ensure that the memory address of the operand is placed in the processor register with a previous
instruction. The advantage of a register indirect mode instruction is that the address field of the
instruction uses fewer bits to select a register than would have been required to specify a memory
address directly.
Therefore EA = the address stored in the register R
•
Operand is in memory cell pointed to by contents of register R
•
Example LDAX B
Advantage
•
Less number of bits are required to specify the register.
•
One fewer memory access than indirect addressing.
Register Indirect addressing mode diagram
Instruction
Opcode
Register Address R
Memory
Registers
Pointer to Operand
Operand
Autoincrement or autodecrement addressing mode
This is similar to the register indirect mode except that the register is incremented or
decremented after or before its value is used to access memory. When the address stored in the register
refers to a table of data in memory, it is necessary to increment or decrement the register after every
access to the table. This can be achieved by using the increment or decrement instruction.
However, because it is such a common requirement that the computers have incorporated special mode
that automatically increments or decrements the contents of the register after data access.
•
This mode is specially useful when we want to access a table of data.
•
For example
INR R1
will increment the register R1.
DCR R2
will decrement the register R2.
Autoincrement or Autodecrement addressing mode
diagram
value++
Instruction
Opcode
Register Address R
Memory
Operand
Registers
Operand
value
Autoincrement or Autodecrement addressing mode
diagram
Instruction
Opcode
Register Address R
Memory
Operand
Registers
value++
value
Operand
Relative addressing mode
In this mode the content of the program counter is added to the address part of the instruction
in order to obtain the effective address. Effective address is defined as the memory address obtained
from the computation dictated by the given addressing mode. The address part of the instruction is
usually a signed number (in 2’s complement representation) which can be either positive or negative.
When this number is added to the content of the program counter, the result produces an effective
address whose position in memory is relative to the address of the next instruction. Relative addressing
is often used with branch type instructions when the branch address is in the area surrounding the
instruction word itself. It results in a shorter address field in the instruction format since the relative
address can be specified with a smaller number of bits compared to the bits required to designate the
entire memory address.
EA = A + contents of PC
Example: PC contains 825 and address part of instruction contains 24.
After the instruction is read from location 825, the PC is incremented to 826. So EA=826+24=850. The
operand will be found at location 850 i.e. 24 memory locations forward from the address of the next
instruction.
Relative addressing mode diagram
Instruction
Address A
Opcode
Memory
Program counter
Contents of register
+
Operand
Indexed addressing mode
In this mode the content of an index register is added to the address part of the instruction to
obtain the effective address. The index register is a special CPU register that contains an index value. The
address field of the instruction defines the beginning address of a data array in memory. Each operand
in the array is stored in memory relative to the beginning address. The distance between the beginning
address and the address of the operand is the index value stored in the index register. Any operand in
the array can be accessed with the same instruction provided that the index register contains the correct
index value. The index register can be incremented to facilitate access to consecutive operands. Note
that if an index type instruction does not include an address field in its format, then the instruction
converts to the register indirect mode of operation.
•
Therefore
EA = A + IR
•
Example MOV AL , DS: disp [SI]
Advantage
•
Good for accessing arrays.
Indexed addressing mode diagram
Instruction
Opcode
IR
Address A
Memory
Index Register
Contents of register
+
Operand
A
Base register addressing mode
In this mode the content of base register is added to the address part of the instruction to obtain
the effective address. This is similar to the indexed addressing mode except that the register is now
called a base register instead of an index register. The difference between the two modes is in the way
they are used rather than in the way that they are computed. An index register is assumed to hold an
index number that is relative to the address part of the instruction. A base register is assumed to hold a
base address and the address field of the instruction gives a displacement relative to this base address.
The base register addressing mode is used in computers to facilitate the relocation of the programs in
memory. When programs and data are moved from one segment of memory to another, as required in
multiprogramming systems, the address values of instructions must reflect this change of position. With
a base register, the displacement values of instructions do not have to change. Only the value of the
base register requires updating to reflect the beginning of a new memory segment.
•
Therefore
EA= A + BR
•
For example:
MOV AL, disp [BX]
Segment registers in 8086
Base Register addressing mode diagram
Instruction
Opcode
BR
Address A
Memory
Base Register
Contents of register
+
Operand
Value of BR
INSTRUCTION FORMAT AND TYPES
Introduction
The MIPS R2000/R3000 ISA has fixed-width 32 bit instructions. Fixed-width instructions are
common for RISC processors because they make it easy to fetch instructions without having to decode.
These instructions must be stored at word-aligned addresses (i.e., addresses divisible by 4).
The MIPS ISA instructions fall into three categories: R-type, I-type, and J-type. Not all ISAs divide their
instructions this neatly. This is one reason to study MIPS as a first assembly language. The format is
simple.
R-type
R-type instructions refer to register type instructions. Of the three formats, the R-type is the most
complex.
This is the format of the R-type instruction, when it is encoded in machine code.
B31-26
opcode
B25-21
B20-16
B15-11
B10-6
B5-0
register s register t register d shift amount function
The prototypical R-type instruction is:
add $rd, $rs, $rt
where $rd refers to some register d (d is shown as a variable, however, to use the instruction, you must
put a number between 0 and 31, inclusive for d). $rs, $rt are also registers.
The semantics of the instruction are;
R[d] = R[s] + R[t]
where the addition is signed addition.
You will notice that the order of the registers in the instruction is the destination register ($rd), followed
by the two source registers ($rs and $rt).
However, the actual binary format (shown in the table above) stores the two source registers first, then
the destination register. Thus, how the assembly language programmer uses the instruction, and how
the instruction is stored in binary, do not always have to match.
Let's explain each of the fields of the R-type instruction.

opcode (B31-26)
Opcode is short for "operation code". The opcode is a binary encoding for the instruction.
Opcodes are seen in all ISAs. In MIPS, there is an opcode for add.
The opcode in MIPS ISA is only 6 bits. Ordinarily, this means there are only 64 possible
instructions. Even for a RISC ISA, which typically has few instructions, 64 is quite small. For R-type
instructions, an additional 6 bits are used (B5-0) called the function. Thus, the 6 bits of the opcode
and the 6 bits of the function specify the kind of instruction for R-type instructions.

rd (B25-21)
This is the destination register. The destination register is the register where the result of the
operation is stored.

rs (B20-16)
This is the first source register. The source register is the register that holds one of the arguments
of the operation.

rt (B15-11)
This is the second source register.

shift amount (B10-6)
The amount of bits to shift. Used in shift instructions.

function (B5-0)
An additional 6 bits used to specify the operation, in addition to the opcode.
I-type instructions
I-type is short for "immediate type". The format of an I-type instuction looks like:
B31-26
B25-21
opcode
B20-16
B15-0
register s register t
immediate
The prototypical I-type instruction looks like:
add $rt, $rs, immed
In this case, $rt is the destination register, and $rs is the only source register. It is unusual that $rd is not
used, and that $rd does not appear in bit positions B25-21 for both R-type and I-type instructions.
Presumably, the designers of the MIPS ISA had their reasons for not making the destination register at a
particular location for R-type and I-type.
The semantics of the addi instruction are;
R[t] = R[s] + (IR15)16 IR15-0
where IR refers to the instruction register, the register where the current instruction is stored. (IR 15)16
means that bit B15 of the instruction register (which is the sign bit of the immediate value) is repeated 16
times. This is then followed by IR15-0, which is the 16 bits of the immediate value.
Basically, the semantics says to sign-extend the immediate value to 32 bits, add it (using signed addition)
to register R[s], and store the result in register $rt.
J-type instructions
J-type is short for "jump type". The format of an J-type instuction looks like:
B31-26
B25-0
opcode
target
The prototypical I-type instruction looks like:
j target
The semantics of the j instruction (j means jump) are:
PC <- PC31-28 IR25-0 00
where PC is the program counter, which stores the current address of the instruction being executed.
You update the PC by using the upper 4 bits of the program counter, followed by the 26 bits of the
target (which is the lower 26 bits of the instruction register), followed by two 0's, which creates a 32 bit
address. The jump instruction will be explained in more detail in a future set of notes.
Why Five Bits?
If you look at the R-type and I-type instructions, you will see 5 bits reserved for each register. You might
wonder why.
MIPS supports 32 integer registers. To specify each register, the register are identified with a number
from 0 to 31. It takes log2 32 = 5 bits to specify one of 32 registers.
If MIPS has 64 register, you would need 6 bits to specify the register.
The register number is specified using unsigned binary. Thus, 00000 refers to $r0 and 11111 refers to
register $r31.
Opcodes Table
The following table contains a listing of MIPS instructions and the corresponding opcodes. Opcode and
funct numbers are all listed in hexadecimal.
Mnemonic
Meaning
Type Opcode Funct
Add
Add
R
0x00
0x20
Addi
Add Immediate
I
0x08
NA
Addiu
Add Unsigned Immediate
I
0x09
NA
Addu
Add Unsigned
R
0x00
0x21
And
Bitwise AND
R
0x00
0x24
Andi
Bitwise AND Immediate
I
0x0C
NA
Beq
Branch if Equal
I
0x04
NA
Bne
Branch if Not Equal
I
0x05
NA
Div
Divide
R
0x00
0x1A
Divu
Unsigned Divide
R
0x00
0x1B
J
Jump to Address
J
0x02
NA
Jal
Jump and Link
J
0x03
NA
Jr
Jump to Address in Register
R
0x00
0x08
Lbu
Load Byte Unsigned
I
0x24
NA
Lhu
Load Halfword Unsigned
I
0x25
NA
Lui
Load Upper Immediate
I
0x0F
NA
Lw
Load Word
I
0x23
NA
mfc0
Move from Coprocessor 0
R
0x10
NA
Mfhi
Move from HI Register
R
0x00
0x10
Mnemonic
Meaning
Type Opcode Funct
Mflo
Move from LO Register
R
0x00
0x12
Mult
Multiply
R
0x00
0x18
Multu
Unsigned Multiply
R
0x00
0x19
Nor
Bitwise NOR (NOT-OR)
R
0x00
0x27
Or
Bitwise OR
R
0x00
0x25
Ori
Bitwise OR Immediate
I
0x0D
NA
Sb
Store Byte
I
0x28
NA
Sh
Store Halfword
I
0x29
NA
Sll
Logical Shift Left
R
0x00
0x00
Slt
Set to 1 if Less Than
R
0x00
0x2A
Slti
Set to 1 if Less Than Immediate
I
0x0A
NA
Sltiu
Set to 1 if Less Than Unsigned Immediate I
0x0B
NA
Sltu
Set to 1 if Less Than Unsigned
R
0x00
0x2B
Sra
Arithmetic Shift Right (sign-extended)
R
0x00
0x03
Srl
Logical Shift Right (0-extended)
R
0x00
0x02
Sub
Subtract
R
0x00
0x22
Subu
Unsigned Subtract
R
0x00
0x23
Sw
Store Word
I
0x2B
NA
Xor
Bitwise XOR (Exclusive-OR)
R
0x00
0x26
UNIT FOUR: ASSEMBLY AND ASSEMBLY PROCESS
ASSEMBLY PROCESS
A computer understands machine code. People (and compilers) write assembly language.
assembly
-----------------
machine
source --> | assembler | --> code
code
-----------------
An assembler is a program (a very deterministic program). It translates each instruction to its machine
code. In the past, there was a one-to-one correspondence between assembly language instructions and
machine language instructions.
This is no longer the case. Assemblers are now-a-days made more powerful, and can "rework" code,
doing further translation on assembly language code in a manner that would once have been done only
by a compiler.
The Translation of MAL to TAL

MAL -- the instructions accepted by the assembler

TAL -- a subset of MAL. These are instructions that can be directly turned into machine code.
There are lots of MAL instructions that have no direct TAL equivalent. They will be translated
(composed, synthesized) into one or more TAL instructions. The MAL instructions that need to be
translated are often called pseudoinstructions.
How to determine whether an instruction is a TAL instruction or not: look in the list of TAL instructions. If
the instruction is there, then it is a TAL instruction!
The assembler takes (non MIPS, or pseudoinstructions) MAL instructions and synthesizes them with 1 or
more MIPS instructions. Here are a bunch of examples.
Multiplication and Division Instructions
mul $8, $17, $20
becomes
mult $17, $20
mflo $8
Why? 32-bit multiplication produces a 64-bit result. To deal with this larger result, the MIPS architecture
has 2 registers that hold results for integer multiplication and division. They are called HI and LO. Each is
a 32 bit register.
mult places the least significant 32 bits of its result into LO, and the most significant into HI. Note that
this can lead to an incorrect product being used as the result, in the case that more than 32 bits are
required to represent the correct product.
Then, more TAL instructions are needed to move data into or out of registers HI and LO:
operation of mflo,
|||
||-|------
mtlo,
mfhi,
register lo
from
move
mthi
|||
||- register hi
|-- to
--- move
Data is moved into or out of register HI or LO.
One operand is needed to tell where the data is coming from or going to.
Integer division also uses register HI and LO, since it generates both a quotient and remainder as a
result.
div $rd, $rs, $rt
# MAL
becomes
div $rs, $rt
# TAL
mflo $rd
# quotient in register LO
and
rem $rd, $rs, $rt
# MAL
becomes
div $rs, $rt
# TAL
mfhi $rd
# remainder in register HI
Load and Store Instructions
lw $8, label
becomes
la $8, label
lw $8, 0($8)
which becomes
lui $8, 0xMSpart of label
# label represents an address
ori $8, $8, 0xLSpart of label
lw $8, 0($8)
or
lui $8, 0xMSpart of label
lw $8, 0xLSpart of label($8)
Note that this 2-instruction sequence only works if the most significant bit of the LSpart of label is a 0.
The la instruction is also a pseudoinstruction (MAL, but not TAL). Its synthesis is accomplished with the 2
instruction sequence of lui followed by ori as given above. The lui instruction places the most significant
16 bits of the desired address into a register, and the ori sets the least significant 16 bits of the register.
For example, assume that the label X has been assigned by the assembler to be the address 0xaabb00cc.
The MAL instruction
la $12, X
becomes
lui $12, 0xaabb
ori $12, $12, 0x00cc
A store instruction which implies the use of a la pseudoinstruction in its synthesis needs to place the
address in a register. For example, consider the code
sw
$12, X
This synthesis may not use register $12 as the place to temporarily hold the address of X as the above
example for the lw did. Using $12 would overwrite the value that is to be stored to memory. In this case,
and other cases like this, the assember requires the use of an extra register to complete the synthesis.
Register $1 on the MIPS processor is set aside (by convention) for exactly this type of situation. The
synthesis for this sw example becomes
lui $1, 0xaabb
ori $1, $1, 0x00cc
sw $12, 0($1)
Instructions with Immediates
Instructions with immediates are synthesized with instructions that must have an immediate
value as the last operand.
add $sp, $sp, 4
becomes
addi $sp, $sp, 4
An add instruction requires 3 operands in registers. addi has one operand that must be an immediate.
These instructions are classified as immediate instructions. On the MIPS, they include: addi, addiu, andi,
lui, ori, xori.
Instructions with Too Few Operands
add $12, $18
is expanded back out to be
add $12, $12, $18
I/O Instructions
putc $18
becomes
li $2, 11
# MAL
move $4, $18
# MAL
syscall
which becomes
addi $2, $0, 11
add $4, $18, $0
syscall
getc $11
becomes
li $2, 12
syscall
move $11, $2
which becomes
addi $2, $0, 12
syscall
add $11, $2, $0
puts $13
becomes
li $2, 4
move $4, $13
syscall
which becomes
addi $2, $0, 4
add $4, $13, $0
syscall
done
becomes
li $2, 10
syscall
which becomes
addi $2, $0, 10
syscall
ASSEMBLY
The assembler's job is to
1. assign addresses
2. generate machine code
A modern assembler will

on the fly, translate (synthesize) from the accepted assembly language to the instructions
available in the architecture

assign addresses

generate machine code

generate an image (the memory image) of what memory must look like for the program to be
executed.
A simple assembler will make 2 complete passes over the data to complete this task.
Pass 1: create complete symbol table generate machine code for instructions other than branches,
jumps, jal, la, etc. (those instructions that rely on an address for their machine code).
Pass 2: complete machine code for instructions that did not get finished in pass 1.
A symbol table is a table, listing address assignments (made by the assembler) for all labels.
The assembler starts at the top of the source code program, and scans. It looks for

directives (.data .text .space .word .byte .float )

instructions
An important detail: there are separate memory spaces for data and instructions. The assembler
allocates each in sequential order as it scans through the source code program.
The starting addresses are fixed -- any program will be assembled to have data and instructions that
start at the same, fixed address.
EXAMPLE (given in little endian order)
.data
a1: .word 3
a2: .byte '\n'
a3: .space 5
address
contents
0x00001000
0x00001004
0x00001008
0x00000003
0x??????0a
0x????????
0x0000100c
0x????????
(the
3 MSbytes are not part of the declaration)
Note: Our assembler (in the 354 simulator) will align data to word addresses unless you specify
otherwise!
Machine Code Generation
Simple example of machine code generation for simple instruction:
assembly language:
addi
$8, $20, 15
^
^
|
^
|
opcode rt
^
|
rs
|
immediate
machine code format
31
15
0
----------------------------------------| opcode | rs | rt | immediate
|
----------------------------------------opcode is 6 bits -- it is defined to be 001000
rs is 5 bits,
rt is 5 bits,
encoding of 20, 10100
encoding of 8, 01000
so, the 32-bit instruction for addi $8, $20, 15
001000 10100 01000 0000000000001111
re-spaced:
0010 0010 1000 1000 0000 0000 0000 1111
OR
0x 2
2
8
8
0
0
0
f
A Detailed MIPS R2000 Assembly Example
The Source Code:
.data
a1: .word 3
a2: .word 16:4
a3: .word 5
.text
__start: la $6, a2
loop:
lw $7, 4($6)
mult $9, $10
b loop
done
The Symbol Table:
# MAL code fragment
is
symbol
address
--------------------a1
0040 0000
a2
0040 0004
a3
0040 0014
__start
0080 0000
loop
0080 0008
Memory Map of the Data Section:
address
0040
0040
0040
0040
0040
0040
0000
0004
0008
000c
0010
0014
contents
hex
0000 0003
0000 0010
0000 0010
0000 0010
0000 0010
0000 0005
binary
0000 0000
0000 0000
0000 0000
0000 0000
0000 0000
0000 0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0001
0001
0001
0001
0000
0011
0000
0000
0000
0000
0101
Translation to TAL Code:
.text
__start: lui $6, 0x0040
ori $6, $6, 0x0004
loop:
lw $7, 4($6)
mult $9, $10
beq $0, $0, loop
ori $2, $0, 10
syscall
# la $6, a2
# b loop
# done
Memory Map of the Text Section: memory map of text section
address
0080
0080
0080
0080
0080
0080
0080
0000
0004
0008
000c
0010
0014
0018
contents
hex
3c06 0040
34c6 0004
8cc7 0004
012a 0018
1000 fffd
3402 000a
0000 000c
binary
0011 1100
0011 0100
1000 1100
0000 0001
0001 0000
0011 0100
0000 0000
0000
1100
1100
0010
0000
0000
0000
0110
0110
0111
1010
0000
0010
0000
0000
0000
0000
0000
1111
0000
0000
0000
0000
0000
0000
1111
0000
0000
0100
0000
0000
0001
1111
0000
0000
0000
0100
0100
1000
1101
1010
1100
(lui)
(ori)
(lw)
(mult)
(beq)
(ori)
(syscall)
The Process of Assembly:
The assembler starts at the beginning of the ASCII source code. It scans for tokens, and takes action
based on those tokens.

For a token of .data:
This directive that tells the assembler that what will come next are to be placed in the data
portion of memory.

For a token of a1::
This is a label. Put it in the symbol table. Assign an address. Assume that the program data starts
at address 0x0080 0000.
Branch Offset Computation
At execution time (for a taken branch):
contents of PC + sign extended offset field | 00 --> PC
The PC points to the instruction after the beq when the offset is added.
At assembly time: (for the displacement or offset field of the beq in the above example)
byte offset = target addr - ( 4 + beq addr )
= 00800008 - ( 00000004 + 00800010 )
(hex)
(ordered to give POSITIVE result)
0000 0000 1000 0000 0000 0000 0001 0100
- 0000 0000 1000 0000 0000 0000 0000 1000
-----------------------------------------0000 0000 0000 0000 0000 0000 0000 1100 (byte offset)
(compute the additive inverse)
1111 1111 1111 1111 1111 1111 1111 0011
+
1
----------------------------------------1111 1111 1111 1111 1111 1111 1111 0100
(-12)
we have 16 bit offset field.
throw away least significant 2 bits
(they should always be 0, and they are added
back at execution time)
1111 1111 1111 1111 1111 1111 1111 0100 (byte offset)
becomes
11 1111 1111 1111 01
(offset field)
Jump Target Computation
At execution time:
most significant 4 bits of PC || target field | 00 --> PC
(26 bits)
at assembly time, to get the target field:

take 32 bit target address,

eliminate least significant 2 bits (to make it a word-aligned address!)

eliminate most significant 4 bits
What remains is 26 bits, and it goes in the target field.
An example of machine code generated for a jump instruction:
.
.
.
j
L2
.
.
L2: # another instruction here
Assume
that
the
j
instruction
is
to
be
placed
at
address
0x0100acc0
Assume that the assembler assigns address 0x0100ff04 for label L2
Then, when the assembler is generating machine code for the j instruction,
1. The assembler checks that the most significant 4 bits of the address of the
jump instruction is the same as the most significant 4 bits of the address for
the target (L2).
2.
instruction address
0000 0001 0000 0000 (m.s. 16 bits)
3.
L2 address
0000 0001 0000 0000 (m.s. 16 bits)
4.
^^^^
These 4 bits ARE the same, so procede.
5. Extract bits 27..2 of the target address for the machine code.
6.
L2 0000 0001 0000 0000 1111 1111 0000 0100
7.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8. The machine code for the j instruction:
9.
000010
0001 0000 0000 1111 1111 0000 01
10.
op code
26-bit partial address
11.
12.
Given in hexadecimal:
13.
0000 1000 0100 0000 0011 1111 1100 0001
14.
0x 0
8
4
0
3
f
c
1
In the first step, if the address of the jump instruction and the target address differ in their 4 most
significant bits, then the assembler must translate to different TAL code.
One possible translation:
j L3
# assume j will be placed at address 0x0400 0088
.
.
.
L3:
# assume L3 is at address 0xab00 0040
becomes
la
$1, L3
jr
$1
which in TAL, would be
lui $1, 0xab00
ori $1, $1, 0x0040
jr
$1
UNIT FIVE: LINKING AND LOADING
Functions of Linkers and loaders
The basic job of any linker or loader is simple: it binds more abstract names to more concrete
names, which permits programmers to write code using the more abstract names. That is, it takes a
name written by a programmer such as getline and binds it to ``the location 612 bytes from the
beginning of the executable code in module iosys.'' Or it may take a more abstract numeric address such
as ``the location 450 bytes beyond the beginning of the static data for this module'' and bind it to a
numeric address.
Fig5.1: A Diagram showing the functions of a linker
Address binding: a historical perspective
A useful way to get some insight into what linkers and loaders do is to look at their part in the
development of computer programming systems.
The earliest computers were programmed entirely in machine language. Programmers would write out
the symbolic programs on sheets of paper, hand assemble them into machine code and then toggle the
machine code into the computer, or perhaps punch it on paper tape or cards. (Real hot-shots could
compose code directly at the switches.) If the programmer used symbolic addresses at all, the symbols
were bound to addresses as the programmer did his or her hand translation. If it turned out that an
instruction had to be added or deleted, the entire program had to be hand-inspected and any addresses
affected by the added or deleted instruction adjusted.
The problem was that the names were bound to addresses too early. Assemblers solved that
problem by letting programmers write programs in terms of symbolic names, with the assembler binding
the names to machine addresses. If the program changed, the programmer had to reassemble it, but the
work of assigning the addresses is pushed off from the programmer to the computer.
Libraries of code compound the address assignment problem. Since the basic operations that
computers can perform are so simple, useful programs are composed of subprograms that perform
higher level and more complex operations. Computer installations keep a library of pre-written and
debugged subprograms that programmers can draw upon to use in new programs they write, rather
than requiring programmers to write all their own subprograms. The programmer then loads the
subprograms in with the main program to form a complete working program.
Programmers were using libraries of subprograms even before they used assemblers. By 1947,
John Mauchly, who led the ENIAC project, wrote about loading programs along with subprograms
selected from a catalog of programs stored on tapes, and of the need to relocate the subprograms' code
to reflect the addresses at which they were loaded. Perhaps surprisingly, these two basic linker
functions, relocation and library search, appear to predate even assemblers, as Mauchly expected both
the program and subprograms to be written in machine language. The relocating loader allowed the
authors and users of the subprograms to write each subprogram as though it would start at location
zero, and to defer the actual address binding until the subprograms were linked with a particular main
program.
With the advent of operating systems, relocating loaders separate from linkers and libraries
became necessary. Before operating systems, each program had the machine's entire memory at its
disposal, so the program could be assembled and linked for fixed memory addresses, knowing that all
addresses in the computer would be available. But with operating systems, the program had to share
the computer's memory with the operating system and perhaps even with other programs, This means
that the actual addresses at which the program would be running weren't known until the operating
system loaded the program into memory, deferring final address binding past link time to load time.
Linkers and loaders now divided up the work, with linkers doing part of the address binding, assigning
relative addresses within each program, and the loader doing a final relocation step to assign actual
addresses.
As systems became more complex, they called upon linkers to do more and more complex name
management and address binding. Fortran programs used multiple subprograms and common blocks,
areas of data shared by multiple subprograms, and it was up to the linker to lay out storage and assign
the addresses both for the subprograms and the common blocks. Linkers increasingly had to deal with
object code libraries. including both application libraries written in Fortran and other languages, and
compiler support libraries called implcitly from compiled code to handle I/O and other high-level
operations.
Programs quickly became larger than available memory, so linkers provided overlays, a technique
that let programmers arrange for different parts of a program to share the same memory, with each
overlay loaded on demand when another part of the program called into it. Overlays were widely used
on mainframes from the advent of disks around 1960 until the spread of virtual memory in the mid1970s, then reappeared on microcomputers in the early 1980s in exactly the same form, and faded as
virtual memory appeared on PCs in the 1990s. They're still used in memory limited embedded
environments, and may yet reappear in other places where precise programmer or compiler control of
memory usage improves performance.
With the advent of hardware relocation and virtual memory, linkers and loaders actually got less
complex, since each program could again have an entire address space. Programs could be linked to be
loaded at fixed addresses, with hardware rather than software relocation taking care of any load-time
relocation. But computers with hardware relocation invariably run more than one program, frequently
multiple copies of the same program. When a computer runs multiple instances of one program, some
parts of the program are the same among all running instance (the executable code, in particular), while
other parts are unique to each instance. If the parts that don't change can be separated out from the
parts that do change, the operating system can use a single copy of the unchanging part, saving
considerable storage. Compilers and assemblers were modified to create object code in multiple
sections, with one section for read only code and another section for writable data, the linker had to be
able to combine all of sections of each type so that the linked program would have all the code in one
place and all of the data in another. This didn't delay address binding any more than it already was, since
addresses were still assigned at link time, but more work was deferred to the linker to assign addresses
for all the sections.
Even when different programs are running on a computer, those different programs usually turn
out to share a lot of common code. For example, nearly every program written in C uses routines such as
fopen and printf, database applications all use a large access library to connect to the database, and
programs running under a GUI such as X Window, MS Windows, or the Macintosh all use pieces of the
GUI library. Most systems now provide shared libraries for programs to use, so that all the programs that
use a library can share a single copy of it. This both improves runtime performance and saves a lot of
disk space; in small programs the common library routines often take up more space than the program
itself.
In the simpler static shared libraries, each library is bound to specific addresses at the time the
library is built, and the linker binds program references to library routines to those specific addresses at
link time. Static libraries turn out to be inconveniently inflexible, since programs potentially have to be
relinked every time any part of the library changes, and the details of creating static shared libraries turn
out to be very tedious. Systems added dynamically linked libraries in which library sections and symbols
aren't bound to actual addresses until the program that uses the library starts running. Sometimes the
binding is delayed even farther than that; with full-fledged dynamic linking, the addresses of called
procedures aren't bound until the first call. Furthermore, programs can bind to libraries as the programs
are running, loading libraries in the middle of program execution. This provides a powerful and highperformance way to extend the function of programs. Microsoft Windows in particular makes extensive
use of runtime loading of shared libraries (known as DLLs, Dynamically Linked Libraries) to construct and
extend programs.
Linking and loading
Linkers and loaders perform several related but conceptually separate actions.

Program loading: Copy a program from secondary storage (which since about 1968 invariably
means a disk) into main memory so it's ready to run. In some cases loading just involves copying
the data from disk to memory, in others it involves allocating storage, setting protection bits, or
arranging for virtual memory to map virtual addresses to disk pages.

Relocation: Compilers and assemblers generally create each file of object code with the program
addresses starting at zero, but few computers let you load your program at location zero. If a
program is created from multiple subprograms, all the subprograms have to be loaded at nonoverlapping addresses. Relocation is the process of assigning load addresses to the various parts
of the program, adjusting the code and data in the program to reflect the assigned addresses. In
many systems, relocation happens more than once. It's quite common for a linker to create a
program from multiple subprograms, and create one linked output program that starts at zero,
with the various subprograms relocated to locations within the big program. Then when the
program is loaded, the system picks the actual load address and the linked program is relocated
as a whole to the load address.

Symbol resolution: When a program is built from multiple subprograms, the references from one
subprogram to another are made using symbols; a main program might use a square root routine
called sqrt, and the math library defines sqrt. A linker resolves the symbol by noting the location
assigned to sqrt in the library, and patching the caller's object code to so the call instruction
refers to that location.
Although there's considerable overlap between linking and loading, it's reasonable to define a
program that does program loading as a loader, and one that does symbol resolution as a linker.
Either can do relocation, and there have been all-in-one linking loaders that do all three
functions.
The line between relocation and symbol resolution can be fuzzy. Since linkers already can resolve
references to symbols, one way to handle code relocation is to assign a symbol to the base
address of each part of the program, and treat relocatable addresses as references to the base
address symbols.
One important feature that linkers and loaders share is that they both patch object code, the
only widely used programs to do so other than perhaps debuggers. This is a uniquely powerful
feature, albeit one that is extremely machine specific in the details, and can lead to baffling bugs
if done wrong.
Two-pass linking
Now we turn to the general structure of linkers. Linking, like compiling or assembling, is fundamentally a
two pass process. A linker takes as its input a set of input object files, libraries, and perhaps command
files, and produces as its result an output object file, and perhaps ancillary information such as a load
map or a file containing debugger symbols, Figure 5.2.
Figure 5.2: The linker process picture of linker taking input files, producing output file, maybe also other junk
Each input file contains a set of segments, contiguous chunks of code or data to be placed in the
output file. Each input file also contains at least one symbol table. Some symbols are exported, defined
within the file for use in other files, generally the names of routines within the file that can be called
from elsewhere. Other symbols are imported, used in the file but not defined, generally the names of
routines called from but not present in the file.
When a linker runs, it first has to scan the input files to find the sizes of the segments and to
collect the definitions and references of all of the symbols It creates a segment table listing all of the
segments defined in the input files, and a symbol table with all of the symbols imported or exported.
Using the data from the first pass, the linker assigns numeric locations to symbols, determines the sizes
and location of the segments in the output address space, and figures out where everything goes in the
output file.
The second pass uses the information collected in the first pass to control the actual linking
process. It reads and relocates the object code, substituting numeric addresses for symbol references,
and adjusting memory addresses in code and data to reflect relocated segment addresses, and writes
the relocated code to the output file. It then writes the output file, generally with header information,
the relocated segments, and symbol table information. If the program uses dynamic linking, the symbol
table contains the info the runtime linker will need to resolve dynamic symbols. In many cases, the linker
itself will generate small amounts of code or data in the output file, such as "glue code" used to call
routines in overlays or dynamically linked libraries, or an array of pointers to initialization routines that
need to be called at program startup time.
Whether or not the program uses dynamic linking, the file may also contain a symbol table for
relinking or debugging that isn't used by the program itself, but may be used by other programs that
deal with the output file.
Some object formats are relinkable, that is, the output file from one linker run can be used as the
input to a subsequent linker run. This requires that the output file contain a symbol table like one in an
input file, as well as all of the other auxiliary information present in an input file.
Nearly all object formats have provision for debugging symbols, so that when the program is run
under the control of a debugger, the debugger can use those symbols to let the programmer control the
program in terms of the line numbers and names used in the source program. Depending on the details
of the object format, the debugging symbols may be intermixed in a single symbol table with symbols
needed by the linker, or there may be one table for the linker and a separate, somewhat redundant
table for the debugger.
A few linkers appear to work in one pass. They do that by buffering some or all of the contents of
the input file in memory or disk during the linking process, then reading the buffered material later.
Since this is an implementation trick that doesn't fundamentally affect the two-pass nature of linking,
we don't address it further here.
Object code libraries
All linkers support object code libraries in one form or another, with most also providing support
for various kinds of shared libraries.
The basic principle of object code libraries is simple enough, Figure 5.3. A library is little more
than a set of object code files. (Indeed, on some systems you can literally concatenate a bunch of object
files together and use the result as a link library.) After the linker processes all of the regular input files,
if any imported names remain undefined, it runs through the library or libraries and links in any of the
files in the library that export one or more undefined names.
Figure 5.3: Object code libraries Object files fed into the linker, with libraries containing lots of files following along.
Shared libraries complicate this task a little by moving some of the work from link time to load
time. The linker identifies the shared libraries that resolve the undefined names in a linker run, but
rather than linking anything into the program, the linker notes in the output file the names of the
libraries in which the symbols were found, so that the shared library can be bound in when the program
is loaded.
Relocation and code modification
The heart of a linker or loader's actions is relocation and code modification. When a compiler or
assembler generates and object file, it generates the code using the unrelocated addresses of code and
data defined within the file, and usually zeros for code and data defined elsewhere. As part of the linking
process, the linker modifies the object code to reflect the actual addresses assigned. For example,
consider this snippet of x86 code that moves the contents of variable a to variable b using the eax
register.
mov a,%eax
mov %eax,b
If a is defined in the same file at location 1234 hex and b is imported from somewhere else, the
generated object code will be:
A1 34 12 00 00 mov a,%eax
A3 00 00 00 00 mov %eax,b
Each instruction contains a one-byte operation code followed by a four-byte address. The first
instruction has a reference to 1234 (byte reversed, since the x86 uses a right to left byte order) and the
second a reference to zero since the location of b is unknown.
Now assume that the linker links this code so that the section in which a is located is relocated by hex
10000 bytes, and b turns out to be at hex 9A12. The linker modifies the code to be:
A1 34 12 01 00 mov a,%eax
A3 12 9A 00 00 mov %eax,b
That is, it adds 10000 to the address in the first instruction so now it refers to a's relocated address
which is 11234, and it patches in the address for b. These adjustments affect instructions, but any
pointers in the data part of an object file have to be adjusted as well.
On older computers with small address spaces and direct addressing, the modification process is
fairly simple, since there are only only one or two address formats that a linker has to handle. Modern
computers, including all RISCs, require considerably more complex code modification. No single
instruction contains enough bits to hold a direct address, so the compiler and linker have to use
complicated addressing tricks to handle data at arbitrary addresses. In some cases, it's possible to
concoct an address using two or three instructions, each of which contains part of the address, and use
bit manipulation to combine the parts into a full address. In this case, the linker has to be prepared to
modify each of the instructions appropriately, inserting some of the bits of the address into each
instruction. In other cases, all of the addresses used by a routine or group of routines are placed in an
array used as an ``address pool'', initialization code sets one of the machine registers to point to that
array, and code loads pointers out of the address pool as needed using that register as a base register.
The linker may have to create the array from all of the addresses used in a program, then modify
instructions that so that they refer to the appropriate address pool entry. Some systems require position
independent code that will work correctly regardless of where in the address space it is loaded. Linkers
generally have to provide extra tricks to support that, separating out the parts of the program that can't
be made position independent, and arranging for the two parts to communicate.
Compiler Drivers
In most cases, the operation of the linker is invisible to the programmer or nearly so, because it's run
automatically as part of the compilation process. Most compilation systems have a compiler driver that
automatically invokes the phases of the compiler as needed. For example, if the programmer has two C
language source files, the compiler driver will run a sequence of programs like this on a UNIX system:

C preprocessor on file A, creating preprocessed A

C compiler on preprocessed A, creating assembler file A

Assembler on assembler file A, creating object file A

C preprocessor on file B, creating preprocessed B

C compiler on preprocessed B, creating assembler file B

Assembler on assembler file B, creating object file B

Linker on object files A and B, and system C library
That is, it compiles each source file to assembler and then object code, and links the object code
together, including any needed routines from the system C library.
Compiler drivers are often much cleverer than this. They often compare the creation dates of
source and object files, and only recompile source files that have changed. (The UNIX make program is
the classic example.) Particularly when compiling C++ and other object oriented languages, compiler
drivers can play all sorts of tricks to work around limitations in linkers or object formats. For example,
C++ templates define a potentially infinite set of related routines, so to find the finite set of template
routines that a program actually uses, a compiler driver can link the programs' object files together with
no template code, read the error messages from the linker to see what's undefined, call the C++
compiler to generate object code for the necessary template routines and re-link.
Linker command languages
Every linker has some sort of command language to control the linking process. At the very least
the linker needs the list of object files and libraries to link. Generally there is a long list of possible
options: whether to keep debugging symbols, whether to use shared or unshared libraries, which of
several possible output formats to use. Most linkers permit some way to specify the address at which
the linked code is to be bound, which comes in handy when using a linker to link a system kernel or
other program that doesn't run under control of an operating system. In linkers that support multiple
code and data segments, a linker command language can specify the order in which segments are to be
linked, special treatment for certain kinds of segments, and other application-specific options.
There are four common techniques to pass commands to a linker:

Command line: Most systems have a command line or the equivalent, via which one can pass a
mixture of file names and switches. This is the usual approach for UNIX and Windows linkers. On
systems with limited length command lines, there's usually a way to direct the linker to read
commands from a file and treat them as though they were on the command line.

Intermixed with object files: Some linkers, such as IBM mainframe linkers, accept alternating
object files and linker commands in a single input file. This dates from the era of card decks,
when one would pile up object decks and hand-punched command cards in a card reader.

Embedded in object files: Some object formats, notably Microsoft's, permit linker commands to
be embedded inside object files. This permits a compiler to pass any options needed to link an
object file in the file itself. For example, the C compiler passes commands to search the standard
C library.

Separate configuration language: A few linkers have a full fledged configuration language to
control linking. The GNU linker, which can handle an enormous range of object file formats,
machine architectures, and address space conventions, has a complex control language that lets
a programmer specify the order, in which segments should be linked, rules for combining similar
segments, segment addresses, and a wide range of other options. Other linkers have less
complex languages to handle specific features such as programmer-defined overlays.
Linking: a true-life example
We complete our introduction to linking with a small but real linking example. The snippet below
shows a pair of C language source files, m.c with a main program that calls a routine named a, and a.c
that contains the routine with a call to the library routines strlen and printf.
Figure 5.4: Source files
Source file m.c
extern void a(char *); int main(int ac, char **av)
{
static char string[] = "Hello, world!\n"; a(string);
}
Source file a.c
#include <unistd.h>
#include <string.h> void a(char *s)
{
write(1, s, strlen(s));
}
The main program m.c compiles, on my Pentium with GCC, into a 165 byte object file in the classic a.out
object format, in the next code segment. That object file includes a fixed length header, 16 bytes of
"text" segment, containing the read only program code, and 16 bytes of "data" segment, containing the
string. Following that are two relocation entries, one that marks the pushl instruction that puts the
address of the string on the stack in preparation for the call to a, and one that marks the call instruction
that transfers control to a. The symbol table exports the definition of _main, imports _a, and contains a
couple of other symbols for the debugger. Note that the pushl instruction refers to location 10 hex, the
tentative address for the string, since it's in the same object file, while the call refers to location 0 since
the address of _a is unknown.
Figure 5.5: Object code for m.o
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000010 00000000 00000000 00000020 2**3
1 .data 00000010 00000010 00000010 00000030 2**3
Disassembly of section .text: 00000000 <_main>:
0: 55 pushl %ebp
1: 89 e5 movl %esp,%ebp
3: 68 10 00 00 00 pushl $0x10
4: 32 .data
8: e8 f3 ff ff ff call 0
9: DISP32 _a
d: c9 leave
e: c3 ret
...
The subprogram file a.c compiles into a 160 byte object file, Figure 5.6, with the header, a 28 byte text
segment, and no data. Two relocation entries mark the calls to strlen and write, and the symbol table
exports _a and imports _strlen and _write.
Figure 5.6: Object code for m.o
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000001c 00000000 00000000 00000020 2**2
CONTENTS, ALLOC, LOAD, RELOC, CODE
1 .data 00000000 0000001c 0000001c 0000003c 2**2
CONTENTS, ALLOC, LOAD, DATA
Disassembly of section .text: 00000000 <_a>:
0: 55 pushl %ebp
1: 89 e5 movl %esp,%ebp
3: 53 pushl %ebx
4: 8b 5d 08 movl 0x8(%ebp),%ebx
7: 53 pushl %ebx
8: e8 f3 ff ff ff call 0
9: DISP32 _strlen
d: 50 pushl %eax
e: 53 pushl %ebx
f: 6a 01 pushl $0x1
11: e8 ea ff ff ff call 0
12: DISP32 _write
16: 8d 65 fc leal -4(%ebp),%esp
19: 5b popl %ebx
1a: c9 leave
1b: c3 ret
To produce an executable program, the linker combines these two object files with a standard startup
initialization routine for C programs, and necessary routines from the C library, producing an executable
file displayed in part in Figure 5.7.
Figure 5.7: Selected parts of executable
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000fe0 00001020 00001020 00000020 2**3
1 .data 00001000 00002000 00002000 00001000 2**3
2 .bss 00000000 00003000 00003000 00000000 2**3 Disassembly of section .text:
00001020 <start-c>:
...
1092: e8 0d 00 00 00 call 10a4 <_main>
...
000010a4 <_main>:
10a4: 55 pushl %ebp
10a5: 89 e5 movl %esp,%ebp
10a7: 68 24 20 00 00 pushl $0x2024
10ac: e8 03 00 00 00 call 10b4 <_a>
10b1: c9 leave
10b2: c3 ret
... 000010b4 <_a>:
10b4: 55 pushl %ebp
10b5: 89 e5 movl %esp,%ebp
10b7: 53 pushl %ebx
10b8: 8b 5d 08 movl 0x8(%ebp),%ebx
10bb: 53 pushl %ebx
10bc: e8 37 00 00 00 call 10f8 <_strlen>
10c1: 50 pushl %eax
10c2: 53 pushl %ebx
10c3: 6a 01 pushl $0x1
10c5: e8 a2 00 00 00 call 116c <_write>
10ca: 8d 65 fc leal -4(%ebp),%esp
10cd: 5b popl %ebx
10ce: c9 leave
10cf: c3 ret
...
000010f8 <_strlen>:
...
0000116c <_write>:
...
The linker combined corresponding segments from each input file, so there is one combined text
segment, one combined data segment and one bss segment (zero-initialized data, which the two input
files didn't use). Each segment is padded out to a 4K boundary to match the x86 page size, so the text
segment is 4K (minus a 20 byte a.out header present in the file but not logically part of the segment), the
data and bss segments are also each 4K.
The combined text segment contains the text of library startup code called start-c, then text from
m.o relocated to 10a4, a.o relocated to 10b4, and routines linked from the C library, relocated to higher
addresses in the text segment. The data segment, not displayed here, contains the combined data
segments in the same order as the text segments. Since the code for _main has been relocated to
address 10a4 hex, that address is patched into the call instruction in start-c. Within the main routine, the
reference to the string is relocated to 2024 hex, the string's final location in the data segment, and the
call is patched to 10b4, the final address of _a. Within _a, the calls to _strlen and _write are patched to
the final addresses for those two routines.
The executable also contains about a dozen other routines from the C library, not displayed here,
that are called directly or indirectly from the startup code or from _write (error routines, in the latter
case.) The executable contains no relocation data, since this file format is not relinkable and the
operating system loads it at a known fixed address. It contains a symbol table for the benefit of a
debugger, although the executable doesn't use the symbols and the symbol table can be stripped off to
save space.
Exercises
What is the advantage of separating a linker and loader into separate programs? Under what
circumstances would a combined linking loader be useful?
Nearly every programming system produced in the past 50 years includes a linker. Why?
In this chapter we've discussed linking and loading assembled or compiled machine code. Would a linker
or loader be useful in a purely interpretive system that directly interprets source language code? How
about in a interpretive system that turns the source into an intermediate representation like P-code or
the Java Virtual Machine?
UNIT SIX: MACROS
Macros
A macro is a fragment of code which has been given a name. Whenever the name is used, it is
replaced by the contents of the macro. There are two kinds of macros. They differ mostly in what they
look like when they are used. Object-like macros resemble data objects when used, function-like macros
resemble function calls.
You may define any valid identifier as a macro, even if it is a C keyword. The preprocessor does
not know anything about keywords. This can be useful if you wish to hide a keyword such as const from
an older compiler that does not understand it. However, the preprocessor operator defined can never
be defined as a macro, and C++'s named operators cannot be macros when you are compiling C++.
Object-like Macros
An object-like macro is a simple identifier which will be replaced by a code fragment. It is called
object-like because it looks like a data object in code that uses it. They are most commonly used to give
symbolic names to numeric constants.
You create macros with the ‘#define’ directive. ‘#define’ is followed by the name of the macro and
then the token sequence it should be an abbreviation for, which is variously referred to as the macro's
body, expansion or replacement list. For example,
#define BUFFER_SIZE 1024
defines a macro named BUFFER_SIZE as an abbreviation for the token 1024. If somewhere after this
‘#define’ directive there comes a C statement of the form
foo = (char *) malloc (BUFFER_SIZE);
then the C preprocessor will recognize and expand the macro BUFFER_SIZE. The C compiler will see the
same tokens as it would if you had written
foo = (char *) malloc (1024);
By convention, macro names are written in uppercase. Programs are easier to read when it is possible to
tell at a glance which names are macros.
The macro's body ends at the end of the ‘#define’ line. You may continue the definition onto multiple
lines, if necessary, using backslash-newline. When the macro is expanded, however, it will all come out
on one line. For example,
#define NUMBERS 1, \
2, \
3
int x[] = { NUMBERS };
==> int x[] = { 1, 2, 3 };
The most common visible consequence of this is surprising line numbers in error messages.
There is no restriction on what can go in a macro body provided it decomposes into valid preprocessing
tokens. Parentheses need not balance, and the body need not resemble valid C code. (If it does not, you
may get error messages from the C compiler when you use the macro.)
The C preprocessor scans your program sequentially. Macro definitions take effect at the place you write
them. Therefore, the following input to the C preprocessor
foo = X;
#define X 4
bar = X;
produces
foo = X;
bar = 4;
When the preprocessor expands a macro name, the macro's expansion replaces the macro invocation,
then the expansion is examined for more macros to expand. For example,
#define TABLESIZE BUFSIZE
#define BUFSIZE 1024
TABLESIZE
==> BUFSIZE
==> 1024
TABLESIZE is expanded first to produce BUFSIZE, then that macro is expanded to produce the final result,
1024.
Notice that BUFSIZE was not defined when TABLESIZE was defined. The ‘#define’ for TABLESIZE uses
exactly the expansion you specify—in this case, BUFSIZE—and does not check to see whether it too
contains macro names. Only when you use TABLESIZE is the result of its expansion scanned for more
macro names.
This makes a difference if you change the definition of BUFSIZE at some point in the source file.
TABLESIZE, defined as shown, will always expand using the definition of BUFSIZE that is currently in
effect:
#define BUFSIZE 1020
#define TABLESIZE BUFSIZE
#undef BUFSIZE
#define BUFSIZE 37
Function-like Macros
You can also define macros whose use looks like a function call. These are called function-like
macros. To define a function-like macro, you use the same ‘#define’ directive, but you put a pair of
parentheses immediately after the macro name. For example,
#define lang_init()
lang_init()
==> c_init()
c_init()
A function-like macro is only expanded if its name appears with a pair of parentheses after it. If you
write just the name, it is left alone. This can be useful when you have a function and a macro of the
same name, and you wish to use the function sometimes.
extern void foo(void);
#define foo() /* optimized inline version */
...
foo();
funcptr = foo;
Here the call to foo() will use the macro, but the function pointer will get the address of the real
function. If the macro were to be expanded, it would cause a syntax error.
If you put spaces between the macro name and the parentheses in the macro definition, that does not
define a function-like macro, it defines an object-like macro whose expansion happens to begin with a
pair of parentheses.
#define lang_init ()
lang_init()
==> () c_init()()
c_init()
The first two pairs of parentheses in this expansion come from the macro. The third is the pair that was
originally after the macro invocation. Since lang_init is an object-like macro, it does not consume those
parentheses.
Macro Arguments
Function-like macros can take arguments, just like true functions. To define a macro that uses
arguments, you insert parameters between the pair of parentheses in the macro definition that make
the macro function-like. The parameters must be valid C identifiers, separated by commas and
optionally whitespace.
To invoke a macro that takes arguments, you write the name of the macro followed by a list of actual
arguments in parentheses, separated by commas. The invocation of the macro need not be restricted to
a single logical line—it can cross as many lines in the source file as you wish. The number of arguments
you give must match the number of parameters in the macro definition. When the macro is expanded,
each use of a parameter in its body is replaced by the tokens of the corresponding argument. (You need
not use all of the parameters in the macro body.)
As an example, here is a macro that computes the minimum of two numeric values, as it is defined in
many C programs, and some uses.
#define min(X, Y) ((X) < (Y) ? (X) : (Y))
x = min(a, b);
==> x = ((a) < (b) ? (a) : (b));
y = min(1, 2);
==> y = ((1) < (2) ? (1) : (2));
z = min(a + 28, *p);
==> z = ((a + 28) < (*p) ? (a + 28) : (*p));
(In this small example you can already see several of the dangers of macro arguments.)
Leading and trailing whitespace in each argument is dropped, and all whitespace between the tokens of
an argument is reduced to a single space. Parentheses within each argument must balance; a comma
within such parentheses does not end the argument. However, there is no requirement for square
brackets or braces to balance, and they do not prevent a comma from separating arguments. Thus,
macro (array[x = y, x + 1])
passes two arguments to macro: array[x = y and x + 1]. If you want to supply array[x = y, x +
1] as an argument, you can write it as array[(x = y, x + 1)], which is equivalent C code.
All arguments to a macro are completely macro-expanded before they are substituted into the macro
body. After substitution, the complete text is scanned again for macros to expand, including the
arguments. This rule may seem strange, but it is carefully designed so you need not worry about
whether any function call is actually a macro invocation. You can run into trouble if you try to be too
clever, though.
For example, min (min (a, b), c) is first expanded to
min (((a) < (b) ? (a) : (b)), (c))
and then to
((((a) < (b) ? (a) : (b))) < (c)
? (((a) < (b) ? (a) : (b)))
: (c))
(Line breaks shown here for clarity would not actually be generated.)
You can leave macro arguments empty; this is not an error to the preprocessor (but many macros will
then expand to invalid code). You cannot leave out arguments entirely; if a macro takes two arguments,
there must be exactly one comma at the top level of its argument list. Here are some silly examples
using min:
min(, b)
min(a, )
min(,)
min((,),)
min()
min(,,)
==>
==>
==>
==>
((
)
((a )
((
)
(((,))
<
<
<
<
(b)
( )
( )
( )
?
?
?
?
(
)
(a )
(
)
((,))
:
:
:
:
(b))
( ))
( ))
( ))
error--> macro "min" requires 2 arguments, but only 1 given
error--> macro "min" passed 3 arguments, but takes just 2
Whitespace is not a preprocessing token, so if a macro foo takes one argument, foo () and foo ( ) both
supply it an empty argument. Previous GNU preprocessor implementations and documentation were
incorrect on this point, insisting that a function-like macro that takes a single argument be passed a
space if an empty argument was required.
Macro parameters appearing inside string literals are not replaced by their corresponding actual
arguments.
#define foo(x) x, "x"
foo(bar)
==> bar, "x"
Stringification
Sometimes you may want to convert a macro argument into a string constant. Parameters are
not replaced inside string constants, but you can use the ‘#’ preprocessing operator instead. When a
macro parameter is used with a leading ‘#’, the preprocessor replaces it with the literal text of the actual
argument, converted to a string constant. Unlike normal parameter replacement, the argument is not
macro-expanded first. This is called stringification.
There is no way to combine an argument with surrounding text and stringify it all together. Instead, you
can write a series of adjacent string constants and stringified arguments. The preprocessor will replace
the stringified arguments with string constants. The C compiler will then combine all the adjacent string
constants into one long string.
Here is an example of a macro definition that uses stringification:
#define WARN_IF(EXP) \
do { if (EXP) \
fprintf (stderr, "Warning: " #EXP "\n"); } \
while (0)
WARN_IF (x == 0);
==> do { if (x == 0)
fprintf (stderr, "Warning: " "x == 0" "\n"); } while (0);
The argument for EXP is substituted once, as-is, into the if statement, and once, stringified, into the
argument to fprintf. If x were a macro, it would be expanded in the if statement, but not in the string.
The do and while (0) are a kludge to make it possible to write WARN_IF (arg);, which the resemblance of
WARN_IF to a function would make C programmers want to do;
Stringification in C involves more than putting double-quote characters around the fragment. The
preprocessor backslash-escapes the quotes surrounding embedded string constants, and all backslashes
within string and character constants, in order to get a valid C string constant with the proper contents.
Thus, stringifying p = "foo\n"; results in "p = \"foo\\n\";". However, backslashes that are not inside string
or character constants are not duplicated: ‘\n’ by itself stringifies to "\n".
All leading and trailing whitespace in text being stringified is ignored. Any sequence of whitespace in the
middle of the text is converted to a single space in the stringified result. Comments are replaced by
whitespace long before stringification happens, so they never appear in stringified text.
There is no way to convert a macro argument into a character constant.
If you want to stringify the result of expansion of a macro argument, you have to use two levels of
macros.
#define xstr(s) str(s)
#define str(s) #s
#define foo 4
str (foo)
==> "foo"
xstr (foo)
==> xstr (4)
==> str (4)
==> "4"
s is stringified when it is used in str, so it is not macro-expanded first. But s is an ordinary argument to
xstr, so it is completely macro-expanded before xstr itself is expanded. Therefore, by the time str gets to
its argument, it has already been macro-expanded.
Concatenation
It is often useful to merge two tokens into one while expanding macros. This is called token
pasting or token concatenation. The ‘##’ preprocessing operator performs token pasting. When a macro
is expanded, the two tokens on either side of each ‘##’ operator are combined into a single token, which
then replaces the ‘##’ and the two original tokens in the macro expansion. Usually both will be
identifiers, or one will be an identifier and the other a preprocessing number. When pasted, they make a
longer identifier. This isn't the only valid case. It is also possible to concatenate two numbers (or a
number and a name, such as 1.5 and e3) into a number. Also, multi-character operators such as += can
be formed by token pasting.
However, two tokens that don't together form a valid token cannot be pasted together. For
example, you cannot concatenate x with + in either order. If you try, the preprocessor issues a warning
and emits the two tokens. Whether it puts white space between the tokens is undefined. It is common
to find unnecessary uses of ‘##’ in complex macros. If you get this warning, it is likely that you can simply
remove the ‘##’.
Both the tokens combined by ‘##’ could come from the macro body, but you could just as well
write them as one token in the first place. Token pasting is most useful when one or both of the tokens
comes from a macro argument. If either of the tokens next to an ‘##’ is a parameter name, it is replaced
by its actual argument before ‘##’ executes. As with stringification, the actual argument is not macroexpanded first. If the argument is empty, that ‘##’ has no effect.
Keep in mind that the C preprocessor converts comments to whitespace before macros are even
considered. Therefore, you cannot create a comment by concatenating ‘/’ and ‘*’. You can put as much
whitespace between ‘##’ and its operands as you like, including comments, and you can put comments
in arguments that will be concatenated. However, it is an error if ‘##’ appears at either end of a macro
body.
Consider a C program that interprets named commands. There probably needs to be a table of
commands, perhaps an array of structures declared as follows:
struct command
{
char *name;
void (*function) (void);
};
struct command commands[] =
{
{ "quit", quit_command },
{ "help", help_command },
...
};
It would be cleaner not to have to give each command name twice, once in the string constant and once
in the function name. A macro which takes the name of a command as an argument can make this
unnecessary. The string constant can be created with stringification, and the function name by
concatenating the argument with ‘_command’. Here is how it is done:
#define COMMAND(NAME)
{ #NAME, NAME ## _command }
struct command commands[] =
{
COMMAND (quit),
COMMAND (help),
...
};
Variadic Macros
A macro can be declared to accept a variable number of arguments much as a function can. The
syntax for defining the macro is similar to that of a function. Here is an example:
#define eprintf(...) fprintf (stderr, __VA_ARGS__)
This kind of macro is called variadic. When the macro is invoked, all the tokens in its argument list after
the last named argument (this macro has none), including any commas, become the variable argument.
This sequence of tokens replaces the identifier __VA_ARGS__ in the macro body wherever it appears.
Thus, we have this expansion:
eprintf ("%s:%d: ", input_file, lineno)
==> fprintf (stderr, "%s:%d: ", input_file, lineno)
The variable argument is completely macro-expanded before it is inserted into the macro expansion, just
like an ordinary argument. You may use the ‘#’ and ‘##’ operators to stringify the variable argument or
to paste its leading or trailing token with another token. (But see below for an important special case for
‘##’.)
If your macro is complicated, you may want a more descriptive name for the variable argument than
__VA_ARGS__.
CPP permits this, as an extension. You may write an argument name immediately before
the ‘...’; that name is used for the variable argument. The eprintf macro above could be written
#define eprintf(args...) fprintf (stderr, args)
using this extension. You cannot use __VA_ARGS__ and this extension in the same macro.
You can have named arguments as well as variable arguments in a variadic macro. We could define
eprintf like this, instead:
#define eprintf(format, ...) fprintf (stderr, format, __VA_ARGS__)
This formulation looks more descriptive, but unfortunately it is less flexible: you must now supply at
least one argument after the format string. In standard C, you cannot omit the comma separating the
named argument from the variable arguments. Furthermore, if you leave the variable argument empty,
you will get a syntax error, because there will be an extra comma after the format string.
eprintf("success!\n", );
==> fprintf(stderr, "success!\n", );
GNU CPP has a pair of extensions which deal with this problem. First, you are allowed to leave the
variable argument out entirely:
eprintf ("success!\n")
==> fprintf(stderr, "success!\n", );
Second, the ‘##’ token paste operator has a special meaning when placed between a comma and a
variable argument. If you write
#define eprintf(format, ...) fprintf (stderr, format, ##__VA_ARGS__)
and the variable argument is left out when the eprintf macro is used, then the comma before the ‘##’ will
be deleted. This does not happen if you pass an empty argument, nor does it happen if the token
preceding ‘##’ is anything other than a comma.
eprintf ("success!\n")
==> fprintf(stderr, "success!\n");
The above explanation is ambiguous about the case where the only macro parameter is a variable
arguments parameter, as it is meaningless to try to distinguish whether no argument at all is an empty
argument or a missing argument. In this case the C99 standard is clear that the comma must remain,
however the existing GCC extension used to swallow the comma. So CPP retains the comma when
conforming to a specific C standard, and drops it otherwise. C99 mandates that the only place the
identifier __VA_ARGS__ can appear is in the replacement list of a variadic macro. It may not be used as a
macro name, macro argument name, or within a different type of macro. It may also be forbidden in
open text; the standard is ambiguous. We recommend you avoid using it except for its defined purpose.
Variadic macros are a new feature in C99. GNU CPP has supported them for a long time, but only with a
named variable argument (‘args...’, not ‘...’ and __VA_ARGS__). If you are concerned with
portability to previous versions of GCC, you should use only named variable arguments. On the other
hand, if you are concerned with portability to other conforming implementations of C99, you should use
only __VA_ARGS__.
Previous versions of CPP implemented the comma-deletion extension much more generally. We
have restricted it in this release to minimize the differences from C99. To get the same effect with both
this and previous versions of GCC, the token preceding the special ‘##’ must be a comma, and there
must be white space between that comma and whatever comes immediately before it:
#define eprintf(format, args...) fprintf (stderr, format , ##args)
Standard Predefined Macros
The standard predefined macros are specified by the relevant language standards, so they are
available with all compilers that implement those standards. Older compilers may not provide all of
them. Their names all start with double underscores.
__FILE__
This macro expands to the name of the current input file, in the form of a C string constant. This is the
path by which the preprocessor opened the file, not the short name specified in ‘#include’ or as the
input file name argument. For example, "/usr/local/include/myheader.h" is a possible
expansion of this macro.
__LINE__
This macro expands to the current input line number, in the form of a decimal integer constant. While we
call it a predefined macro, it's a pretty strange macro, since its “definition” changes with each new line of
source code.
__FILE__ and __LINE__ are useful in generating an error message to report an inconsistency detected by the
program; the message can state the source line at which the inconsistency was detected. For example,
fprintf (stderr, "Internal error: "
"negative string length "
"%d at %s, line %d.",
length, __FILE__, __LINE__);
An ‘#include’ directive changes the expansions of __FILE__ and __LINE__ to correspond to the
included file. At the end of that file, when processing resumes on the input file that contained the
‘#include’ directive, the expansions of __FILE__ and __LINE__ revert to the values they had before
the ‘#include’ (but __LINE__ is then incremented by one as processing moves to the line after the
‘#include’).
A ‘#line’ directive changes __LINE__, and may change __FILE__ as well.
C99 introduces __func__, and GCC has provided __FUNCTION__ for a long time. Both of these are
strings containing the name of the current function (there are slight semantic differences; see the GCC
manual). Neither of them is a macro; the preprocessor does not know the name of the current function.
They tend to be useful in conjunction with __FILE__ and __LINE__, though.
__DATE__
This macro expands to a string constant that describes the date on which the preprocessor is being run.
The string constant contains eleven characters and looks like "Feb 12 1996". If the day of the month
is less than 10, it is padded with a space on the left.
If GCC cannot determine the current date, it will emit a warning message (once per compilation) and
__DATE__ will expand to "??? ?? ????".
__TIME__
This macro expands to a string constant that describes the time at which the preprocessor is being run.
The string constant contains eight characters and looks like "23:59:01".
If GCC cannot determine the current time, it will emit a warning message (once per compilation) and
__TIME__ will expand to "??:??:??".
__STDC__
In normal operation, this macro expands to the constant 1, to signify that this compiler conforms to ISO
Standard C. If GNU CPP is used with a compiler other than GCC, this is not necessarily true; however, the
preprocessor always conforms to the standard unless the -traditional-cpp option is used.
This macro is not defined if the -traditional-cpp option is used.
On some hosts, the system compiler uses a different convention, where __STDC__ is normally 0, but is 1
if the user specifies strict conformance to the C Standard. CPP follows the host convention when
processing system header files, but when processing user files __STDC__ is always 1. This has been
reported to cause problems; for instance, some versions of Solaris provide X Windows headers that expect
__STDC__ to be either undefined or 1.
__STDC_VERSION__
This macro expands to the C Standard's version number, a long integer constant of the form yyyymmL
where yyyy and mm are the year and month of the Standard version. This signifies which version of the C
Standard the compiler conforms to. Like __STDC__, this is not necessarily accurate for the entire
implementation, unless GNU CPP is being used with GCC.
The value 199409L signifies the 1989 C standard as amended in 1994, which is the current default; the
value 199901L signifies the 1999 revision of the C standard. Support for the 1999 revision is not yet
complete.
This macro is not defined if the -traditional-cpp option is used, nor when compiling C++ or
Objective-C.
__STDC_HOSTED__
This macro is defined, with value 1, if the compiler's target is a hosted environment. A hosted
environment has the complete facilities of the standard C library available.
__cplusplus
This macro is defined when the C++ compiler is in use. You can use __cplusplus to test whether a
header is compiled by a C compiler or a C++ compiler. This macro is similar to __STDC_VERSION__, in
that it expands to a version number. A fully conforming implementation of the 1998 C++ standard will
define this macro to 199711L. The GNU C++ compiler is not yet fully conforming, so it uses 1 instead. It
is hoped to complete the implementation of standard C++ in the near future.
__OBJC__
This macro is defined, with value 1, when the Objective-C compiler is in use. You can use __OBJC__ to
test whether a header is compiled by a C compiler or an Objective-C compiler.
__ASSEMBLER__
This macro is defined with value 1 when preprocessing assembly language.
Undefining and Redefining Macros
If a macro ceases to be useful, it may be undefined with the ‘#undef’ directive. ‘#undef’ takes a
single argument, the name of the macro to undefine. You use the bare macro name, even if the macro is
function-like. It is an error if anything appears on the line after the macro name. ‘#undef’ has no effect if
the name is not a macro.
#define FOO 4
x = FOO;
#undef FOO
x = FOO;
==> x = 4;
==> x = FOO;
Once a macro has been undefined, that identifier may be redefined as a macro by a subsequent ‘#define’
directive. The new definition need not have any resemblance to the old definition.
However, if an identifier which is currently a macro is redefined, then the new definition must be
effectively the same as the old one. Two macro definitions are effectively the same if:

Both are the same type of macro (object- or function-like).

All the tokens of the replacement list are the same.

If there are any parameters, they are the same.

Whitespace appears in the same places in both. It need not be exactly the same amount of
whitespace, though. Remember that comments count as whitespace.
These definitions are effectively the same:
#define FOUR (2 + 2)
#define FOUR
(2
+
#define FOUR (2 /* two */ + 2)
2)
but these are not:
#define
#define
#define
#define
FOUR (2 + 2)
FOUR ( 2+2 )
FOUR (2 * 2)
FOUR(score,and,seven,years,ago) (2 + 2)
If a macro is redefined with a definition that is not effectively the same as the old one, the preprocessor
issues a warning and changes the macro to use the new definition. If the new definition is effectively the
same, the redefinition is silently ignored. This allows, for instance, two different headers to define a
common macro. The preprocessor will only complain if the definitions do not match.
Directives Within Macro Arguments
Occasionally it is convenient to use preprocessor directives within the arguments of a macro. The
C and C++ standards declare that behavior in these cases is undefined.
Versions of CPP prior to 3.2 would reject such constructs with an error message. This was the
only syntactic difference between normal functions and function-like macros, so it seemed attractive to
remove this limitation, and people would often be surprised that they could not use macros in this way.
Moreover, sometimes people would use conditional compilation in the argument list to a normal library
function like ‘printf’, only to find that after a library upgrade ‘printf’ had changed to be a function-like
macro, and their code would no longer compile. So from version 3.2 we changed CPP to successfully
process arbitrary directives within macro arguments in exactly the same way as it would have processed
the directive were the function-like macro invocation not present.
If, within a macro invocation, that macro is redefined, then the new definition takes effect in time for
argument pre-expansion, but the original definition is still used for argument replacement. Here is a
pathological example:
#define f(x) x x
f (1
#undef f
#define f 2
f)
which expands to
1 2 1 2
with the semantics described above.
Macro Pitfalls
In this section we describe some special rules that apply to macros and macro expansion, and point
out certain cases in which the rules have counter-intuitive consequences that you must watch out for.

Misnesting

Operator Precedence Problems

Swallowing the Semicolon

Duplication of Side Effects

Self-Referential Macros

Argument Prescan

Newlines in Arguments
UNIT SEVEN: COMPILATION PROCESS
Compilation process
A compiler is a computer program (or set of programs) that transforms source code written in a
programming language (the source language) into another computer language (the target language,
often having a binary form known as object code). The most common reason for wanting to transform
source code is to create an executable program.
The name "compiler" is primarily used for programs that translate source code from a high-level
programming language to a lower level language (e.g., assembly language or machine code). If the
compiled program can run on a computer whose CPU or operating system is different from the one on
which the compiler runs, the compiler is known as a cross-compiler. A program that translates from a
low level language to a higher level one is a decompiler. A program that translates between high-level
languages is usually called a language translator, source to source translator, or language converter. A
language rewriter is usually a program that translates the form of expressions without a change of
language.
A compiler is likely to perform many or all of the following operations: lexical analysis,
preprocessing, parsing, semantic analysis (Syntax-directed translation), code generation, and code
optimization.
Program faults caused by incorrect compiler behavior can be very difficult to track down and work
around; therefore, compiler implementors invest a lot of time ensuring the correctness of their
software.
The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used
to help create the lexer and parser.
History
Software for early computers was primarily written in assembly language for many years. Higher
level programming languages were not invented until the benefits of being able to reuse software on
different kinds of CPUs started to become significantly greater than the cost of writing a compiler. The
very limited memory capacity of early computers also created many technical problems when
implementing a compiler.
Towards the end of the 1950s, machine-independent programming languages were first
proposed. Subsequently, several experimental compilers were developed. The first compiler was written
by Grace Hopper, in 1952, for the A-0 programming language. The FORTRAN team led by John Backus at
IBM is generally credited as having introduced the first complete compiler in 1957. COBOL was an early
language to be compiled on multiple architectures, in 1960.
In many application domains the idea of using a higher level language quickly caught on. Because
of the expanding functionality supported by newer programming languages and the increasing
complexity of computer architectures, compilers have become more and more complex.
Early compilers were written in assembly language. The first self-hosting compiler — capable of
compiling its own source code in a high-level language — was created for Lisp by Tim Hart and Mike
Levin at MIT in 1962. Since the 1970s it has become common practice to implement a compiler in the
language it compiles, although both Pascal and C have been popular choices for implementation
language. Building a self-hosting compiler is a bootstrapping problem—the first such compiler for a
language must be compiled either by a compiler written in a different language, or (as in Hart and
Levin's Lisp compiler) compiled by running the compiler in an interpreter.
Compilers in education
Compiler construction and compiler optimization are taught at universities and schools as part of the
computer science curriculum. Such courses are usually supplemented with the implementation of a
compiler for an educational programming language. A well-documented example is Niklaus Wirth's PL/0
compiler, which Wirth used to teach compiler construction in the 1970s. In spite of its simplicity, the
PL/0 compiler introduced several influential concepts to the field:
1. Program development by stepwise refinement (also the title of a 1971 paper by Wirth)
2. The use of a recursive descent parser
3. The use of EBNF to specify the syntax of a language
4. A code generator producing portable P-code
5. The use of T-diagrams in the formal description of the bootstrapping problem
Compilation
Compilers enabled the development of programs that are machine-independent. Before the
development of FORTRAN (FORmula TRANslator), the first higher-level language, in the 1950s, machinedependent assembly language was widely used. While assembly language produces more reusable and
relocatable programs than machine code on the same architecture, it has to be modified or rewritten if
the program is to be executed on different hardware architecture.
With the advance of high-level programming languages soon followed after FORTRAN, such as COBOL, C,
BASIC, programmers can write machine-independent source programs. A compiler translates the highlevel source programs into target programs in machine languages for the specific hardwares. Once the
target program is generated, the user can execute the program.
The structure of a compiler
Compilers bridge source programs in high-level languages with the underlying hardware. A
compiler requires
1) determining the correctness of the syntax of programs,
2) generating correct and efficient object code,
3) run-time organization, and
4) formatting output according to assembler and/or linker conventions.
Components of a Compiler
A compiler consists of three main parts: the frontend, the middle-end, and the backend.
The front end checks whether the program is correctly written in terms of the programming language
syntax and semantics. Here legal and illegal programs are recognized. Errors are reported, if any, in a
useful way. Type checking is also performed by collecting type information. The frontend then generates
an intermediate representation or IR of the source code for processing by the middle-end.
The middle end is where optimization takes place. Typical transformations for optimization are removal
of useless or unreachable code, discovery and propagation of constant values, relocation of computation
to a less frequently executed place (e.g., out of a loop), or specialization of computation based on the
context. The middle-end generates another IR for the following backend. Most optimization efforts are
focused on this part.
The back end is responsible for translating the IR from the middle-end into assembly code. The
target instruction(s) are chosen for each IR instruction. Register allocation assigns processor registers for
the program variables where possible. The backend utilizes the hardware by figuring out how to keep
parallel execution units busy, filling delay slots, and so on. Although most algorithms for optimization are
in NP, heuristic techniques are well-developed.
Compiler output
One classification of compilers is by the platform on which their generated code executes. This is
known as the target platform.
A native or hosted compiler is one which output is intended to directly run on the same type of
computer and operating system that the compiler itself runs on. The output of a cross compiler is
designed to run on a different platform. Cross compilers are often used when developing software for
embedded systems that are not intended to support a software development environment.
The output of a compiler that produces code for a virtual machine (VM) may or may not be
executed on the same platform as the compiler that produced it. For this reason such compilers are not
usually classified as native or cross compilers.
The lower level language that is target of a compiler may itself be a high-level programming language. C,
often viewed as some sort of portable assembler, can also be the target language of a compiler. E.g.: C
front, the original compiler for C++ used C as target language.
Compiled versus interpreted languages
Higher-level programming languages are generally divided for convenience into compiled
languages and interpreted languages. However, in practice there is rarely anything about a language
that requires it to be exclusively compiled or exclusively interpreted, although it is possible to design
languages that rely on re-interpretation at run time. The categorization usually reflects the most popular
or widespread implementations of a language — for instance, BASIC is sometimes called an interpreted
language, and C a compiled one, despite the existence of BASIC compilers and C interpreters.
Modern trends toward just-in-time compilation and bytecode interpretation at times blur the traditional
categorizations of compilers and interpreters.
Some language specifications spell out that implementations must include a compilation facility;
for example, Common Lisp. However, there is nothing inherent in the definition of Common Lisp that
stops it from being interpreted. Other languages have features that are very easy to implement in an
interpreter, but make writing a compiler much harder; for example, APL, SNOBOL4, and many scripting
languages allow programs to construct arbitrary source code at runtime with regular string operations,
and then execute that code by passing it to a special evaluation function. To implement these features in
a compiled language, programs must usually be shipped with a runtime library that includes a version of
the compiler itself.
Hardware compilation
The output of some compilers may target hardware at a very low level, for example a Field
Programmable Gate Array (FPGA) or structured Application-specific integrated circuit (ASIC). Such
compilers are said to be hardware compilers or synthesis tools because the source code they compile
effectively controls the final configuration of the hardware and how it operates; the output of the
compilation are not instructions that are executed in sequence - only an interconnection of transistors
or lookup tables. For example, XST is the Xilinx Synthesis Tool used for configuring FPGAs. Similar tools
are available from Altera, Synplicity, Synopsys and other vendors.
Compiler construction
In the early days, the approach taken to compiler design used to be directly affected by the
complexity of the processing, the experience of the person(s) designing it, and the resources available.
A compiler for a relatively simple language written by one person might be a single, monolithic piece of
software. When the source language is large and complex, and high quality output is required, the
design may be split into a number of relatively independent phases. Having separate phases means
development can be parceled up into small parts and given to different people. It also becomes much
easier to replace a single phase by an improved one, or to insert new phases later (e.g., additional
optimizations).
The division of the compilation processes into phases was championed by the Production Quality
Compiler-Compiler Project (PQCC) at Carnegie Mellon University. This project introduced the terms front
end, middle end, and back end.
All but the smallest of compilers have more than two phases. However, these phases are usually
regarded as being part of the front end or the back end. The point at which these two ends meet is open
to debate. The front end is generally considered to be where syntactic and semantic processing takes
place, along with translation to a lower level of representation (than source code).
The middle end is usually designed to perform optimizations on a form other than the source code or
machine code. This source code/machine code independence is intended to enable generic
optimizations to be shared between versions of the compiler supporting different languages and target
processors.
The back end takes the output from the middle. It may perform more analysis, transformations
and optimizations that are for a particular computer. Then, it generates code for a particular processor
and OS. This front-end/middle/back-end approach makes it possible to combine front ends for different
languages with back ends for different CPUs. Practical examples of this approach are the GNU Compiler
Collection, LLVM, and the Amsterdam Compiler Kit, which have multiple front-ends, shared analysis and
multiple back-ends.
One-pass versus multi-pass compilers
Classifying compilers by number of passes has its background in the hardware resource
limitations of computers. Compiling involves performing lots of work and early computers did not have
enough memory to contain one program that did all of this work. So compilers were split up into smaller
programs which each made a pass over the source (or some representation of it) performing some of
the required analysis and translations.
The ability to compile in a single pass has classically been seen as a benefit because it simplifies
the job of writing a compiler and one-pass compilers generally perform compilations faster than multipass compilers. Thus, partly driven by the resource limitations of early systems, many early languages
were specifically designed so that they could be compiled in a single pass (e.g., Pascal).
In some cases the design of a language feature may require a compiler to perform more than one
pass over the source. For instance, consider a declaration appearing on line 20 of the source which
affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather
information about declarations appearing after statements that they affect, with the actual translation
happening during a subsequent pass.
The disadvantage of compiling in a single pass is that it is not possible to perform many of the
sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly
how many passes an optimizing compiler makes. For instance, different phases of optimization may
analyse one expression many times but only analyse another expression once.
Splitting a compiler up into small programs is a technique used by researchers interested in
producing provably correct compilers. Proving the correctness of a set of small programs often requires
less effort than proving the correctness of a larger, single, equivalent program.
While the typical multi-pass compiler outputs machine code from its final pass, there are several other
types:

A "source-to-source compiler" is a type of compiler that takes a high level language as its input
and outputs a high level language. For example, an automatic parallelizing compiler will
frequently take in a high level language program as an input and then transform the code and
annotate it with parallel code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's
DOALL statements).

Stage compiler that compiles to assembly language of a theoretical machine, like some Prolog
implementations

o
This Prolog machine is also known as the Warren Abstract Machine (or WAM).
o
Bytecode compilers for Java, Python, and many more are also a subtype of this.
Just-in-time compiler, used by Smalltalk and Java systems, and also by Microsoft .NET's Common
Intermediate Language (CIL)
o
Applications are delivered in bytecode, which is compiled to native machine code just
prior to execution.
Front end
The front end analyzes the source code to build an internal representation of the program, called the
intermediate representation or IR. It also manages the symbol table, a data structure mapping each
symbol in the source code to associated information such as location, type and scope. This is done over
several phases, which includes some of the following:
1. Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within
identifiers require a phase before parsing, which converts the input character sequence to a
canonical form ready for the parser. The top-down, recursive-descent, table-driven parsers used
in the 1960s typically read the source one character at a time and did not require a separate
tokenizing phase. Atlas Autocode, and Imp (and some implementations of ALGOL and Coral 66)
are examples of stropped languages which compilers would have a Line Reconstruction phase.
2. Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single
atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax
is typically a regular language, so a finite state automaton constructed from a regular expression
can be used to recognize it. This phase is also called lexing or scanning, and the software doing
lexical analysis is called a lexical analyzer or scanner.
3. Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro
substitution and conditional compilation. Typically the preprocessing phase occurs before
syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens
rather than syntactic forms. However, some languages such as Scheme support macro
substitutions based on syntactic forms.
4. Syntax analysis involves parsing the token sequence to identify the syntactic structure of the
program. This phase typically builds a parse tree, which replaces the linear sequence of tokens
with a tree structure built according to the rules of a formal grammar which define the
language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases
in the compiler.
5. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree
and builds the symbol table. This phase performs semantic checks such as type checking
(checking for type errors), or object binding (associating variable and function references with
their definitions), or definite assignment (requiring all local variables to be initialized before use),
rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete
parse tree, meaning that this phase logically follows the parsing phase, and logically precedes the
code generation phase, though it is often possible to fold multiple phases into one pass over the
code in a compiler implementation.
Back end
The term back end is sometimes confused with code generator because of the overlapped
functionality of generating assembly code. Some literature uses middle end to distinguish the generic
analysis and optimization phases in the back end from the machine-dependent code generators.
The main phases of the back end include the following:
1. Analysis: This is the gathering of program information from the intermediate representation
derived from the input. Typical analyses are data flow analysis to build use-define chains,
dependence analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the
basis for any compiler optimization. The call graph and control flow graph are usually also built
during the analysis phase.
2. Optimization: the intermediate language representation is transformed into functionally
equivalent but faster (or smaller) forms. Popular optimizations are inline expansion, dead code
elimination, constant propagation, loop transformation, register allocation and even automatic
parallelization.
3. Code generation: the transformed intermediate language is translated into the output language,
usually the native machine language of the system. This involves resource and storage decisions,
such as deciding which variables to fit into registers and memory and the selection and
scheduling of appropriate machine instructions along with their associated addressing modes
(see also Sethi-Ullman algorithm).
Phases of Compilation
There are 6 phases a typical compiler will implement. The major reason why separate a compiler into
6 phases is probably simplicity. Compiler design is thus full of ``Divide and Conquer'' strategy,
component-level design, reusabililty and performance optimization.

Lexical Analysis

Syntax Analysis

Error Recovery

Scope Analysis

Type Analysis

Code Generation
The process of lexical analysis is called lexical analyzer, scanner, or tokenizer. Its purpose is to break a
sequence of characters into a subsequences called tokens. The syntax analysis phase, called parser,
reads tokens and validates them in accordance with a grammar. Vocabulary, i.e., a set of predefined
tokens, is composed of word symbols (reserved words), names (identifiers), numerals (constants), and
special symbols (operators). During compilation, a compiler will find errors such as lexical, syntax,
semantic, and logical errors. If a token is found not belonging to the vocabulary, it is an lexical error. A
grammar dictates the syntax of a language. If a sentence does not follow the syntax, it is called a syntax
error. Semantic errors is like assigning an integer to a double vaiable! Logical errors simply refer to the
program logic is not correct, even though it is syntactically and semantically correct!
Compiler analysis is the prerequisite for any compiler optimization, and they tightly work
together. For example, dependence analysis is crucial for loop transformation. In addition, the scope of
compiler analysis and optimizations vary greatly, from as small as a basic block to the
procedure/function level, or even over the whole program (interprocedural optimization). Obviously, a
compiler can potentially do a better job using a broader view. But that broad view is not free: large
scope analysis and optimizations are very costly in terms of compilation time and memory space; this is
especially true for interprocedural analysis and optimizations.
Interprocedural analysis and optimizations are common in modern commercial compilers from
HP, IBM, SGI, Intel, Microsoft, and Sun Microsystems. The open source GCC was criticized for a long time
for lacking powerful interprocedural optimizations, but it is changing in this respect. Another open
source compiler with full analysis and optimization infrastructure is Open64, which is used by many
organizations for research and commercial purposes.
Due to the extra time and space needed for compiler analysis and optimizations, some compilers skip
them by default. Users have to use compilation options to explicitly tell the compiler which
optimizations should be enabled.
Just-in-Time Compilation
In computing, just-in-time compilation (JIT), also known as dynamic translation, is a method to
improve the runtime performance of computer programs. Traditionally, computer programs had two
modes of runtime operation, either interpreted or static (ahead-of-time) compilation. Interpreted code
is translated from a high-level language to a machine code continuously during every execution,
whereas statically compiled code is translated into machine code before execution, and only requires
this translation once.
JIT compilers represent a hybrid approach, with translation occurring continuously, as with
interpreters, but with caching of translated code to minimize performance degradation. It also offers
other advantages over statically compiled code at development time, such as handling of late-bound
data types and the ability to enforce security guarantees.
JIT builds upon two earlier ideas in run-time environments: bytecode compilation and dynamic
compilation. It converts code at runtime prior to executing it natively, for example bytecode into native
machine code.
Several modern runtime environments, such as Microsoft's .NET Framework and most
implementations of Java, rely on JIT compilation for high-speed code execution.
JIT code generally offers far better performance than interpreters. In addition, it can in some cases offer
better performance than static compilation, as many optimizations are only feasible at run-time:
1. The compilation can be optimized to the targeted CPU and the operating system model where
the application runs. For example JIT can choose SSE2 CPU instructions when it detects that the
CPU supports them. To obtain this level of optimization specificity with a static compiler, one
must either compile a binary for each intended platform/architecture, or else include multiple
versions of portions of the code within a single binary.
2. The system is able to collect statistics about how the program is actually running in the
environment it is in, and it can rearrange and recompile for optimum performance. However,
some static compilers can also take profile information as input.
3. The system can do global code optimizations (e.g. inlining of library functions) without losing the
advantages of dynamic linking and without the overheads inherent to static compilers and
linkers. Specifically, when doing global inline substitutions, a static compilation process may need
run-time checks and ensure that a virtual call would occur if the actual class of the object
overrides the inlined method, and boundary condition checks on array accesses may need to be
processed within loops. With just-in-time compilation in many cases this processing can be
moved out of loops, often giving large increases of speed.
4. Although this is possible with statically compiled garbage collected languages, a bytecode system
can more easily rearrange executed code for better cache utilization.
Startup delay and optimizations
JIT typically causes a slight delay in initial execution of an application, due to the time taken to
load and compile the bytecode. Sometimes this delay is called "startup time delay". In general, the more
optimization JIT performs, the better the code it will generate, but the initial delay will also increase. A
JIT compiler therefore has to make a trade-off between the compilation time and the quality of the code
it hopes to generate. However, it seems that much of the startup time is sometimes due to IO-bound
operations rather than JIT compilation (for example, the rt.jar class data file for the Java Virtual Machine
is 40 MB and the JVM must seek a lot of data in this contextually huge file).
One possible optimization, used by Sun's HotSpot Java Virtual Machine, is to combine
interpretation and JIT compilation. The application code is initially interpreted, but the JVM monitors
which sequences of bytecode are frequently executed and translates them to machine code for direct
execution on the hardware. For bytecode which is executed only a few times, this saves the compilation
time and reduces the initial latency; for frequently executed bytecode, JIT compilation is used to run at
high speed, after an initial phase of slow interpretation. Additionally, since a program spends most time
executing a minority of its code, the reduced compilation time is significant. Finally, during the initial
code interpretation, execution statistics can be collected before compilation, which helps to perform
better optimization.
The correct tradeoff can vary due to circumstances. For example, Sun's Java Virtual Machine has
two major modes—client and server. In client mode, minimal compilation and optimization is
performed, to reduce startup time. In server mode, extensive compilation and optimization is
performed, to maximize performance once the application is running by sacrificing startup time. Other
Java just-in-time compilers have used a runtime measurement of the number of times a method has
executed combined with the bytecode size of a method as a heuristic to decide when to compile. Still
another uses the number of times executed combined with the detection of loops. In general, it is much
harder to accurately predict which methods to optimize in short-running applications than in longrunning ones.
Native Image Generator (Ngen) by Microsoft is another approach at reducing the initial delay.
Ngen pre-compiles (or "pre-jits") bytecode in a Common Intermediate Language image into machine
native code. As a result, no runtime compilation is needed. .NET framework 2.0 shipped with Visual
Studio 2005 runs Ngen on all of the Microsoft library DLLs right after the installation. Pre-jitting provides
a way to improve the startup time. However, the quality of code it generates might not be as good as
the one that is jitted, for the same reasons why code compiled statically, without profile-guided
optimization, cannot be as good as JIT compiled code in the extreme case: the lack of profiling data to
drive, for instance, inline caching.
There also exist Java implementations that combine an AOT (ahead-of-time) compiler with either
a JIT compiler (Excelsior JET) or interpreter (GNU Compiler for Java.)
Tombstone diagrams (or T-diagrams) consist of a set of “puzzle pieces” representing languages of
language processors and programs. They are used to illustrate and reason about transformations from a
source language A to a target language B realised in an implementation language I. They are most
commonly found describing complicated processes for bootstrapping, porting, and self-compiling of
compilers, interpreters, and macro-processors.
Tombstone diagram
Tombstone diagram representing an Ada compiler written in C that produces machine code.
Representation of the process of bootstrapping a C compiler written in C, by compiling it using another
compiler written in machine code.
Bootstrapping: In computer science, bootstrapping is the process of writing a compiler (or assembler) in
the target programming language which it is intended to compile. Applying this technique leads to a selfhosting compiler.
A large proportion of programming languages are bootstrapped, including BASIC, C, Pascal, Factor,
Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme, Python and more.
Advantages
Bootstrapping a compiler has the following advantages:

it is a non-trivial test of the language being compiled;

compiler developers only need to know the language being compiled;

compiler development can be done in the higher level language being compiled;

improvements to the compiler's back-end improve not only general purpose programs but also
the compiler itself; and

it is a comprehensive consistency check as it should be able to reproduce its own object code.
Porting :In computer science, porting is the process of adapting software so that an executable program
can be created for a computing environment that is different from the one for which it was originally
designed (e.g. different CPU, operating system, or third party library). The term is also used for when
software/hardware is changed to make them usable in different environments.

Software is portable when the cost of porting it to a new platform is less than the cost of writing
it from scratch. The lower the cost of porting software, relative to its implementation cost, the
more portable it is said to be.
Etymology
The term 'port' is derived from the Latin portare, meaning 'to carry'.
When code is not
compatible with a particular OS or architecture, the code must be 'carried' to the new system.
The term is not generally applied to the process of adapting software to run with less memory on the
same CPU and operating system, nor is it applied to the rewriting of source code in a different language
(i.e. language conversion or translation).
Software developers often claim that the software they write is portable, meaning that little
effort is needed to adapt it to a new environment. The amount of effort actually needed depends on
several factors, including the extent to which the original environment (the source platform) differs from
the new environment (the target platform), the experience of the original authors in knowing which
programming language constructs and third party library calls are unlikely to be portable, and the
amount of effort invested by the original authors in only using portable constructs (platform specific
constructs often provide a cheaper solution).
APPENDIX I
List of DOS Commands
ACALC
External - PC DOS 7
Calculates the value of a mathematical expression.
ACALC [/T[:]format] expression
/T
format
expression
Specifies the output format type
D=Decimal (default)
B=Binary
O=Octal
X=heXadecimal
A=All (decimal, binary, octal, and hexadecimal)
Specifies a valid numeric expression.
Numbers prefixed with 'b', 'o', and 'x' are assumed to be binary, octal,
and hexadecimal respectively. Decimal numbers are not prefixed.
APPEND
External - DOS 3.3 and above
Allows programs to open data files in specified directories as if they were in the current directory.
APPEND [[drive:]path[;...]] [/X[:ON | :OFF]] [/PATH:ON | /PATH:OFF] [/E]
APPEND ;
[drive:]path Specifies a drive and directory to append.
/X:ON
Applies appended directories to file searches and
application execution.
/X:OFF
Applies appended directories only to requests to open files.
/X:OFF is the default setting.
/PATH:ON
Applies appended directories to file requests that already
specify a path. /PATH:ON is the default setting.
/PATH:OFF
Turns off the effect of /PATH:ON.
/E
Stores a copy of the appended directory list in an environment
variable named APPEND. /E may be used only the first time
you use APPEND after starting your system.
Type APPEND ; to clear the appended directory list.
Type APPEND without parameters to display the appended directory list.
ASSIGN
External - DOS 2.0 and above
Further information: Drive letter assignment
Redirects requests for disk operations on one drive to a different drive.
ASSIGN [x[:]=y[:][...]]
ASSIGN /STATUS
x
y
/STATUS
Specifies the drive letter to reassign.
Specifies the drive that x: will be assigned to.
Displays current drive assignments.
Type ASSIGN without parameters to reset all drive letters to original
assignments.
ATTRIB
External - DOS 3.0 and above
Displays or changes file attributes.
ATTRIB [+R | -R] [+A | -A] [+S | -S] [+H | -H] [[drive:][path]filename] [/S]
+
Sets an attribute.
Clears an attribute.
R
Read-only file attribute.
A
Archive file attribute.
S
System file attribute.
H
Hidden file attribute.
/S Processes files in all directories in the specified path.
+S, -S, +H, and -H are available in DOS 5.0 and above.
BACKUP
External - DOS 2.0 and above
Backs up one or more files from one disk to another.
BACKUP source destination-drive: [/S] [/M] [/A] [/F[:size]]
[/D:date[/T:time]] [/L[:[drive:][path]logfile]]
source
destination-drive:
/S
/M
/A
/F:[size]
Specifies the file(s), drive, or directory to back up.
Specifies the drive to save backup copies onto.
Backs up contents of subdirectories.
Backs up only files that have changed since the last
backup.
Adds backup files to an existing backup disk.
Specifies the size of the disk to be formatted.
/D:date
Backs up only files changed on or after the specified
date.
/T:time
Backs up only files changed at or after the specified
time.
/L[:[drive:][path]logfile]
Creates a log file and entry to record the backup
operation.
BREAK
Internal - DOS 2.0 and above
Sets or clears extended CTRL+C checking.
BREAK [ON | OFF]
Type BREAK without a parameter to display the current BREAK setting.
CALL
Internal - DOS 3.3 and above
Calls one batch program from another.
CALL [drive:][path]filename [batch-parameters]
batch-parameters
Specifies any command-line information required by the
batch program.
CHCP
Internal - DOS 3.3 and above
Displays or sets the active code page number.
CHCP [nnn]
nnn
Specifies a code page number.
Type CHCP without a parameter to display the active code page number.
CHDIR or CD
Internal - DOS 2.0 and above
Displays the name of or changes the current directory.
CHDIR [drive:][path]
CHDIR[..]
CD [drive:][path]
CD[..]
..
Specifies that you want to change to the parent directory.
Type CD drive: to display the current directory in the specified drive.
Type CD without parameters to display the current drive and directory.
CHKDSK
External - DOS 1.0 and above
Checks a disk and displays a status report.
CHKDSK [drive:][[path]filename] [/F] [/V]
[drive:][path]
filename
/F
/V
Specifies the drive and directory to check.
Specifies the file(s) to check for fragmentation.
Fixes errors on the disk.
Displays the full path and name of every file on the disk.
Type CHKDSK without parameters to check the current disk.
CHKDSK originated as an external command in 86-DOS.
CHOICE
External - DOS 6.0 and above
Waits for you to choose one of a set of choices.
CHOICE [/C[:]choices] [/N] [/S] [/T[:]c,nn] [text]
/C[:]choices
/N
/S
/T[:]c,nn
text
Specifies allowable keys. Default is YN.
Does not display choices and ? at end of prompt string.
Treats choice keys as case sensitive.
Defaults choice to c after <nn> seconds.
Prompts string to display.
ERRORLEVEL is set to offset of key you press in choices.
CLS
Internal - DOS 2.0 and above
Clears the screen.
CLS
COMMAND
External - DOS 1.0 and above
Starts a new copy of the DOS Command Interpreter.
COMMAND [[drive:]path] [device] [/E:nnnnn] [/P [/MSG]]
[/H] [/O] [/Y [/C command | /K command]]
[drive:]path
Specifies the directory containing COMMAND.COM file.
device
/E:nnnnn
/P
/MSG
/H
/O
/Y
/C command
/K command
Specifies the device to use for command input and output.
Sets the initial environment size to nnnnn bytes.
Makes the new Command Interpreter permanent (can't exit).
Stores all error messages in memory (requires /P).
Loads the Command Interpreter into a UMB if available.
Disables overwrite prompt on COPY,XCOPY,and MOVE commands.
Steps through the batch program specified by /C or /K.
Executes the specified command and returns.
Executes the specified command and continues running.
The /P and /MSG switches may be used only when COMMAND is started by using
the SHELL command in the CONFIG.SYS file.
/O and /Y are only available in DOS 6 and above. /H is only available in PC DOS 7.
COMMAND.COM originated in 86-DOS.
COMP
External - DOS 1.0 and above
Compares the contents of two files or sets of files.
COMP [data1] [data2] [/D] [/A] [/L] [/N=number] [/C]
data1
data2
/D
Specifies location and name(s) of first file(s) to compare.
Specifies location and name(s) of second files to compare.
Displays differences in decimal format. This is the default
setting.
/A
Displays differences in ASCII characters.
/L
Displays line numbers for differences.
/N=number Compares only the first specified number of lines in each file.
/C
Disregards case of ASCII letters when comparing files.
To compare sets of files, use wildcards in data1 and data2 parameters.
COPY
Internal - DOS 1.0 and above
Copies one or more files to another location.
COPY [/A | /B] source [/A | /B] [+ source [/A | /B] [+ ...]] [destination
[/A | /B]] [/V] [/Y | /-Y]
source
/A
/B
destination
/V
/Y
/-Y
Specifies the file or files to be copied.
Indicates an ASCII text file.
Indicates a binary file.
Specifies the directory and/or filename for the new file(s).
Verifies that new files are written correctly.
Suppresses prompting to confirm you want to overwrite an
existing destination file.
Causes prompting to confirm you want to overwrite an
existing destination file.
The switch /Y may be preset in the COPYCMD environment variable.
To append files, specify a single file for destination, but multiple files
for source (using wildcards or file1+file2+file3 format).
/Y and /-Y are only available in DOS 6 and above.
COPY originated as an internal command in 86-DOS.
CTTY
Internal - DOS 2.0 and above
Changes the terminal device used to control your system.
CTTY device
device
The terminal device you want to use, such as COM1.
DATE
External - DOS 1.0
Internal - DOS 1.1 and above
Displays or sets the date.
DATE [date]
Type DATE without parameters to display the current date setting and
a prompt for a new one. Press ENTER to keep the same date.
DEBUG
External - DOS 1.0 and above
Runs Debug, a program testing and editing tool.
DEBUG [[drive:][path]filename [testfile-parameters]]
[drive:][path]filename
testfile-parameters
Specifies the file you want to test.
Specifies command-line information required by
the file you want to test.
After Debug starts, type ? to display a list of debugging commands.
DEBUG originated as an external command in 86-DOS.
DEFRAG
External - DOS 6.0 and above
Further information: Defragmentation
Reorganizes files on disks to optimize performance.
DEFRAG [drive:] [/F] [/Sorder] [/B] [/SKIPHIGH] [/LCD | /BW | /G0]
DEFRAG [drive:] [/U] [/B] [/SKIPHIGH] [/LCD | /BW | /G0]
[drive:]
/F
/U
/S
order
Drive letter of disk to be optimized.
Fully optimizes specified disk.
Unfragments files, leaving space between files.
Sort files by specified order.
N By Name (alphabetic)
E By extension (alphabetic)
D By date & time (earliest first) S By size (smallest first)
- Suffix to reverse order
/B
Restarts your computer after optimization.
/SKIPHIGH Prevents Defrag from using extended or upper memory.
/LCD
Runs Defrag using an LCD color scheme.
/BW
Runs Defrag using a black and white color scheme.
/G0
Disable the graphic mouse and graphic character set.
DEFRAG is a licensed version Norton Utilities Speed Disk.
DEL or ERASE
Internal - DOS 1.0 and above
Deletes one or more files.
DEL [drive:][path]filename [/P]
ERASE [drive:][path]filename [/P]
[drive:][path]filename
Specifies the file(s) to delete. Specify multiple
files by using wildcards.
/P
Prompts for confirmation before deleting each file.
/Q
Quiet mode, do not ask if ok to delete on global wildcard
/P is only available in DOS 5 and above.
ERASE (but not its alias DEL) originated as an internal command in 86-DOS. All versions of DR-DOS support the ERA command as an alias to ERASE /
DEL and add ERAQ / DELQ shortcuts identical to the DR-DOS ERA / ERASE / DEL command with the /Q (Query) option given to prompt the user for
confirmation.
DELTREE
External - DOS 6.0 and above
Deletes a directory and all the subdirectories and files within it.
To Delete one or more directories:
DELTREE [/Y] [drive:]path [[drive:]path[...]]
/Y
[drive:]path
Suppresses prompting to confirm whether you want to
delete the subdirectory.
Specifies the name of the directory you want to delete.
Note: Use DELTREE with caution. Every file and subdirectory within the
specified directory will be deleted.
DIR
Internal - DOS 1.0 and above
Displays a list of files and subdirectories in a directory.
DIR [drive:][path][filename] [/P] [/W] [/A[[:]attribs]] [/O[[:]sortord]]
[/S] [/B] [/L]
[drive:][path][filename]
Specifies drive, directory, and files to list.
/P
Pauses after each full screen of information.
/W
Uses wide list format.
/A
Displays files with specified attributes.
attribs
D Directories
R Read-only files
H Hidden files
S System files A Files ready to archive - Prefix meaning "not"
/O
Lists by files in sorted order.
sortord
N By name (alphabetic)
S By size (smallest first)
E By extension (alphabetic) D By date & time (earliest first)
G Group directories first
- Prefix to reverse order
/S
Displays files in specified directory and all subdirectories.
/B
Uses bare format (no heading information or summary).
/L
Uses lowercase.
Switches may be preset in the DIRCMD environment variable. Override
preset switches by prefixing any switch with - (hyphen)--for example, /-W.
To remove the commas from the DIR output, use the NO_SEP environment variable.
Only /P and /W are available prior to DOS 5.
DIR originated as an internal command in 86-DOS.
DISKCOMP
External - DOS 1.0 and above
Compares the contents of two floppy disks.
DISKCOMP [drive1: [drive2:]] [/1] [/8]
/1
Compares the first side of the disks.
/8
Compares only the first eight sectors of each track.
DISKCOPY
External - DOS 1.0 and above
Copies the contents of one floppy disk to another.
DISKCOPY [drive1: [drive2:]] [/1] [/V] [/M]
/1
/V
/M
Copies only the first side of the disk.
Verifies that the information is copied correctly.
Force multi-pass copy using memory only.
The two floppy disks must be the same type.
You may specify the same drive for drive1 and drive2.
DOSKEY
External - DOS 5.0 and above
Edits command lines, recalls DOS commands, and creates macros.
DOSKEY [/REINSTALL] [/BUFSIZE=size] [/MACROS] [/HISTORY]
[/INSERT | /OVERSTRIKE] [macroname=[text]]
/REINSTALL
/BUFSIZE=size
/MACROS
/HISTORY
/INSERT
/OVERSTRIKE
macroname
text
Installs a new copy of Doskey.
Sets size of command history buffer.
Displays all Doskey macros.
Displays all commands stored in memory.
Specifies that new text you type is inserted in old text.
Specifies that new text overwrites old text.
Specifies a name for a macro you create.
Specifies commands you want to record.
UP and DOWN ARROWS recall commands; ESC clears command line; F7 displays
command history; ALT+F7 clears command history; F8 searches command
history; F9 selects a command by number; ALT+F10 clears macro definitions.
The following are some special codes in Doskey macro definitions:
$T
Command separator. Allows multiple commands in a macro.
$1-$9 Batch parameters. Equivalent to %1-%9 in batch programs.
$*
Symbol replaced by everything following macro name on command line.
DRVLOCK
'External - PC DOS 5.02 and above
Locks the drive or socket so that media cannot be removed.
DRVLOCK [drive: | socket:] [/ON | /OFF]
/ON
Sets the lock on.
/OFF
Sets the lock off.
DRVLOCK is only included with PC DOS versions.
DYNALOAD
External - PC DOS 7
Load a device driver after system startup.
DYNALOAD filename [parameters]
filename
parameters
Specifies the name of the device driver to load.
Specifies any parameters for the device driver.
E
External - PC DOS 6.1 and above
Starts PC DOS editor, which creates and changes ASCII files.
E [/Q] [/B] [/S] [/D] [/I] [/C] [/A] [/DM] [/80 |/132] [d:][path][filespec]
[=filespec] ['command']
/Q
/B
/S
Turns off display of "Loading .." message.
Displays files in browse (read-only) mode.
Uses EMS (or hardfile if no EMS is available) to edit files
too large for conventional memory.
/D
Forces file to be loaded from disk.
/I
Edits STACKER.INI file.
/C
Edits CONFIG.SYS file.
/A
Edits AUTOEXEC.BAT file.
/DM
Disables Mouse.
/80
Enables 80 column, 16 color text video (CGA/EGA/MCGA/VGA/SVGA/XGA).
/132
Enables 132 column, 16 color text video (XGA).
[d:][path][filespec] Specifies drive, directory, and file to edit.
=
Is shorthand for "same path as last specified" at the DOS
prompt, or "same as current file's" at the editor commandline.
For example, E \PAS\LANG\FOO.PAS =FOO.BAK will load the two
files FOO.PAS and FOO.BAK, both from the directory \PAS\LANG.
'command' Specifies a startup command. For example, E \FOO.PAS 'ALL /IF'
will load the file FOO.PAS and then execute the ALL command
on this file.
Switches may be preset in the E environment variable.
E is only included with PC DOS versions and replaces the MS-DOS editor EDIT.
ECHO
Internal - DOS 2.0 and above
Displays messages, or turns command-echoing on or off.
ECHO [ON | OFF]
ECHO [message]
Type ECHO without parameters to display the current echo setting.
EDIT
External - MS-DOS 5.0 and above
Starts the MS-DOS editor, which creates and changes ASCII files.
EDIT [[drive:][path]filename] [/B] [/G] [/H] [/NOHI]
[drive:][path]filename Specifies the ASCII file to edit.
/B
Allows use of a monochrome monitor with a color graphics card.
/G
Provides the fastest update of a CGA screen.
/H
Displays the maximum number of lines possible for your hardware.
/NOHI
Allows the use of a monitor without high-intensity support.
PC DOS 6.1 and later use the E editor.
EDLIN
External - DOS 1.0 and above
Line-oriented text editor.
EDLIN [:][path]filename [/B]
/B
Ignores end-of-file (CTRL+Z) characters.
EDLIN originated as an external command in 86-DOS.
EJECT
External - PC DOS 5.02 and above
Ejects the media from a drive.
EJECT [drive:]
EJECT is only included with PC DOS versions.
EMM386
External - DOS 5.0 and above
Turns on or off EMM386 expanded memory support.
EMM386 [ON | OFF | AUTO] [W=ON | W=OFF]
ON | OFF | AUTO
Activates or suspends EMM386.EXE device driver,
or places it in auto mode.
W=ON | OFF
Turns on or off Weitek coprocessor support.
EMM386.EXE must be loaded as a device driver in CONFIG.SYS in order to use this command.
EXE2BIN
External - DOS 1.0 and above
Converts .EXE (executable) files to binary format.
EXE2BIN [drive1:][path1]input-file [[drive2:][path2]output-file]
input-file
Specifies the .EXE file to be converted.
output-file Specifies the binary file to be created.
EXIT
Internal - DOS 2.0 and above
Quits the COMMAND.COM program (command interpreter).
EXIT
FASTOPEN
External - DOS 3.3 to DOS 6.3
Decreases the amount of time needed to open frequently used files and directories.
FASTOPEN drive:[[=]n] [drive:[[=]n][ ...]] [/X]
drive:
n
Specifies the hard disk drive you want Fastopen to work with.
Specifies the maximum number of file locations Fastopen retains
in its filename cache.
Creates the filename cache in expanded memory.
/X
FC
External - DOS 3.3 and above
Compares two files or sets of files and displays the differences between them.
FC [/A] [/C] [/L] [/LBn] [/N] [/T] [/W] [/nnnn] [drive1:][path1]filename1
[drive2:][path2]filename2
FC /B [drive1:][path1]filename1 [drive2:][path2]filename2
/A
/B
/C
/L
/LBn
Displays only first and last lines for each set of differences.
Performs a binary comparison.
Disregards the case of letters.
Compares files as ASCII text.
Sets the maximum consecutive mismatches to the specified number of
/N
/T
/W
/nnnn
lines.
Displays the line numbers on an ASCII comparison.
Does not expand tabs to spaces.
Compresses white space (tabs and spaces) for comparison.
Specifies the number of consecutive lines that must match after a
mismatch.
FDISK
External - DOS 2.0 and above
Configures a hard disk for use with DOS.
FDISK [/STATUS]
/STATUS
Displays the status of the fixed disk drive
The undocumented /MBR switch replaces the Master Boot Record. The partition entries in it will remain intact.
FIND
External - DOS 2.0 and above
Searches for a text string in a file or files.
FIND [/V] [/C] [/N] [/I] [/S] "string" [[drive:][path]filename[ ...]]
/V
Displays all lines NOT containing the specified string.
/C
Displays only the count of lines containing the string.
/N
Displays line numbers with the displayed lines.
/I
Ignores the case of characters when searching for the string.
/S
Search subdirectories also.
"string" Specifies the text string to find.
[drive:][path]filename
Specifies a file or files to search.
If a pathname is not specified, FIND searches the text typed at the prompt
or piped from another command.
/S is only available in PC DOS 7. Also PC DOS 7 allows the use of wildcards in filenames while prior versions do not.
FOR
Internal - DOS 2.0 and above
Runs a specified command for each file in a set of files.
FOR %variable IN (set) DO command [command-parameters]
%variable Specifies
(set)
Specifies
command
Specifies
command-parameters
Specifies
a replaceable parameter.
a set of one or more files. Wildcards may be used.
the command to carry out for each file.
parameters or switches for the specified command.
To use the FOR command in a batch program, specify %%variable instead of
%variable
FORMAT
External - DOS 1.0 and above
Formats a disk for use with DOS.
FORMAT drive: [/V[:label]] [/Q] [/U] [/F:size] [/B | /S] [/C]
FORMAT drive: [/V[:label]] [/Q] [/U] [/T:tracks /N:sectors] [/B | /S] [/C]
FORMAT drive: [/V[:label]] [/Q] [/U] [/1] [/4] [/B | /S] [/C]
FORMAT drive: [/Q] [/U] [/1] [/4] [/8] [/B | /S] [/C]
/V[:label]
/Q
/U
/F:size
Specifies the volume label.
Performs a quick format.
Performs an unconditional format.
Specifies the size of the floppy disk to format (such
as 160, 180, 320, 360, 720, 1.2, 1.44, 2.88).
/B
Allocates space on the formatted disk for system files.
/S
Copies system files to the formatted disk.
/T:tracks
Specifies the number of tracks per disk side.
/N:sectors Specifies the number of sectors per track.
/1
Formats a single side of a floppy disk.
/4
Formats a 5.25-inch 360K floppy disk in a high-density drive.
/8
Formats eight sectors per track.
/C
Revert to less conservative handling of bad blocks.
/Q and /U are only available in DOS 5 and above. /C is only available in DOS 6 and above.
FORMAT replaced the internal command CLEAR in 86-DOS.
GOTO
Internal - DOS 2.0 and above
Directs DOS to a labelled line in a batch program.
GOTO label
label
Specifies a text string used in the batch program as a label.
You type a label on a line by itself, beginning with a colon.
GRAFTABL
External - DOS 3.0 and above
Enables DOS to display an extended character set in graphics mode.
GRAFTABL [xxx]
GRAFTABL /STATUS
xxx
Specifies a code page number.
/STATUS Displays the current code page selected for use with GRAFTABL.
GRAPHICS
External - DOS 2.0 and above
Loads a program that can print graphics.
GRAPHICS [type] [[drive:][path]filename] [/R] [/B] [/LCD]
[/PRINTBOX:STD | /PRINTBOX:LCD]
type
Specifies a printer type.
[drive:][path]filename
Specifies the file containing information on supported printers.
/R
Prints white on black as seen on the screen.
/B
Prints the background in color for COLOR4 and COLOR8 printers.
/LCD
Prints using LCD aspect ratio.
/PRINTBOX:STD | /PRINTBOX:LCD
Specifies the print-box size, either STD or LCD.
This command uses the file GRAPHICS.PRO which contains information about the supported printer types.
HELP
External - DOS 5.0 and above
Displays command help.
HELP [topic]
 MS-DOS versions use QBASIC.EXE and QuickHelp files for its help system.
 PC DOS versions use VIEW.EXE and Information Presentation Facility files for its help system.
IF
Internal - DOS 2.0 and above
Performs conditional processing in batch programs.
IF [NOT] ERRORLEVEL number command
IF [NOT] string1==string2 command
IF [NOT] EXIST filename command
NOT
Specifies that DOS should carry out the command only
if the condition is false.
ERRORLEVEL number Specifies a true condition if the last program run returned
an exit code equal to or greater than the number specified.
command
Specifies the command to carry out if the condition is
met.
string1==string2 Specifies a true condition if the specified text strings
match.
EXIST filename
Specifies a true condition if the specified filename
exists.
INTERLNK
External - DOS 5.02 and above
Displays status of INTERLNK-INTERSVR redirected drives.
INTERLNK [client=[server]]
client
server
Specifies a client drive to redirect to a server drive.
Cancels redirection if no server drive is specified.
Specifies a server drive to redirect to a client drive.
Type INTERLNK with no parameters to show INTERLNK status.
INTERLNK.EXE must be loaded as a device driver in CONFIG.SYS in order to use this command.
INTERSVR
External - DOS 5.02 and above
Provides serial or parallel file transfer and printing capabilities via redirected drives.
INTERSVR [drive:[...]] [/X=drive:[...]] [/LPT[:][n | address]]
[/COM[:][n | address]] [/baud:rate] [/v] [/b]
drive:
Specifies the drive(s) to redirect
(by default, all drives are redirected).
/X=drive:
Specifies the drive(s) to exclude.
/LPT[n]
Specifies a port to scan. (/LPT scans all LPT ports).
/LPT[address] Specifies a port address to scan.
/COM[n]
Specifies a port to scan. (/COM scans all COM ports).
/COM[address] Specifies a port address to scan.
/BAUD:rate
Set a maximum serial baud rate.
/B
Displays the INTERLNK server screen in black and white.
/V
Prevents conflicts with a computer's timer. Specify this
switch if you have a serial connection between computers and
one of them stops running when you use INTERLNK.
INTERSVR /RCOPY
Copies INTERLNK files from one computer to another, provided that the
computers' serial ports are connected with a 7-wire null-modem cable.
JOIN
External - DOS 3.1 and above
Joins a disk drive to a directory on another drive.
JOIN [drive1: [drive2:]path]
JOIN drive1: /D
drive1:
drive2:
path
/D
Specifies a disk drive that will appear as a directory on
drive2.
Specifies a drive to which you want to join drive1.
Specifies the directory to which you want to join drive1. It
must be empty and cannot be the root directory.
Cancels any previous JOIN commands for the specified drive.
Type JOIN without parameters to list currently joined drives.
KEYB
External - DOS 3.3 and above
Configures a keyboard for a specific language.
KEYB [xx[,[yyy][,[drive:][path]filename]]] [/E] [/ID:nnn]
xx
Specifies a two-letter keyboard code.
yyy
Specifies the code page for the character set.
[drive:][path]filename Specifies the keyboard definition file.
/E
Specifies that an enhanced keyboard is installed.
/ID:nnn
Specifies the keyboard in use.
KEYB replaces the commands KEYBFR, KEYBGR, KEYBIT, KEYBSP and KEYBUK from DOS 3.0 to 3.2.
LABEL
External - DOS 3.0 and above
Creates, changes, or deletes the volume label of a disk.
LABEL [drive:][label]
LOADFIX
External - DOS 5.0 and above
Loads a program above the first 64K of memory, and runs the program.
LOADFIX [drive:][path]filename
Use LOADFIX to load a program if you have received the message
"Packed file corrupt" when trying to load it in low memory.
LOADHIGH or LH
Internal - DOS 5.0 and above
Loads a program into the upper memory area.
LOADHIGH [drive:][path]filename [parameters]
LOADHIGH [/L:region1[,minsize1][;region2[,minsize2]...]]
[drive:][path]filename [parameters]
/L:region1[,minsize1][;region2[,minsize2]]...
Specifies the region(s) of memory into which to load
the program. Region1 specifies the number of the first
memory region; minsize1 specifies the minimum size, if
any, for region1. Region2 and minsize2 specify the
number and minimum size of the second region, if any.
You can specify as many regions as you want.
[drive:][path]filename
Specifies the location and name of the program.
parameters
Specifies any command-line information required by
the program.
/L is only available in DOS 6 and above.
MEM
External - DOS 4.0 and above
Displays the amount of used and free memory in your system.
MEM [/CLASSIFY | /DEBUG | /FREE | /MODULE modulename] [/PAGE]
/CLASSIFY or /C
/DEBUG or /D
/FREE or /F
/MODULE or /M
Classifies programs by memory usage. Lists the size of
programs, provides a summary of memory in use, and lists
the largest memory block available.
Displays status of all modules in memory, internal drivers,
and other information.
Displays information about the amount of free memory left
in both conventional and upper memory.
Displays a detailed listing of a module's memory use.
This option must be followed by the name of a module,
optionally separated from /M by a colon.
/PAGE or /P
Pauses after each full screen of information.
MIRROR
External - DOS 5.0 and above
Records information about one or more disks.
MIRROR [drive:[ ...]] [/1] [/Tdrive[-entries][ ...]]
MIRROR [/U]
MIRROR [/PARTN]
drive:
/1
Specifies the drive for which you want to save information.
Saves only the latest disk information (does not back up
previous information).
/Tdrive
Loads the deletion-tracking program for the specified drive.
-entries
Specifies maximum number of entries in the deletion-tracking
file.
/U
Unloads the deletion-tracking program.
/PARTN
Saves hard disk partition information to a floppy diskette.
MIRROR is licensed from Central Point Software PC Tools
MKDIR or MD
Internal - DOS 2.0 and above
Creates a directory.
MKDIR [drive:]path
MD [drive:]path
MODE
External - DOS 1.0 and above
Configures system devices.
Printer port:
MODE LPTn[:] [COLS=c] [LINES=l] [RETRY=r]
Serial port:
MODE COMm[:] [BAUD=b] [PARITY=p] [DATA=d] [STOP=s] [RETRY=r]
Device Status:
MODE [device] [/STATUS]
Redirect printing: MODE LPTn[:]=COMm[:]
Prepare code page: MODE device CP PREPARE=((yyy[...]) [drive:][path]filename)
Select code page: MODE device CP SELECT=yyy
Refresh code page: MODE device CP REFRESH
Code page status: MODE device CP [/STATUS]
Display mode:
MODE [display-adapter][,n]
MODE CON[:] [COLS=c] [LINES=n]
Typematic rate:
MODE CON[:] [RATE=r DELAY=d]
MORE
External - DOS 2.0 and above
Displays output one screen at a time.
MORE < [drive:][path]filename
command-name | MORE
[drive:][path]filename Specifies a file to display one screen at a time.
command-name
Specifies a command whose output will be displayed.
MOVE
External - DOS 6.0 and above
Moves files and renames files and directories.
To move one or more files:
MOVE [/Y | /-Y] [drive:][path]filename1[,...] destination
To rename a directory:
MOVE [drive:][path]dirname1 dirname2
/Y
/-Y
Suppresses prompting to confirm overwriting of the destination.
Causes prompting to confirm overwriting of the destination.
The switch /Y may be present in the COPYCMD environment variable.
This may be overridden with /-Y on the command line.
[drive:][path]filename1 Specifies the location and name of the file
or files you want to move.
destination
Specifies the new location of the file. Destination
can consist of a drive letter and colon, a directory
name, or a combination. If you are moving only one
file, you can also include a filename if you want
to rename the file when you move it.
[drive:][path]dirname1 Specifies the directory you want to rename.
dirname2
Specifies the new name of the directory.
MSCDEX
External - DOS 6.0 and above
Loads the CD-ROM support utility.
MSCDEX /D:driver ... [/E] [/K] [/L:letter] [/M:buffers] [/S] [/V]
/D:driver
/E
/K
Specifies name of CD-ROM driver
Load buffers in expanded memory (EMS)
Load Kanji support
/L:letter
Specifies first drive letter to use
/M:buffers Specifies number of sector buffers
/S
Load server environment support
/V
Display verbose memory usage statistics
MSD
External - MS-DOS 6.0 and above
Provides detailed technical information about your computer.
MSD [/I] [/F[drive:][path]filename] [/P[drive:][path]filename]
[/S[drive:][path][filename]]
MSD [/B][/I]
/B
/I
/F[drive:][path]filename
Runs MSD using a black and white color scheme.
Bypasses initial hardware detection.
Requests input and writes an MSD report to the
specified file.
/P[drive:][path]filename
Writes an MSD report to the specified file
without first requesting input.
/S[drive:][path][filename] Writes a summary MSD report to the specified
file. If no filename is specified, output is to
the screen.
MSD is only included with MS-DOS versions, PC DOS uses QCONFIG.
NLSFUNC
External - DOS 3.3 and above
Loads country-specific information.
NLSFUNC [[drive:][path]filename]
[drive:][path]filename
Specifies the file containing country-specific
information.
PATH
Internal - DOS 2.0 and above
Displays or sets a search path for executable files.
PATH [[drive:]path[;...]]
PATH ;
Type PATH ; to clear all search-path settings and direct DOS to search
only in the current directory.
Type PATH without parameters to display the current path.
PAUSE
Internal - DOS 1.0 and above
Suspends processing of a batch program and displays the message "Press any key to continue...." (For DOS 4.01 and above) or "Strike a key when
ready..." (For DOS 4.0 and below).
PAUSE
PAUSE originated as an internal command in 86-DOS.
POWER
External - DOS 5.02 and above
Reduces power used by your computer.
POWER [ADV[:MAX | REG | MIN] | STD | OFF]
ADV[:MAX | REG | MIN] -- Reduces power by monitoring applications
and hardware devices. MAX provides the most power conservation,
REG provides average power conservation, and MIN provides the
least conservation.
STD -- Reduces power by monitoring hardware devices.
OFF -- Turns off power management.
POWER.EXE must be loaded as a device driver in CONFIG.SYS in order to use this command.
PRINT
External - DOS 2.0 and above
Prints a text file while you are using other DOS commands.
PRINT [/D:device] [/B:size] [/U:ticks1] [/M:ticks2] [/S:ticks3]
[/Q:qsize] [/T] [[drive:][path]filename[ ...]] [/C] [/P]
/D:device
/B:size
/U:ticks1
/M:ticks2
/S:ticks3
/Q:qsize
/T
/C
/P
Specifies a print device.
Sets the internal buffer size, in bytes.
Waits the specified maximum number of clock ticks for the printer
to be available.
Specifies the maximum number of clock ticks it takes to print a
character.
Allocates the scheduler the specified number of clock ticks for
background printing.
Specifies the maximum number of files allowed in the print queue.
Removes all files from the print queue.
Cancels printing of the preceding filename and subsequent
filenames.
Adds the preceding filename and subsequent filenames to the print
queue.
Type PRINT without parameters to display the contents of the print queue.
PROMPT
Internal - DOS 2.0 and above
Changes the DOS command prompt.
PROMPT [text]
text
Specifies a new command prompt.
Prompt can be made up of normal characters and the following special codes:
$Q
$$
$T
$D
$P
$V
$N
$G
$L
$B
$H
$E
$_
= (equal sign)
$ (dollar sign)
Current time
Current date
Current drive and path
DOS version number
Current drive
> (greater-than sign)
< (less-than sign)
| (pipe)
Backspace (erases previous character)
Escape code (ASCII code 27)
Carriage return and linefeed
Type PROMPT without parameters to reset the prompt to the default setting.
QBASIC
External - MS-DOS 5.0 and above
Further information: QBASIC
Starts the MS-DOS QBasic programming environment.
QBASIC [/B] [/EDITOR] [/G] [/H] [/MBF] [/NOHI] [[/RUN] [drive:][path]filename]
/B
/EDITOR
/G
/H
/MBF
Allows use of a monochrome monitor with a color graphics card.
Starts the MS-DOS editor.
Provides the fastest update of a CGA screen.
Displays the maximum number of lines possible for your hardware.
Converts the built-in functions MKS$, MKD$, CVS, and CVD to
MKSMBF$, MKDMBF$, CVSMBF, and CVDMBF, respectively.
/NOHI
Allows the use of a monitor without high-intensity support.
/RUN
Runs the specified Basic program before displaying it.
[[drive:][path]filename] Specifies the program file to load or run.
QBASIC replaces GW-BASIC from earlier versions of DOS.
QCONFIG
External - PC DOS 6.1 and above
Displays detailed technical information about your computer.
QCONFIG [/?][/A][/C][/D][/E][/I][/O[file]][/P][/Q] [key="text"]
/?
/A
/C
/D
/E
/I
/O
/Ofile
/P
/Q
key="text"
Displays this help information.
Displays all Micro Channel adapters supported by QCONFIG.
Displays additional detail on async ports.
Displays a detailed listing of hardware.
Displays current environment.
Displays CONFIG.SYS & AUTOEXEC.BAT.
Redirects output to file QCONFIG.OUT.
Redirects output to file (anyname).
Pauses the output between screens.
Does not display redirect message.
Defines key with text to appear in output (must be last option).
QCONFIG is only included with PC DOS versions and replaces the Microsoft MSD utility.
RECOVER
External - DOS 2.0 to DOS 5.0
Recovers readable information from a bad or defective disk.
RECOVER [drive:][path]filename
RECOVER drive:
REM
Internal - DOS 1.0 and above
Records comments (remarks) in a batch file or CONFIG.SYS.
REM [comment]
RENAME or REN
Internal - DOS 1.0 and above
Renames a file or files.
RENAME [drive:][path]filename1 filename2
REN [drive:][path]filename1 filename2
Note that you cannot specify a new drive or path for your destination file.
Use MOVE to move files from one directory to another, or to rename a directory.
RENAME (but not its alias REN) originated as an internal command in 86-DOS.
REPLACE
External - DOS 3.2 and above
Replaces files.
REPLACE [drive1:][path1]filename [drive2:][path2] [/A] [/P] [/R] [/W]
REPLACE [drive1:][path1]filename [drive2:][path2] [/P] [/R] [/S] [/W] [/U]
[drive1:][path1]filename Specifies the source file or files.
[drive2:][path2]
Specifies the directory where files are to be
replaced.
/A
Adds new files to destination directory. Cannot
use with /S or /U switches.
/P
Prompts for confirmation before replacing a file or
adding a source file.
/R
Replaces read-only files as well as unprotected
files.
/S
Replaces files in all subdirectories of the
destination directory. Cannot use with the /A
switch.
/W
Waits for you to insert a disk before beginning.
/U
Replaces (updates) only files that are older than
source files. Cannot use with the /A switch.
RESTORE
External - DOS 2.0 and above
Restores files that were backed up by using the BACKUP command.
RESTORE drive1: drive2:[path[filename]] [/S] [/P] [/B:date] [/A:date] [/E:time]
[/L:time] [/M] [/N] [/D]
drive1: Specifies the drive on which the backup files are stored.
drive2:[path[filename]]
Specifies the file(s) to restore.
/S
Restores files in all subdirectories in the path.
/P
Prompts before restoring read-only files or files changed since
the last backup (if appropriate attributes are set).
/B
Restores only files last changed on or before the specified date.
/A
Restores only files changed on or after the specified date.
/E
Restores only files last changed at or earlier than the specified
time.
/L
Restores only files changed at or later than the specified time.
/M
Restores only files changed since the last backup.
/N
Restores only files that no longer exist on the destination disk.
/D
Displays files on the backup disk that match specifications.
REXX
External - PC DOS 7
Further information: REXX
Execute a REXX program.
REXX filename [parameters]
filename
Specifies the name of the REXX program to execute.
parameters Specifies any parameters for the REXX program.
REXXDUMP
External - PC DOS 7
Dump the variables of an active REXX procedure.
REXXDUMP
RMDIR or RD
Internal - DOS 2.0 and above
Removes (deletes) a directory.
RMDIR [drive:]path
RD [drive:]path
SCANDISK
External - MS-DOS 6.2 and above
Runs the ScanDisk disk-repair program.
To check and repair a drive, use the following syntax:
SCANDISK [drive: | /ALL] [/CHECKONLY | /AUTOFIX [/NOSAVE]] [/SURFACE]
To check and repair an unmounted DriveSpace compressed volume file, use:
SCANDISK drive:\DRVSPACE.nnn [/CHECKONLY | /AUTOFIX[/NOSAVE]]
To examine a file for fragmentation, use the following syntax:
SCANDISK /FRAGMENT [drive:][path]filename
To undo repairs you made previously, use the following syntax:
SCANDISK /UNDO [drive:]
For [drive:], specify the drive containing your Undo disk.
/ALL
/AUTOFIX
/CHECKONLY
/CUSTOM
/NOSAVE
/NOSUMMARY
/SURFACE
/MONO
Checks and repairs all local drives.
Fixes damage without prompting.
Checks a drive, but does not repair any damage.
Configures and runs ScanDisk according to SCANDISK.INI settings.
With /AUTOFIX, deletes lost clusters rather than saving as files.
With /CHECKONLY or /AUTOFIX, prevents ScanDisk from stopping at
summary screens.
Performs a surface scan after other checks.
Configures ScanDisk for use with a monochrome display.
To check and repair the current drive, type SCANDISK without parameters.
Scandisk is only included with MS-DOS versions.
SET
Internal - DOS 2.0 and above
Displays, sets, or removes DOS environment variables.
SET [variable=[string]]
variable
string
Specifies the environment-variable name.
Specifies a series of characters to assign to the variable.
Type SET without parameters to display the current environment variables.
SETVER
External - DOS 5.0 and above
Sets the version number that DOS reports to a program.
Display current version table: SETVER [drive:path]
Add entry:
SETVER [drive:path] filename n.nn
Delete entry:
SETVER [drive:path] filename /DELETE [/QUIET]
[drive:path]
filename
n.nn
/DELETE or /D
/QUIET
Specifies location of the SETVER.EXE file.
Specifies the filename of the program.
Specifies the DOS version to be reported to the program.
Deletes the version-table entry for the specified program.
Hides the message typically displayed during deletion of
version-table entry.
SETVER.EXE must be loaded as a device driver in CONFIG.SYS in order to use this command. While the internal version setting functionality was present in
DOS 4, the SETVER command did not appear until DOS 5.
SHARE
External - DOS 3.0 and above
Installs file-sharing and locking capabilities on your hard disk.
SHARE [/F:space] [/L:locks] [/NOHMA]
/F:space
Allocates file space (in bytes) for file-sharing information.
/L:locks
Sets the number of files that can be locked at one time.
/NOHMA
Don't load code into the HMA.
/NOHMA is only available in PC DOS 7.
SHIFT
Internal - DOS 2.0 and above
Changes the position of replaceable parameters in a batch file.
SHIFT
SMARTDRV
External - DOS 6.0 and above
Installs and configures the SMARTDrive disk-caching utility.
SMARTDRV [/X] [[drive[+|-]]...] [/U] [/C | /R] [/L] [/V | /Q | /S]
[InitCacheSize [WinCacheSize]] [/E:ElementSize] [/B:BufferSize]
/X
drive
Disables write-behind caching for all drives.
Sets caching options on specific drive(s). The specified
drive(s) will have write-caching disabled unless you add +.
Enables write-behind caching for the specified drive.
Disables all caching for the specified drive.
Do not load CD-ROM caching module.
Writes all information currently in write-cache to hard disk.
Clears the cache and restarts SMARTDrive.
Prevents SMARTDrive from loading itself into upper memory.
Displays SMARTDrive status messages when loading.
Does not display status information.
Displays additional information about SMARTDrive's status.
Specifies XMS memory (KB) for the cache.
Specifies XMS memory (KB) for the cache with Windows.
Specifies how many bytes of information to move at one time.
Specifies the size of the read-ahead buffer.
+
/U
/C
/R
/L
/V
/Q
/S
InitCacheSize
WinCacheSize
/E:ElementSize
/B:BufferSize
SORT
External - DOS 2.0 and above
Sorts input and writes results to the screen, a file, or another device.
SORT [/R] [/+n] < [drive1:][path1]filename1 [> [drive2:][path2]filename2]
[command |] SORT [/R] [/+n] [> [drive2:][path2]filename2]
/R
/+n
[drive1:][path1]filename1
[drive2:][path2]filename2
Reverses the sort order; that is, sorts Z to A,
then 9 to 0.
Sorts the file according to characters in
column n.
Specifies a file to be sorted.
Specifies a file where the sorted input is to be
stored.
Specifies a command whose output is to be sorted.
command
SUBST
External - DOS 3.1 and above
Associates a path with a drive letter.
SUBST [drive1: [drive2:]path]
SUBST drive1: /D
drive1:
[drive2:]path
/D
Specifies
Specifies
a virtual
Deletes a
a virtual drive to which you want to assign a path.
a physical drive and path you want to assign to
drive.
substituted (virtual) drive.
Type SUBST with no parameters to display a list of current virtual drives.
SYS
External - DOS 1.0 and above
Copies DOS system files and command interpreter to a disk you specify.
SYS [drive1:][path] drive2:
[drive1:][path] Specifies the location of the system files.
drive2:
Specifies the drive the files are to be copied to.
SYS originated as an external command in 86-DOS.
TIME
External - DOS 1.0
Internal - DOS 1.1 and above
Displays or sets the system time.
TIME [time]
Type TIME with no parameters to display the current time setting and a prompt
for a new one. Press ENTER to keep the same time.
TREE
External - DOS 2.0 and above
Graphically displays the directory structure of a drive or path.
TREE [drive:][path] [/F] [/A]
/F
Displays the names of the files in each directory.
/A
Uses ASCII instead of extended characters.
TRUENAME
Internal - DOS 4.0 and above
Returns a fully qualified filename.
TRUENAME [drive:][path]filename
This command was undocumented in DOS 3.x.
TYPE
Internal - DOS 1.0 and above
Displays the contents of a text file.
TYPE [drive:][path]filename
TYPE originated as an internal command in 86-DOS.
UNDELETE
External - DOS 5.0 and above
Restores files previously deleted with the DEL command.
UNDELETE [[drive:][path]filename] [/DT | /DS | /DOS]
UNDELETE [/LIST | /ALL | /PURGE[DRIVE] | /STATUS | /LOAD | /UNLOAD
/S[DRIVE] | /T[DRIVE]-entrys ]]
/LIST
Lists the deleted files available to be recovered.
/ALL
Recovers files without prompting for confirmation.
/DOS
Recovers files listed as deleted by MS-DOS.
/DT
Recovers files protected by Delete Tracker.
/DS
Recovers files protected by Delete Sentry.
/LOAD
Loads Undelete into memory for delete protection.
/UNLOAD
Unloads Undelete from memory.
/PURGE[drive]
Purges all files in the Delete Sentry directory.
/STATUS
Display the protection method in effect for each drive.
/S[drive]
Enables Delete Sentry method of protection.
/T[drive][-entrys]
Enables Delete Tracking method of protection.
UNDELETE is licensed from Central Point Software PC Tools
UNFORMAT
External - DOS 5.0 and above
Restores a disk erased by the FORMAT command.
UNFORMAT drive: [/J]
UNFORMAT drive: [/U] [/L] [/TEST] [/P]
UNFORMAT /PARTN [/L]
drive:
/J
Specifies the drive to unformat.
Verifies that the mirror files agree with the system information
on the disk.
/U
Unformats without using MIRROR files.
/L
Lists all file and directory names found, or, when used with the
/PARTN switch, displays current partition tables.
/TEST
Displays information but does not write changes to disk.
/P
Sends output messages to printer connected to LPT1.
/PARTN
Restores disk partition tables.
UNFORMAT is licensed from Central Point Software PC Tools
VER
Internal - DOS 2.0 and above
Displays the DOS version.
VER
The undocumented /R switch displays the revision level and where DOS is loaded (low, HMA or ROM) in DOS 5 and above.
Version returned:
 MS-DOS up to 6.22, typically derive the DOS version from the DOS kernel. This may be different from the string it prints when it starts.
 PC DOS typically derive the version from an internal string in COMMAND.COM (so PC DOS 6.1 COMMAND.COM reports the version as 6.10,
although the kernel version is 6.00.)
 DR-DOS reports whatever value the reserved environment variable VER holds.
VERIFY
Internal - DOS 2.0 and above
Tells DOS whether to verify that your files are written correctly to a disk.
VERIFY [ON | OFF]
Type VERIFY without a parameter to display the current VERIFY setting.
VOL
Internal - DOS 2.0 and above
Displays the disk volume label and serial number, if they exist.
VOL [drive:]
XCOPY
External - DOS 3.2 and above
Copy entire directory trees.
XCOPY [/Y|/-Y] source [destination] [/A|/M] [/D:date] [/P] [/S] [/E] [/V] [/W]
source
Specifies the file(s) to copy.
destination Specifies the location and/or name of new files.
/A
Copies files with the archive attribute set,
doesn't change the attribute.
/M
Copies files with the archive attribute set,
turns off the archive attribute.
/D:date
Copies files changed on or after the specified date.
/P
Prompts you before creating each destination file.
/S
Copies directories and subdirectories except empty ones.
/E
Copies any subdirectories, even if empty.
/V
Verifies each new file.
/W
Prompts you to press a key before copying.
/Y
Suppresses prompting to confirm you want to overwrite an
existing destination file.
/-Y
Causes prompting to confirm you want to overwrite an
existing destination file.
The switch /Y may be preset in the COPYCMD environment variable.
This may be overridden with /-Y on the command line
/Y and /-Y are only available in DOS 6 and above.
Renaming the XCOPY program file to MCOPY under MS-DOS 3.2, the command will no longer ask for confirmation if the target was meant as file or
directory. Instead, it will then automatically assume the target to be an directory, if the source was given as a directory or as multiple files or was ending on "\".
Linux Commands
Command
Description
• apropos whatis
Show commands pertinent to string. See also
threadsafe
• man -t ascii | ps2pdf - > ascii.pdf
which command
make a pdf of a manual page
Show full path name of command
time command
See how long a command takes
• time cat
dir navigation
Start stopwatch. Ctrl-d to stop. See also sw
• cd • cd
Go to previous directory
Go to $HOME directory
(cd dir && command)
• pushd .
Go to dir, execute command and return to current
dir
Put current dir on stack so you can popd back to
it
file searching
• alias l='ls -l --color=auto'
quick dir listing
• ls -lrt
List files by date. See also newest and
find_mm_yyyy
• ls /usr/bin | pr -T9 -W$COLUMNS
find -maxdepth 1 -type f | xargs grep -F 'example'
Print in 9 columns to width of terminal
Search 'expr' in this dir and below. See also
findrepo
Search all regular files for 'example' in this dir
and below
Search all regular files for 'example' in this dir
find -maxdepth 1 -type d | while read dir; do echo
$dir; echo cmd2; done
Process each item with multiple commands (in while
loop)
find -name '*.[ch]' | xargs grep -E 'expr'
find -type f -print0 | xargs -r0 grep -F 'example'
• find -type f ! -perm -444
Find files not readable by all (useful for web
site)
• find -type d ! -perm -111
Find dirs not accessible by all (useful for web
site)
• locate -r 'file[^/]*\.txt'
• look reference
• grep --color reference /usr/share/dict/words
Search cached index for names. This re is like
glob *file*.txt
Quickly search (sorted) dictionary for prefix
Highlight occurances of regular expression in
dictionary
archives and compression
gpg -c file
Encrypt file
gpg file.gpg
tar -c dir/ | bzip2 > dir.tar.bz2
Decrypt file
Make compressed archive of dir/
bzip2 -dc dir.tar.bz2 | tar -x
Extract archive (use gzip instead of bzip2 for
tar.gz files)
tar -c dir/ | gzip | gpg -c | ssh user@remote 'dd
of=dir.tar.gz.gpg'
Make encrypted archive of dir/ on remote machine
find dir/ -name '*.txt' | tar -c --files-from=- |
bzip2 > dir_txt.tar.bz2
Make archive of subset of dir/ and below
find dir/ -name '*.txt' | xargs cp -a --targetdirectory=dir_txt/ --parents
Make copy of subset of dir/ and below
( tar -c /dir/to/copy ) | ( cd /where/to/ && tar -x
-p )
( cd /dir/to/copy && tar -c . ) | ( cd /where/to/ &&
tar -x -p )
Copy (with permissions) copy/ dir to /where/to/
dir
( tar -c /dir/to/copy ) | ssh -C user@remote 'cd
/where/to/ && tar -x -p'
Copy (with permissions) copy/ dir to
remote:/where/to/ dir
Copy (with permissions) contents of copy/ dir to
/where/to/
dd bs=1M if=/dev/sda | gzip | ssh user@remote 'dd
Backup harddisk to remote machine
of=sda.gz'
rsync (Network efficient file copier: Use the --dry-run option for testing)
rsync -P rsync://rsync.server.com/path/to/file file
Only get diffs. Do multiple times for troublesome
downloads
rsync --bwlimit=1000 fromfile tofile
Locally copy with rate limit. It's like nice for
I/O
rsync -az -e ssh --delete ~/public_html/
remote.com:'~/public_html'
rsync -auz -e ssh remote:/dir/ . && rsync -auz -e
ssh . remote:/dir/
Mirror web site (using compression and encryption)
Synchronize current directory with remote one
ssh (Secure SHell)
ssh $USER@$HOST command
• ssh -f -Y $USER@$HOSTNAME xeyes
scp -p -r $USER@$HOST: file dir/
Run command on $HOST as $USER (default
command=shell)
Run GUI command on $HOSTNAME as $USER
Copy with permissions to $USER's home directory on
$HOST
scp -c arcfour $USER@$LANHOST: bigfile
ssh -g -L 8080:localhost:80 root@$HOST
ssh -R 1434:imap:143 root@$HOST
ssh-copy-id $USER@$HOST
Use faster crypto for local LAN. This might
saturate GigE
Forward connections to $HOSTNAME:8080 out to
$HOST:80
Forward connections from $HOST:1434 in to imap:143
Install public key for $USER@$HOST for passwordless log in
wget (multi purpose download tool)
•
(cd dir/ && wget -nd -pHEKk
http://www.pixelbeat.org/cmdline.html)
wget -c http://www.example.com/large.file
Store local browsable version of a page to the
current dir
Continue downloading a partially downloaded file
wget -r -nd -np -l1 -A '*.jpg'
http://www.example.com/dir/
Download a set of files to the current directory
wget
wget
•
grep
echo
ftp://remote/file[1-9].iso/
FTP supports globbing directly
-q -O- http://www.pixelbeat.org/timeline.html |
Process output directly
'a href' | head
'wget url' | at 01:00
Download url at 1AM to current dir
wget --limit-rate=20k url
Do a low priority download (limit to 20KB/s in
this case)
wget -nv --spider --force-html -i bookmarks.html
Check links in a file
Efficiently update a local copy of a site (handy
wget --mirror http://www.example.com/
from cron)
networking (Note ifconfig, route, mii-tool, nslookup commands are obsolete)
ethtool eth0
Show status of ethernet interface eth0
ethtool --change eth0 autoneg off speed 100 duplex
full
Manually set ethernet interface speed
iwconfig eth1
iwconfig eth1 rate 1Mb/s fixed
Show status of wireless interface eth1
Manually set wireless interface speed
• iwlist scan
• ip link show
ip link set dev eth0 name wan
ip link set dev eth0 up
List wireless networks in range
List network interfaces
Rename interface eth0 to wan
Bring interface eth0 up (or down)
• ip addr show
ip addr add 1.2.3.4/24 brd + dev eth0
List addresses for interfaces
Add (or del) ip and mask (255.255.255.0)
• ip route show
ip route add default via 1.2.3.254
List routing table
Set default gateway to 1.2.3.254
• host pixelbeat.org
Lookup DNS ip address for name or vice versa
Lookup local ip address (equivalent to host
`hostname`)
Lookup whois info for hostname or ip address
• hostname -i
• whois pixelbeat.org
• netstat -tupl
List internet services on a system
• netstat -tup
List active connections to/from system
windows networking (Note samba is the package that provides all this windows specific networking support)
• smbtree
nmblookup -A 1.2.3.4
smbclient -L windows_box
mount -t smbfs -o fmask=666,guest
//windows_box/share /mnt/share
Find windows machines. See also findsmb
Find the windows (netbios) name associated with ip
address
List shares on windows machine or samba server
Mount a windows share
Send popup to windows machine (off by default in
XP sp2)
text manipulation (Note sed uses stdin and stdout. Newer versions support inplace editing with the -i
option)
echo 'message' | smbclient -M windows_box
sed 's/string1/string2/g'
sed 's/\(.*\)1/\12/g'
Replace string1 with string2
Modify anystring1 to anystring2
sed '/^ *#/d; /^ *$/d'
sed ':a; /\\$/N; s/\\\n//; ta'
Remove comments and blank lines
Concatenate lines with trailing \
sed 's/[ \t]*$//'
Remove trailing spaces from lines
Escape shell metacharacters active within double
quotes
sed 's/\([`"$\]\)/\\\1/g'
• seq 10 | sed "s/^/
/; s/ *\(.\{7,\}\)/\1/"
• seq 10 | sed p | paste - -
Right align numbers
Duplicate a column
sed -n '1000{p;q}'
sed -n '10,20p;20q'
Print 1000th line
Print lines 10 to 20
sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'
sed -i 42d ~/.ssh/known_hosts
Extract title from HTML web page
Delete a particular line
sort -t. -k1,1n -k2,2n -k3,3n -k4,4n
• echo 'Test' | tr '[:lower:]' '[:upper:]'
Sort IPV4 ip addresses
Case conversion
• tr -dc '[:print:]' < /dev/urandom
• tr -s '[:blank:]' '\t' </proc/diskstats | cut -f4
Filter non printable characters
cut fields separated by blanks
• history | wc -l
Count lines
set operations (Note you can export LANG=C for speed. Also these assume no duplicate lines within a file)
sort file1 file2 | uniq
sort file1 file2 | uniq -d
Union of unsorted files
Intersection of unsorted files
sort file1 file1 file2 | uniq -u
sort file1 file2 | uniq -u
Difference of unsorted files
Symmetric Difference of unsorted files
join -t'\0' -a1 -a2 file1 file2
join -t'\0' file1 file2
Union of sorted files
Intersection of sorted files
join -t'\0' -v2 file1 file2
join -t'\0' -v1 -v2 file1 file2
math
Difference of sorted files
Symmetric Difference of sorted files
• echo '(1 + sqrt(5))/2' | bc -l
Quick math (Calculate φ). See also bc
• seq -f '4/%g' 1 2 99999 | paste -sd-+ | bc -l
Calculate π the unix way
More complex (int) e.g. This shows max FastE
• echo 'pad=20; min=64; (100*10^6)/((pad+min)*8)' | bc
packet rate
•
echo 'pad=20; min=64; print (100E6)/((pad+min)*8)' |
Python handles scientific notation
python
•
echo 'pad=20; plot [64:1518]
(100*10**6)/((pad+x)*8)' | gnuplot -persist
• echo 'obase=16; ibase=10; 64206' | bc
Plot FastE packet rate vs packet size
• units -t '100m/9.58s' 'miles/hour'
Base conversion (decimal to hexadecimal)
Base conversion (hex to dec) ((shell arithmetic
expansion))
Unit conversion (metric to imperial)
• units -t '500GB' 'GiB'
• units -t '1 googol'
Unit conversion (SI to IEC prefixes)
Definition lookup
• seq 100 | (tr '\n' +; echo 0) | bc
calendar
Add a column of numbers. See also add and funcpy
• cal -3
• cal 9 1752
Display a calendar
Display a calendar for a particular month year
• date -d fri
What date is it this friday. See also day
exit a script unless it's the last day of the
month
What day does xmas fall on, this year
• echo $((0x2dec))
• [ $(date -d '12:00 +1 day' +%d) = '01' ] || exit
• date --date='25 Dec' +%A
• date --date='@2147483647'
Convert seconds since the epoch (1970-01-01 UTC)
to date
• TZ='America/Los_Angeles' date
What time is it on west coast of US (use tzselect
to find TZ)
•
date --date='TZ="America/Los_Angeles" 09:00 next
Fri'
locales
What's the local time for 9AM next Friday on west
coast US
• printf "%'d\n" 1234
Print number with thousands grouping appropriate
to locale
• BLOCK_SIZE=\'1 ls -l
• echo "I live in `locale territory`"
Use locale thousands grouping in ls. See also l
Extract info from locale database
• LANG=en_IE.utf8 locale int_prefix
Lookup locale info for specific country. See also
ccodes
locale -kc $(locale | sed -n
's/\(LC_.\{4,\}\)=.*/\1/p') | less
recode (Obsoletes iconv, dos2unix, unix2dos)
•
List fields available in locale database
• recode -l | less
Show available conversions (aliases on each line)
recode windows-1252.. file_to_change.txt
Windows "ansi" to local charset (auto does CRLF
conversion)
recode utf-8/CRLF.. file_to_change.txt
recode iso-8859-15..utf8 file_to_change.txt
Windows utf8 to local charset
Latin9 (western europe) to utf8
recode ../b64 < file.txt > file.b64
recode /qp.. < file.qp > file.txt
Base64 encode
Quoted printable decode
recode ..HTML < file.txt > file.html
• recode -lf windows-1252 | grep euro
Text to HTML
Lookup table of characters
• echo -n 0x80 | recode latin-9/x1..dump
• echo -n 0x20AC | recode ucs-2/x2..latin-9/x
Show what a code represents in latin-9 charmap
Show latin-9 encoding
• echo -n 0x20AC | recode ucs-2/x2..utf-8/x
CDs
Show utf-8 encoding
gzip < /dev/cdrom > cdrom.iso.gz
mkisofs -V LABEL -r dir | gzip > cdrom.iso.gz
Save copy of data cdrom
Create cdrom image from contents of dir
mount -o loop cdrom.iso /mnt/dir
cdrecord -v dev=/dev/cdrom blank=fast
Mount the cdrom image at /mnt/dir (read only)
Clear a CDRW
gzip -dc cdrom.iso.gz | cdrecord -v dev=/dev/cdrom -
Burn cdrom image (use dev=ATAPI -scanbus to
confirm dev)
cdparanoia -B
Rip audio tracks from CD to wav files in current
dir
cdrecord -v dev=/dev/cdrom -audio -pad *.wav
Make audio CD from all wavs in current dir (see
also cdrdao)
oggenc --tracknum=$track track.cdda.wav -o track.ogg Make ogg file from wav file
disk space (See also FSlint)
• ls -lSr
Show files by size, biggest last
• du -s * | sort -k1,1rn | head
• du -hs /home/* | sort -k1,1h
Show top disk users in current dir. See also dutop
Sort paths by easy to interpret disk usage
• df -h
• df -i
Show free space on mounted filesystems
Show free inodes on mounted filesystems
• fdisk -l
Show disks partitions sizes and types (run as
root)
• rpm -q -a --qf '%10{SIZE}\t%{NAME}\n' | sort -k1,1n
List all packages by installed size (Bytes) on rpm
distros
•
dpkg-query -W -f='${InstalledSize;10}\t${Package}\n' | sort -k1,1n
• dd bs=1 seek=2TB if=/dev/null of=ext3.test
List all packages by installed size (KBytes) on
deb distros
Create a large test file (taking no space). See
also truncate
• > file
monitoring/debugging
truncate data of file or create an empty file
• tail -f /var/log/messages
• strace -c ls >/dev/null
Monitor messages in a log file
Summarise/profile system calls made by command
• strace -f -e open ls >/dev/null
• strace -f -e trace=write -e write=1,2 ls >/dev/null
List system calls made by command
Monitor what's written to stdout and stderr
• ltrace -f -e getenv ls >/dev/null
• lsof -p $$
List library calls made by command
List paths that process id has open
• lsof ~
List processes that have specified path open
Show network traffic except ssh. See also
tcpdump_not_me
List processes in a hierarchy
• tcpdump not port 22
• ps -e -o pid,args --forest
•
ps -e -o pcpu,cpu,nice,state,cputime,args --sort
pcpu | sed '/^ 0.0 /d'
List processes by % cpu usage
• ps -C firefox-bin -L -o pid,tid,pcpu,state
List processes by mem (KB) usage. See also
ps_mem.py
List all threads for a particular process
• ps -p 1,$$ -o etime=
• last reboot
List elapsed wall time for particular process IDs
Show system reboot history
• free -m
• watch -n.1 'cat /proc/interrupts'
Show amount of (remaining) RAM (-m displays in MB)
Watch changeable data continuously
• ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS
• udevadm monitor
Monitor udev events to help configure rules
system information (see also sysinfo) ('#' means root access is required)
• uname -a
Show kernel version and system architecture
• head -n1 /etc/issue
Show name and version of distribution
• cat /proc/partitions
• grep MemTotal /proc/meminfo
Show all partitions registered on the system
Show RAM total seen by the system
• grep "model name" /proc/cpuinfo
• lspci -tv
Show CPU(s) info
Show PCI info
• lsusb -tv
Show USB info
List mounted filesystems on the system (and align
output)
Show state of cells in laptop battery
• mount | column -t
• grep -F capacity: /proc/acpi/battery/BAT0/info
# dmidecode -q | less
# smartctl -A /dev/sda | grep Power_On_Hours
# hdparm -i /dev/sda
Display SMBIOS/DMI information
How long has this disk (system) been powered on in
total
Show info about disk sda
# hdparm -tT /dev/sda
# badblocks -s /dev/sda
interactive
Do a read speed test on disk sda
Test for unreadable blocks on disk sda
• readline
Line editor used by bash, python, bc, gnuplot, ...
• screen
Virtual terminals with detach capability, ...
Powerful file manager that can browse rpm, tar,
ftp, ssh, ...
Interactive/scriptable graphing
• mc
• gnuplot
• links
• xdg-open .
Web browser
open a file or url with the registered desktop
application