Class notes

Transcription

Class notes
Mathematics 1264: C/C++ programming, Hilary 2015
´ D´unlaing
Colm O
January 27, 2015
Note to student. The course notes will be published on the web, section by section: the exam
syllabus will be included some weeks before the end of the semester.
Plan of course (tentative schedule, subject to change)
• Week 1:
Hexadecimal numbers, machine code, assembler code, languages. C and C++.
Hello world in C and in C++
char, short, int, long, float, double, pointer, arrays, unsigned types. Sign extension.
• Week 2:
Programming assignment 1
Integer arithmetic
Variables
Assignment statements — day of week
Quiz 1
• Week 3:
Arrays and initialisation
Programming assignment 2
for-loops and output
While-loops
Command-line arguments
Input through scanf() and fgets() and >>
• Week 4:
Programming assignment 3
If-statements
Functions and subroutines
Simulating functions and subroutines
Quiz 2
1
• Week 5:
Programming assignment 4
Functions and subroutines continued: more about variables.
2-dimensional arrays
C string library
Pointers, Malloc() and Calloc().
• Week 6:
Programming assignment 5
Structured types in C and C++
Matrix example
Quiz 3
• Week 7 is Reading Week.
• Week 8:
Programming assignment 6
Classes in C++
C++ Standard Template Library (STL)
• Week 9:
Programming assignment 7
STL continued
Quiz 4
• Week 10:
Programming assignment 8
Matrix example again
St. Patrick’s Day
C++ armadillo linear algebra library
• Week 11:
Last Programming assignment 9?
Files
The cut-and-paste principle
Quiz 5
• Week 12:
Review
Good Friday
2
1 Hex numbers, machine code, assemblers, languages
1.1 Octal and hexadecimal numbers
Our decimal number system is derived from the human hand.
All computer data is stored as patterns of 0s and 1s. A ‘bit’ is a binary digit, i.e., 0 or 1, or an
object which can take these values. There is a multiplicative effect, so that 8 bits combined together
can take 28 = 256 different values.
A byte is a group of 8 bits.
The binary string 01001000 represents
0 + 0 × 2 + 0 × 22 + 1 × 23 + 0 × 24 + 0 × 25 + 1 × 26 + 0 × 27 = 8 + 64 = 72
(that is, 72 in decimal, of course).
It is easy to list the binary strings of length 3 in ascending order:
000, 001, 010, 011,
100, 101, 110, 111
The rightmost bit is called the ‘low-order bit.’ The ‘low order bit’ changes most often; the next bit
changes half as often; the high-order bit changes only once.
It is easy to convert a bitstring into an octal string. Simply put it in groups of 3, starting from the
right. Thus
01001000
01 001 000
1
1
0
On the other hand, interpreting 110 as an octal string we get
0 + 1 × 8 + 1 × 82 = 72
(again, 72 in decimal).
It is a coincidence that all the octal digits in 01 001 000 are 0 or 1, so it ‘looks’ like a binary
number. To correct the ambiguity, one can use (. . .)b to indicate ‘to base b’.
Then without ambiguity
(01001000)2 = (110)8 = (72)10
Octal numbers give a compact way to represent bitstrings. So do hexadecimal numbers, which
are numbers to base 16. We need 16 hex digits to form hexadecimal numbers. One uses a,b,c,d,e,f
(or A,B,C,D,E,F) for the digits ≥ 10. Every hex digit equals four binary digits. The hex digits
convert to octal, binary, and decimal as follows
3
hex
0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
octal
0
1
2
3
4
5
6
7
10
11
12
13
14
15
16
17
binary
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
decimal
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
There are procedures for addition, subtraction, multiplication, and division, in binary, octal, and
hex. Addition is easy. For example (each calculation is ‘staggered’ from right to left to show the
‘carries’).
binary
10100011
+11010101
---------10
10
10
1
1
1
10
----------That is
10100011
+11010101
----------101111000
octal
hex
Decimal
243
+325
---10
7
5
---i.e.
243
+325
---570
a3
+d5
--8
17
--i.e
c3
+d5
--178
163
+213
---376
Multiplication and division in binary involve a fairly large number of very simple steps. Except
for trivial cases, they are always ‘long multiplication’ and ‘long division.’ We shall not attempt them
by hand.
4
1.2 Features of a computer
A computer has several components, including Central memory, central processor, hard disc, and
terminal (or monitor).
Long-term data is on the hard disc; the central processor works on short-term data in the central
memory.
Here is a C program
#include <stdio.h>
main()
{
printf("Hello\n");
printf("there\n");
}
Create a file hello.c containing the above lines, then run
gcc hello.c
This will create a file a.out which the computer can run as a program:
aturing% a.out
(The ‘aturing% ’ is a ‘command-line prompt.’)
will cause the message
Hello
there
to be written to the terminal.
Question: what’s the ’\n’ for?
a.out is in machine code. A computer accepts instructions in a very compact form called its
machine code. A machine program (also called a ‘binary’ or ‘executable’) is a list of instructions in
machine code. In the 1970s, with small microprocessors, it was common to write programs directly
in machine code. Here is an example of machine code. Nowadays, most machine-code programs
have thousands of lines like these. For example (I think that this tabulates the a.out file compiled
from the above program, but it may be something different) on an Intel computer. The instructions
are given in hex.
Memory
Address
Machine
instructions------------------------------------
5
00000210
00000220
00000230
00000240
00000250
00000260
00000270
00000280
00000290
000002a0
000002b0
000002c0
000002d0
69
74
32
01
10
d8
d4
00
ff
ff
ff
31
68
6e
61
2e
00
69
95
95
00
35
25
25
ed
c0
5f
72
30
01
69
04
04
00
c8
d0
d4
5e
83
75
74
00
00
0d
08
08
e8
95
95
95
89
04
73
5f
00
24
00
06
07
c8
04
04
04
e1
08
65
6d
00
00
00
05
02
00
08
08
08
83
51
64
61
02
00
02
00
00
00
ff
68
68
e4
56
00
69
00
00
00
00
00
00
25
00
08
f0
68
5f
6e
02
10
56
d0
55
e8
cc
00
00
50
84
5f
00
00
00
00
95
89
f3
95
00
00
54
83
6c
47
01
00
00
04
e5
01
04
00
00
52
04
69
4c
00
00
00
08
83
00
08
e9
e9
68
08
62
49
00
00
00
07
ec
00
00
e0
d0
20
e8
63
42
00
00
00
01
08
c9
00
ff
ff
84
bf
5f
43
00
00
00
00
e8
c3
00
ff
ff
04
ff
73
5f
00
00
00
00
61
00
00
ff
ff
08
ff
(1.1) Although letters on the terminal look like ordinary newsprint, say, under close inspection the
letters spelling Hello are just patterns of dots, something like
How are these letters stored on a computer? they could be stored as 7 × 5 patterns of zeroes and
1s, where 0 means ‘no dot’ and 1 means ‘dot.’ This would require 35 bits per letter. Instead, all
characters are stored as 8-bit patterns under an internationally agreed code, the ASCII code. To learn
more, type
man ascii
ASCII code for H is 01001000, for e is 01100101, and so on.
Question. The ASCII code for H has octal value 110. The ASCII code for e is 01100101 as a
bitstring. What is it in octal? in decimal?
Figure 1 shows the basic computer components.
Conclusions. The computer stores all data as patterns of 0s and 1s, called bitstrings. All characters appear on the screen as patterns of dots.
Central memory, processor, hard disc. When you have edited and saved your program hello.c,
it is now stored on the hard disc. (in ASCII, of course). It is data.
It is the processor which does the work of the computer. Its job is to read instructions from
central memory and execute them. The instructions are contained in executable programs.
When you type
gcc hello.c
the computer copies an executable program called gcc into central memory, then executes that program on the data contained in hello.c. It produces a new executable program which is usually
called a.out and stores it on disc.
When you type (on ‘jbell’, say)
6
01100110110001001
01100110110001001
hard
disc
central
memory
01100110110001001
processor
01001000011001010110110001101100011011110000101000000000
01110100011010000110010101110010011001010000101000000000
Hello
there
terminal
Figure 1: Parts of a computer
%jbell a.out
the computer copies a.out into central memory and executes it, with the results as described.
2 Anatomy of a C program, and a C++
Here is a Hello, World program in C.
#include <stdio.h>
main()
{
printf("Hello, World\n");
}
• The printf() statement prints the message on the terminal (screen, monitor). This action is
called output.
• The statement printf() is not ‘part’ of the C language; it is a separate routine whose general
properties are in the file stdio.h which is stored in some recognised place in the computer.
The #include statement is necessary; otherwise gcc will not recognise the printf()
statement.
The file stdio.h is called a ‘header file.’ Hence the suffix .h.
• The real business of the program is in the
7
main ()
{ .... }
Every C program must contain this — called the ‘main routine.’
• The C program should be stored in a file hello.c or something: the .c suffix shows it is a C
program.
• gcc hello.c produces an executable file a.out as already discussed.
And here is one in C++
#include <iostream>
using namespace std;
int main ()
{
cout << "Hello, World" << endl;
return 0;
}
• This time there is no printf() statement; the output is produced by cout << ...ectetera.
cout represents the terminal, and the important facts are stored in the file iostream (no .h).
The endl is ‘end-of-line.’ One can use "\n" as in C, or put the \n after ‘World.’
• using namespace std; has to be there (semicolon and all). It involves some complicated
ideas which we ignore for now.
• The main() routine is presented slightly differently, with the int and the return 0;. This
is not of much interest.
• This should be stored in a file like hello.cpp. The .cpp suffix indicates a C++ program.
To compile it, use g++ rather than gcc:
aturing% g++ hello.cpp
• The executable program will be in a.out, just as with the C program.
3 Various types of computer data
Machine instructions generally manipulate data stored in central memory. Data is organised as follows (there is some repetition here).
• The fundamental unit of data is a bit, something which can have two values, 0 or 1. Central
memory is a very large collection of bits, possibly billions.
8
• Before the 1970s central memory was composed of many (about a million) small doughnutshaped magnets threaded together with copper wire and called magnetic core memory. Hence
the word ‘core’ used to mean central memory, and ‘core dump’ for a display of the contents
of central memory (usually following a program crash). Nowadays, billions of bits of memory
are stored on a single chip.
• Bits are never read singly. Memory is grouped into 8-bit units called bytes. Each byte then can
have 28 = 256 values. A byte then corresponds to a number in the range 0 (00000000) to 255
(11111111).
• The ascii character set maps all printable characters, such as 0, a, &, *, to byte values. Also,
nonprintable characters such a carriage return, backspace, ctrl-U, etcetera (ctrl-G is 07 in Hex.
It should make a sound when pressed — or printed).
• As far as I know, the smallest piece of data in C is a single byte, and the keyword is char
because of the ascii conventions. In other words, when you need to present data byte-by-byte
in a C program, you will use the word char.
• Next is short (short integer). In our system this appears to be two bytes with 65536 different
values.
In the 1990s the default integer length was 16 bits (short). Now that memory is much more
abundant, the default is 32 bits.
• Next is int (integer), 32 bits. The range is from −2147483648 to 2147483647. About ±2
billion.
• Next is long. On 32-bit machines this appears to be 4 bytes, on 64-bit machines this is 8
bytes.
• Memory addresses are important in C. There is no special keyword for ‘memory address’ —
they are introduced in another way — but all memory addresses occupy 4 bytes or 8 bytes.
On 32-bit machines the range is 0 . . . 232 − 1. The highest memory address is 4294967295, 4
gigabytes.
• Next is float. In our system this appears to be a 4-byte representation of floating-point
numbers.
• Next is double. In our system this appears to be a 8-byte representation of floating-point
numbers.
The following program shows the size (number of bytes) in each data type. It uses features of C
which will not be introduced until late in the term.
#include <stdio.h>
main()
{
9
printf("char %d bytes\n", sizeof(char));
printf("short %d bytes\n", sizeof(short));
printf("int %d bytes\n", sizeof(int));
printf("float %d bytes\n", sizeof(float));
printf("long %d bytes\n", sizeof(long));
printf("double %d bytes\n", sizeof(double));
/*
* Working with addresses is an advanced topic.
* Just to give a foretaste,
* ‘short *’ means ‘address of a short integer,’,
* ‘int *’ means ‘address of an ‘int’’, and
* so on. All these addresses are 4 or 8 bytes,
* depending on the machine.
*/
printf("address
printf("address
printf("address
printf("address
printf("address
printf("address
of
of
of
of
of
of
char %d bytes\n", sizeof(char * ));
short %d bytes\n", sizeof(short * ));
int %d bytes\n", sizeof(int * ));
float %d bytes\n", sizeof(float * ));
long %d bytes\n", sizeof(long * ));
double %d bytes\n", sizeof(double * ));
}
Output when run on my 32-bit office PC:
char 1 bytes
short 2 bytes
int 4 bytes
float 4 bytes
long 4 bytes
double 8 bytes
address of char 4 bytes
address of short 4 bytes
address of int 4 bytes
address of float 4 bytes
address of long 4 bytes
address of double 4 bytes
Output when run on the 64-bit machine aturing:
char 1 bytes
short 2 bytes
int 4 bytes
10
float 4 bytes
long 8 bytes
double 8 bytes
address of char 8 bytes
address of short 8 bytes
address of int 8 bytes
address of float 8 bytes
address of long 8 bytes
address of double 8 bytes
Here are the internal representations of various numbers (Most of them need explanation)
char z:
short 43:
short -43:
short -9:
short -32768:
short 32767:
int -2:
int 300:
int -300:
int 70000:
int -70000:
int -2147483648:
int 2147483647:
long -3:
float 1234.560059:
double 1234.560000:
string hello:
7a
2b
d5
f7
00
ff
fe
2c
d4
70
90
00
ff
fd
ec
0a
68
00
00
ff
ff
80
7f
ff
01
fe
11
ee
00
ff
ff
51
d7
65
00
00
00
00
00
00
ff
00
ff
01
fe
00
ff
ff
9a
a3
6c
00
00
00
00
00
00
ff
00
ff
00
ff
80
7f
ff
44
70
6c
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
3d
6f
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
4a
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
93
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
40
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
4 Integer arithmetic
(4.1) 2s complement. A short integer is 4 hex digits, 16 bits, or 2 bytes long so it can represent at
most 216 = 65536 different integers. We might expect it to take values 0 to 65535, but instead half of
the values are negative. The range of values is from −32768 to 32767.
Notice that 43 decimal is represented as 2b 00 hex. This shows that on our machines the first
byte is low-order, the second is high-order. It is said humorously that on Intel processors, numbers
are stored little-endian, meaning that the low-order byte (but not the low-order bit) is stored before
the high-order byte. We should preferably write it with high-order byte first:
00 2b
This represents 2 ∗ 16 + 11 = 43, as expected.
Next, −9 is represented as f7 ff, or, high-order byte first, ff f7. Normally ff ff would
represent 216 − 1 and ff f7 would be 216 − 9. The general rules are as follows.
11
• Let N = 215 . (The same idea holds for 32-bit and 64-bit integers, except that then
N = 231 = 2, 147, 483, 648
N = 263 = 9, 223, 372, 036, 854, 775, 808
respectively.)
• An integer x is in short integer range if
−N ≤ x ≤ N − 1.
• If −N ≤ x < N − 1, then the 2s-complement form of x is
(
x if 0 ≤ x ≤ N − 1
2N + x if −N ≤ x ≤ −1
Thus a 2s-complement short integer has ‘face value’ between 0000 and f f f f (hexadecimal) or
0 and 65535 (decimal), and the signed integer it represents is in the range −32768 . . . 32767.
• If x′ and y ′ are 2s-complement integers, then their 2s-complement sum is
x′ + y ′ mod (2N )
i.e., x′ + y ′ mod 65536.
• Modular arithmetic: x mod y is the remainder on dividing x by y. For example, 11 mod 4 = 3.
(4.2) Proposition Let x and y be two integers within the range of short integers, i.e., −32768 ≤
x, y ≤ 32767. If x + y is also in this range, then 2s-complement addition will produce the correct
answer in 2s-complement form.
Partial proof. If x and y are both nonnegative, then 0 ≤ x + y < 215 and there is no carry to the
16th bit.
If x and y are both negative, and x + y is in range, then 216 > 216 + x + y ≥ 215 . But (216 + x) +
(216 + y) = 216 + (216 + x + y). The remainder modulo 216 is 216 + x + y, and it is between 215 and
216 − 1, which is correct.
Case one positive, the other nonnegative: skippped.
(4.3) Converting decimal to short. Positive numbers can be converted to hexadecimal by repeatedly
dividing by 16. For example, to convert 12345 to hex,
12345 ÷ 16 = 771 remainder 9, i.e., 12345 = 16 × 771 + 9
771 ÷ 16 = 48 remainder 3, i.e., 771 = 16 × 48 + 3
48 ÷ 16 = 2 remainder 0, i.e., 48 = 16 × 3 + 0
12345 = 16 × (16 × (16 × 3 + 0) + 3) + 9
12345 = 163 × 3 + 162 × 0 + 16 × 3 + 9
(12345)10 = (3039)16
12
To convert a negative integer x to 2s-complement, first convert |x| to hex, then subtract from
ffff, then add 1. This is the same as subtracting from 21 6, as required.
For example, to convert −12345 to 2s-complement short integer,
(12345)10 = (3039)16
f f f f − 3039 = cf c6
cf c6 + 1 = cf c7
Little endian: c7 cf
Negatives. If x is in short integer range, and so is −x, and the short integer representation of x is
y, then −y is represented as (216 − y), whether x is positive or negative.
The only case where x is in range, but not −x, is x = −32768, where y = 216 − y = 32768.
(4.4) Floating point numbers are the computerese version of high-precision decimal numbers.
They can be broken down into exponent e and mantissa m (both integers) and represent m ∗ 2e .
Further details later.
(4.5) To disinguish between hex and decimal, we write, for example, (23)16 = (35)10 .
5 Variables
An integer variable in C or C++ is a named item stored as an integer. It must be declared. Its value
can be altered through assignment statements.
#include <stdio.h>
main()
{
int x,y;
x = 1; y = 2;
printf("x is %d, y is %d, x+y is %d\n", x, y, x+y);
}
This example shows two integer variables. To output any information about them, you need the
formatting in the printf() statement. It is easy to guess what it prints.
The string
"x is %d, y is %d, x+y is %d\n"
is called a format string. The values of x, y, x + y are inserted into the three places where %d occurs.
There are two variants of the %d format.
• %8d will insert an 8-digit number, right justified padded on the left with blanks. If the number
is already at least 8 digits long, the 8 in %8d has no effect.
• %08d is like %8d, but it pads with zeroes, not blanks.
13
6 For loops
Here is a simple program.
#include <stdio.h>
main()
{
int i;
for ( i=0; i<5; i = i+1 )
printf( "hello\n");
}
The output is
hello
hello
hello
hello
hello
• Every C program must contain one section main(){ ...}.
• This program uses one variable, an integer i.
• printf() prints to the terminal. It is essential, but it is not part of the C language proper.
The line
#include <stdio.h>
tells gcc that there is a file (somewhere) called stdio.h which needs to be included. It helps
explain the printf() statement.
• The text "hello\n" is called a character-string constant. It includes the newline (or carriagereturn) character \n.
• The statement
i = i+1;
means replace the variable i (stored somewhere in central memory) by the new value i+1.
There is a shorter way to write this:
14
++i;
This abbreviation should be used with care: it is more complicated than it looks.
• The for (...)
... statement is called a for-loop. It operates as follows.
• i is set to 0, then compared to 5.
• 0 < 5, so the statement printf("hello\n"); is executed.
• i is incremented to 1, and again compared to 5.
• 1 < 5, so the print statement is executed.
• And so on, with i = 0, 1, 2, 3, 4. Then i is incremented to 5, 5 is not < 5, so the loop
terminates and the program terminates.
TEMPLATE for a for-loop
for
(
<initial action>
<while condition holds true>
<between-step action>
;
;
<statement>
OR
;
)
{
<group of statements>
}
Indentation. A group of statements should be indented further than the curly braces, which
should be level with the ‘for.’ A single statement should be indented further than the ‘for.’ (Indentation makes it easier to understand the program structure.)
We can have a single statement
printf("hello\n");
or a group of statements, each terminated by semicolon, and the group between braces — see
below.
BEST PRACTICE. It is better to group statements between braces, even when there is only one.
Semicolons. There must be a semicolon after each statement, including the last in a group.
While condition is true: Only the condition is given, such as i<7.
The symbol < means ‘less than,’ of course. Other relations include
15
Mathematical form
≤
=
≥
>
6=
C form
<=
==
>=
>
!=
Here is another example.
#include <stdio.h>
main()
{
int i, j;
for ( i=0; i<5; ++i )
{
for ( j=0; j<i; ++j )
printf ( " " );
printf( "hello\n");
}
}
The output is
hello
hello
hello
hello
hello
Printing strings and integers. The general printf statement has the form
printf ( <format control string>, item_1, ... item_n );
The minimal possibility is where no items are printed, just the format string. This was used in
printf("hello\n");.
More generally, the items are matched with parts of the control string to produce a formatted
output. (Hence the f, for formatted, in printf.) If the item is a character string like "hello\n",
it should be matched by %s. If it is an integer (or a short integer), it should be matched by %d.
For example, the following code prints out a multiplication table.
16
#include <stdio.h>
main()
{
int n, i;
n = 7;
printf("%d times table\n\n", n);
for ( i=0; i<10; ++i )
printf("%d times %d is %d\n", n, i, n*i );
}
Note: n*i means n × i.
The output is
7 times table
7
7
7
7
7
7
7
7
7
7
times
times
times
times
times
times
times
times
times
times
0
1
2
3
4
5
6
7
8
9
is
is
is
is
is
is
is
is
is
is
0
7
14
21
28
35
42
49
56
63
It would look better if the 0 and 7 in the first two lines were aligned with the right-hand ends of
the lines below them. This can be done with the statement
printf("%d times %d is %2d\n", n, i, n*i );
The general %d-format rules are
• %d causes an integer (or short integer) value to be converted to the shortest possible ASCII
string and printed.
• %5d causes an integer to be converted to an ASCII string of length ≥ 5 and printed. If (counting
digits and possible minus sign) there are < 5 ASCII characters, it is padded on the left with
blanks.
17
There are two more variations.
• %07d causes an integer to be converted to an ASCII string of length ≥ 7 and printed. If
necessary, it is padded with zeroes on the left. If negative, the zeroes come after the minus
sign, of course.
• %-07d is like %07d except padding is with blanks on the right.
The minus sign is about alignment, and has nothing to do with the fact that numerical data is
being printed. Note that it cancels out the zero-padding!
• Summary. Integers are formatted in printf statements by including the following in the
format control string. Special notation: the square brackets are for optional items. The
angle brackets are for descriptions.
%[-][0][h minimum width i]d
Examples:
%d
%3d
%-3d
%010d
%-10d
Rules for formatting a character string are simpler. Strings are formatted in printf statements
by
%[-][h minimum width i]s
Examples:
%s
%10s
%-10s
7 Printf formats tabulated, first draft
%d
%8d
%08d
%-8d
%c
%s
%8s
%-8s
%x
%o
signed decimal output
right justify with blanks
right justify with zeroes
left justify
(ascii) character
string
right justify
left justify
hexadecimal
octal
18
8 Assignment statements with modular arithmetic
Arithmetic assignment statements use the following operators (and more):
+
-
*
/
%
• ‘*’ stands for multiplication, of course
• ‘/’, when applied to integers, means integer division, that is, it is rounded to an integer
• ‘%’ means integer remainder on division, and is a variant of the mathematical ‘remainder modulo’ operator.
• When m and n are both positive, then m/n is the quotient (rounded down) and m%n is the
remainder, which is nonnegative.
• When m is negative and n positive, then m/n is rounded up and m%n is negative, which is not
the same as the mathematical form.
• One needs to allow for this. For example, assuming n is positive, (m − 1) mod n should be
(assuming m is nonnegative) be converted to
(m+n-1)%n
• Division and remainder when n is negative — whatever the rules are, they are not worth remembering; there is no reason one should ever want to perform integer division by a negative
number.
Example. Given a date in the form dd mm yy, where these three numbers are positive integers
in the correct ranges, and it is understood that the date is in this century, then the following expression
gives a number between 0 and 6 where 0 means Sunday and 6 means Saturday:
yy
yy + ⌊ ⌋ + offset[mm − 1] + dd + C
mod 7
4
where offset[] is an integer array. This hasn’t been introduced yet, but its usage is rather intuitive.
int offset[12] = {0,3,3,6,1,4,6,2,5,0,3,5};
and C is a correction for leap years: in January and February of a leap year, subtract 1, because the
extra day only ‘kicks in’ in March.
Allowing some leeway with integer arrays, you know enough now to convert the above formula to
C or C++ code, except for the ‘correction term’ C, which requires an if-statement, not yet introduced.
19
9 Command-line arguments
To be able to supply your program with command-line arguments, add a bit to your main() section.
First, more notation about characters and character strings.
• A single character (as opposed to a ‘string’ of characters) is represented with a single quote,
such as
’a’, ’A’, ’\n’, ’\0’
• The null character is represented as ’\0’, 8 zero-bits or 00000000
• A character string is an array of characters, terminated with a null character. For example,
"hello\n"
is stored as an array of seven characters, including the final null character.
• For technical reasons, a character string may be declared using a * rather than a [] notation,
e.g.,
char * x
#include <stdio.h>
main ( int argc, char * argv[] )
{
int i;
for (i=0; i<argc; ++i)
printf ( "%s\n", argv[i] );
}
If one compiles this program and types
a.out a quick brown fox
the four character strings "a", "quick", "brown", and ”fox” are called command-line arguments. The result is
a.out
a
quick
brown
fox
20
Partial explanation. You can, as shown, use to argc as you would use an integer variable. It
means the number of character strings on the ‘command line,’ including the a.out. The minimum
value is 1.
The variable argv is an array of character strings. Its size is not given, but argv[i] is the i-th
command argument, valid for i between 0 and argc-1.
The command-line arguments are character strings, but they can be converted to integers, etcetera,
through another #include:
#include <stdlib.h>
If x is a character string, then
atoi (x)
is the integer value of x. If x does not represent an integer then atoi(x) is just zero.
For example,
#include <stdio.h>
#include <stdlib.h>
main ( int argc, char
{
int dd, mm, yy;
dd = atoi ( argv[1]
mm = atoi ( argv[2]
yy = atoi ( argv[3]
* argv[] )
);
);
);
printf ("Date is %02d/%02d/%02d\n", dd, mm, yy);
}
10
If-statements
(10.1) Conditions and if-statements. An if-statement has the form (mind the INDENTATION)
if ( <condition> )
<statement; or {group}>
and --- optionally --else
<statement; or {group}>
21
The condition must be in parentheses.
if ( <condition> ) . . .
Programming languages usually use the word ‘then.’ C doesn’t. The condition is in parentheses
and ‘then’ is understood.
Statement or group of statements? It is best practice to use curly brackets always, as otherwise
one gets into a mess. (If I forget to do so, remind me.)
if ( x == 1 )
{
printf ("hello\n");
}
else
{
printf ("goodbye\n");
}
Conditions are converted to integers. In a.out the condition argc == 2 is tested and an
integer produced: 1 for true and 0 for false. More generally, any integer value can be used as a
condition; nonzero is treated as true and zero as false.
Complex if-statements. The basic ‘if-statement’ relations are
==, <, <=, >, >=, !=
They can be grouped into more complex statements using
&& for ‘and,’
|| for ‘or,’ and
! for ’not.’
For example, to test if a 4-digit year is a leap-year,
if ( yy % 400 == 0 || ( yy % 4 == 0 && yy % 100 != 0 ) )
Every fourth year is a leap year, except for centuries; every fourth century is a leap year.
More complex conditions can be constructed with
&& || !
for and, or, not. The DOUBLE ampersand and double bar are important; single ampersand and
single bar have a different meaning.
For example, suppose yy represents a year, including the century, not just the last two digits.
According to the Gregorian calendar, a leap year is
• divisible by 4, and
22
• either is not divisible by 100 or is divisible by 400.
Meaning that only one century in 4 is a leap-year; so on average the year is
365
397
400
days long, apparently a good approximation.
This can be expressed in C:
... int leapyear, yy; ....
leapyear =
yy % 4 == 0
&&
( yy % 100 != 0 || yy % 400 == 0 )
;
if ( leapyear ) ....
There are rules about the order of evaluation in the expression
yy % 4 == 0 &&
( yy % 100 != 0 || yy % 400 == 0 )
To be really sure, you can fully parenthesise the expression, getting
(yy % 4 == 0) &&
( (yy % 100 != 0) || (yy % 400 == 0) )
There are certain rules about order of evaluation, but it’s hard to remember them all. Better safe than
sorry.
23