General Information & Communication Technology
Transcription
General Information & Communication Technology
i General Information & Communication Technology 350101 GenICT I & II 2015 Partial Lecture Notes Michael Kohlhase School of Engineering & Science Jacobs University, Bremen Germany [email protected] March 24, 2015 ii Preface This Document This document contains the course notes for the those parts of the course General Information & Communication Technology I & II held at Jacobs University Bremen in the academic year 2014. Contents: The document mixes the slides presented in class with comments of the instructor to give students a more complete background reference. Caveat: This document is made available for the students of this course only. It is still a draft and will develop over the course of the current course and in coming academic years. Licensing: This document is licensed under a Creative Commons license that requires attribution, allows commercial use, and allows derivative works as long as these are licensed under the same license. Course Concept Aims: The course 350101 “General Information & Communication Technology I/II” (GenICT) is a two-semester course that introduces concepts of Computer Science Concepts to non-CS students. The course is co-taught by four Jacobs Computer Science Faculty each covering a quarter of the materials. Course Contents Goal: We want to demonstrate both theoretical foundations of CS as Computer Science, and we want to provide practical knowledge helping students to cope with understanding and handling Computers, electronic documents and data, and the Web. Roughly the first half of the first semester is devoted to theoretical foundations and core concepts (Kohlhase and Jaeger), and the second half of the semester to the practical real-world stuff (Schnwlder and Baumann). Throughout the semester, students will be introduced stepwise to one of the main programming languages of today, Python. Acknowledgments Materials: The presentation of the programming language python uses materials prepared by Dr. Heinrich Stamerjohanns and Dr. Florian Rabe for the ESM Phython modules. GenICT Students: The following students have submitted corrections and suggestions to this and earlier versions of the notes: Kim Philipp Jablonski, Tom Wiesing. Contents Preface . . . . . . . . This Document . Course Concept . Course Contents Acknowledgments I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GenICT 2: Dependable and Secure Software ii ii ii ii ii 1 1 Introduction to Dependability and Security 3 2 Software Errors 7 3 Software Testing 3.1 Software Testing Introduction . 3.2 Functional (Black-Box) Testing 3.2.1 Unit Testing . . . . . . 3.2.2 Integration Testing . . . 3.3 Structural (White-Box) Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 13 13 20 23 4 Software Maintenance 4.1 Motivation . . . . . . . . . . . . . . 4.2 Revision Control Systems . . . . . . 4.2.1 Introduction/Motivation . . . 4.2.2 Centralized Version Control . 4.2.3 Distributed Revision Control 4.3 Bug/Issue Tracking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 28 28 31 33 34 5 Security by Encryption 5.1 Introduction to Crypto-Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Public Key Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Internet Security by Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 37 40 47 . . . . . . . . . . iii Part I GenICT 2: Dependable and Secure Software 1 Chapter 1 Introduction to Dependability and Security Dependable & Secure Software Definition 1.0.1 (Dependability) A system is called dependable if it can maintain its working capacity (work as specified) for a certain time or until the completion of a specified amount of work under given service conditions without forced interruptions. Definition 1.0.2 (Security) Information security, sometimes shortened to InfoSec, is the practice of defending information from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction. Observation: We want our software systems to be dependable and secure (D&S) This module: But how can we ensure this? (what are the consequences if not) ©: Michael Kohlhase D&S Terminology/Taxonomy 3 1 4 CHAPTER 1. INTRODUCTION TO DEPENDABILITY AND SECURITY availability reliability Dependability & Security of systems is analyzed in terms of attributes: what do we want to achieve threats: wrong means: what can be do about that safety attributes integrity maintainability confidentiality what can go D&S These are for general systems (adapt to software systems) fault/bug error threats failure prevention removal means tolerance forecasting ©: Michael Kohlhase 2 Dependability & Security Attributes Definition 1.0.3 D&S attributes are qualities of a system that affect overall D&S: we have availability: readiness for correct service safety: absence of catastrophic consequences on the user(s) and the environment integrity: absence of improper system alteration confidentiality: the absence of unauthorized disclosure of information reliability: continuity of correct service maintainability: ability for a process to undergo modifications and repairs the first five contribute to dependability, the last three to security We will concentrate on “correct service” in this module ©: Michael Kohlhase 3 Dependability & Security Threats Definition 1.0.4 D&S threats are things that can affect a system and cause a drop in the D&S attributes. A fault (also called a bug) is a defect in a system. An error is a discrepancy between the intended behavior of a system and its actual behavior inside the system boundary. 5 A failure is an instance in time when a system displays behavior that is contrary to its specification. Observation: The presence of a fault in a system may lead to a failure.(or not) Example 1.0.5 Input and state conditions may never cause this fault to be executed ; no error ; no failure. Observation: Errors occur at runtime when the system enters an unexpected state due to the activation of a fault. (need debuggers or logs to find) Observation: An error may not necessarily cause a failure. Example 1.0.6 An exception may be thrown by a system but this may be caught and handled using fault tolerance techniques. (system works correctly) ©: Michael Kohlhase 4 The first “Bug”: 1945 Moth found trapped between points at Relay #70, Panel F, of the Mark II Aiken Relay Calculator, while it was being tested at Harvard University, September 9, 1945. ©: Michael Kohlhase 5 Dependability & Security Means Definition 1.0.7 D&S means are measures to break the fault-error-failure chain to increase D&S of a system, they include prevention: stop faults from being incorporated into a system removal of faults from a system (during development & use) 6 CHAPTER 1. INTRODUCTION TO DEPENDABILITY AND SECURITY forecasting: predicts likely faults so that they can be removed or their effects can be circumvented. tolerance: mechanisms that allow a system to still deliver the required service in the presence of faults For this module: we will concentrate on fault removal! testing as a means for finding bugs during development verification as a means for showing the absence of (certain) faults bug/issue-tracking as a means for reporting faults during use. confidentiality and DS-attributesintegrity encryption for ensuring confidentiality and authentication. ©: Michael Kohlhase 6 Chapter 2 Software Errors The first “Bug”: 1945 Moth found trapped between points at Relay #70, Panel F, of the Mark II Aiken Relay Calculator, while it was being tested at Harvard University, September 9, 1945. ©: Michael Kohlhase 7 Myths about Software bugs Benign Bug Hypothesis: Bugs are nice, tame, and logical. Bug Locality Hypothesis: A bug discovered within a component affects only that components behavior. Control Bug Dominance: Most bugs are in the control structure of programs. Corrections Abide: A corrected bug remains correct. Silver Bullets: A language, design method, environment grants immunity from bugs. Sadism Suffices: All bugs can be caught using low cunning and intuition. 7 8 CHAPTER 2. SOFTWARE ERRORS ©: Michael Kohlhase 8 Sources of Software Errors Requirements Definition: Erroneous, incomplete, inconsistent requirements. Design: Fundamental design flaws in the software. Implementation: Mistakes in chip fabrication, wiring, programming faults, malicious code. Support Systems: Poor programming languages, faulty compilers and debuggers, misleading development tools. Inadequate Testing of Software: Incomplete testing, poor verification, mistakes in debugging. Evolution: Sloppy redevelopment or maintenance, introduction of new flaws in attempts to fix old flaws, incremental escalation to inordinate complexity. ©: Michael Kohlhase 9 Effects of Software bugs (Examples) Military Aviation Problems An F-18 crashed because of a missing exception condition: if ... then ... without the else clause that was thought could not possibly arise. In simulation, an F-16 program bug caused the virtual plane to flip over whenever it crossed the equator, as a result of a missing minus sign to indicate south latitude. Year Ambiguities In 1992, Mary Bandar received an invitation to attend a kindergarten in Winona, Minnesota, along with others born in ’88. (Mary was 104 years old) Mr. Blodgetts auto insurance rate tripled when he turned 101.(first driver over 100) His age was interpreted as 1. (program: a teenager is someone under 20!) Dates, Times, and Integers (32, 768 = 215 overflows 16-bit words) A Washington D.C. hospital computer system collapsed on September 19, 1989, 215 days after January 1, 1900, forcing a lengthy period of manual operation. COBOL uses a two-character date field . . . The Linux term program, died word wide on October 26, 1993. Shaky Math program fault in a earthquake simulation program ; 5 US nuclear power plants shut down in 1979(fault discovered after the power plants were built) Problem: sum instead of sum of absolute values(plants to week for large quake) 9 Therac-25 Radiation “Therapy” In Texas, 1986, a man received between 16,500-25,000 rads in less than 1 sec, over an area of about 1 cm. (lost arm, later died) In Texas, 1986, a man received at least 4,000 rads to brain. (died) In Washington, 1987, a patient received 8,000-10,000 rads instead prescribed 86 rads. (died) Bank Generosity A Norwegian bank ATM dispersed 10× the amount. (long lines, great joy) A software flaw caused a UK bank to duplicate every transfer payment request for half an hour. (initial loss: 2 × 109 £ after recovery 5 × 106 £) Making Rupee! An Australian man purchased 104, 500 $ worth of Sri Lankan Rupees. The first bank’s software had displayed a bogus exchange rate in the Rupee position! A judge ruled that the man had acted without intended fraud and could keep the extra 335, 758 $! The next day he sold the Rupees to another bank for 440, 258 $ Bug in BoNY Software The Bank of New York (BoNY) had a 3.2 × 1010 $ overdraft as the result of a 16-bit integer counter that went unchecked. BoNY stuck, while NY Federal Reserve debited BoNY’s cash account. The bug cost BoNY 5 × 109 $ in interest payments. BoNY had to borrow 2.4 × 1010 $ to cover itself for 1 day until a fix. ©: Michael Kohlhase 10 10 CHAPTER 2. SOFTWARE ERRORS Chapter 3 Software Testing 3.1 Software Testing Introduction Software Testing (Intro) Definition 3.1.1 software testing is the process of reviewing or exercising a program with the specific intent of finding errors prior to delivery to the end user. Test Feature Space Testing is a complex and multi-faceted area Level (will cover some) regression acceptance safety system security robustness integration usability reliability Accessibility unit performance correctness white grey black maintainability box box box portability manual interoperability semi-automatic … Automation automatic Quality 320312 Software Engineering (P. Baumann) 7 ©: Michael Kohlhase 11 The Significance of Testing Most widely-used activity for ensuring that software systems satisfy the specified requirements. Consumes substantial project resources. Some estimates: ∼ 50% of development costs 11 12 CHAPTER 3. SOFTWARE TESTING NIST Study 2002: The annual cost of inadequate testing in the US can be as much as 59 billion US dollars. ©: Michael Kohlhase 12 Limitations of Testing Testing cannot occur until after the code is written. The problem is big! Perhaps the least understood major SE activity. Exhaustive testing is not practical even for the simplest programs. WHY? Even if we “exhaustively” test all execution paths of a program, we cannot guarantee its correctness. – The best we can do is increase our confidence! “Testing can show the presence of bug, not their absence.”(Edsger W. Dijkstra) Testers do not have immunity to bugs. Even the slightest modifications after a program has been tested invalidate (some or even all of) our previous testing efforts. Automation is critically important. Unfortunately, there are only a few good tools, and in general, effective use of these good tools is very limited. ©: Michael Kohlhase 13 Testing Methods (General) Definition 3.1.2 Software testing by inspecting the code (automatically or manually) is called static testing. Definition 3.1.3 Software testing by executing code on a given set of test cases is referred to as dynamic testing. Definition 3.1.4 Black-box testing (or functional testing) treats the software as a “black box”, examining functionality without any knowledge of internal implementation. Definition 3.1.5 White-box testing (or structural testing) is a software testing method that takes internal structures or workings of an application into account ©: Michael Kohlhase 14 Specification-based Testing Definition 3.1.6 Specification-based testing aims to test the functionality of 3.2. FUNCTIONAL (BLACK-BOX) TESTING 13 software according to the applicable requirements, usually given as a test suite. Definition 3.1.7 A test suite is a set of test cases: sets of inputs, execution conditions and expected results for a particular test objective. A test case is the smallest entity that is always executed as a unit, from beginning to end. a test suite can be seen as a form of specification Definition 3.1.8 A software requirements specification (SRS, or just specification) is a description of the structure and behavior of a software system, laying out functional and non-functional requirements. A SRS may include a set of use/test cases that describe interactions the users will have with the software. ©: Michael Kohlhase 15 Testing Levels (in the Project Workflow) Testing & The Design Cycle Testing occurs at different levels (should be integrated into design process) What users really need Acceptance testing System testing Requirements Integration testing Design Code Unit testing Project work flow Dynamic testing 320312 Software Engineering (P. Baumann) ©: Michael Kohlhase 3.2 3.2.1 9 16 Functional (Black-Box) Testing Unit Testing Unit Testing Definition 3.2.1 Unit testing, is a specification-based testing method that specifically tests a single “unit” of code in isolation. A unit can be an entire module, a single class or function, or almost anything in between as long as the code is isolated from other code not under testing (which itself could have errors and would thus confuse test results). 14 CHAPTER 3. SOFTWARE TESTING Unit testing usually supported via a test harness that automates running test cases (e.g. upon save) most programming languages have frameworks for unit-testing nowadays Benefits of unit testing: tests as specification: write tests before coding; tests pass ; code complete regression testing: run unit tests after every change tests as documentation: test cases document what is critical about the unit simplify integration testing: rely on thoroughly tested units. ©: Michael Kohlhase 17 Unit Testing in python (after [Knu]) python has a unit testing framework: unittest (standard library) Running Example: file prime1.py def is_prime(number): """Return True if *number* is prime.""" for element in range(number): if number % element == 0: return False return True def print_next_prime(number): """Print the closest prime number larger than *number*.""" index = number while True: index += 1 if is_prime(index): print(index) ©: Michael Kohlhase 18 A first Unit Test A first unit test for prime1.py in file test_prime1.py import unittest from prime1 import is_prime class PrimesTestCase(unittest.TestCase): """Tests for ‘primes.py‘.""" def test_is_five_prime(self): """Is five successfully determined to be prime?""" self.assertTrue(is_prime(5)) if __name__ == ’__main__’: unittest.main() Unit test with a single test case: test_is_five_prime 3.2. FUNCTIONAL (BLACK-BOX) TESTING 15 in unittest any function whose name starts with test in a class derived from unittest.TestCase is a unit test case. test cases are run and their assertions checked by unittest.main()). run this by python test_primes.py and obtain $ python test primes.py E ============================================== ERROR: test is five prime ( main .PrimesTestCase) −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Traceback (most recent call last): File ”test primes.py”, line 8, in test is five prime self.assertTrue(is prime(5)) File ”/home/jknupp/code/github code/blug private/primes.py”, line 4, in is prime if number % element == 0: ZeroDivisionError: integer division or modulo by zero −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Ran 1 test in 0.000s The E is the result of the single test case being run(failure (error) otherwise .) error message points to line and problem(python is zero-indexed ; division by zero) here the error is encountered before the test even terminates! fix line 3 to for element in range(2, number): test again $ python test_primes.py . ----------------------------------------------Ran 1 test in 0.000s all is OK. ©: Michael Kohlhase 19 Assertions in python A unit test consists of one or more assertions (statements that assert that some property of the code being tested is true). Example 3.2.2 asserting that 5 is prime: self.assertTrue(is_prime(5)) Other Assertions: Details at https://docs.python.org/3/library/unittest.html 16 CHAPTER 3. SOFTWARE TESTING Method assertEqual(a, b) assertNotEqual(a, b) assertTrue(x) assertFalse(x) assertIs(a, b) assertIsNot(a, b) assertIsNone(x) assertIsNotNone(x) assertIn(a, b) assertNotIn(a, b) assertIsInstance(a, b) assertNotIsInstance(a, b) checks that a == b a != b bool(x) is True bool(x) is False a is b a is not b x is None x is not None a in b a not in b isinstance(a, b) not isinstance(a, b) new in 3.1 3.1 3.1 3.1 3.1 3.1 3.2 3.2 all accept optional message argument for error messages on failure via the key msg. ©: Michael Kohlhase 20 More Unit Tests for is_prime test_is_five_prime worked for an generic prime number. Test negative cases by adding a method to the PrimesTestCase class: def test_is_four_non_prime(self): """Is four correctly determined not to be prime?""" self.assertFalse(is_prime(4), msg=’Four is not prime!’) assertFalse specifies that we expect 4 to be compound. The msg message outputs additional information if the unit test fails. ©: Michael Kohlhase 21 Testing Edge Cases Errors usually occur in edge cases: here 0, 1, negative integers. testing the zero case. def test_is_zero_not_prime(self): """Is zero correctly determined not to be prime?""" self.assertFalse(is_prime(0)) gives the result python test primes.py ..F ================================================ FAIL: test is zero not prime ( main .PrimesTestCase) Is zero correctly determined not to be prime? −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Traceback (most recent call last): File ”test primes.py”, line 17, in test is zero not prime self.assertFalse(is prime(0)) AssertionError: True is not false −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Ran 3 tests in 0.000s 3.2. FUNCTIONAL (BLACK-BOX) TESTING 17 FAILED (failures=1) Right, we changed the range statement to exclude zero and one. lets fix that (then the tests pass.) def is_prime(number): """Return True if *number* is prime.""" if number in (0, 1): return False for element in range(2, number): if number % element == 0: return False return True ©: Michael Kohlhase 22 Testing for negative numbers (a whole range) let’s test a whole range (program in python) def test_negative_number(self): """Is a negative number correctly determined not to be prime?""" for index in range(-1, -10, -1): self.assertFalse(is_prime(index)) test fails, but we do not get enough information python test primes.py ...F ====================================================================== FAIL: test negative number ( main .PrimesTestCase) Is a negative number correctly determined not to be prime? −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Traceback (most recent call last): File ”test primes.py”, line 22, in test negative number self.assertFalse(is prime(index)) AssertionError: True is not false −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Ran 4 tests in 0.000s FAILED (failures=1) which negative number did it fail on? (unittest unhelpful) we can fix this with a better message def test_negative_number(self): """Is a negative number correctly determined not to be prime?""" for index in range(-1, -10, -1): self.assertFalse(is_prime(index), msg=’{} is not prime’.format(index)) this gives python test primes ...F ============================================================== FAIL: test negative number (test primes.PrimesTestCase) Is a negative number correctly determined not to be prime? 18 CHAPTER 3. SOFTWARE TESTING −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Traceback (most recent call last): File ”./test primes.py”, line 22, in test negative number self.assertFalse(is prime(index), msg=’{} should not be determined to be prime’.format(index)) AssertionError: True is not false : −1 should not be determined to be prime −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Ran 4 tests in 0.000s FAILED (failures=1) ©: Michael Kohlhase 23 The End Result A better implementation: prime3.py def is_prime(number): """Return True if *number* is prime.""" if number <= 1: return False for element in range(2, number): if number % element == 0: return False return True A (somewhat) comprehensive test case test_prime3.py import unittest from prime3 import is_prime class PrimesTestCase(unittest.TestCase): """Tests for ‘primes.py‘.""" def test_is_five_prime(self): """Is five successfully determined to be prime?""" self.assertTrue(is_prime(5)) def test_is_four_non_prime(self): """Is four correctly determined not to be prime?""" self.assertFalse(is_prime(4), msg=’Four is not prime!’) def test_is_zero_not_prime(self): """Is zero correctly determined not to be prime?""" self.assertFalse(is_prime(0)) def test_negative_number(self): """Is a negative number correctly determined not to be prime?""" for index in range(-1, -10, -1): self.assertFalse(is_prime(index), msg=’{} is not prime’.format(index)) if __name__ == ’__main__’: unittest.main() ©: Michael Kohlhase 24 What have we learnt? While developing tests for is_prime we learnt that unit-testing can be used for black-box testing of functions (tests never looked) 3.2. FUNCTIONAL (BLACK-BOX) TESTING 19 finds errors early (division by zero) finds regressions (when we changed to range(2,number)) documents edge cases (0/1 prime? a priori unclear) is quite a lot of work (test suite longer than the function) Still to do automation of test application(scripts, IDE, continuous integration framework) did we test enough? testing print_next_prime (difficult, it uses is_prime) (maybe 1234567891 is not found prime?) ©: Michael Kohlhase 25 Unit Testing with Mocks/Stubs Recall: Unit testing is about testing a module in isolation Problem: What if the module (e.g. a function) calls others? Any fault in those could cause test failure. Example 3.2.3 Testing networking code (network may be down) Example 3.2.4 Testing modules with database access(really use production DB?) Replace inferior modules with specially constructed stubs or mocks (mockups) that always succeed, give controlled results Various ways to do this (vary in complexity and effectiveness) Definition 3.2.5 A monkey patch is a way for a program to extend or modify supporting system software locally (affecting only the running instance of the program ©: Michael Kohlhase 26 Simple Mocking by Monkey Patching in python Remember our running example? (the second function) def print_next_prime(number): """Print the closest prime number larger than *number*.""" index = number while True: index += 1 if is_prime(index): print(index) calls two extenal functions: is_prime and print (not isolated) Idea: Monkey patch them (python has a framework) gives us the @patch(’foo’) annotation that marks foo for patching. 20 CHAPTER 3. SOFTWARE TESTING ©: Michael Kohlhase 27 Monkey Patching in python: Test Case Towards a test case for print_next_prime function First import the unittest and unittest.mock frameworks. then import the is_prime and print_next_prime functions(so that we can test/patch them.) import unittest from unittest.mock import patch from primes import is_prime from primes import print_next_prime The test case proper: e.g. the next prime after four (is five!) class NextPrimeTestCase(unittest.TestCase): """Testing print_next_prime method""" @patch(’builtins.print’) @patch(’primes.is_prime’) def test_is_five_after_4(self, is_prime, print): """is 5 next prime after four?""" is_prime.return_value = True print_next_prime(4); print.assert_called_with(5) if __name__ == "__main__": unittest.main() @patch(’foo’) only does its job if foo is mentioned as an argument to the test. ©: Michael Kohlhase 3.2.2 28 Integration Testing Integration Testing Definition 3.2.6 Integration testing is the phase in software testing in which individual software units are combined and tested as a group. things tested in integration testing include import/export type compatibility representation compatibility out-of-range errors Example 3.2.7 The NASA Mars orbiter unexpectedly turned into lander because NASA specified a thruster in SI units and the contractor interpreted them as PSI. Integration testing strategies: big-bang, top-down, bottom-up, sandwich(see these next) 3.2. FUNCTIONAL (BLACK-BOX) TESTING 21 Definition 3.2.8 In big bang integration testing, all or most of the units are coupled together to form a complete software system and then used for integration testing. great if it works, not much information if it does not ©: Michael Kohlhase (time saver) 29 Top-Down Integration Testing A Definition 3.2.9 In top-down integration testing, the top module is tested with stubs, re-run tests for each replacement B F G C depth-first replacement of stubs D E Advantage: tracing faults to user-visible behaviors (failures) ©: Michael Kohlhase (linking faults) 30 Bottom-Up Integration Testing A Definition 3.2.10 In bottom-up integration testing, tested units are grouped into modules re-run unit tests with each replacement B G C depth-first replacement of drivers Advantage: finding errors early. F D E (modules are always tested) Definition 3.2.11 In sandwich integration testing top-down and bottom-up integration testing are combined. ©: Michael Kohlhase 31 System Testing Definition 3.2.12 System testing is a functional testing method which is conducted on a complete, integrated system to evaluate the system’s compliance with its specified requirements. “integrated system” = ˆ integrated hardware and software(after integration testing) Focus on use & interaction of system functionalities rather than details of 22 CHAPTER 3. SOFTWARE TESTING implementations Should be carried out by a group independent of the code developers alpha testing: end users at developer’s site beta testing: at end user site, without the developer involved! Concerns in system testing: (including, but not limited to) GUI and usability compatibility & security(does it play nice with the client’s other systems?) (can users interact with the system?) load, volume, scalability (can the system handle the required load and data volumes?) ©: Michael Kohlhase 32 Acceptance Testing Definition 3.2.13 In user acceptance testing (UAT) the client tests whether the a system meets the contractual requirements so that transfer of ownership can take place. UAT is sometimes mixed/confused with beta testing. Problems: UAT is a crucial step in the software life-cycle and should be planned well ahead - customer may demand new functionality when exposed to the system + agree on UAT test suite in contract ©: Michael Kohlhase 33 Testing Levels (in the Project Workflow) Testing & The Design Cycle Testing occurs at different levels What users really need (should be integrated into design process) Acceptance testing Requirements Design Code System testing Integration testing Unit testing Project work flow Dynamic testing 320312 Software Engineering (P. Baumann) 9 3.3. STRUCTURAL (WHITE-BOX) TESTING 23 ©: Michael Kohlhase 3.3 34 Structural (White-Box) Testing Coverage Analysis vs. Path-Based Testing How good are our tests?: Do they test all of the possible program behaviors? We have to look at the program to find out (white-box testing) Path-based testing: Analyze program behaviors (paths through the program) and generate conditions for test cases that cover these paths. Definition 3.3.1 Control-flow testing is a structural testing method that takes the control flow of the program as a model to determine suitable test cases. ©: Michael Kohlhase 35 Control Flow Graphs Definition 3.3.2 The control flow graph of a program is a graph that models its control structure. Its nodes are are labeled with process blocks – sequences of statements without control operators and decisions – Boolean expressions in control structures. Two nodes are connected with an edge, iff they can be executed subsequently. We call a node with in-degree greater that 1 a junction. Example 3.3.3 Consider the function that raises x to the power of y. def power (x,y): if(y<0): p = -y; else p = y z=1.0 while (p != 0): z = z * x; p -= 1 if(y<0): return 1.0 / z else return z if y<0 T a b F p = y p = -y c d z=1.0 e T f while p!=0 F g h z=z*x p-=1 if y<0 T i k F return 1/z return z ©: Michael Kohlhase 36 Paths in Programs Definition 3.3.4 Let G be a control flow graph, then we call a path in G complete if it starts at the unit’s entry and ends at the return. Complete paths are useful for testing because: It is difficult to set up and execute paths that start at an arbitrary statement. 24 CHAPTER 3. SOFTWARE TESTING It is difficult to stop at an arbitrary statement without changing the code being tested. We think of routines as input/output paths. There are many paths between the entry and exit points of a typical routine. Even a small routine can have a large number of paths. ©: Michael Kohlhase (infinite) 37 How many Paths are enough for Testing? In Principle: we would have to test all paths (infeasible) Definition 3.3.5 In testing we speak of path coverage, iff all paths are tested statement coverage, iff all statements are covered. decision coverage, iff all decisions are tested in both directions path coverage ; decision coverage ; statement coverage (at least without goto) Definition 3.3.6 (Testing Strategies) If we strive for path coverage, we speak of path testing statement coverage, we speak of statement testing decision coverage, we speak of branch testing ©: Michael Kohlhase 38 Path Predicates Intuition: Every path corresponds to a succession of true or false values for the predicates traversed on that path. Definition 3.3.7 A path predicate is a Boolean expression that characterizes the set of input values that will cause a path to be traversed. Definition 3.3.8 Any set of input values that satisfies all of the conditions of the path predicate will force the routine through that path; we say that they achieve the path. If there is no such set of inputs, the path is unachievable. Definition 3.3.9 The act of finding a set of solutions to the path predicate expression is called path sensitization. ©: Michael Kohlhase Path Testing Absolute Value 39 3.3. STRUCTURAL (WHITE-BOX) TESTING 25 Example 3.3.10 Consider the following function (very very simple) if x<0 def abs (x) if (x<0): x = -x return x T F a x=-x b c return x We have two paths: ac and b. path ac b b path pred. x<0 x≥0 x≥0 (we can test exhaustively here) test case, e.g. input output −3 3 0 0 3 3 comment edge case regular ©: Michael Kohlhase 40 Path Testing Power Example 3.3.11 Recall the power function def power (x,y): if(y<0): p = -y; else p = y z=1.0 while (p != 0): z = z * x; p -= 1 if(y<0): return 1.0 / z else return z (simple) if y<0 T a b F p = y p = -y c d z=1.0 e T f while p!=0 F g h z=z*x p-=1 if y<0 T i k F return 1/z return z We have infinitely many paths: (ac|bd)e(f g) ∗ h(i|k)(path testing impossible) Remark: Paths acehk and bdehi are unachievable. (same predicate y > 0) Example 3.3.12 (Statement/Branch Testing) Choose complete paths that exercise all statements/branches (here the same) path acefghi bdehk bdefgfghk path predicate x < 0, p = −y = 1 x ≥ 0,y = p = 0 x ≥ 0,y = p = 2 ©: Michael Kohlhase Coverage Testing in python test case, e.g. input output h−1, 1i 1 h0, 0i 1 h2, 2i 4 41 26 CHAPTER 3. SOFTWARE TESTING python has a coverage testing tool: coverage. install it with pip install coverage run it from the command line: coverage test_prime3.py or even coverage run --branch test_prime3.py generate a nice html page coverage html ©: Michael Kohlhase (for branch coverage) (see it in a browser) 42 Chapter 4 Software Maintenance Motivation Software Maintenance Programs and software systems are long-lived objects (often decades) ©: Michael Kohlhase n te st in m g ai nt en an ce en t at io sig en qu ire m [?] claims that 90% of costs are in maintenance n 0 de and adapted to changing requirements (that as well) em 5 im pl they are continually improved (that introduces new bugs) 10 ts 15 re 4.1 43 Lehman’s Laws of Software Evolution Context: A program that is written to perform some real-world activity; how it should behave is strongly linked to the environment in which it runs. Lehmann et al [?] identify a set of 8 laws about software evolution, including Continuing Change – a program must be continually adapted or it becomes progressively less satisfactory Invariant Work Rate – the average effective global activity rate in an evolving program is invariant over the product’s lifetime. Continuing Growth – the functional content of an E-type system must be continually increased to maintain user satisfaction over its lifetime 27 28 CHAPTER 4. SOFTWARE MAINTENANCE Declining Quality – the quality of a program will appear to be declining unless it is rigorously maintained and adapted to operational environment changes Feedback System – program evolution processes constitute multi-level, multiloop, multi-agent feedback systems and must be treated as such to achieve significant improvement over any reasonable base. ©: Michael Kohlhase 44 Lessons from Lehmanns Laws Software maintenance is the elephant in the room we need to get the maintenace phase right Continuing Change ; need to manage software over time release management revision management (distribute regular updates to the program) regression testing (compare old behavior to new one) (keep access to all versions, track changes) Feedback Cycle ; manage user feedback find out what users really need/want allow users to report failures. Solution: Software Lifecycle Management Systems. Example 4.1.1 GitHub or GitLab offer revision control, issue tracking and project planning features. ©: Michael Kohlhase 4.2 45 Revision Control Systems We address a very important topic for document management: supporting the document life-cycle as a collaborative process. In this section we discuss how we can use a set of tools that have been developed for supporting collaborative development of large program collections can be used for document management. We will first introduce the problems and current attempts at solutions and the introduce two classes of revision control systems and discuss their paradigmatic systems. 4.2.1 Introduction/Motivation Lifecycle Management for Digital Documents Documents may have a non-trivial life-cycle involving multiple actors. Example 4.2.1 For a novel we have the following stages: 4.2. REVISION CONTROL SYSTEMS 29 1. skeleton/layout (chapters, characters, interactions) 2. first complete draft (given out to test readers) 3. private editing cycle ; accepted draft(testing with more readers, refining/condensing the story) 4. publisher’s editing cycle ; final draft(professional editor proposes refinements to the draft) 5. copyediting for spelling, adherence of publisher’s house style 6. adding artwork/cover ; first published edition 7. e-dition (eBook) etc. (different artwork, links, interactivity) Example 4.2.2 For technical books, multiple editions follow to adapt them to changing domain or correct errors. ©: Michael Kohlhase 46 Document Lifecycle Mgmt. & Collaboration Approaches Practice: Send around MS Word documents by e-mail (dates in file name) Characteristics/Problems: ++ well-understood technology (no training need) – version tracking as a social process (error prone) – merging diverging versions is annoying (manual process) – archiving past versions optional/manual (storage problems) – no multifile support, no snapshots Summary: only supports serial collaboration, no multifile support start time δ1 D1 δ2 D2 finish δ3 ... δn Dn larger teams ; more time wasted ©: Michael Kohlhase 47 Document Lifecycle Mgmt. & Collaboration Approaches Practice: Put your documents on Dropbox or MS Sharepoint Characteristics/Problems: – local install of (proprietary) software + auto-synchronization between cloud and user copies upon save + auto-archiving past versions in cloud 30 CHAPTER 4. SOFTWARE MAINTENANCE – merging diverging versions unsupported (manual process) – no multifile support, no snapshots Summary: only supports serial collaboration start time δ1 D1 δ2 D2 finish δ3 ... δn Dn larger teams ; more time wasted ©: Michael Kohlhase 48 Document Lifecycle Mgmt. & Collaboration Approaches Practice: Use etherpad, google docs or Office 365 for collaborative editing. Characteristics/Problems: + browser-based, no installation necessary + real-time auto-synchronization between cloud and user copies + auto-archiving past versions in cloud + no diverging versions – no multifile support, no snapshots Summary: only supports serial collaboration start time δ1 D1 δ2 D2 finish δ3 ... δn Dn larger teams ; more time wasted ©: Michael Kohlhase 49 Document Lifecycle Mgmt. & Collaboration Approaches Practice: Use version control system (for ASCII-based file formats) Characteristics/Problems: – special install, training necessary – restricted to character/line-based formats 4.2. REVISION CONTROL SYSTEMS 31 + user-initiated synchronization between cloud and user copies + auto-archiving past versions on server ++ multifile support, snapshots, merging support, tagging Summary: supports parallel, branching collaboration start δ4 δ1 D1 δ2 δ6 D2 δ3 D4 δ5 D3 time δ7 ... 0 δn−3 ... ... δn−3 finish δn−2 Dn−2 δn−1 δn0 Dn−1 0 δn−1 δn Dn Dn−3 larger teams ; large-scale parallelization/experimentation ©: Michael Kohlhase 4.2.2 50 Centralized Version Control Centralized version control systems ti Computing and Managing Differences with diff & patch Definition 4.2.3 diff is a file comparison utility that computes differences between two files f1 and f2 . Differences are output linewise in a diff file (also called a patch), which can be applied to f1 to obtain f2 via the patch utility. Example 4.2.4 The quick brown fox jumps over the lazy dog The quack brown fox jumps over the loozy dog 1c1,2 < The --> The > 3c4 < the --> the quick brown quack brown lazy dog loozy dog Definition 4.2.5 A diff file consists of a sequence of hunks that in turn consist of a locator which contrasts the source and target locations (in terms of line numbers) followed by the added/deleted lines. ©: Michael Kohlhase 51 Merging Differences with merge3 There are basically two ways of merging the differences of files into one. 32 CHAPTER 4. SOFTWARE MAINTENANCE Definition 4.2.6 In two-way merge, an automated procedure tries to combine two different files by copying over differences by guessing or asking the user. Definition 4.2.7 In three-way merge the files are assumed to be created by changing a joint original (the parent) by editing. The merge3 tool examines the differences and patterns appearing in the changes between both files as well as the parent, building a relationship model to generate a new revision. Usually, non-conflicting differences (affecting only one of the files) can directly be copied over. ©: Michael Kohlhase 52 Definition 4.2.8 A revision control system is a software system that tracks the change process of sets of files via a repository that stores the files’ revisions – the content of the files at the time of a commit. Users do not directly work on the repository, but on a working copy that is synchronized with the repository by revision control actions • checkout: creates a new working copy from the repository • update: merges the differences between the base revision of the working copy and the revision of the repository into the working copy. • commit: transmits the differences between the repository revision and the working copy to the repository, which registers them, patches the repository revision, and makes this the new head revision Version Control with Subversion Definition 4.2.9 Subversion is a centralized revision control system that features Central repository (for current revision and reverse diffs) Local working copies (asynchronous checkouts, updates, commits) They are kept synchronized by passing around diff differences and patching the repository and working copies. Conflicts are resolved by (three-way) merge. checkout O LC1 (∅) commit δ1 repository update δ1 LC2 (O) merge δ1 commit cr(δ1 , δ2 ) ©: Michael Kohlhase LC3 (O + δ2 ) 53 4.2. REVISION CONTROL SYSTEMS 33 Collaboration with Subversion Idea: We can use the same technique for collaboration between multiple working copies. Diff-Based Collaboration: ... W C 1 (O17 ) up W C n (O19 ) up ci ci R19 The Subversion system takes care of the synchronizeation: you can only commit, if your revision is HEAD If there are changes on the same line, you have a conflict. update merges the changes into your working copy ©: Michael Kohlhase 4.2.3 (otherwise update) 54 Distributed Revision Control Centralized vs. Distributed Version Control Problem with Subversion: we can only commit when online! all collaboration goes via the repository Idea: Distribute the Repositories and move differences between them. pull checkout WC 1 1 δ (O17 ) R (O17 ) commit ... checkout δ W C n (O19 ) commit pull push 0 R1 (O19 ) push pull R19 headless ©: Michael Kohlhase Distributed Version Control with git 55 34 CHAPTER 4. SOFTWARE MAINTENANCE Definition 4.2.10 git is a distributed version control system t hat features local repositories (contains head and reverse diffs) multiple remote repositories changes from a remote repository can be pulled into the local one. local working copies local changes can pushed to a remote repository (local commits) (branches/forks) Definition 4.2.11 There are various repository management systems that facilitate providing repositories, e.g. GitHub, a repository hosting service at http://GitHub.com(free public repositories) GitLab, an open source repository management system (http://gitlab.org) ©: Michael Kohlhase 56 GitFlow: An Elaborate Development Model based on GIT [?] suggests a development model with feature branches, . . . ©: Michael Kohlhase 4.3 57 Bug/Issue Tracking Systems Bug/Issue Tracking Systems Definition 4.3.1 A bug tracking system (also called bugtracker or issue tracking system) is a software application that keeps track of reported issues – i.e. software bugs and feature requests – in software development projects. 4.3. BUG/ISSUE TRACKING SYSTEMS 35 Example 4.3.2 There are many open-source and commercial bugtrackers bugzilla: http://bugzilla.org GitHub: http://github.com (simple Markdown syntax ) GitLab: http://gitlab.com (open source version of GitHub) (Mozilla’s bugtracker) TRAC: http://trac.edgewall.org(+Wiki +Mgt. features, mostly for SVN) JIRA: https://www.atlassian.com/software/jira ©: Michael Kohlhase (proprietary) 58 The Anatomy of an Issue (How to Write a Good One) Components of an bug report title: a short and descriptive overview (one line) description: a precise description of the expected and actual behavior, giving exact reference to the component, version, and environment in which the bug occurs. (bugs must be reproducible and localizable) attachment: e.g. a screen shot, set of inputs, etc. Example 4.3.3 (A bad bug report description) My browser crashed. I think I was on foo.com. I think that this is a really bad problem and you should fix it or else nobody will use your browser. Example 4.3.4 (A good one) I crash each time I go to foo.com (Mozilla build 20000609, Win NT 4.0SP5). This link will crash Mozilla reproducibly unless you remove the border=0 attribute: <IMG SRC="http://foo.com/topicfoos.gif" width=34 border=0 alt="News"> Remember: developers are also human (try to minimize their work) Components of a feature request: like a bug, but only expected behavior ©: Michael Kohlhase 59 Bugtracker Workflow Typical Workflow: supported by bugtrackers user reports issue QA engineer triages issues – classification, remove duplicates, identify dependencies, tie to component, . . . developer accepts or re-assigns issue bug fixing (files report in the system) other users extend/discuss/up/downvote issue (fixes who is responsible primarily) project planning by identification of sub-issues, dependencies (new issues) (design, implementation, testing) 36 CHAPTER 4. SOFTWARE MAINTENANCE issue landing bug closure (sign-off, integration into code base) release of the fix (in the next revision) Administrative Metadata: to make these workflows work issue number: for referencing with e.g. #15 comments: a discussion thread focused on this issue. labels: for specializing bug search resolution for fixed bugs assignee: a developer currently responsible participants: people who get notified of changes/comments status: e.g. one of new, assigned, fixed/closed, reopened. FIXED: source updated and tested INVALID: not a bug in the code WONTFIX: “feature”, not a bug DUPLICATE: already reported elsewhere; include reference WORKSFORME: couldnt reproduce issue dependencies: which bugs does this one depend on/block? ©: Michael Kohlhase 60 Dependency Graph of a Firefox Issue in Bugzilla ©: Michael Kohlhase 61 Chapter 5 Security by Encryption 5.1 Introduction to Crypto-Systems There are various ways to ensure security on networks: one is just to cease all traffic (not a very attractive one), another is to make the information on the network inaccessible by physical means (e.g. shielding the wires electrically and guarding them by a large police force). Here we want to look into “security by encryption”, which makes the content of Internet packets unreadable by unauthorized parties. We will start by reviewing the basics, and work our way towards a secure network infrastructure via the mathematical foundations and special protocols. Security by Encryption Problem: In open packet-switched networks like the Internet, anyone can inspect the packets (and see their contents via packet sniffers) create arbitrary packets (and forge their metadata) can combine both to falsify communication (man-in-the-middle attack) In “dedicated line networks” (e.g. old telephone) you needed switch room access. But there are situations where we want our communication to be confidential, Internet Banking(obviously, other criminals would like access to your account) Login to Campus.net(wouldn’t you like to know my password to “correct” grades?) Whistle-blowing(your employer should not know what you sent to WikiLeaks) The Situation: Alice wants to communicate with Bob privately, but Eve(sdropper) can listen in 37 38 CHAPTER 5. SECURITY BY ENCRYPTION Eve Alice Bob Idea: Encrypt packet content (so that only the recipients can decrypt) an build this into the fabric of the Internet (so that users don’t have to know) ©: Michael Kohlhase 62 Encryption: Terminology & Examples Definition 5.1.1 Encryption is the process of transforming information (referred to as plaintext) using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of encryption is called ciphertext, and the reverse process that transforms ciphertext to plaintext: decryption. We call a method for encryption/decryption a cipher. Definition 5.1.2 The corresponding science is called cryptology, it has two areas of study: cryptography (encryption/decryption via ciphers) and code breaking/cryptoanalysis: decrypting ciphertexts without a key or recovering keys from ciphertexts. Example 5.1.3 (Spartan encryption (since ca. 700 BC)) The oldest (military) encryption method is a scytale – a wooden stick of defined diameter, onto which a strip of parchment with letters can be wrapped to reveal the plaintext. Here the stick is the key and the parchment strip the ciphertext. Example 5.1.4 (The Caesar Cipher) Shift the letters of the alphabet by n letters to the right. Julius Caesar (first mention) used 3, Augustus 1. Support by hardware. Example 5.1.5 (Don’t forget your Bank Card PIN) 5.1. INTRODUCTION TO CRYPTO-SYSTEMS 39 Write the encoded PIN number to the card, here complete each digit to 9. PIN = 5315 ©: Michael Kohlhase 63 Code-Breaking, e.g. by Frequency-Analysis Letters (bigrams, trigrams,. . . ) in English come in characteristic frequencies. (ETAOINSHRDLU) Use those to to decode a cipher text: most frequent character represents an “E”, the second most frequent a “T”, ... this works well for simple substitution ciphers Data Paradox: Deciphering longer texts is often easier than short ones. Lesson for Encryption: Change your cipher often ©: Michael Kohlhase (minimize data) 64 The simplest form of encryption (and the way we know from spy stories) uses uses the same key for encryption and decryption. Symmetric Key Encryption Definition 5.1.6 Symmetric-key cryptosystems are a class of cryptographic algorithms that use essentially identical keys for both decryption and encryption. Example 5.1.7 Permute the ASCII table by a bijective function ϕ : {0, . . . , 127} → {0, . . . , 127} (ϕ is the shared key) Example 5.1.8 The AES algorithm (Advanced Encryption Standard) [AES01] is a widely used symmetric-key algorithm that is approved by US government organs for transmitting top-secret information. (efficient but safe) AES is safe: For AES-128/192/256, recovering the key takes 2126.1 /2189.7 /2254.4 steps respectively. (38/57/78 digit numbers) Note: For trusted communication sender and recipient need access to shared key. Problem: How to initiate safe communication over the internet?(far, far apart) Need to exchange shared key (chicken and egg problem) Pipe dream: Wouldn’t it be nice if I could just publish a key publicly and use that? 40 CHAPTER 5. SECURITY BY ENCRYPTION Actually: this works, just (obviously) not with symmetric-key encryption. ©: Michael Kohlhase 5.2 65 Public Key Encryption To get around the chicken-and-egg problem of secure communication we identified above, we will introduce a more general way of encryption: one where we allow the keys for encryption and decryption to be different. This liberalization allows us to enter into a whole new realm of applications. The following presentation is based on the one in [Con12] Diffie/Hellmann Key Exchange 1c Agree on a joint base color, here joint: Eve joint: joint: Alice Bob ©: Michael Kohlhase 66 Diffie/Hellmann Key Exchange 2c randomly pick a private color ( / ). joint: Eve priv: priv: joint: joint: Alice Bob ©: Michael Kohlhase Diffie/Hellmann Key Exchange 3c 67 5.2. PUBLIC KEY ENCRYPTION 41 mix private color into the joint base color ( / ) to the partner to disguise it and send mixtures joint: alice: bob: key: ?? Eve priv: priv: joint: bob: joint: alice: Alice Bob ©: Michael Kohlhase 68 Diffie/Hellmann Key Exchange 4c mix your own private color to get the key joint: alice: bob: key: ?? Eve priv: key: priv: key: joint: bob: joint: alice: Alice Bob Note: Eve cannot determine the shared key the private colors. Two success factors to this trick! , since she would need one of mixing colors is associative and commutative (order/grouping irrelevant) mixing colors is much simpler than getting the original colors back from mixture ©: Michael Kohlhase 69 A numeric one-way function We need a one-way function for numbers to compute numeric keys. Idea: Take the discrete logarithm. Definition 5.2.1 (Recap) We say that a is congruent to b modulo m, iff an = b and 0 ≤ b < m. 42 CHAPTER 5. SECURITY BY ENCRYPTION Idea: We can do arithetics modulo: 5 + 4 ≡ 2 mod 7 or 34 ≡ 1 mod 8. Theorem 5.2.2 (A useful Fact) If p is prime and b is a primitive root of n, then the bx mod p distribute evenly over 0 ≤ x < p. Definition 5.2.3 Let p be a prime number, b a primitive root of n, and bx ≡ y mod pthen we call x the discrete logarithm of y modulo p for the base k. Observation 5.2.4 The discrete logarithm is very hard to compute: essentially p times the steps as for the discrete power. (generate and test) Corollary 5.2.5 The discrete logarithm is a one-way function. (for large p) ©: Michael Kohlhase 70 Diffie/Hellmann Key Exchange 1 Agree on a modulus m and a base b, e.g. p = 17, b = 3 joint: p = 17, b = 3 Eve joint: p = 17, b = 3 joint: p = 17, b = 3 Alice Bob ©: Michael Kohlhase 71 Diffie/Hellmann Key Exchange 2 randomly pick a private exponent (e = 54/e = 24), the private key. joint: p = 17, b = 3 Eve priv: 24 priv: 54 joint: p = 17, b = 3 joint: p = 17, b = 3 Alice Bob ©: Michael Kohlhase 72 5.2. PUBLIC KEY ENCRYPTION 43 Diffie/Hellmann Key Exchange 3 send be mod p (the public key) to the partner (354 ≡ 15 mod 17 and 324 ≡ 16 mod 17) joint: alice: bob: key: p = 17, b = 3 15 16 ?? Eve priv: 24 priv: 54 joint: p = 17, b = 3 bob: 16 joint: p = 17, b = 3 alice: 15 Alice Bob ©: Michael Kohlhase 73 Diffie/Hellmann Key Exchange 4 raise the partner’s public key to your private key Alice: 1654 ≡ 324 54 54 24 Bob: 1524 ≡ 3 ≡ 354·24 ≡ 1 mod 17 ≡ 324·54 ≡ 1 mod 17 joint: alice: bob: key: p = 17, b = 3 15 16 ?? Eve priv: 54 key: 1 joint: p = 17, b = 3 bob: 16 priv: 24 key: 1 joint: p = 17, b = 3 alice: 15 Alice Bob Note: Eve cannot determine the shared key 1, since she would need one of the private keys. Two success factors to this trick! discrete exponentiation is associative and commutative (order/grouping irrelevant) discrete logarithm is a one-way function. ©: Michael Kohlhase 74 Public Key Encryption Definition 5.2.6 In an asymmetric-key cryptosystem, the key needed to encrypt a message is different from the key for decryption. Such a method is 44 CHAPTER 5. SECURITY BY ENCRYPTION called a public-key cryptosystem if the the decryption key (the private key) is very difficult to reconstruct from encryption key (called the public key). We speak of a (cryptographic) key pair. Asymmetric cryptosystems are based on trap door functions: one-way functions that can (only) inverted with a suitable key. trap door functions are usually based on primd factorization. ©: Michael Kohlhase 75 Applications of Public-Key Kryptosystems Preparation: Create a cryptographic key pair and publishe the public key. (always keep the private key confidential!) Application: Confidential Messaging: To send a confidential message the sender encrypts it using the intended recipient’s public key; to decrypt the message, the recipient uses the private key. Application: Digital Signatures: A digital signature consists of a plaintext together with its ciphertext – encoded with the sender’s private key. A message signed with a sender’s private key can be verified by anyone who has access to the sender’s public key, thereby proving that the sender had access to the private key (and therefore is likely to be the person associated with the public key used), and the part of the message that has not been tampered with. ©: Michael Kohlhase 76 The confidential messaging is analogous to a locked mailbox with a mail slot. The mail slot is exposed and accessible to the public; its location (the street address) is in essence the public key. Anyone knowing the street address can go to the door and drop a written message through the slot; however, only the person who possesses the key can open the mailbox and read the message. An analogy for digital signatures is the sealing of an envelope with a personal wax seal. The message can be opened by anyone, but the presence of the seal authenticates the sender. Note: For both applications (confidential messaging and digitally signed documents) we have only stated the basic idea. Technical realizations are more elaborate to be more efficient. One measure for instance is not to encrypt the whole message and compare the result of decrypting it, but only a well-chosen excerpt. Let us now look at the mathematical foundations of encryption. It is all about the existence of natural-number functions with specific properties. Indeed cryptography has been a big and somewhat unexpected application of mathematical methods from number theory (which was perviously thought to be the ultimate pinnacle of “pure math”.) 5.2. PUBLIC KEY ENCRYPTION 45 Encryption by Trapdoor Functions Idea: Mathematically, encryption can be seen as an injective function. Use functions for which the inverse (decryption) is difficult to compute. Definition 5.2.7 A one-way function is a function that is “easy” to compute on every input, but “hard” to invert given the image of a random input. In theory: “easy” and “hard” are understood wrt. computational complexity theory, specifically the theory of polynomial time problems. E.g. “easy” = ˆ O(n) and “hard” = ˆ Ω(2n ) ˆ to P = N P conjecture) Remark: It is open whether one-way functions exist (≡ In practice: “easy” is typically interpreted as “cheap enough for the legitimate users” and “prohibitively expensive for any malicious agents”. Definition 5.2.8 A trapdoor function is a one-way function that is easy to invert given a piece of information called the trapdoor. Example 5.2.9 Consider a padlock, it is easy to change from “open” to closed, but very difficult to change from “closed” to open unless you have a key (trapdoor). ©: Michael Kohlhase 77 Of course, we need to have one-way or trapdoor functions to get public key encryption to work. Fortunately, there are multiple candidates we can choose from. Which one eventually makes it into the algorithms depends on various details; any of them would work in principle. Candidates for one-way/trapdoor functions Multiplication and Factoring: The function f takes as inputs two prime numbers p and q in binary notation and returns their product. This function can be computed in O(n2 ) time where n is the total length (number of digits) of the inputs. Inverting this function requires finding the factors of a given integer N . The best factoring algorithms known for this problem run in time 1 2 2O(log(N ) 3 log(log(N )) 3 ) . Modular squaring and square roots: The function f takes two positive integers x and N , where N is the product of two primes p and q, and outputs x2 div N . Inverting this function requires computing square roots modulo N ; that is, given y and N , find some x such that x2 mod N = y. It can be shown that the latter problem is computationally equivalent to factoring N (in the sense of polynomial-time reduction) (used in RSA encryption) Discrete exponential and logarithm: The function f takes a prime number p and an integer x between 0 and p − 1; and returns the 2x div p. This discrete exponential function can be easily computed in time O(n3 ) where n is the number of bits in p. Inverting this function requires computing the discrete logarithm modulo p; namely, given a prime p and an integer y between 0 and p − 1, find x such that 2x = y. 46 CHAPTER 5. SECURITY BY ENCRYPTION ©: Michael Kohlhase 78 To see whether these trapdoor function candidates really behave as expected, RSA laboratories, one of the first security companies specializing in public key encryption has established a series of prime factorization challenges to test the assumptions underlying public key cryptography. Example: RSA-129 problem Definition 5.2.10 Call a number semi-prime, iff it has exactly two prime factors. These are exactly the numbers involved in RSA encryption. RSA laboratories initiated the RSA challenge, to see whether multiplication is indeed a “practical” trapdoor function Example 5.2.11 (The RSA129 Challenge) is to factor the semi-prime number on the right So far, the challenges up to ca 200 decimal digits have been factored, but all within the expected complexity bounds. but: would you report an algorithm that factors numbers in low complexity? ©: Michael Kohlhase 79 Note that all of these test are run on conventional hardware (von Neumann architectures); there have been claims that other computing hardware; most notably quantum computing or DNA computing might have completely different complexity theories, which might render these factorization problems tractable. Up to now, nobody has been able to actually build alternative computation hardware that can actually even attempt to solve such factorization problems (or they are not telling). Classical- and Quantum Computers for RSA-129 5.3. INTERNET SECURITY BY ENCRYPTION ©: Michael Kohlhase 47 80 This concludes our excursion into theoretical aspects of encryption, we will now turn to the task of building these ideas into existing infrastructure of the Internet and the WWWeb. The most obvious thing we need to do is to publish public keys in a way that it can be verified to whom they belong. 5.3 Internet Security by Encryption Public Key Certificates Definition 5.3.1 A public key certificate is an electronic document which uses a digital signature to bind a public key with an identity, e.g. the name of a person or an organization. Idea: If we trust the signatory’s signature, then we can use the certificate to verify that a public key belongs to an individual. Otherwise we verify the signature using the signatory’s public key certificate. Problem: We can ascend the ladder of trust, but in the end we have to trust someone! In a typical public key infrastructure scheme, the signature will be of a certificate authority, an organization chartered to verify identity and issue public key certificates. In a “web of trust” scheme, the signature is of either the user (a self-signed certificate) or other users (“endorsements”). (e.g. PGP = ˆ Pretty Good Privacy) on a UNIX system, you can create a certificate (and associated private key) e.g. with (Windows similar ; Google) openssl ca -in req.pem -out newcert.pem ©: Michael Kohlhase 81 48 CHAPTER 5. SECURITY BY ENCRYPTION Building on the notion of a public key certificate, we can build secure variants of the applicationlevel protocols. Of course, we could do this individually for every protocol, but this would duplicate efforts. A better way is to leverage the layered infrastructure of the Internet and build a generic secure transport-layer protocol, that can be utilized by all protocols that normally build on TCP or UDP. Building Security in to the WWWeb Infrastructure Idea: Build Encryption into the WWWeb infrastructure (make it easy to use) ; Secure variants of the application-level protocols that encrypt contents Definition 5.3.2 Transport layer security (TLS) is a cryptographic protocol that encrypts the segments of network connections at the transport layer, using asymmetric cryptography for key exchange, symmetric encryption for privacy, and message authentication codes for message integrity. TLS can be used to make application-level protocols secure. ©: Michael Kohlhase 82 Let us now look at bit closer into the structure of the TLS handshake, the part of the TLS protocol that initiates encrypted communication. A TLS Handshake between Client and Server Definition 5.3.3 A TLS handshake authenticates a server and provides a shared key for symmetric-key encryption. It has the following steps 1. Client presents a list of supported encryption methods 2. Server picks the strongest and tells client (C/S agree on method) 3. Server sends back its public key certificate (name and public key) 4. Client confirms certificate with CA (authenticates Server if successful) 5. Client picks a random number, encrypts that (with servers public key) and sends it to server. 6. Only server can decrypt it (using its private key) 7. Now they both have a shared secret (the random number) 8. From the random number, both parties generate key material Definition 5.3.4 A TLS connection is a transport-layer connection secured by symmetric-key encryption. Authentication and keys are established by a TLS handshake and the connection is encrypted until it closes. ©: Michael Kohlhase 83 The reason we switch from public key to symmetric encryption after communication has been initiated and keys have been exchanged is that symmetric encryption is computationally more efficient without being intrinsically less secure. But there is more to the integration of encryption into the WWWeb, than just enabling secure transport protocols. We need to extend the web servers and web browsers to implement the secure protocols (of course), and we need to set up a system of certification agencies, whose public keys are baked into web servers (so that they can check the signatures on public keys in server 5.3. INTERNET SECURITY BY ENCRYPTION 49 certificates). Moreover, we need user interfaces that allow users to inspect certificates, and grant exceptions, if needed. Building Security in to the WWWeb Infrastructure Definition 5.3.5 HTTP Secure (HTTPS) is a variant of HTTP that uses TLS for transport. HTTPS URIs start with https:// Server Integration: All common web servers support HTTPS on port 443 (default), but need a public key certificate. (self-sign one or buy one from a CA) Browser Integration: All common web browsers support HTTPS and give access to certificates ©: Michael Kohlhase 84 Confidential E-Mail with Digital Signatures Hey: That was a nice theoretical exercise, how can I use that in practice? Example 5.3.6 (Secure E-Mail) Adding PGP (Pretty Good Privacy; an open source cryptosystem) to Thunderbird (tools & addons & get addons) add the enigmail addon to thunderbird let that generate a public/private key for your e-mail account(give password) let it install GnuPGP (the actual cryptosystem) Done! little sign/encrypt buttons appear on the lower right of your composition window. Your mail visible to strangers (here test mail to myself) 50 CHAPTER 5. SECURITY BY ENCRYPTION Note that the transport metadata on top are not encrypted Your mail after authentification (here test mail to myself) Note the verified signature shown on top. ©: Michael Kohlhase 85 The Web of Trust Recap: We can only verify a signature, if we have, if we have a PKI certificate. Cost Problem: PKI certificates are expensive!(and authority needs to know you) Centrality Problem: What happens if a PKI authority has been compromised? Idea: instead of self-signed PKI certificates, mutually sign certificates in a “Web of Trust” costs are minimal (we already know each other) no central point of failure ©: Michael Kohlhase (more resilient) 86 Acknowledgement: The following presentation is adapted from [Rya] The Web of Trust for the Three Musketeers D’Artagnan arrives at Paris, and has a duel with Athos, Porthos, and Aramis, but they learn to trust each other against the guards of Cardinal Richilieu. To seal their friendship, they decide to exchange their public keys. Definition 5.3.7 A key is called valid, iff it belongs to the individual it claims to belong to. 5.3. INTERNET SECURITY BY ENCRYPTION 51 To certify validity, the four friends also sign each-other’s public key. Example 5.3.8 d’Artagnan signs Portos’ key to say that I, d’Artagnan, vouch that this key belongs to Porthos by adding my signature to it. The musketeers also trust each other to make introductions. The situation from d’Artagnan’s perspective d’Artagnan t:ultimate v:ultimate Athos t:full v:full Porthos t:full v:full Aramis t:full v:full ©: Michael Kohlhase 87 Porthos sends over Planchet as his new valet d’Artagnan can verify that Planchet is who he says he is, because his key bears Porthos’ signature. (d’Artagnan has full trust in Porthos) d’Artagnan signs Planchet’s key d’Artagnan t:ultimate v:ultimate Athos t:full v:full Porthos t:full v:full Aramis t:full v:full Planchet t:unknown v:full ©: Michael Kohlhase 88 Setting (personal) Trust for other Keys Validity can be computed, but trust (in validity of introduced keys) must be set personally d’Artagnan sets the trust in all the valets to “marginal” 52 CHAPTER 5. SECURITY BY ENCRYPTION d’Artagnan t:ultimate v:ultimate Athos t:full v:full Grimaud t:marginal v:full Porthos t:full v:full Mousqueton t:marginal v:full Aramis t:full v:full Planchet t:marginal v:full ©: Michael Kohlhase Bazin t:marginal v:full 89 Understanding Marginal Trust Rule 5.3.9 No escalation of trust levels Planchet introduces d’Artagnan to their landlord M. Bonacieux d’Artagnan t:ultimate v:ultimate Athos t:full v:full Grimaud t:marginal v:full Porthos t:full v:full Mousqueton t:marginal v:full Aramis t:full v:full Planchet t:marginal v:full Bazin t:marginal v:full Bonacieux t:unknown v:marginal ©: Michael Kohlhase 90 Understanding Marginal Trust All other valets also vouch for M. Bonacieux by signing his key Rule 5.3.10 3 marginal = ˆ 1 full 5.3. INTERNET SECURITY BY ENCRYPTION 53 d’Artagnan t:ultimate v:ultimate Athos t:full v:full Grimaud t:marginal v:full Porthos t:full v:full Mousqueton t:marginal v:full Aramis t:full v:full Planchet t:marginal v:full Bazin t:marginal v:full Bonacieux t:unknown v:full ©: Michael Kohlhase 91 But Technical Aspects of Security are not the only ones. . . ©: Michael Kohlhase 92 54 CHAPTER 5. SECURITY BY ENCRYPTION Bibliography [AES01] Announcing the ADVANCED ENCRYPTION STANDARD (AES), 2001. [Bro75] Fred Brooks. The Mythical Man-Month. Addison-Wesley, 1975. [Con12] Jamie Condliffe. Easily understand encryption using. . . paint and clocks? Gizmodo, 2012. [Dri10] Vincent Driessen. A successful git branching model. online at http://nvie.com/ posts/a-successful-git-branching-model/, 2010. [Knu] Jeff Knupp. Improve your python: Understanding unit testing. Web tutorial at http://www.jeffknupp.com/blog/2013/12/09/ improve-your-python-understanding-unit-testing/. [LRW+ 97] Meir M. Lehman, J. F. Ramil, P. D. Wernick, D. E. Perry, and W. M. Turski. Metrics and laws of software evolution – the nineties view. In Proc. 4th International Software Metrics Symposium (METRICS ’97), pages 20–32, 1997. [Rya] Konstantin Ryabitsev. PGP Web of Trust: Core concepts behind trusted communication. http://www.linux.com/learn/tutorials/ 760909-pgp-web-of-trust-core-concepts. seen 2014-09-20. 55 Index achieve, 24 alpha testing, 22 asymmetric-key cryptosystem, 39 attribute, 4 authority certificate, 43 availability, 4 big statement, 24 cryptoanalysis, 34 breaking code, 34 code breaking, 34 cryptography, 34 cryptology, 34 cryptosystem asymmetric-key, 39 public-key, 40 bang, 21 beta testing, 22 big bang, 21 functional testing, 12 testing functional, 12 block process, 23 bottom-up, 21 branch testing, 24 case test, 13 certificate authority, 43 checkout, 30 cipher, 34 ciphertext, 34 commit, 30, 31 confidentiality, 4 control flow graph, 23 control revision (system), 30 copy working, 30 coverage decision, 24 path, 24 decision, 23 decision coverage, 24 decryption, 34 dependable, 3 diff file, 30 patch, 30 digital signature, 40 discrete logarithm, 38 door trap (function), 40 dynamic testing, 12 error, 4 failure, 5 bug, 4 fault, 4 file diff, 30 flow control (graph), 23 forecasting, 6 function one-way, 41 trapdoor, 41 hunk, 30 InfoSec, 3 56 INDEX Information security, 3 security Information, 3 integrity, 4 junction, 23 key, 34 key public (certificate), 43 key pair, 40 private, 40 public, 40 logarithm discrete, 38 maintainability, 4 means, 5 merge two-way, 30 two-way merge, 30 merge three-way, 30 three-way merge, 30 monkey patch, 19 one-way function, 41 pair key, 40 parent, 30 patch monkey, 19 path predicate, 24 sensitization, 24 path coverage, 24 testing, 24 plaintext, 34 predicate path, 24 prevention, 5 private key, 40 process block, 23 public 57 key certificate, 43 public key, 40 public-key cryptosystem, 40 push, 32 reliability, 4 remote, 32 removal, 5 repository, 30 revision, 30 revision control system, 30 safety, 4 sandwich, 21 scytale, 34 semi-prime, 42 sensitization path, 24 signature digital, 40 software testing, 11 specification, 13 requirements software (specification), 13 software requirements specification, 13 statement coverage, 24 testing, 24 static testing, 12 suite test, 13 test case, 13 suite, 13 testing alpha, 22 beta, 22 testing software, 11 testing dynamic, 12 static, 12 testing branch, 24 58 path, 24 statement, 24 threat, 4 tolerance, 6 top-down, 21 trap door function, 40 trapdoor, 41 trapdoor function, 41 UAT, 22 acceptance user (testing), 22 user acceptance testing, 22 unachievable, 24 update, 30 valid, 46 structural testing, 12 testing structural, 12 working copy, 30 INDEX