How to Solve it using Python Release 0.1.1 Senthil Kumaran February 11, 2010
Transcription
How to Solve it using Python Release 0.1.1 Senthil Kumaran February 11, 2010
How to Solve it using Python Release 0.1.1 Senthil Kumaran February 11, 2010 CONTENTS 1 Hello! 2 Lets recollect certain common Programming Paradigms 2.1 Regular Expressions . . . . . . . . . . . . . . . . . 2.2 Exception Handling . . . . . . . . . . . . . . . . . 2.3 Iterators . . . . . . . . . . . . . . . . . . . . . . . . 2.4 List Comprehensions . . . . . . . . . . . . . . . . . 2.5 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 4 4 5 Interesting Modules that ease our tasks 3.1 Collections module . . . . . . . . . 3.2 itertools module . . . . . . . . . . 3.3 Creating new iterators . . . . . . . 3.4 Selecting elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 8 8 9 3 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Let’s start with strings 11 5 Files - we handle them often 13 6 Date and time related 15 7 Processing XML 17 8 Dealing with Databases 8.1 pickle module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 20 9 Process Handling 9.1 subprocess module . . . . . . . . . . . . . . . . . . . . 9.2 subprocess.call(*popenargs, **kwargs) 9.3 Popen method . . . . . . . . . . . . . . . . . . . . . . 9.4 Writing a Task Scheduler . . . . . . . . . . . . . . . . . . . . 23 23 23 23 24 10 Network Programming 10.1 socket module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Connecting to IRC and logging the messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 28 11 Web Programming 11.1 Example of Smart Redirect Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 12 Unit test - Super Cool stuff 12.1 How to write Unit tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 12.2 Test Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 13 Useful modules 13.1 Performance measurements using timeit module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 2to3 tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 35 14 References 37 ii CHAPTER ONE HELLO! Welcome to the tutorial - “How to Solve it using Python”. This is an intermediate to advanced level tutorial, wherein we will discuss commonly occuring programming problems and solutions. My aim is to help you to get started into development path and start writing the building blocks of python applications. The tutorial slides and source code will be available at http://uthcode.sarovar.org Tutorial Notes serve as a handy reference material and is a sub-set of the deck. The complete deck of slides along with the source code can downloaded from http://uthcode.sarovar.org Happy Learning! Senthil Kumaran 1 How to Solve it using Python, Release 0.1.1 2 Chapter 1. Hello! CHAPTER TWO LETS RECOLLECT CERTAIN COMMON PROGRAMMING PARADIGMS 2.1 Regular Expressions Compiling Regular Expressions Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. >>> import re >>> p = re.compile(’ab*’) >>> print p <_sre.SRE_Pattern object at 80b4150> re.compile() also accepts an optional flags argument, used to enable various special features and syntax variations.: >>> p = re.compile(’ab*’, re.IGNORECASE) Methods for searching and matching on RegexObject. Method/Attribute match() search() findall() finditer() Purpose Determine if the RE matches at the beginning of the string. Scan through a string, looking for any location where this RE matches. Find all substrings where the RE matches, and returns them as a list. Find all substrings where the RE matches, and returns them as an iterator. ToAvoid the backslash plague, a raw string is used. prefix the string with r’‘ You can also do search and replacement using the sub method. In the following example, the replacement function translates decimals into hexadecimal: >>> def hexrepl( match ): ... "Return the hex string for a decimal number" ... value = int( match.group() ) ... return hex(value) ... >>> p = re.compile(r’\d+’) >>> p.sub(hexrepl, ’Call 65490 for printing, 49152 for user code.’) ’Call 0xffd2 for printing, 0xc000 for user code.’ 3 How to Solve it using Python, Release 0.1.1 2.2 Exception Handling try: print ’Press Return or Ctrl-C:’, ignored = raw_input() except Exception, err: print ’Caught exception:’, err except KeyboardInterrupt, err: print ’Caught KeyboardInterrupt’ else: print ’No exception’ 2.3 Iterators An iterator is an object representing a stream of data; this object returns the data one element at a time. A Python iterator must support a method called next() (Python2) and __next__() (Python3) that takes no arguments and always returns the next element of the stream. If there are no more elements in the stream, next()/ __next__() must raise the StopIteration exception. Iterators don’t have to be finite, though; it’s perfectly reasonable to write an iterator that produces an infinite stream of data. You can only go forward in an iterator; there’s no way to get the previous element, reset the iterator, or make a copy of it. 2.4 List Comprehensions Two common operations on an iterator’s output are 1) performing some operation for every element, 2) selecting a subset of elements that meet some condition. List comprehensions and generator expressions (short form: “listcomps” and “genexps”) You can strip all the whitespace from a stream of strings with the following code: line_list = [’ line 1\n’, ’line 2 \n’, ...] # Generator expression -- returns iterator stripped_iter = (line.strip() for line in line_list) # List comprehension -- returns list stripped_list = [line.strip() for line in line_list] You can select only certain elements by adding an "if" condition: stripped_list = [line.strip() for line in line_list if line != ""] With a list comprehension, you get back a Python list; stripped_list is a list containing the resulting lines, not an iterator. Generator expressions return an iterator that computes the values as necessary, not needing to materialize all the values at once. This means that list comprehensions aren’t useful if you’re working with iterators that return an infinite stream or a very large amount of data. Generator expressions are preferable in these situations. To avoid introducing an ambiguity into Python’s grammar, if expression is creating a tuple, it must be surrounded with parentheses. The first list comprehension below is a syntax error, while the second one is correct: # Syntax error [ x,y for x in seq1 for y in seq2] 4 Chapter 2. Lets recollect certain common Programming Paradigms How to Solve it using Python, Release 0.1.1 # Correct [ (x,y) for x in seq1 for y in seq2] 2.5 Generators Generators are a special class of functions that simplify the task of writing iterators. Regular functions compute a value and return it, but generators return an iterator that returns a stream of values. Generators can they can be thought of as resumable functions. Here’s the simplest example of a generator function: def generate_ints(N): for i in range(N): yield i Any function containing a yield keyword is a generator function; this is detected by Python’s bytecode compiler which compiles the function specially as a result. Inside a generator function, the return statement can only be used without a value, and signals the end of the procession of values; after executing a return the generator cannot return any further values. return with a value, such as return 5, is a syntax error inside a generator function. The end of the generator’s results can also be indicated by raising StopIteration manually, or by just letting the flow of execution fall off the bottom of the function. 2.5. Generators 5 How to Solve it using Python, Release 0.1.1 6 Chapter 2. Lets recollect certain common Programming Paradigms CHAPTER THREE INTERESTING MODULES THAT EASE OUR TASKS 3.1 Collections module Collections module implements high-performance container datatypes. Currently, there are four datatypes, Counter, deque, OrderedDict and defaultdict, and one datatype factory function, namedtuple(). The specialized containers provided in this module provide alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple. In addition to containers, the collections module provides some ABCs (abstract base classes) that can be used to test whether a class provides a particular interface, for example, whether it is hashable or a mapping. A counter tool is provided to support convenient and rapid tallies. For example: >>> # Tally occurrences of words in a list >>> cnt = Counter() >>> for word in [’red’, ’blue’, ’red’, ’green’, ’blue’, ’blue’]: ... cnt[word] += 1 >>> cnt Counter({’blue’: 3, ’red’: 2, ’green’: 1}) >>> # Find the ten most common words in Hamlet >>> import re >>> words = re.findall(’\w+’, open(’hamlet.txt’).read().lower()) >>> Counter(words).most_common(10) [(’the’, 1143), (’and’, 966), (’to’, 762), (’of’, 669), (’i’, 631), (’you’, 554), (’a’, 546), (’my’, 514), (’hamlet’, 471), (’in’, 451)] deque([iterable[, maxlen]]) Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “doubleended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction. def moving_average(iterable, n=3): # moving_average([40, 30, 50, 46, 39, 44]) --> 40.0 42.0 45.0 43.0 # http://en.wikipedia.org/wiki/Moving_average it = iter(iterable) d = deque(itertools.islice(it, n-1)) d.appendleft(0) s = sum(d) 7 How to Solve it using Python, Release 0.1.1 for elem in it: s += elem - d.popleft() d.append(elem) yield s / float(n) Default Dict Using list as the default_factory, it is easy to group a sequence of key-value pairs into a dictionary of lists: >>> s = [(’yellow’, 1), (’blue’, 2), (’yellow’, 3), (’blue’, 4), (’red’, 1)] >>> d = defaultdict(list) >>> for k, v in s: ... d[k].append(v) ... >>> d.items() [(’blue’, [2, 4]), (’red’, [1]), (’yellow’, [1, 3])] Namedtuple named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index.: >>> from collections import namedtuple >>> Point = namedtuple(’Point’, ’x y’, verbose=True) Ordered Dictionary Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added. 3.2 itertools module The itertools module contains a number of commonly-used iterators as well as functions for combining several iterators.The module’s functions fall into a few broad classes: • Functions that create a new iterator based on an existing iterator. • Functions for treating an iterator’s elements as function arguments. • Functions for selecting portions of an iterator’s output. • A function for grouping an iterator’s output. 3.3 Creating new iterators itertools.count(n) returns an infinite stream of integers, increasing by 1 each time. You can optionally supply the starting number, which defaults to 0: itertools.count() => 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... itertools.count(10) => 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... 8 Chapter 3. Interesting Modules that ease our tasks How to Solve it using Python, Release 0.1.1 itertools.cycle(iter) saves a copy of the contents of a provided iterable and returns a new iterator that returns its elements from first to last. The new iterator will repeat these elements infinitely. itertools.cycle([1,2,3,4,5]) => 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ... itertools.repeat(elem, [n]) returns the provided element n times, or returns the element endlessly if n is not provided. itertools.repeat(’abc’) => abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ... itertools.repeat(’abc’, 5) => abc, abc, abc, abc, abc itertools.chain(iterA, iterB, ...) takes an arbitrary number of iterables as input, and returns all the elements of the first iterator, then all the elements of the second, and so on, until all of the iterables have been exhausted. itertools.chain([’a’, ’b’, ’c’], (1, 2, 3)) => a, b, c, 1, 2, 3 itertools.izip(iterA, iterB, ...) takes one element from each iterable and returns them in a tuple: itertools.izip([’a’, ’b’, ’c’], (1, 2, 3)) => (’a’, 1), (’b’, 2), (’c’, 3) 3.4 Selecting elements Another group of functions chooses a subset of an iterator’s elements based on a predicate. itertools.ifilter(predicate, iter) returns all the elements for which the predicate returns true: def is_even(x): return (x % 2) == 0 itertools.ifilter(is_even, itertools.count()) => 0, 2, 4, 6, 8, 10, 12, 14, ... itertools.ifilterfalse(predicate, iter) is the opposite, returning all elements for which the predicate returns false: itertools.ifilterfalse(is_even, itertools.count()) => 1, 3, 5, 7, 9, 11, 13, 15, ... itertools.takewhile(predicate, iter) returns elements for as long as the predicate returns true. Once the predicate returns false, the iterator will signal the end of its results. def less_than_10(x): return (x < 10) itertools.takewhile(less_than_10, itertools.count()) => 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 itertools.takewhile(is_even, itertools.count()) => 0 3.4. Selecting elements 9 How to Solve it using Python, Release 0.1.1 itertools.dropwhile(predicate, iter) discards elements while the predicate returns true, and then returns the rest of the iterable’s results. itertools.dropwhile(less_than_10, itertools.count()) => 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... itertools.dropwhile(is_even, itertools.count()) => 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... • There is more magic in both collections and itertools. Please refer to the standard library documentation. 10 Chapter 3. Interesting Modules that ease our tasks CHAPTER FOUR LET’S START WITH STRINGS The main tool Python gives us to process text is strings - immutable sequence of characters.In Python2, there are two kinds of strings: plain strings, which contain 8-bit (ASCII) characters; and Unicode Strings, which contain Unicode characters.In Python3, there is only one string - which is unicode string. Strings are immutable, which means that no matter what operations you do on a string, you will always produce a new string object, rather than mutating the existing string. In general, you can do anything to a string that you can do to any other sequence, as long as it doesn’t require changing the sequence, since the strings are immutable. list_of_lines = one_large.string.splitlines() one_large_string = ’\n’.join(list_of_lines) Characters of a string. list("python") map(somefunc,"python") map(lambda x:x,"python") sets.Set("python") Reversing For sequences, the extended slicing with negative step can be used for reversing. [::-1] There is reversed built-in function which can also be used for reversing the words in a sentence. But for reversing the characters in a string, if reversed is used, then a ‘’.join needs to be called. The reversed returns an iterator, suitable for looping on or for passing to some “accumulator” callable such as ‘’.join. It does not return a ready made string. string functions string.translate(s, table[, deletechars]) Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal. If table is None, then only the character deletion step is performed. string.maketrans(from, to) Return a translation table suitable for passing to translate(), that will map each character in from into the character at the same position in to; from and to must have the same length. Note: Don’t use strings derived from lowercase and uppercase as arguments; in some locales, these don’t have the same length. For case conversions, always use str.lower() and str.upper(). • Look at the string examples for scripts that demonstrate these functions. Examples of string.Template method 11 How to Solve it using Python, Release 0.1.1 import string # make a template from string where some identifiers are marked with $ template = string.Template(’this is a $template’) print template.substitute({’template’:5}) print template.substitute({’template’:’five’}) # even keyword arguments is possible print template.substitute(template=5) print template.substitute(template=’five’) builtin repr ord chr unichr python2 obj to str c to int i to c, 0 <= i < 256, i to u, 0 <= i < 0x10ffff python3 obj to str c to int, valid surrogate pair accepted. i to u , 0 <= i < 0x10ffff NA string, unicode string, bytes, bytestring Strings are sequence of characters (ascii in python2 and unicode - python3), e.g. an HTML Document.Bytes are not characters an JPEG Document. A general rule of thumb is Bytesting contains encoded data and a unicode object contains unencoded data. • bytes object have a decode() method that takes a character encoding and returns a string. • string object has a encode method that takes a a character encoding and returns a bytes object. Python 2.0 had strings and Unicode Strings. Python 3.0 has strings. That is it.But you wont miss 8 bit strings which acted as bytes in python2, because there is a separate bytes datatype. Reindent a particular string def reindent(s, numSpaces=1): leading_space = numSpaces * ’ ’ lines = [ leading_space + line.strip() for line in s.splitlines() ] return ’\n’.join(lines) sentence = """ This is a sentence which is not formatted. You this right? How would it be good be really well formattted? Let us try it? """ print sentence print ’-’*80 print reindent(sentence) 12 Chapter 4. Let’s start with strings CHAPTER FIVE FILES - WE HANDLE THEM OFTEN • The builtin function open opens the file and returns a stream. It raises an IOError upon failure. • A variant of ‘r’ that is sometimes precious is ‘rU’, which tells Python to read the file in text mode with “universal newlines”: mode ‘rU’ can read text files independently of the line termination convention the files are using, be it the Unix way, the Windows way or even the (Old) Mac way. (Mac OS X today uses Unix for all intents and purposes, but releases of Mac OS 9 and earlier were different). • When read is called with an integer argument N, it reads and returns the next N bytes (or all the remaining bytes if less than N bytes remain) • Files have other writing related methods such as flush, to send any data being buffered, and writelines, to write a sequence of strings in a single call. However, write is by far the most commonly used method. • Other methods worth mentioning are seek and tell, which support random access to files. These methods are normally used with binary files made up of fixed-length records. • StringIO objects are plug-and-play compatible with file objects, so scanner takes its three lines of text from an in-memory string object, rather than a true external file.This shows that Everywhere in Python, object interfaces, rather than specific data types are units of coupling. • Often the data you want to write is not in one bit string, but in a list (or other sequence) of strings. In this case, you should use the writelines method (which despite its name, is not limited to lines and works just as well with binary data as well as text files). Calling writelines is much faster than the alternatives of joining the strings into one big string (e.g, with the ‘’.join) and then calling write or calling write repeatedly in a loop. Calling close is even more advisable when you are writing to a file than you are reading from a file. • With ZipFile, the flag is not used the same way when opening a file, and rb is not recognized. The r flag handles the inherently binary nature of all zip files on all platforms. StringIO object as a fileobject from cStringIO import StringIO def scanner(fileobject, linehandler): for line in fileobject: linehandler(line) def firstword(line): print line.split()[0] string = StringIO(’one\ntwo xxx\nthree\n’) scanner(string, firstword) 13 How to Solve it using Python, Release 0.1.1 14 Chapter 5. Files - we handle them often CHAPTER SIX DATE AND TIME RELATED • One of the most used function is from the module time which is time.time. This returns the number of seconds passed since a fixed instant called the epoch, it is usually the midnight of 1 Jan 1970. To find out which epoch, your platform uses: >>> import time >>> print time.asctime(time.gmtime(0)) • time.gmtime - converts any timestamp into a tuple without TZ convertion. • time.asctime- represents it in human readable way. Here is a way to unpack the tuple representing the current local time: >>> time.localtime() time.struct_time(tm_year=2010, tm_mon=2, tm_mday=10, tm_hour=20, tm_min=43, tm_sec=38, tm_wday=2, tm_yday=41, tm_isdst=0) >>> year, month, mday, hour, minute, sec, wday, yday, isdst = time.localtime() >>> time.localtime().tm_mday, time.localtime().tm_mon, time.localtime().tm_year (10, 2, 2010) • calling time.localtime, time.gmtime, time.asctime without any argument, each of them conveniently defaults to using the current time. • time.strftime - builds a string from a time tuple. • time.strptime - produces a time tuple from a string. • time.sleep - helps you introduce delays in your python programs. The POSIX sleep accepts only the seconds delay, i.e. the integer value, but Python version accepts float and allows for sub-second delays. • The datetime module provides better abstractions to deal with dates and times.: >>> today = datetime.date.today() >>> print(today) 2010-02-10 >>> birthday = datetime.date(1987,3,9) >>> print(birthday) 1987-03-09 >>> currenttime = datetime.datetime.now().time() >>> lunchtime = datetime.time(13,00) 15 How to Solve it using Python, Release 0.1.1 >>> now = datetime.datetime.now() >>> epoch = datetime.datetime(1970,1,1) >>> meeting = datetime.datetime(2010,2,17,15,30) >>> print(meeting) 2010-02-17 15:30:00 Parse the RFC 1123 date format >>> >>> >>> >>> >>> datereturned = "Thu, 01 Dec 1994 16:00:00 GMT" dateexpired = "Sun, 05 Aug 2007 03:25:42 GMT" obj1 = datetime.datetime(*time.strptime(datereturned, "%a, %d %b %Y %H:%M:%S %Z")[0:6]) obj2 = datetime.datetime(*time.strptime(dateexpired, "%a, %d %b %Y %H:%M:%S %Z")[0:6]) if obj1 == obj2: print "Equal" elif obj1 > obj2: print datereturned elif obj1 < obj2: print dateexpired • The datetime objects are immutable. Useful when using in sets and dictionaries.: >>> today = datetime.date.today() >>> next_year = today.replace(year=today.year+1) SyntaxError: invalid syntax >>> print(next_year) 2011-02-10 • dateutil and mxDatetime are two third party utils that are worth looking at too. 16 Chapter 6. Date and time related CHAPTER SEVEN PROCESSING XML • XML is the open standards way of exchanging information. • Dealing with XML is not very seamless, there is always a requirement to write code that reads (i.e, deserializes or parses) and writes (i.e, serializes) XML. • It is important to note that modules in the xml package require that there be at least one SAX-compliant XML parser available. Starting with Python 2.3, the Expat parser is included with Python, so the xml.parsers.expat module will always be available. * xml.dom and xml.sax packages are the definition of the Python bindings for the DOM and SAX interfaces. Parsing XML using xml.etree module First of all understand that Element Tree is a tree datastructure. It represents the XML document as a Tree. The XML Nodes are Elements. (Thus Element Tree) Now, if I were to structure an html document as a element tree.: <html> | <head> ------/ \ | <title> <meta> <body> / | \ <h1> <h2> <para> / \ <li> <li> The Element type is a flexible container object, designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary. The C implementation of xml.etree.ElementTree is available as xml.etree.cElementTree: <html> <head> <title>Example page</title> </head> <body> <p>Moved to <a href="http://example.org/">example.org</a> or <a href="http://example.com/">example.com</a>.</p> </body> </html> Example of changing the attribute “target” of every link in first paragraph: 17 How to Solve it using Python, Release 0.1.1 >>> from xml.etree.ElementTree import ElementTree >>> tree = ElementTree() >>> tree.parse("index.xhtml") <Element html at b7d3f1ec> >>> p = tree.find("body/p") # Finds first occurrence of tag p in body >>> p <Element p at 8416e0c> >>> links = p.getiterator("a") # Returns list of all links >>> links [<Element a at b7d4f9ec>, <Element a at b7d4fb0c>] >>> for i in links: # Iterates through all found links ... i.attrib["target"] = "blank" >>> tree.write("output.xhtml") 18 Chapter 7. Processing XML CHAPTER EIGHT DEALING WITH DATABASES There are only two kinds of computer programs: toy programs and programs that interact with some kind of persistent databases. • sqllite database is included as part of Python’s standard library. • Python provides a number of built-in facilities for storing and retrieving data. pickle module should be used always and marshal exists for the purposes of ‘pyc’ files. using sqlite3 SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage. It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle. To use the module, you must first create a Connection object that represents the database. Here the data will be stored in the /tmp/example file: conn = sqlite3.connect(’/tmp/example’) You can also supply the special name :memory: to create a database in RAM. Once you have a Connection, you can create a Cursor object and call its execute() method to perform SQL commands: c = conn.cursor() # Create table c.execute(’’’create table stocks (date text, trans text, symbol text, qty real, price real)’’’) # Insert a row of data c.execute("""insert into stocks values (’2006-01-05’,’BUY’,’RHAT’,100,35.14)""") # Save (commit) the changes conn.commit() # We can also close the cursor if we are done with it c.close() 19 How to Solve it using Python, Release 0.1.1 8.1 pickle module In python3, The pickle module has an transparent optimizer (_pickle) written in C. It is used whenever available. Otherwise the pure Python implementation is used. In python2, there is a pure python pickle and a C implementation cPickle. There are currently 4 different protocols which can be used for pickling. • Protocol version 0 is the original human-readable protocol and is backwards compatible with earlier versions of Python. • Protocol version 1 is the old binary format which is also compatible with earlier versions of Python. • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. • Protocol version 3 was added in Python 3.0. It has explicit support for bytes and cannot be unpickled by Python 2.x pickle modules. This is the current recommended protocol, use it whenever it is possible. Refer to PEP 307 for information about improvements brought by protocol 2. See pickletools‘s source code for extensive comments about opcodes used by pickle protocols. The following can be pickled. • None, True, and False • integers, floating point numbers, complex numbers • strings, bytes, bytearrays • tuples, lists, sets, and dictionaries containing only picklable objects • functions defined at the top level of a module • built-in functions defined at the top level of a module • classes that are defined at the top level of a module • instances of such classes whose __dict__ or __setstate__() is picklable (see section Pickling Class Instances for details) For the simplest code, use the dump() and load() functions. #!/usr/bin/python3.1 import pickle # An arbitrary collection of objects supported by pickle. data = { ’a’: [1, 2.0, 3, 4+6j], ’b’: ("character string", b"byte string"), ’c’: set([None, True, False]) } with open(’data.pickle’, ’wb’) as f: # Pickle the ’data’ dictionary using the highest protocol available. pickle.dump(data, f, pickle.HIGHEST_PROTOCOL) The following example reads the resulting pickled data. #!/usr/bin/python3.1 import pickle 20 Chapter 8. Dealing with Databases How to Solve it using Python, Release 0.1.1 with open(’data.pickle’, ’rb’) as f: # The protocol version used is detected automatically, so we do not # have to specify it. data = pickle.load(f) 8.1. pickle module 21 How to Solve it using Python, Release 0.1.1 22 Chapter 8. Dealing with Databases CHAPTER NINE PROCESS HANDLING 9.1 subprocess module The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. 9.2 subprocess.call(*popenargs, **kwargs) This is convenience function provided by subprocess module which executes the command given by the argument, when shell=True is the shell variables are expanded in the command line. # using os.system() you could call the external operating system calls # That is equivalent to call() method from subprocess. # Optional shell argument provides the facility to pass the shell variables. # Doing it os.system way import os import subprocess os.system(’date’) # Doing it subprocess way subprocess.call(’date’) # Accessing shell variables # Since we set shell=True, the shell variables are expanded in the command line subprocess.call(’echo $PATH’,shell=True) 9.3 Popen method subprocess module defines a class called Popen. :: class subprocess.Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0) • subprocess.PIPE: Special value that can be used as the stdin, stdout or stderr argument to Popen and indicates that a pipe to the standard stream should be opened. 23 How to Solve it using Python, Release 0.1.1 • subprocess.STDOUT: Special value that can be used as the stderr argument to Popen and indicates that standard error should go into the same handle as standard output. import subprocess print ’subprocess demo: reading output of child process’ proc = subprocess.Popen(’echo "Hello,World"’,shell=True, stdout=subprocess.PIPE) cout = proc.communicate()[0] print cout print ’subprocess demo: writing to the input of a pipe’ proc = subprocess.Popen(’cat -’, shell=True, stdin=subprocess.PIPE) proc.communicate(’Hello, World’) print print ’subprocess replacement of popen2, reading and writing through pipes’ proc = subprocess.Popen(’cat -’, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE) cout = proc.communicate(""" We were so poor we couldn’t afford a watchdog. we’d bark ourselves. -- Crazy Jimmy """)[0] If we heard a noise at night, print cout print ’subprocess replacement of popen3, handling stdin, stdout and stderr’ proc = subprocess.Popen(’cat -;echo "This will be an Error Message" 1>&2’, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE ) cout, cerr = proc.communicate(’This is the Input Message’) print cout print cerr 9.4 Writing a Task Scheduler # Scheduling Commands. # Credit: Peter Cogolo import import import import time os sys sched schedule = sched.scheduler(time.time, time.sleep) def perform_command(cmd, inc): schedule.enter(inc, 0, perform_command, (cmd, inc)) 24 Chapter 9. Process Handling How to Solve it using Python, Release 0.1.1 os.system(cmd) def main(cmd, inc=60): schedule.enter(0, 0, perform_command, (cmd, inc)) schedule.run() if __name__ == ’__main__’: numargs = len(sys.argv) - 1 if numargs < 1 or numargs > 2: print "usage: " + sys.argv[0] + " command [seconds_delay]" sys.exit(1) cmd = sys.argv[1] if numargs < 3: main(cmd) else: inc = int(sys.argv[2]) main(cmd, inc) 9.4. Writing a Task Scheduler 25 How to Solve it using Python, Release 0.1.1 26 Chapter 9. Process Handling CHAPTER TEN NETWORK PROGRAMMING 10.1 socket module The Python socket module provides direct access to the standard BSD socket interface, which is available on most modern computer systems. The advantage of using Python for socket programming is that socket addressing is simpler and much of the buffer allocation is done automatically. In addition, it is easy to create secure sockets and several higher-level socket abstractions are available. To create a server, you need to: 1. create a socket 2. bind the socket to an address and port 3. listen for incoming connections 4. wait for clients 5. accept a client 6. send and receive data To create a client, you need to: 1. create a socket 2. connect to the server 3. send and receive data import socket host = ’127.0.0.1’ port = 50000 size = 1024 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((host, port)) s.send("Hello,World") data = s.recv(size) s.close() print ’Received:’, data import socket host = ’127.0.0.1’ 27 How to Solve it using Python, Release 0.1.1 port = 50000 backlog = 5 size = 1024 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind((host, port)) s.listen(backlog) while True: client, address = s.accept() data = client.recv(size) if data: client.send(data) client.close() 10.2 Connecting to IRC and logging the messages import socket SERVER = ’irc.freenode.net’ PORT = 6667 NICKNAME = ’phoe6’ # REPLACE WITH YOUR USERNAME CHANNEL = ’#python’ # CHANGE CHANNEL IF DESIRED IRC = socket.socket(socket.AF_INET, socket.SOCK_STREAM) def irc_conn(): IRC.connect((SERVER,PORT)) def send_data(command): IRC.send((command + ’\r\n’).encode(’utf-8’)) def join(channel): send_data("JOIN %s" % channel) def login(nickname, username=’phoe6’, password=None, realname=’Senthil’, hostname=’freenode’, servername=’Server’): import getpass password = getpass.getpass(’Enter password for %s:’ % nickname) send_data("PASS %s" % password) send_data("USER %s %s %s %s" % (username, hostname, servername, realname)) send_data("NICK %s" % nickname) def part(): send_data("PART") irc_conn() login(NICKNAME) join(CHANNEL) try: while True: buffer = IRC.recv(1024) msg = buffer.split() print(msg) if msg[0] == "PING": # answer PING with PONG, as RFC 1459 specifies send_data("PONG %s" % msg[1]) if msg[1] == ’PRIVMSG’: 28 Chapter 10. Network Programming How to Solve it using Python, Release 0.1.1 nick_name = msg[0][:msg[0].find("!")] message = ’ ’.join(msg[3:]) print(nick_name.lstrip(’:’), ’->’, message.lstrip(’:’)) finally: part() 10.2. Connecting to IRC and logging the messages 29 How to Solve it using Python, Release 0.1.1 30 Chapter 10. Network Programming CHAPTER ELEVEN WEB PROGRAMMING urllib module is available for doing a variety of web-related stuff. urllib.request - request an url. urllib.parse - parse an url. urllib.error - handle errors urllib.robotparser - handles robots.txt file. 11.1 Example of Smart Redirect Handler import urllib2 class SmartRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_302(self, req, fp, code, msg, headers): result = urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers) result.status = code return result request = urllib2.Request("http://localhost/index.html") opener = urllib2.build_opener(SmartRedirectHandler()) obj = opener.open(request) print ’I capture the http redirect code:’, obj.status print ’Its been redirected to:’, obj.url 31 How to Solve it using Python, Release 0.1.1 32 Chapter 11. Web Programming CHAPTER TWELVE UNIT TEST - SUPER COOL STUFF 12.1 How to write Unit tests The unittest module provides a rich set of tools for constructing and running tests. This section demonstrates that a small subset of the tools suffice to meet the needs of most users. Here is a short script to test three functions from the random module: import random import unittest class TestSequenceFunctions(unittest.TestCase): def setUp(self): self.seq = list(range(10)) def test_shuffle(self): # make sure the shuffled sequence does not lose any elements random.shuffle(self.seq) self.seq.sort() self.assertEqual(self.seq, list(range(10))) def test_choice(self): element = random.choice(self.seq) self.assert_(element in self.seq) def test_sample(self): self.assertRaises(ValueError, random.sample, self.seq, 20) for element in random.sample(self.seq, 5): self.assert_(element in self.seq) if __name__ == ’__main__’: unittest.main() The unittest module can be used from the command line to run tests from modules, classes or even individual test methods: python -m unittest test_module1 test_module2 python -m unittest test_module.TestClass python -m unittest test_module.TestClass.test_method You can pass in a list with any combination of module names, and fully qualified class or method names. 33 How to Solve it using Python, Release 0.1.1 You can run tests with more detail (higher verbosity) by passing in the -v flag: python-m unittest -v test_module For a list of all the command line options: python -m unittest -h The command line can also be used for test discovery, for running all of the tests in a project or just a subset. 12.2 Test Discovery unittest supports simple test discovery. For a project’s tests to be compatible with test discovery they must all be importable from the top level directory of the project; i.e. they must all be in Python packages. Test discovery is implemented in TestLoader.discover(), but can also be used from the command line. The basic command line usage is: cd project_directory python -m unittest discover The discover sub-command has the following options: -v, --verbose Verbose output -s directory Directory to start discovery (‘.’ default) -p pattern Pattern to match test files (‘test*.py’ default) -t directory Top level directory of project (default to start directory) The -s, -p, & -t options can be passsed in as positional arguments. The following two command lines are equivalent: python -m unittest -s project_directory -p ’*_test.py’ python -m unittest project_directory ’*_test.py’ 34 Chapter 12. Unit test - Super Cool stuff CHAPTER THIRTEEN USEFUL MODULES 13.1 Performance measurements using timeit module $ python -mtimeit -s’import random; x=range(1000); random.shuffle(x)’ ’y=list(x); y.sort()’ 1000 loops, best of 3: 452 usec per loop $ python -mtimeit -s’import random; x=range(1000); random.shuffle(x)’ ’x.sort()’ 10000 loops, best of 3: 37.4 usec per loop $ python -mtimeit -s’import random; x=range(1000); random.shuffle(x)’ ’sorted(x)’ 1000 loops, best of 3: 462 usec per loop $ 13.2 2to3 tool Run ./2to3 to convert stdin (-), files or directories given as arguments. 2to3 must be run with at least Python 2.6. The intended path for migrating to Python 3.x is to first migrate to 2.6 (in order to take advantage of Python 2.6’s runtime compatibility checks). • In the tutorial source files, run 2to3 example_for_2to3.py and analyze the output. 35 How to Solve it using Python, Release 0.1.1 36 Chapter 13. Useful modules CHAPTER FOURTEEN REFERENCES • http://docs.python.org - Python Tutorial and Library Documentations • Python Cookbook - by Alex Martelli, Anna Martelli and David Ascher. • Python Community Recipes at ActiveState. 37