PDF document - New Mexico Institute of Mining and Technology
Transcription
PDF document - New Mexico Institute of Mining and Technology
pycbc: A Python interface to the Christmas Bird Count database John W. Shipman 2016-03-08 17:52 Abstract Describes a database to represent data from the Audubon Christmas Bird Counts, and a Python-language interface to that database. This publication is available in Web form1 and also as a PDF document2. Please forward any comments to [email protected]. Table of Contents 1. Introduction and scope ............................................................................................................. 3 2. Downloadable files .................................................................................................................. 3 3. Glossary .................................................................................................................................. 4 3.1. Count ........................................................................................................................... 4 3.2. Circle ............................................................................................................................ 4 3.3. Year number ................................................................................................................. 4 3.4. Circle-year .................................................................................................................... 4 3.5. Year key ........................................................................................................................ 5 3.6. Kind of bird .................................................................................................................. 5 3.7. Count week birds ........................................................................................................... 5 4. General design notes ................................................................................................................ 5 4.1. Attributes of the principal entities ................................................................................... 6 4.2. SQL considerations ...................................................................................................... 11 5. Using the pycbc interface ...................................................................................................... 12 5.1. The CBCData class ....................................................................................................... 13 5.2. The Nation class ........................................................................................................ 14 5.3. The Region class ........................................................................................................ 15 5.4. The Physio class ........................................................................................................ 15 5.5. The Circle class ........................................................................................................ 15 5.6. The Effort class ........................................................................................................ 16 5.7. The Census class ......................................................................................................... 17 6. The SQL schema .................................................................................................................... 17 6.1. pycbc.py: Prologue .................................................................................................... 18 6.2. Imports ....................................................................................................................... 18 1 2 http://www.nmt.edu/~shipman/z/cbc/pycbc/ http://www.nmt.edu/~shipman/z/cbc/pycbc/pycbc.pdf New Mexico Tech Computer Center pycbc: Python interface for the CBC database 1 6.3. Manifest constants ....................................................................................................... 6.4. class CBCData: The database interface ...................................................................... 6.5. The nations table ...................................................................................................... 6.6. The regions table ...................................................................................................... 6.7. The physios table ....................................................................................................... 6.8. The circles table ...................................................................................................... 6.9. The cir_reg table ...................................................................................................... 6.10. The cir_physio table ............................................................................................... 6.11. The efforts table ..................................................................................................... 6.12. The censuses table ................................................................................................... 6.13. Object-relational mapping ........................................................................................... 6.14. CBCData.__init__(): Constructor .......................................................................... 6.15. CBCData.genNations() ......................................................................................... 6.16. CBCData.getNation() ........................................................................................... 6.17. CBCData.genRegions() .......................................................................................... 6.18. CBCData.getRegion() ........................................................................................... 6.19. CBCData.genPhysios() ......................................................................................... 6.20. CBCData.getPhysio() ........................................................................................... 6.21. CBCData.getRegionCircle() ............................................................................... 6.22. CBCData.genCircles() ......................................................................................... 6.23. CBCData.genCirclesByName() ............................................................................. 6.24. CBCData.genRegionCircles() ............................................................................. 6.25. CBCData.genPrimaryRegionCircles() ............................................................... 6.26. CBCData.genCirclesByPhysio() ......................................................................... 6.27. CBCData.getCircle(): Retrieve a specific circle ....................................................... 6.28. CBCData.genEfforts() ......................................................................................... 6.29. CBCData.getEffort(): Retrieve a specific effort record ............................................ 6.30. CBCData.overlappers(): Find overlapping circles .................................................. 6.31. CBCData.degMinAdd(): Lat/long arithmetic ............................................................ 6.32. CBCData.overlapCheck(): Do these circles overlap? ............................................... 6.33. CBCData.__circleSep(): Compute the separation of two circles .............................. 6.34. CBCData.__terraCircle(): Convert a circle center to a terrestrial position .............. 7. The staticloader script: Populate the static tables ................................................................ 7.1. staticloader: Prologue ............................................................................................ 7.2. staticloader: main() ............................................................................................. 7.3. staticloader: loadNations() ............................................................................... 7.4. staticloader: addNation ....................................................................................... 7.5. staticloader: loadRegions() ............................................................................... 7.6. staticloader: addRegion() ................................................................................... 7.7. staticloader: loadPhysios() ................................................................................ 7.8. staticloader: addPhysio() ................................................................................... 7.9. staticloader: check() ........................................................................................... 7.10. staticloader: Epilogue ........................................................................................... 8. Conversion from the old MySQL database ............................................................................... 8.1. Schema of the 1998 database ......................................................................................... 8.2. mycbc.py: Interface to the 1998 database ...................................................................... 8.3. class MyCBC: Interface to the old database ................................................................. 8.4. MyCBC.__init__() ................................................................................................... 8.5. MyCBC.__mapTable: Locate and bind a table ............................................................... 8.6. MyCBC.genCirs(): Generate all circles ....................................................................... 8.7. MyCBC.genStnds(): Generate all the circle-years for a given circle ................................ 8.8. MyCBC.getEff(): Retrieve the eff row for a given circle-year ..................................... 2 pycbc: Python interface for the CBC database 18 22 22 23 23 24 24 25 26 27 28 30 30 31 31 31 32 32 32 33 33 33 33 34 34 35 35 35 37 38 39 40 40 41 41 43 44 44 45 45 46 46 47 47 47 49 50 53 54 54 55 55 New Mexico Tech Computer Center 8.9. MyCBC.getAsPub(): Retrieve the aspub row for a circle-year ...................................... 8.10. MyCBC.genCens(): Generate census records for one circle-year ................................... 9. transloader: Copy over the MySQL database ....................................................................... 9.1. transloader: Prologue .............................................................................................. 9.2. transloader: main() ............................................................................................... 9.3. transloader: readPassword() ............................................................................... 9.4. transloader: dbCopy() ........................................................................................... 9.5. transloader: copyCir(): Copy data for one circle .................................................... 9.6. transloader: addCircle() ..................................................................................... 9.7. transloader: addCircleYear() ............................................................................. 9.8. transloader: addCensus() ..................................................................................... 9.9. transloader: Epilogue .............................................................................................. 10. Static data files ..................................................................................................................... 55 55 55 56 56 57 57 58 59 60 61 61 61 1. Introduction and scope The National Audubon Society has been conducting the Christmas Bird Count (CBC) since 1900. The author has been working with digital representations of this database since 1975. This document represents a complete redesign of a previous version of the database. • This database is an older version of the current database3 maintained by the National Audubon Society. The author's current work is not an attempt to provide a parallel database. It is mainly of interest as an example of contemporary database design, and it also supports the author's work as the New Mexico regional editor of the CBC. • The starting point for the current work was the database design documented in the 1998 database specification4. The current effort is a reimplementation of the data in this older database, with an improved design based on third normal form database normalization5. • The current work is also a case study in database implementation using the Python programming language6 and the SQLAlchemy7 object-relational database mapping system. It will be implemented using the Postgresql8 database engine, and the data will be loaded from a representation of the old database that uses the MySQL9 database engine. Details of this translation process are discussed in Section 8, “Conversion from the old MySQL database” (p. 47). 2. Downloadable files • pycbc.py10: The Python module defined here. For the user interface, see Section 5.1, “The CBCData class” (p. 13). For internals, see Section 6, “The SQL schema” (p. 17). • nationlist11: The file that defines the nation codes; see Section 10, “Static data files” (p. 61). • regionlist12: The file that defines the region (state or province) codes; see Section 10, “Static data files” (p. 61). 3 http://www.audubon.org/bird/cbc/index.html http://www.nmt.edu/~shipman/z/cbc/db_spec.html 5 http://en.wikipedia.org/wiki/3NF 6 http://www.python.org/ 7 http://www.sqlalchemy.org/ 8 http://en.wikipedia.org/wiki/Postgresql 9 http://en.wikipedia.org/wiki/Mysql 10 http://www.nmt.edu/~shipman/z/cbc/pycbc/pycbc.py 11 http://www.nmt.edu/~shipman/z/cbc/pycbc/nationlist 12 http://www.nmt.edu/~shipman/z/cbc/pycbc/regionlist 4 New Mexico Tech Computer Center pycbc: Python interface for the CBC database 3 • physiolist13: The file that defines the physiographic region code system of the USGS Bird Banding Lab. A slightly edited version of the list scraped from the Bird Banding Lab's page14. See Section 10, “Static data files” (p. 61). • staticloader15: A script that initializes the database and loads the three static tables. See Section 7, “The staticloader script: Populate the static tables” (p. 40). • transloader16: A script that reads the old MySQL database and reformats it as the current Postgresql database. See Section 9, “transloader: Copy over the MySQL database” (p. 55). • This document was written using DocBook 4.317, according to the New Mexico Tech Computer Center's publication, Writing documentation with DocBook-XML 4.318. You can examine the XML source for the present document19. 3. Glossary Because the selection of good names is so important, let's start by defining some terms, and also discussing some terms that are problematic. 3.1. Count This is a highly problematic term. It may refer to: the entire institution of the CBC; only those circles counted within a given year; only one circle counted in one year; or the number of birds of a given kind. Consequently, the use of this term without abundant context is discouraged. 3.2. Circle This term generally refers to one of the 15-mile-diameter circles that are the standard unit of counting. However, in many cases, especially pelagic transects, the term may refer to areas of some other shape. 3.3. Year number The First Christmas Bird Count was run on Christmas Day in 1900. Later, the count period was extended to allow counts on dates from mid-December through early January. Each yearly cycle is published separately. We will use the term year number to mean 1 for the first year of the CBC, 2 for the second, and so on. In general, year number N includes dates in December of year 1899+N and January of year 1900+N. 3.4. Circle-year In the published data, each year is published in a single periodical. We use the term “circle-year” to mean one circle counted in one year. 13 http://www.nmt.edu/~shipman/z/cbc/pycbc/physiolist http://www.mbr-pwrc.usgs.gov/bbs/physio.html 15 http://www.nmt.edu/~shipman/z/cbc/pycbc/staticloader 16 http://www.nmt.edu/~shipman/z/cbc/pycbc/transloader 17 http://en.wikipedia.org/wiki/DocBook 18 http://www.nmt.edu/tcc/help/pubs/docbook/ 19 http://www.nmt.edu/~shipman/z/cbc/pycbc/pycbc.xml 14 4 pycbc: Python interface for the CBC database New Mexico Tech Computer Center 3.5. Year key We use the term year key to mean any value that uniquely identifies a circle within one year number. • For year numbers 1–90 in the database, circles are usually numbered starting from 1 up to the total number of circles counted that year. For example, in the 88th CBC, there were 1502 circles counted, numbered 0001-1502 in the database. However, there are a few published counts that were added to the sequence late in the publication cycle. For example, in year number 075, between circles number 0027 and 0028 there are two counts numbered 0027a and 0027b. In the actual publication, these numbers started appearing only around mid-1960s. The author handnumbered his copies for earlier years. • For years 91 and later in the database, the year key is a four-letter code of the form RRKK, where RR is the two-letter region code and KK is a two-letter code unique within that region. For example, the Zuñi, NM circle has code “NMZU”. 3.6. Kind of bird In order to understand how we represent the kinds of bird seen, see A system for representing bird taxonomy20, which describes a system of six-letter codes for describing kinds of birds. That system allows for single kinds of birds, “species pairs” (e.g., we think it was either a Hammond's or Dusky flycatcher), or hybrids (e.g., it looked like a hybrid of Blue-winged and Cinnamon teal). In general, the representation is a triple (form, rel, altForm). The form is the first or only six-letter code. The rel value is the relationship code, blank for single forms, “×” for hybrids; and “/” for species pairs. If the relationship code is not blank, the altForm value is the second form's six-letter code. If there are two codes, we stipulate that form < altForm, so we don't have to search twice to pick up a given hybrid or pair. Additionally, this database sometimes represents the sighting's age category (adult, immature, or female/immature) or sex category (male or female). 3.7. Count week birds Born from the frustration of participants who find interesting birds in advance of the official day but cannot locate them on the official day, Audubon has long been publishing records “seen count week but not count day”. Careful researchers who use weighted analyses such as “birds per party-hour” will want to exclude these records from their data sets, but other consumers of the CBC data will be quite interested in them. 4. General design notes The first step in redesigning the older database21 is to represent it as an entity-relationship model22. 20 http://www.nmt.edu/~shipman/xnomo http://www.nmt.edu/~shipman/z/cbc/db_spec.html 22 http://en.wikipedia.org/wiki/Entity-relationship_diagram 21 New Mexico Tech Computer Center pycbc: Python interface for the CBC database 5 circle Describes one 15-mile-diameter count circle by the latitude and longitude of its center. region One political region: a U.S. state, a Canadian province, or the only other nation represented in this database, the circle located on the French islands of Saint Pierre and Miquelon. nation One country. physio A physiographic region stratum code as defined by the U.S. Fish & Wildlife Service's Bird Banding Lab. For the authority file, see Section 2, “Downloadable files” (p. 3). This code is useful for grouping circles by their biogeographic similarity. Many circle records have not been coded for physiographic strata. Circles may have up to two physiographic strata codes, and for those that have two, the first code is the major stratum and the second the minor stratum, so the ordering of the two codes is important. effort This entity describes one year in which there is a published census of the circle. kind of bird This entity represents a specific kind of bird seen in one year, and the number of individuals of that kind that the counters saw. Note that there may be many records for a given species within one effort entity, differing by several details: age; sex; whether seen count day or only during count week; or whether the identification is in question. Note also that on a few occasions an effort has resulted in zero birds (mainly in remote parts of Alaska and Canada), but this is still considered a valid count. 4.1. Attributes of the principal entities The tables that represent the entities described above will carry the names of those entities, as plurals. Here are the attributes of these tables, and some discussion of how they are derived from the old database. 4.1.1. Attributes of the nations table For the script that loads this table, see Section 7, “The staticloader script: Populate the static tables” (p. 40). nation_code Three-character code for the country. nation_name Full name of the country. 6 pycbc: Python interface for the CBC database New Mexico Tech Computer Center 4.1.2. Attributes of the regions table For the script that loads this table, see Section 7, “The staticloader script: Populate the static tables” (p. 40). reg_nation National code for this region, defined in Section 4.1.1, “Attributes of the nations table” (p. 6). reg_code Two-character postal code, e.g., HI or YT. reg_name Conventional name of the region, e.g., “West Virginia” or “Province Quebec”. 4.1.3. Attributes of the circles table lat North latitude of the circle's center in degrees and minutes as ddmm. lon West longitude of the circle's center in degrees and minutes as dddmm. water Describes whether salt water occurs in the circle. This attribute is not always properly encoded, so the lack of a code does not imply a lack of salt water. Codes are: (blank) Unknown or no salt water. p Pelagic: the entire circle is in open ocean. o Some open ocean is included in the circle. e Some ocean estuary is included in the circle, but no open ocean. odd Code to indicate an area that is not the standard 15-mile-diameter circle. As with the water attribute, not all circles were properly encoded. Code values may be any of: (blank) Standard circle or unknown shape. p Pelagic-only transect. x Not a pelagic-only transect, and not a standard circle. cir_name The published name of the circle. Many circles have changed their names; generally the attribute in the circles table is the last name used for that particular center. A few standard abbreviations are used: M.A. Management Area N.M. National Monument N.P. National Park N.W.R. National Wildlife Refuge P.P. Provincial Park S.P. State Park W.M.A. Wildlife Management Area New Mexico Tech Computer Center pycbc: Python interface for the CBC database 7 4.1.4. Attributes of the physios table physio_code Two-digit code for the physiographic stratum, with left zero fill. physio_name Description of this physiographic stratum, e.g., “Southern Rockies”. 4.1.5. Attributes of the cir_reg table This table represents the many-to-many relation between circles and regions. lat, lon Link to the circles table. reg_pos Position of this region within the list of regions for the circle. This value is necessary because the regions are ordered. Values are 0 for the first or only region; 1 for the second region; 2 for the third region. reg_code Link to the regions table. 4.1.6. Attributes of the cir_physio table This table represents the many-to-many relation between circles and physiographic strata. lat, lon Link to the circles table. physio_pos Position of this stratum code: 0 for the first or only stratum, 1 for the second. 4.1.7. Attributes of the efforts table Each row of this table represents the censusing of one circle in a given year number. lat Latitude, encoded as in Section 4.1.3, “Attributes of the circles table” (p. 7). lon Longitude, encoded as in Section 4.1.3, “Attributes of the circles table” (p. 7). year_no The year number, three digits, with left zero fill. For example, “043” for the Forty-third CBC (December 1942 and January 1943). year_key A five-character key that uniquely identifies an effort within a year. In particular, this field can help a researcher rapidly find the published data in the original periodical. See Section 3.5, “Year key” (p. 5) for a discussion of what actually appeared in the published data. For year numbers 1–90, this column has format NNNNX, where NNNN is the serial number within the year, and X is either blank or a lowercase letter. For years 91 through the present, this column's format always has the four-character SSKK form. 8 pycbc: Python interface for the CBC database New Mexico Tech Computer Center See Section 8, “Conversion from the old MySQL database” (p. 47) for more information about the origin of this field. yyyymmdd The date of the count, if known. For many records this is the date of Christmas because the true date was not recorded in the old database. as_lat Published latitude, if known. May be null. as_lon Published longitude, if known. May be null. as_name Circle name as published. The old database also tracked the region codes as published, but there is no strong reason to retain these data. n_obs Number of observers; an integer, one or greater. ph_tot Note This attribute and all the remaining attributes in this table may be null. Total party-hours, to tenths. ph_foot Party-hours on foot. ph_car Party-hours by car. ph_other Party-hours by means other than foot or car. h_fd Hours (not party-hours) by feeder-watchers. h_owl Hours “owling” or, as it was later known, nocturnal birding. pm_tot Total party-miles, to tenths. pm_other Party-miles by means other than foot or car. m_owl Miles owling or other nocturnal birding. 4.1.8. Attributes of the censuses table lat Latitude, encoded as in Section 4.1.3, “Attributes of the circles table” (p. 7). lon Longitude, encoded as in Section 4.1.3, “Attributes of the circles table” (p. 7). New Mexico Tech Computer Center pycbc: Python interface for the CBC database 9 year_no The year number, encoded as in Section 4.1.7, “Attributes of the efforts table” (p. 8). seq_no Sequence number of this record within the circle and year. Audubon chose to discard this field in their database, but it is vital to the Christmas Bird Count Database corrections project23, because the records must be in the original order to be proofread efficiently against the original publication. form First or only form code describing the type of bird. Example: AMEROB for American Robin. These codes are defined in the nomenclature system specification24. rel (blank) Not a species pair or hybrid. / Species pair; the code for the second alternative is in the alt_form attribute. x Hybrid; the code for the second assumed parent form is in the alt_form attribute. alt_form Second form code when the rel attribute is not blank; null when rel is blank. Because, for example, “Downy Woodpecker/Hairy Woodpecker” is the same kind of bird as “Hairy Woodpecker/Downy Woodpecker”, the form codes are always ordered such that, lexically, the form code is less than the alt_form code so that a given species pair or hybrid will always have the same representation. age Age code. (blank) Unknown age class. a Adult. i Immature, subadult, or juvenal plumage. p Female or immature. Yes, female is a sex and not an age class, but we arbitrary place this category under age. sex Sex code. (blank) Sex unknown. m Male. f Female. plus Count-week indicator (see Section 3.7, “Count week birds” (p. 5)). Normally blank; contains “+” for count-week birds that were not seen count day. q Questionable ID flag. Normally blank; contains “q” if the editor indicated some doubt as to whether this species occurred in the circle at all. Records that are in questions due to abnormally high numbers are not flagged here. 23 24 http://www.nmt.edu/~shipman/z/cbc/proof.html http://www.nmt.edu/~shipman/xnomo 10 pycbc: Python interface for the CBC database New Mexico Tech Computer Center census Number of individuals. This is often encoded as “-1” when the number is unknown. Audubon's version of the database uses zero when the number is unknown. For count week birds, this attribute is generally -1, but a few such records contain actual numbers. 4.2. SQL considerations Each of the major tables shown in the diagram in Section 4, “General design notes” (p. 5) will become a table in the SQL representation. Two more tables are required to manage many-to-many relationships. • Table cir_reg will represent the many-to-many relation between circles and region codes. The lefthand key, linking to a circle, is lat + lon; the right-hand key, linking to a region, is reg_code. Furthermore, the relation includes ordering information: for a circle that overlaps multiple regions, there is a primary region, a secondary region, and possibly a tertiary region. Hence, the intermediate table that defines this relation must carry an additional column: 0 for the primary region, 1 for secondary region, and 2 for the tertiary region. If anyone ever censuses a circle that overlaps four or more states, we can use the numbers 3 or more. We'll call this column reg_pos, the region's position. The primary key for this table will be the concatenation of lat + lon + reg_pos, so that retrieval will produce the region codes in the correct order. We'll need to support two other queries on this table. • In order to produce a report showing all the circles that are listed primarily under a given state, we'll index on reg_code + reg_pos. • That same index, with reg_pos a wild card, will produce a report showing all the circles that occur even partially in a given state. • Table cir_physio will represent the many-to-many relation between circles and physiographic strata. The left-hand key is lat + lon and the right-hand key is physio_code. Again, there is an ordering: some circles have two stratum codes, but the first one is the principal stratum code. We will add a column named physio_pos to indicate the position of the physiographic stratum code for a given circle, with values of 0 or 1. The primary key for this table will be lat + lon + physio_pos, which produces the physiographic stratum codes with the primary code first. Here is a revision of our original entity-relationship model showing the final tables and their relations. Primary key columns are indicated with an asterisk “*”, and the arrows show the foreign key relations. New Mexico Tech Computer Center pycbc: Python interface for the CBC database 11 For the censuses table we have a choice of two unique keys. The concatenation of lat + lon + year_no + year_key + seq_no is the unique primary key. Note It would be nice if every circle were counted exactly once in a given year number, but there are hundreds of exceptions. Audubon stipulated starting in the mid-1930s that counts within one year number should not overlap, but exceptions persisted until the 55th year. This is why the year_key must be part of the primary key. We'll need another index to search for kinds of birds. Within a given circle-year, we must concatenate seven columns to insure uniqueness: form + rel + alt_form + age + sex + plus + q. So the secondary index will include these seven fields, plus lat + lon + year_no + year_key + seq_no. 5. Using the pycbc interface To access this database from a Python script, import the pycbc module and call the CBCData() constructor like this: 12 pycbc: Python interface for the CBC database New Mexico Tech Computer Center import pycbc db = pycbc.CBCData(password) If you pass the correct database password as an argument, you will be granted read-write access; otherwise any attempt to retrieve data will fail. 5.1. The CBCData class Here are the attributes and methods on this class. .engine The sqlalchemy.engine.Engine instance connecting to the Postgresql engine where the CBC database lives. .meta The schema as a sqlalchemy.schema.MetaData instance. This is not documented in the SQLAlchemy reference materials, but a MetaData instance contains a .tables attribute that is a dictionary whose keys are the names of mapped tables, and each corresponding value is the actual Table instance for that table. .Session A constructor for an SQLAlchemy session. .s A Session instance. Use this only for single-threaded applications. For multi-threaded applications, create a new Session for each thread. .Nation The mapped class for the nations table. See Section 5.2, “The Nation class” (p. 14). .Region The mapped class for the regions table. See Section 5.3, “The Region class” (p. 15). .Physio The mapped class for the physios table. See Section 5.4, “The Physio class” (p. 15). .Circle The mapped class for the circles table. See Section 5.5, “The Circle class” (p. 15). .Effort The mapped class for the efforts table. See Section 5.6, “The Effort class” (p. 16). .Census The mapped class for the censuses table. See Section 5.7, “The Census class” (p. 17). .nations_table, .regions_table, .physios_table, .circles_table, .efforts_table, .census_table The actual Table instances. .genNations() Generate a sequence of the Nation instances in ascending order by nation name. .getNation(nationCode) Return the Nation instance with a given nation code. Will raise KeyError if there is no such code. .genRegions() Generate a sequence of the Region instances in ascending order by region name. New Mexico Tech Computer Center pycbc: Python interface for the CBC database 13 .getRegion(regionCode) Return the Region instance with a given region code, or raise KeyError if there is no such code. .genPhysios() Generate the Physio instances in self, in ascending order by code. .getPhysio(code) Return the Physio instance for a given physiographic stratum code, or raise KeyError if it is not found. .getRegionCircle(regionCode, cirName) If there is a circle whose name is exactly cirName and occurs all or partly in the region whose code is regionCode, return the corresponding Circle object, otherwise raise KeyError. .genCircles() Generates the Circle instances in ascending order by latitude + longitude. .genCirclesByName(prefix) Generates all the Circle instances whose names begin with the given prefix, in ascending order by circle name. .genRegionCircles(regionCode) Generate Circle instances for all the circles that occur in any part of the region with a given code, in ascending order by circle name. .genPrimaryRegionCircles(regionCode) Generate Circle instances for all the circles that are listed under the region with a given code; that is, the circles for which the given region code is the first one displayed. To find all the circles that occur even partially in a given region, use the .circles attribute of a Region instance. .genCirclesByPhysio(physioCode) Generate all the Circle instances that are associated with the physiographic stratum whose code is physioCode. .getCircle(lat, lon) Given a latitude as a string "ddmm" and a longitude as "dddmm", return the corresponding Circle instance. If that center is not in the database, the method will raise a KeyError. .genEfforts() Generate all the Effort records in self in primary key order. .overlappers(fromCircle) Use this method to find other circles that overlap a given Circle instance fromCircle. The return value is a list of tuples (pct, c) where pct is the percentage of their areas that overlap, in the open interval (0.0,100.0), and c is the overlapping Circle instance, and the list is in descending order by the pct value. .getEffort(year_no, year_key) Returns the Effort instance for the given year number and year key. 5.2. The Nation class An instance of this class represents one nation. .nation_code National code, e.g., “USA”. 14 pycbc: Python interface for the CBC database New Mexico Tech Computer Center .nation_name Full name of the nation, e.g., “United States of America”. .regions An iterator that produces all the Region instances in this nation, in ascending order by region code. 5.3. The Region class An instance of this class represents one state or province or the islands of St. Pierre et Miquelon. .reg_code Region code, e.g., “NM” for New Mexico. .reg_name Region name, e.g., “New Mexico”. .nation The Nation instance for the nation containing this region. .circles An iterator that produces all the circles that have at least some area in this region. 5.4. The Physio class Mapped to the table of physiographic strata. .physio_code Two-digit code for this stratum, as a string. .physio_name Full description of the stratum, e.g., “Closed Boreal Forest”. .circles An iterator that produces Circle instances for all the circles that containing the stratum. 5.5. The Circle class Each instance represents a circle with its center at a specific latitude and longitude, to the nearest minute. .lat Latitude as a string of four digits, "ddmm". .lon Longitude as a string of five digits, "dddmm". .water Water-body code: " ", "p", "o", or "e". .odd Odd-shape code, " ", "p", or "x". .cir_name Full name of the circle, as Unicode. .regions An iterator that produces the Region instances for this circle's regions, in standard order as published. New Mexico Tech Computer Center pycbc: Python interface for the CBC database 15 .physios An iterator that produces the Physio instances for this circle's physiographic strata. If there are two, the first is the major stratum and the second is the minor stratum. .efforts An iterator that produces the Effort instances for the circle-years this circle was counted, in chronological order. .allRegions() Returns a string of the form “R0[-R1[-R2]]”, where each Ri is one of the region codes for this circle. .fullName() This method returns a string of the form “ddmmn dddmmw R0[-R1[-R2]]: cir_name ”. Example: "3843n 08528w IN-KY: Hanover-Madison". .unicode() Returns a string like “38°43′N 085°28′W IN-KY: Hanover-Madison”. 5.6. The Effort class There is one instance for each circle-year. .lat, .lon These two fields are the composite key used to relate a circle-year to a circle. .year_no The year number as a string of three digits with left zero fill, e.g., “008”. See Section 3.3, “Year number” (p. 4). .year_key The year key for this circle-year, left-justified and blank-filled to length 5. S see the discussion of the year_key attribute in Section 4.1.7, “Attributes of the efforts table” (p. 8). Example values (where _ represents a space): 0001_ 0027_ 0027b NMZU_ .yyyymmdd Date as a datetime.date object. .n_obs Number of observers as an int. .ph_tot, .ph_foot, .ph_car, .ph_other, .h_fd, .h_owl, .pm_tot, .pm_foot, .pm_car, .pm_other, and .m_owl All these hour- and mile-based quantities use Python's decimal.Decimal type25, with a precision of one digit after the decimal point. Note that quantities in this type can be formatted using a “%f” format, as in this conversational example. 25 http://docs.python.org/library/decimal.html 16 pycbc: Python interface for the CBC database New Mexico Tech Computer Center >>> d1=decimal.Decimal('1.4') >>> d2=decimal.Decimal('3.47') >>> d3=d1+d2 >>> d3 Decimal('4.87') >>> "%6.1f" % d3 ' 4.9' >>> "%6.3f" % d3 ' 4.870' .circle The Circle instance for to this circle-year. .censuses An iterator that produces all the Census instances for this circle-year. 5.7. The Census class Each instance represents one kind of bird seen within a circle-year. .lat, .lon, .year_no, .year_key The concatenation of these columns relates each instance to the effort table. .seq_no An integer value that orders census records within a year according to their published order. .form, .rel, .alt_form These three values identify the type of bird. See Section 3.6, “Kind of bird” (p. 5). The .form value is always uppercase and right-blank-padded to a length of six. The .rel value is " ", "x", or "/". If the .rel is not blank, the .alt_form attribute is the second bird code, also uppercased and right-blank-padded to six characters. .age Age code: " ", "a", or "i". .sex Sex code: " ", "m", or "f". .plus Count-week flag: " " or "+". .q Questionable status: " " or "q. .census Count of birds as an int. The value may be −1 to signify an unknown count. There used to be at least one row in this table that had a zero census, which is an error. (Audubon's database, by contrast, uses zero for count-week birds and never shows a value of −1.) 6. The SQL schema Here we begin the actual code inside pycbc.py. The code is presented in lightweight literate programming26 style, with the name of the destination file displayed above the top right side of each code block. 26 http://www.nmt.edu/~shipman/soft/litprog/ New Mexico Tech Computer Center pycbc: Python interface for the CBC database 17 6.1. pycbc.py: Prologue The pycbc.py file starts with a brief comment that points back at this documentation. pycbc.py '''pycbc.py: SQLAlchemy Postgres model for Christmas Bird Count For complete documentation: http://www.nmt.edu/~shipman/z/cbc/pycbc/ ''' 6.2. Imports pycbc.py # - - - - - I m p o r t s SQLAlchemy has a number of sub-modules that we'll need. • • • • • The sqlalchemy.schema module supplies schema classes such as Table and Column. The sqlalchemy.types module defines column types. The sqlalchemy.orm module controls the object-relational mapper (ORM). The sqlalchemy.exc module defines the exception classes thrown by SQLAlchemy. The sqlalchemy.engine module has the create_engine() function necessary for connecting to the database backend. pycbc.py from sqlalchemy import schema, types, orm, engine, exc To handle geographical calculations, we will use the author's Python mapping package27, as well as the standard math package for trig functions. pycbc.py import math import terrapos 6.3. Manifest constants pycbc.py # - - - - - M a n i f e s t c o n s t a n t s 6.3.1. WATER_BLANK The value of the Circle.water field when no ocean is involved. pycbc.py WATER_BLANK = ' ' 6.3.2. WATER_PELAGIC Value of Circle.water for purely pelagic transects. 27 http://www.nmt.edu/~john/tcc/python/mapping/doc/ 18 pycbc: Python interface for the CBC database New Mexico Tech Computer Center pycbc.py WATER_PELAGIC = 'p' 6.3.3. WATER_OCEAN Value of Circle.water when some open ocean is included. pycbc.py WATER_OCEAN = 'o' 6.3.4. WATER_ESTUARY Value of Circle.water when some salt-water estuary is included but no open ocean. pycbc.py WATER_ESTUARY = 'e' 6.3.5. ODD_BLANK The value of Circle.odd when the circle has a normal or unknown shape and size. pycbc.py ODD_BLANK = ' ' 6.3.6. ODD_PELAGIC Value of Circle.odd for pelagic-only transects. pycbc.py ODD_PELAGIC = 'p' 6.3.7. ODD_NONSTANDARD Value of Circle.odd for non-pelagic circles not having a standard size and shape, where this is known. pycbc.py ODD_NONSTANDARD = 'x' 6.3.8. AGE_UNK The value of the Census.age field when the age class is unknown. pycbc.py AGE_UNK = ' ' 6.3.9. AGE_ADULT Value of Census.age for adults. pycbc.py AGE_ADULT = 'a' New Mexico Tech Computer Center pycbc: Python interface for the CBC database 19 6.3.10. AGE_IMM Value of Census.age for immatures. pycbc.py AGE_IMM = 'i' 6.3.11. AGE_PHI Value of Census.age for the female or immature age class. pycbc.py AGE_PHI = 'p' 6.3.12. SEX_UNK The value of the Census.sex field when the sex is unkown. pycbc.py SEX_UNK = ' ' 6.3.13. SEX_M Value of Census.sex for males. pycbc.py SEX_M = 'm' 6.3.14. SEX_F Value of Census.sex for females. pycbc.py SEX_F = 'f' 6.3.15. PLUS_CW' The value of the Census.plus field for count-week records. pycbc.py PLUS_CW = '+' 6.3.16. Q_Q The value of the Census.q field for questionable records. pycbc.py Q_Q = '?' 6.3.17. URL_FORMAT SQLAlchemy uses a URL to connect to the database engine. 20 pycbc: Python interface for the CBC database New Mexico Tech Computer Center pycbc.py URL_FORMAT = "%s://%s:%s@%s/%s" # ^ ^ ^ ^ ^ # | | | | +-- Database name # | | | +-- Host name # | | +-- Password # | +-- User name # +-- Protocol # 6.3.18. PROTOCOL The protocol part of the URL for Postgresql. pycbc.py PROTOCOL = "mysql" 6.3.19. DB_USER Database user name. pycbc.py DB_USER = "john" 6.3.20. DB_HOST pycbc.py DB_HOST = "dbhost.nmt.edu" 6.3.21. DB_NAME Name of the database. pycbc.py DB_NAME = "john" 6.3.22. DEGREE Unicode for the degree (°) symbol. pycbc.py DEGREE = u'\u00b0' 6.3.23. PRIME Unicode for the prime (′) symbol, used for minutes in latitudes and longitudes. pycbc.py PRIME = u'\u2032' 6.3.24. CIRCLE_DIAMETER Diameter of a CBC circle in miles. New Mexico Tech Computer Center pycbc: Python interface for the CBC database 21 pycbc.py CIRCLE_DIAMETER = 15.0 6.3.25. FEET_PER_MILE pycbc.py FEET_PER_MILE = 5280.0 6.3.26. OVERLAP_MINUTES This constant defines how close two circles have to be to each other, in minutes of latitude or longitude, before they are in danger of overlapping. This constant is used in Section 6.30, “CBCData.overlappers(): Find overlapping circles” (p. 35). pycbc.py OVERLAP_MINUTES = 14 6.4. class CBCData: The database interface The actual class declaration begins here. All the metadata—schema, tables, and object-relational mappings—are declared inside this class. The mapped classes are also inside this class; the author found this somewhat disturbing at first, but it makes perfect sense. For example, if a user has an instance db of class CBCData, they can refer to the circles table as db.circles_table or as CBCData.circles_table. Similarly, they can refer to the constructor the class mapped to that table either as db.Circle or CBCData.circle. pycbc.py # - - - - - c l a s s C B C D a t a class CBCData(object): '''Represents the entire Christmas Bird Count database. ''' 6.5. The nations table This table defines the nation codes used in Section 6.6, “The regions table” (p. 23). The script used to load (or reload) this table is shown in Section 7, “The staticloader script: Populate the static tables” (p. 40). Next we create an instance of the MetaData class to hold all the schema definitions. pycbc.py #================================================================ # Table declarations #---------------------------------------------------------------meta = schema.MetaData() nations_table = schema.Table('nations', meta, schema.Column('nation_code', types.CHAR(3), primary_key=True), schema.Column('nation_name', types.VARCHAR(30))) 22 pycbc: Python interface for the CBC database New Mexico Tech Computer Center class Nation(object): '''Class within a class: mapped class for the nations table. ''' def __init__(self, nation_code, nation_name): self.nation_code = nation_code self.nation_name = nation_name def __repr__(self): return ( "<Nation(%s: %s)>" % (self.nation_code, self.nation_name) ) 6.6. The regions table For each region code, this table gives the full name of the associated state or province, as well as the nation code, which is defined in Section 6.5, “The nations table” (p. 22). pycbc.py regions_table = schema.Table('regions', meta, schema.Column('reg_code', types.CHAR(2), primary_key=True), schema.Column('reg_nation', types.CHAR(3), schema.ForeignKey(nations_table.c.nation_code)), schema.Column('reg_name', types.VARCHAR(30))) class Region(object): def __init__(self, reg_nation, reg_code, reg_name): self.reg_nation = reg_nation self.reg_code = reg_code self.reg_name = reg_name def __repr__(self): return ( "<Region(%s(%s)%s)>" % (self.reg_code, self.reg_nation, self.reg_name) ) 6.7. The physios table This table defines the codes for physiographic strata. pycbc.py physios_table = schema.Table('physios', meta, schema.Column('physio_code', types.CHAR(2), primary_key=True), schema.Column('physio_name', types.VARCHAR(30))) class Physio(object): def __init__(self, physio_code, physio_name): self.physio_code = physio_code self.physio_name = physio_name def __repr__(self): return ( "<Physio(%s=%s)>" % (self.physio_code, self.physio_name) ) New Mexico Tech Computer Center pycbc: Python interface for the CBC database 23 6.8. The circles table This table's name will be circles_table. For attributes, see Section 4.1.3, “Attributes of the circles table” (p. 7). pycbc.py circles_table = schema.Table('circles', meta, schema.Column('lat', types.CHAR(4)), schema.Column('lon', types.CHAR(5)), schema.Column('water', types.CHAR(1)), schema.Column('odd', types.CHAR(1)), schema.Column('cir_name', types.VARCHAR(80), nullable=False), schema.PrimaryKeyConstraint('lat', 'lon')) Instances of class Circle will represent these rows in SQLAlchemy. pycbc.py class Circle(object): def __init__(self, lat, lon, water, odd, cir_name): self.lat = lat self.lon = lon self.water = water self.odd = odd self.cir_name = cir_name def __repr__(self): return ( "<Circle(%sn %sw %s)>" % (self.lat, self.lon, self.cir_name) ) def __cmp__(self, other): return cmp(self.cir_name, other.cir_name) The .fullName() function returns a string representation with the region codes filled in. Note that the region codes are available only through the .regions attribute that is added in Section 6.13, “Objectrelational mapping” (p. 28). Just the region part is available as .allRegions(), and a Unicode rendering of the full name, with degree and prime symbols, is available as .unicode(). pycbc.py def allRegions(self): return '-'.join ( [ reg.reg_code for reg in self.regions ] ) def fullName(self): return ( "%sn %sw %s: %s" % (self.lat, self.lon, self.allRegions(), self.cir_name) ) def unicode(self): return ( u"%s%s%s%sN %s%s%s%sW %s: %s" % (self.lat[:2], DEGREE, self.lat[2:], PRIME, self.lon[:3], DEGREE, self.lon[3:], PRIME, self.allRegions(), self.cir_name) ) 6.9. The cir_reg table This table is the intermediate table in the many-to-many relation between circles and regions. 24 pycbc: Python interface for the CBC database New Mexico Tech Computer Center pycbc.py cir_reg_table = schema.Table('cir_reg', meta, schema.Column('lat', types.CHAR(4)), schema.Column('lon', types.CHAR(5)), schema.Column('reg_pos', types.SMALLINT, nullable=False), schema.Column('reg_code', types.CHAR(2), schema.ForeignKey("regions.reg_code", name="cir_reg_reg_x")), schema.PrimaryKeyConstraint('lat', 'lon', 'reg_pos'), schema.ForeignKeyConstraint(('lat', 'lon'), ('circles.lat', 'circles.lon'))) class CirReg(object): def __init__(self, lat, lon, reg_pos, reg_code): self.lat = lat self.lon = lon self.reg_pos = reg_pos self.reg_code = reg_code def __repr__(self): return ( "<CirReg(%sn %sw [%d] %s)>" % (self.lat, self.lon, self.reg_pos, self.reg_code) ) 6.10. The cir_physio table This table is the intermediate table in the many-to-many relation between circles and physios. pycbc.py cir_physio_table = schema.Table('cir_physio', meta, schema.Column('lat', types.CHAR(4)), schema.Column('lon', types.CHAR(5)), schema.Column('physio_pos', types.SMALLINT, nullable=False), schema.Column('physio_code', types.CHAR(2), schema.ForeignKey('physios.physio_code')), schema.PrimaryKeyConstraint('lat', 'lon', 'physio_code'), schema.ForeignKeyConstraint(('lat', 'lon'), ('circles.lat', 'circles.lon'))) class CirPhysio(object): def __init__(self, lat, lon, physio_pos, physio_code): self.lat = lat self.lon = lon self.physio_pos = physio_pos self.physio_code = physio_code def __repr__(self): return ( "<CirPhysio(%sn %sw[%d]%s)>" % (self.lat, self.lon, self.physio_pos, self.physio_code) ) New Mexico Tech Computer Center pycbc: Python interface for the CBC database 25 6.11. The efforts table Each row in this table represents a published count for a specific circle in a specific year number. The lat and lon columns have a foreign-key relation to the circles table. Many of the columns will have null values, especially in the early years. However, the year number (year_no), date (yyyymmdd), and number of observers (n_obs) are always present. • The measures of effort are variously prefixed with “ph_” for party-hours, “h_” for hours, “pm_” for party-miles, and “m_” for miles. • Column suffixes are “tot” for total, “foot” for observers on foot, “car” for observers in vehicles, “other” for observers not on foot or in vehicles; “fd” for feeder watchers; and “owl” for nocturnal birding. • Quantities in hours or miles used fixed-precision representations with one digit to the right of the decimal point. pycbc.py efforts_table = schema.Table('efforts', meta, schema.Column('lat', types.CHAR(4)), schema.Column('lon', types.CHAR(5)), schema.Column('year_no', types.CHAR(3), nullable=False), schema.Column('year_key', types.CHAR(5), nullable=False), schema.Column('yyyymmdd', types.DATE, nullable=False), schema.Column('as_lat', types.CHAR(4)), schema.Column('as_lon', types.CHAR(5)), schema.Column('as_name', types.VARCHAR(80)), schema.Column('n_obs', types.SMALLINT, nullable=False), schema.Column('ph_tot', types.NUMERIC(6,2)), schema.Column('ph_foot', types.NUMERIC(6,2)), schema.Column('ph_car', types.NUMERIC(6,2)), schema.Column('ph_other', types.NUMERIC(6,2)), schema.Column('h_fd', types.NUMERIC(6,2)), schema.Column('h_owl', types.NUMERIC(6,2)), schema.Column('pm_tot', types.NUMERIC(6,2)), schema.Column('pm_foot', types.NUMERIC(6,2)), schema.Column('pm_car', types.NUMERIC(6,2)), schema.Column('pm_other', types.NUMERIC(6,2)), schema.Column('m_owl', types.NUMERIC(6,2)), schema.PrimaryKeyConstraint('lat', 'lon', 'year_no', 'year_key'), schema.ForeignKeyConstraint(('lat', 'lon'), ('circles.lat', 'circles.lon'))) schema.Index('eff_key_x', efforts_table.c.year_no, efforts_table.c.year_key) class Effort(object): def __init__(self, lat, lon, year_no, year_key, yyyymmdd, as_lat, as_lon, as_name, n_obs, ph_tot=None, ph_foot=None, ph_car=None, ph_other=None, h_fd=None, h_owl=None, pm_tot=None, pm_foot=None, pm_car=None, pm_other=None, m_owl=None): self.lat = lat 26 pycbc: Python interface for the CBC database New Mexico Tech Computer Center self.lon = lon self.year_no = year_no self.year_key = year_key self.as_lat = as_lat self.as_lon = as_lon self.as_name = as_name self.yyyymmdd = yyyymmdd self.n_obs = n_obs self.ph_tot = ph_tot self.ph_foot = ph_foot self.ph_car = ph_car self.ph_other = ph_other self.h_fd = h_fd self.h_owl = h_owl self.pm_tot = pm_tot self.pm_foot = pm_foot self.pm_car = pm_car self.pm_other = pm_other self.m_owl = m_owl def __repr__(self): return ( "<Effort(%sn %sw[%s-%s])>" % (self.lat, self.lon, self.year_no, self.year_key) ) 6.12. The censuses table Represents a number of birds of the same kind seen in a circle-year. pycbc.py censuses_table = schema.Table('censuses', meta, schema.Column('lat', types.CHAR(4)), schema.Column('lon', types.CHAR(5)), schema.Column('year_no', types.CHAR(3), nullable=False), schema.Column('year_key', types.CHAR(5), nullable=False), schema.Column('seq_no', types.SMALLINT, nullable=False), schema.Column('form', types.CHAR(6), nullable=False), schema.Column('rel', types.CHAR(1)), schema.Column('alt_form', types.CHAR(6)), schema.Column('age', types.CHAR(1)), schema.Column('sex', types.CHAR(1)), schema.Column('plus', types.CHAR(1)), schema.Column('q', types.CHAR(1)), schema.Column('census', types.INTEGER, nullable=False), schema.PrimaryKeyConstraint('lat', 'lon', 'year_no', 'year_key', 'seq_no'), schema.ForeignKeyConstraint( ('lat', 'lon', 'year_no', 'year_key'), ('efforts.lat', 'efforts.lon', 'efforts.year_no', 'efforts.year_key'), name='cen_eff_x')) schema.Index('cen_form_x', censuses_table.c.form, censuses_table.c.rel, censuses_table.c.alt_form) New Mexico Tech Computer Center pycbc: Python interface for the CBC database 27 class Census(object): def __init__(self, lat, lon, year_no, year_key, seq_no, form, rel, alt_form, age, sex, plus, q, census): self.lat = lat self.lon = lon self.year_no = year_no self.year_key = year_key self.seq_no = seq_no self.form = form self.rel = rel self.alt_form = alt_form self.age = age self.sex = sex self.plus = plus self.q = q self.census = census def __repr__(self): formKey = ( self.form + (self.rel or "") + (self.alt_form or "") ) suffixes = ((self.age or "") + (self.sex or "") + (self.plus or "") + (self.q or "")) return ( "<Census(%sn %sw[%s-%s.%s] %s%d%s)>" % (self.lat, self.lon, self.year_no, self.year_key, self.seq_no, formKey, self.census, suffixes) ) 6.13. Object-relational mapping This section of the pycbc.py file describes the mapping between objects and database tables, and the relations between the tables. A number of new attributes are added to the mapped classes in this section: Table Attribute Function nations .regions Iterates over the Region instances for this nation. regions .nation The related Nation instance for this region. .circles Iterates over the Circle instances that are all or partly within this region. .cir_regs Iterates over the CirReg instances that are related to this region. This attribute is necessary because the CirReg instance has a column reg_pos that is not found in either the circles or region table. .region The related Region instance for this row. .circle The related Circle instance for this row. .circles Iterates over the circles that are related to this stratum. .cir_physios Iterates over the CirPhysio instances for this stratum. .circle The related Circle. .physio The related Physio. cir_reg physios cir_physio 28 pycbc: Python interface for the CBC database New Mexico Tech Computer Center Table Attribute Function circles .regions Iterates over the Region instances for this circle. .cir_regs Iterates over the CirReg instances for this circle. .physios Iterates over the Physio instances for this circle. .cir_physios Iterates over the CirPhysio instances for this circle. .efforts Iterates over the Effort instances for this circle. The first table to be mapped is the nations table, which has a one-to-many relation with the regions table. pycbc.py #================================================================ # Mapper configuration #---------------------------------------------------------------orm.mapper(Nation, nations_table, properties={ 'regions': orm.relation(Region, backref='nation')}) The regions table has a many-to-many relation to the circles table, through the secondary table cir_reg. pycbc.py orm.mapper(Region, regions_table, properties={ 'cir_regs': orm.relation(CirReg, backref='region'), 'circles': orm.relation(Circle, secondary=cir_reg_table, backref='regions')}) orm.mapper(CirReg, cir_reg_table) The physios table has a many-to-many relation with the circles table, through the secondary table cir_physio. pycbc.py orm.mapper(Physio, physios_table, properties={ 'circles': orm.relation(Circle, secondary=cir_physio_table, backref='physios'), 'cir_physios': orm.relation(CirPhysio, backref='physio')}) orm.mapper(CirPhysio, cir_physio_table) The circles table has a one-to-many relation with the efforts table. (Its two many-to-many relations were set up above.) pycbc.py orm.mapper(Circle, circles_table, properties={ 'cir_regs': orm.relation(CirReg, backref='circle'), 'cir_physios': orm.relation(CirPhysio, backref='circle'), 'efforts': orm.relation(Effort, backref='circle')}) The one-to-many relation between the effort table and the censuses table is mapped here. pycbc.py orm.mapper(Effort, efforts_table, properties={ New Mexico Tech Computer Center pycbc: Python interface for the CBC database 29 'censuses': orm.relation(Census, backref='effort')}) orm.mapper(Census, censuses_table) 6.14. CBCData.__init__(): Constructor Here is the constructor for the CBCData class, which establishes a connection to the Postgresql engine. pycbc.py # - - C B C D a t a . _ _ i n i t _ _ def __init__ ( self, password ): '''Constructor: connect to the database. ''' #-- 1 -# [ password is a string -> # if the Postgresql server is available -> # self.engine := an sqlalchemy.engine.Engine # instance connected to that engine # CBCData.meta := CBCData.meta bound to that engine # else -> raise an sqlalchemy.exc.SQLAlchemyError ] url = ( URL_FORMAT % (PROTOCOL, DB_USER, password, DB_HOST, DB_NAME) ) self.engine = engine.create_engine ( url ) CBCData.meta.bind = self.engine Then we create the Session constructor. The autoflush=True option forces a flush of operations to the database on a commit. The autocommit=False means no commit is done after a flush. The expire_on_commit=True causes cached values to be invalidated after an update so that subsequent operations go out to the database. pycbc.py #-- 2 -# [ session := a class constructor that creates a # new session using self.engine # s := an instance of that class ] self.Session = orm.sessionmaker(bind=self.engine, autoflush=True, autocommit=False, expire_on_commit=True ) self.s = self.Session() 6.15. CBCData.genNations() pycbc.py # - - - C B C D a t a . g e n N a t i o n s def genNations ( self ): '''Generate the nations in self, ascending by name. ''' #-- 1 -for row in self.s.query(self.Nation).order_by( self.Nation.nation_name): yield row 30 pycbc: Python interface for the CBC database New Mexico Tech Computer Center 6.16. CBCData.getNation() pycbc.py # - - - C B C D a t a . g e t N a t i o n def getNation(self, nationCode): '''Look up a nation_code. ''' try: result = self.s.query(self.Nation).one() return result except exc.SQLAlchemyError, detail: raise KeyError("No such nation code, '%s': %s" % (nationCode, detail)) FIXME: The current SQLAlchemy exception raised in this case is exc.NoResultFound. However, on infohost, an older version is there that does not have that exception, so we use the more generic exc.SQLAlchemyError, which is less specific. Fix this also in Section 6.18, “CBCData.getRegion()” (p. 31) and Section 6.20, “CBCData.getPhysio()” (p. 32). 6.17. CBCData.genRegions() pycbc.py # - - - C B C D a t a . g e n R e g i o n s def genRegions ( self ): '''Generate the physiographic strata in self, ascending by code. ''' #-- 1 -for row in self.s.query(self.Region): yield row 6.18. CBCData.getRegion() pycbc.py # - - - C B C D a t a . g e t R e g i o n def getRegion(self, regionCode): '''Look up a region code. ''' try: result = self.s.query(self.Region).filter_by( reg_code=regionCode).one() return result except exc.SQLAlchemyError, detail: raise KeyError("No such region code, '%s': %s" % (regionCode, detail)) New Mexico Tech Computer Center pycbc: Python interface for the CBC database 31 6.19. CBCData.genPhysios() pycbc.py # - - - C B C D a t a . g e n P h y s i o s def genPhysios ( self ): '''Generate the physiographic strata in self, ascending by code. ''' #-- 1 -for row in self.s.query(self.Physio): yield row 6.20. CBCData.getPhysio() pycbc.py # - - - C B C D a t a . g e t P h y s i o def getPhysio(self, physioCode): '''Look up a physiographic stratum code. ''' try: result = self.s.query(self.Physio).filter_by( physio_code=physioCode).one() return result except exc.SQLAlchemyError, detail: raise KeyError("No such physio code, '%s': %s" % (physioCode, detail)) 6.21. CBCData.getRegionCircle() pycbc.py # - - - C B C D a t a . g e t R e g i o n C i r c l e def getRegionCircle(self, regionCode, cirName): '''Get the circle in the given region with the given name. ''' circleList = [ circle for circle in self.genCirclesByName(cirName) ] for circle in circleList: regionList = [ region for region in circle.regions ] for region in regionList: if region.reg_code == regionCode: return circle raise KeyError("No circle named '%s' in region '%s'." % (cirName, regionCode)) 32 pycbc: Python interface for the CBC database New Mexico Tech Computer Center 6.22. CBCData.genCircles() pycbc.py # - - - C B C D a t a . g e n C i r c l e s def genCircles(self): '''Generate all the circles. ''' for row in self.s.query(self.Circle): yield row 6.23. CBCData.genCirclesByName() pycbc.py # - - - C B C D a t a . g e n C i r c l e s B y N a m e def genCirclesByName(self, prefix): '''Generate circles whose names starts with (prefix). ''' q = self.s.query(self.Circle).filter( self.Circle.cir_name.like("%s%%" % prefix)) for row in q: yield row 6.24. CBCData.genRegionCircles() pycbc.py # - - - C B C D a t a . g e n R e g i o n C i r c l e s def genRegionCircles(self, regionCode): '''Generate circles that use the given regionCode ''' This query requires that we refer to columns in a related table, cir_regs, and filter out circles that are not related to the given regionCode. The .join() method adds the columns from the cir_reg table, and the .filter_by() method includes only those rows in the joined table whose reg_code column equals regionCode. The .order_by() method sorts the resulting rows by circle name. pycbc.py q = ( self.s.query(self.Circle) .join(self.Circle.cir_regs) .filter_by(reg_code=regionCode) .order_by('cir_name') ) for circle in q: yield circle 6.25. CBCData.genPrimaryRegionCircles() pycbc.py # - - - C B C D a t a . g e n P r i m a r y R e g i o n C i r c l e s def genPrimaryRegionCircles(self, regionCode): New Mexico Tech Computer Center pycbc: Python interface for the CBC database 33 '''Generate circles that have the given regionCode first. ''' This method is only slightly different from Section 6.25, “CBCData.genPrimaryRegionCircles()” (p. 33): it finds only those circles for which the given regionCode is the first listed region. For example, if regionCode is "KY", will find circle “KY-TN-VA: Cumberland Gap”, but not “IN-ILKY: Posey County”. In database terms, we need to know if the cir_reg row that relates a region to a circle has a reg_pos (region position) value 0. For example, there are two records for “IN-KY: Evansville”. The first has reg_pos==0 and reg_code=="IN"; the second has reg_pos==1 and reg_code=="KY". pycbc.py q = ( self.s.query(self.Circle) .join(self.Circle.cir_regs) .filter_by(reg_code=regionCode) .filter_by(reg_pos=0) .order_by('cir_name') ) for circle in q: yield circle 6.26. CBCData.genCirclesByPhysio() pycbc.py # - - - C B C D a t a . g e n C i r c l e s B y P h y s i o def genCirclesByPhysio(self, physioCode): '''Find all circles that contain a given stratum. ''' q = self.s.query(self.CirPhysio).filter_by( physio_code=physioCode) for cirPhysio in q: yield cirPhysio.circle 6.27. CBCData.getCircle(): Retrieve a specific circle pycbc.py # - - - C B C D a t a . g e t C i r c l e def getCircle(self, lat, lon): '''Retrieve a specific circle row. ''' row = self.s.query(self.Circle).get((lat, lon)) if row is None: raise KeyError("Unknown circle center: %sn %sw" % (lat, lon)) else: return row 34 pycbc: Python interface for the CBC database New Mexico Tech Computer Center 6.28. CBCData.genEfforts() pycbc.py # - - - C B C D a t a . g e n E f f o r t s def genEfforts(self): '''Generate all the effort records in primary key order. ''' q = self.s.query(self.Effort) for eff in q: yield eff 6.29. CBCData.getEffort(): Retrieve a specific effort record pycbc.py # - - - C B C D a t a . g e t E f f o r t def getEffort(self, year_no, year_key): '''Retrieve one effort record. ''' try: row = (self.s.query(self.Effort) .filter_by(year_no=year_no, year_key=year_key) .one()) except exc.NoResultFound: raise KeyError("Unknown effort: %s-%s" % (year_no, year_key)) except exc.MultipleResultsFound: raise KeyError("Not unique: %s-%s" % (year_no, year_key)) return row 6.30. CBCData.overlappers(): Find overlapping circles pycbc.py # - - - C B C D a t a . o v e r l a p p e r s def overlappers(self, fromCircle): '''Find circles that overlap fromCircle. [ fromCircle is a Circle instance -> return a list of tuples (pct,c) representing all the circles in self that overlap fromCircle, such that c is the overlapping Circle instance and pct is the percentage of their areas that overlap in the interval (0.0, 100.0) ] ''' For performance reasons, we would like to avoid comparing the fromCircle with every circle in the database. To reduce the number of candidates, we will start by considering how big a difference in latitude or longitude is necessary to guarantee no overlap, that is, how many minutes of latitude or longitude are guaranteed to be greater than 15 miles at any location on the globe. A given difference in degrees of latitude is always the same distance in miles. However, for a given difference in degrees of longitude, the distance in miles is largest at the equator. The author has written New Mexico Tech Computer Center pycbc: Python interface for the CBC database 35 a package to perform mapping calculations; see A Python mapping package28 for particulars. Here is a conversational session using this package. Point zero is the intersection of the 0°meridian and the equator. Point e15 is fifteen miles east along the equator, and point n15 is fifteen miles north along the meridian. The .offsetFeet() method produces the new location along a given bearing (the first argument, in radians), and the second argument is the distance along that bearing in feet. >>> from math import pi >>> from terrapos import * >>> zero=LatLon(0.0, 0.0) >>> fifteen=5280.0 * 15 # Fifteen miles in feet >>> e15=zero.offsetFeet(pi/2, fifteen) # Fifteen miles east >>> print e15.lonDeg*60 # Longitude in minutes 13.025057227 >>> n15=zero.offsetFeet(0, fifteen) # Fifteen miles north >>> print n15.latDeg*60 # Latitude in minutes 13.025057227 Hence, we can guarantee that any circle whose latitude or longitude is 14 or more minutes away cannot overlap fromCircle. This value is defined in Section 6.3.26, “OVERLAP_MINUTES” (p. 22). To convert this condition into a range test of latitude and longitude in our database query, we must find the circles whose latitude and longitude are within range. See Section 6.31, “CBCData.degMinAdd(): Lat/long arithmetic” (p. 37) for the method that does arithmetic on latitudes and longitudes in character form, which always returns the new quantity as “dddmm”. pycbc.py #-- 1 -# [ loLat := fromCircle.lat minus OVERLAP_MINUTES, as "ddmm" # loLon := fromCircle.lon minus OVERLAP_MINUTES, as "dddmm" # hiLat := fromCircle.lat plus OVERLAP_MINUTES, as "ddmm" # hiLon := fromCircle.lon plus OVERLAP_MINUTES, as "dddmm" ] loLat = self.degMinAdd(fromCircle.lat, -OVERLAP_MINUTES)[1:] loLon = self.degMinAdd(fromCircle.lon, -OVERLAP_MINUTES) hiLat = self.degMinAdd(fromCircle.lat, OVERLAP_MINUTES)[1:] hiLon = self.degMinAdd(fromCircle.lon, OVERLAP_MINUTES) Now we can do a query on the circle table and constrain it as a range of latitudes and longitudes. pycbc.py #-- 2 -# [ candidates := Circle instances in self whose lat and lon # are in the range [loLatLon, hiLatLon] # result := a new, empty list ] candidates = ( self.s.query(self.Circle) .filter(self.Circle.lat >= loLat) .filter(self.Circle.lon >= loLon) .filter(self.Circle.lat <= hiLat) .filter(self.Circle.lon <= hiLon) .all() ) result = [] Next we add to result tuples (pct, c) for circles that actually overlap. This check is performed in Section 6.32, “CBCData.overlapCheck(): Do these circles overlap?” (p. 38). 28 http://www.nmt.edu/~john/tcc/python/mapping/py-mapping.pdf 36 pycbc: Python interface for the CBC database New Mexico Tech Computer Center pycbc.py #-- 3 -# [ result +:= tuples (pct, c) for circles in candidates # that overlap fromCircle to any nonzero degree ] for toCircle in candidates: #-- 3 body -# [ if toCircle and fromCircle are distinct circles that # overlap to any nonzero degree -> # result +:= a tuple (pct, toCircle) where pct is the # percentage area overlap # else -> I ] if fromCircle is not toCircle: pct = self.overlapCheck(fromCircle, toCircle) if pct > 0.0: result.append ( (pct, toCircle) ) Finally, sort the list in descending order by percent overlap, and return it. pycbc.py #-- 4 -result.sort() result.reverse() return result 6.31. CBCData.degMinAdd(): Lat/long arithmetic pycbc.py # - - - C B C D a t a . d e g M i n A d d def degMinAdd(self, ddmm, plusMinutes): '''Arithmetic on 'ddmm' and 'dddmm' quantities. [ (ddmm is a string of four or five digits such that the last two are minutes and the rest are degrees) and (plusMinutes is an int) -> return a string of the form "dddmm" representing ddmm + plusMinutes as degrees and minutes ''' This function is used by Section 6.30, “CBCData.overlappers(): Find overlapping circles” (p. 35) to compute the range of latitudes and longitudes that are within a given distance of a circle's center. pycbc.py #-- 1 -# [ dd := degrees part of ddmm as an int # mm := minutes part of ddmm as an int ] dd = int(ddmm[:-2]) mm = int(ddmm[-2:]) #-- 2 -# [ minutes := sum of dd degrees, mm minutes, and plusMinutes ] minutes = dd*60 + mm + plusMinutes #-- 3 -- New Mexico Tech Computer Center pycbc: Python interface for the CBC database 37 # [ return minutes as degrees and minutes in "dddmm" form ] ddNew, mmNew = divmod(minutes, 60) return "%03d%02d" % (ddNew, mmNew) 6.32. CBCData.overlapCheck(): Do these circles overlap? pycbc.py # - - - C B C D a t a . o v e r l a p C h e c k def overlapCheck ( self, fromCircle, toCircle): '''Do these circles overlap? [ fromCircle and toCircle are Circle instances -> if the circles overlap -> return the percentage of area that they overlap in (0.0, 100.0) else -> return 0.0 ] ''' This will require a bit of applied geometry. We don't care about the diameter of the circles, just the degree to which they overlap, so we'll use unit circles of radius 1. Here is a picture of two overlapping circles. F C E D G In this figure, C and D are the centers of the two circles of radius 1. If the length of CD is 2 or greater, there is no overlap. If there is overlap, the area of overlap is twice the area of the shaded portion of the figure. This area is called a segment of a circle, meaning the area bounded by a chord and the circle's perimeter. The CRC Standard Mathematical Tables gives this formula for the area of the segment subtended by a given angle θ. (1) A= 1 2 R (θ − sin θ) 2 Here R is 1 by definition, so this formula simplifies to: (2) A= 1 (θ − sin θ) 2 The angle θ is angle FCG in the figure, which is twice angle FCE. We know that length CF is 1 because it is the radius of a unit circle. We also know that length CE is exactly half of length CD. By simple trig, the cosine of angle FCE is the adjacent side (CE) divided by the hypotenuse, which is 1. So, if S is the separation between the circles in diameters, θ is given by: 38 pycbc: Python interface for the CBC database New Mexico Tech Computer Center (3) θ = 2 cos S Now, on to the code. First, find the separation between the two circles in terms of the standard circle diameter. This is handled in Section 6.33, “CBCData.__circleSep(): Compute the separation of two circles” (p. 39). The result is expressed in diameters, so a result of 1.0 or greater means no overlap. pycbc.py #-- 1 -# [ sep := separation between fromCircle and toCircle # as a fraction of CIRCLE_DIAMETER ] sep = self.__circleSep ( fromCircle, toCircle ) #-- 2 -if sep >= 1.0: return 0.0 The value of sep is the S in the formula above, so now we can compute θ, then the area of the segment. The area of the overlap is twice the area of the segment, which we then convert to a percentage by multiplying by 100. pycbc.py #-- 3 -# [ theta := angle subtending the segment where two # unit circles overlap if their separation is # (sep) diameters ] theta = 2.0 * math.acos(sep) #-- 4 -# [ overlapArea := twice the area of the segment of a # unit circle subtended by angle theta ] overlapArea = theta - math.sin(theta) Finally, the percentage of overlap is computed as the area of the overlap, divided by the area of a unit circle, which is exactly π, and convert to a percentage by multiplying by 100. pycbc.py #-- 5 -return 100.0 * overlapArea / math.pi 6.33. CBCData.__circleSep(): Compute the separation of two circles pycbc.py # - - - C B C D a t a . _ _ c i r c l e S e p def __circleSep ( self, fromCircle, toCircle): '''How many circle diameters separate these circles? [ fromCircle and toCircle are Circle instances -> return the distance between their centers as a fraction of CIRCLE_DIAMETER ] ''' For the interface to the author's mapping package, refer to Section 6.2, “Imports” (p. 18). The terrapos.LatLon() constructor accepts two arguments, each of which is a tuple (D, M) where D is degrees New Mexico Tech Computer Center pycbc: Python interface for the CBC database 39 and M is minutes. For the conversion from string coordinates to positions in the terrapos package, see Section 6.34, “CBCData.__terraCircle(): Convert a circle center to a terrestrial position” (p. 40). pycbc.py #-- 1 -# [ fromPos := fromCircle's center as a terrapos.LatLon # toPos := toCircle's center as a terrapos.LatLon ] fromPos = self.__terraCircle ( fromCircle ) toPos = self.__terraCircle ( toCircle ) In the terrapos package, the method LatLon.crowFeet() gives the distance in feet between two positions. We divide that by the number of feet in a circle diameter to normalize the value to diameters. pycbc.py #-- 2 -# [ sep := (distance between fromPos and toPos in miles) / # CIRCLE_DIAMETER ] sep = (fromPos.crowFeet(toPos) / FEET_PER_MILE)/CIRCLE_DIAMETER #-- 3 -return sep 6.34. CBCData.__terraCircle(): Convert a circle center to a terrestrial position pycbc.py # - - - C B C D a t a . _ _ t e r r a C i r c l e def __terraCircle(self, circle): '''Convert a circle center to a terrestrial position. [ circle is a Circle instance -> return circle's center as a terrapos.LatLon instance ] ''' latDeg = int(circle.lat[:2]) latMin = int(circle.lat[2:]) lonDeg = int(circle.lon[:3]) lonMin = int(circle.lon[3:]) return terrapos.LatLon ( (latDeg, latMin), (lonDeg, lonMin) ) 7. The staticloader script: Populate the static tables Warning This script will destroy the entire database and rebuild it. Exercise caution with live databases! This standalone script starts out by dropping all the tables in the Postgresql database and recreating them according to the new schema. Then it loads up the nations, regions, and physios tables from the static files displayed in Section 10, “Static data files” (p. 61). 40 pycbc: Python interface for the CBC database New Mexico Tech Computer Center Warning The current design assumes that these tables are essentially static. However, keep in mind that one province, Nunavut, was added as recently as 1999. Once the rest of the database has been loaded, this script cannot be rerun: dropping the nations and regions tables would break foreign key constraints. If a nation, region or physiographic stratum must be added or changed, it will be necessary to write either a quick one-off script to do that, or perhaps write a GUI application to maintain these tables. 7.1. staticloader: Prologue The script starts off with the usual line to make it self-executing under Linux. The sys module is imported for input and output, and the pycbc module to connect to the database. staticloader #!/usr/bin/env python #================================================================ # staticloader: Load the 'nations' and 'regions' tables. # For documentation, see: # http://www.nmt.edu/~shipman/z/cbc/pycbc/ #---------------------------------------------------------------#================================================================ # Imports #---------------------------------------------------------------from timer import Timer t0 = Timer('Imports') import sys import pycbc print t0 Constants include the name of the file where the password lives, and the name of the data files for the nations and regions. staticloader #================================================================ # Manifest constants #---------------------------------------------------------------PASS_FILE = 'pspass' NATIONS_FILE = 'nationlist' REGIONS_FILE = 'regionlist' PHYSIOS_FILE = 'physiolist' 7.2. staticloader: main() staticloader # - - - m a i n def main(): '''Main program. New Mexico Tech Computer Center pycbc: Python interface for the CBC database 41 [ (the Postgresql server is available) and (PASS_FILE names a readable file containing the database password for the CBC database) and (NATIONS_FILE names a readable, valid data file for the nations table) and (REGIONS_FILE names a readable, valid data file for the regions table whose nation codes are all defined in NATIONS_FILE) -> that database := that database with the nations table and regions table dropped and recreated with data from NATIONS_FILE and REGIONS_FILE, respectively ] ''' The main program starts by connecting to the database. The password is stored in a file readable only by the author, so that it does not appear here. staticloader #-- 1 -# [ (the Postgresql server is available) and # (PASS_FILE names a readable file containing the database # password for the CBC database) -> # db := a pycbc.py.CBCData instance connected # to that database ] t0 = Timer('Connecting') passFile = file ( PASS_FILE ) password = passFile.readline().strip() passFile.close() db = pycbc.CBCData(password) print t0 Next we drop all the tables and recreate them according to the schema. staticloader #-- 2 -# [ db := db with all tables dropped ] t1 = Timer('Dropping and recreating tables') db.meta.drop_all(checkfirst=True) #-- 3 -# [ db := db with all tables created according to db.meta ] db.meta.create_all() print t1 Loading of the nations file is handled in Section 7.3, “staticloader: loadNations()” (p. 43); for the regions file, see Section 7.5, “staticloader: loadRegions()” (p. 44). staticloader #-- 4 -# [ db := db with the nations table populated from the # file named by NATIONS_FILE ] t2 = Timer('Loading static tables') loadNations ( db ) #-- 5 -# [ db := 42 db with the regions table populated from the file pycbc: Python interface for the CBC database New Mexico Tech Computer Center # named by REGIONS_FILE ] loadRegions ( db ) #-- 6 -# [ db := db with the physios table populated from the file # named by PHYSIOS_FILE ] loadPhysios ( db) print t2 To check that the database was properly loaded, Section 7.9, “staticloader: check()” (p. 46) prints a report showing all the nations and regions. staticloader #-- 7 -# [ sys.stdout # check ( db ) +:= report showing nation and region tables from db ] 7.3. staticloader: loadNations() This function handles loading of the nations table. It is pretty straightforward: it opens the input file, converts each line of that file into a Nation object, and adds it to the database. staticloader # - - - l o a d N a t i o n s def loadNations ( db ): '''Load the nations table. [ (db is a CBCData instance) and (NATIONS_FILE names a readable, valid data file for the nations table) -> db := db with the nations table populated from the file named by NATIONS_FILE ] ''' #-- 1 -# [ inFile := a readable file for NATIONS_FILE ] inFile = file ( NATIONS_FILE ) #-- 2 -# [ db.s +:= new nations rows made from the lines of inFile ] for rawLine in inFile: #-- 2 body -# [ rawLine is a valid nations file line -> # db.s +:= a new nations row made from rawLine ] addNation ( db, rawLine ) #-- 3 -# [ db := db with the transaction in db.s committed ] db.s.commit() New Mexico Tech Computer Center pycbc: Python interface for the CBC database 43 7.4. staticloader: addNation staticloader # - - - a d d N a t i o n def addNation ( db, rawLine ): '''Add one row to the nations table. [ (db is a CBCData instance) and (rawLine is a valid nations file line) -> db.s +:= a new nations row made from rawLine ] Line format: 0 1 2 3 0123456789012345678901234567890 CAN Canada ''' #-- 1 -# [ nation_code := the nation code field from rawLine # nation_name := the nation name field from rawLine ] nation_code = rawLine[:3] nation_name = unicode(rawLine[4:].strip()) #-- 2 -# [ db.s +:= a new Nation row added using nation_code and # nation_name ] db.s.add ( db.Nation ( nation_code, nation_name) ) 7.5. staticloader: loadRegions() staticloader # - - - l o a d R e g i o n s def loadRegions ( db ): '''Load the regions table. [ (db is a CBCData instance) and (REGIONS_FILE names a readable, valid data file for the regions table) -> db := db with the regions table populated from the file named by REGIONS_FILE ] ''' Quite similar to Section 7.3, “staticloader: loadNations()” (p. 43). staticloader #-- 1 -# [ inFile := a readable file for REGIONS_FILE ] inFile = file ( REGIONS_FILE ) #-- 2 -# [ db.s +:= new regions rows made from the lines of inFile ] for rawLine in inFile: #-- 2 body -- 44 pycbc: Python interface for the CBC database New Mexico Tech Computer Center # [ rawLine is a valid regions file line -> # db.s +:= a new regions row made from rawLine ] addRegion ( db, rawLine ) #-- 3 -# [ db := db with the transaction in db.s committed ] db.s.commit() inFile.close() 7.6. staticloader: addRegion() Similar to Section 7.4, “staticloader: addNation” (p. 44). staticloader # - - - a d d R e g i o n def addRegion ( db, rawLine ): '''Add one row to the regions table. [ (db is a CBCData instance) and (rawLine is a valid regions file line) -> db.s +:= a new regions row made from rawLine ] Line format: 0 1 2 3 0123456789012345678901234567890 CAN NT North West Territories ''' #-- 1 -# [ reg_nation := the nation code field from rawLine # reg_code := the region code field from rawLine # reg_name := the region name field from rawLine ] reg_nation = rawLine[:3] reg_code = rawLine[4:6] reg_name = unicode(rawLine[7:].strip()) #-- 2 -# [ db.s +:= a new Region row added using region_code # and region_name ] db.s.add ( db.Region ( reg_nation, reg_code, reg_name) ) 7.7. staticloader: loadPhysios() staticloader # - - - l o a d P h y s i o s def loadPhysios ( db ): '''Load the table of physiographic strata. [ (db is a CBCData instance) and (PHYSIOS_FILE names a readable, valid data file for the physios table) -> New Mexico Tech Computer Center pycbc: Python interface for the CBC database 45 db := db with the physios table populated from that file ] ''' #-- 1 -# [ inFile := a readable file for PHYSIOS_FILE ] inFile = file ( PHYSIOS_FILE ) #-- 2 -# [ db.s +:= new physios rows made from the lines of inFile ] for rawLine in inFile: #-- 2 body -# [ rawLine is a valid physios file line -> # db := db with a new physios row made from rawLine ] addPhysio ( db, rawLine ) #-- 3 -# [ db := db with the transaction in db.s committed ] db.s.commit() 7.8. staticloader: addPhysio() staticloader # - - - a d d P h y s i o def addPhysio ( db, rawLine ): '''Add one row to the physios table. [ (db is a CBCData instance) and (rawLine is a valid physios file line) -> db := db with a new physios row made from rawLine ] Line format: 0 1 2 012345678901234567890 --------------------05 Mississippi Alluvial Plain ''' #-- 1 -# [ physio_code := the physio code field from rawLine # physio_name := the physio name field from rawLine ] physio_code = rawLine[:2] physio_name = unicode(rawLine[3:].strip()) #-- 2 -# [ db.s +:= a new Physio row added using physio_code and # physio_name ] db.s.add ( db.Physio ( physio_code, physio_name ) ) 7.9. staticloader: check() staticloader # - - - 46 r e p o r t pycbc: Python interface for the CBC database New Mexico Tech Computer Center def check ( db ): '''Display regions by nation. [ sys.stdout +:= report showing nation and region tables from db ] ''' for nation in db.genNations(): print "\n=====", nation.nation_name for region in nation.regions: print " ", region.reg_code, region.reg_name print "\n===== physiographic strata" for physio in db.genPhysios(): print physio.physio_code, physio.physio_name 7.10. staticloader: Epilogue The last lines of the script initiate execution of the main program. staticloader #================================================================ # Epilogue #---------------------------------------------------------------if __name__ == "__main__": main() 8. Conversion from the old MySQL database The previous database schema has been unchanged since 1998. This database currently exists as a MySQL database, and will provide the initial values for the current (Postgresql) version. The initial setup of the Postgresql database proceeds in these steps. 1. The script described in Section 7, “The staticloader script: Populate the static tables” (p. 40) drops any existing tables, recreates the database, and then populates the nations and regions tables. 2. Section 9, “transloader: Copy over the MySQL database” (p. 55) describes the script that populates the rest of the tables from the MySQL database. 8.1. Schema of the 1998 database The old MySQL database29 had a very different structure, and was poorly normalized. In particular, there was an unnecessary level of relation in the stnd table, which mapped a key called count ID to a circle, and three other tables used the count ID to link to the stnd table, rather than directly to the circle table. The count ID is a composite of two different fields, and has two different formats depending on the route the data took to get into the database originally: 29 http://www.nmt.edu/~shipman/z/cbc/db_spec.html New Mexico Tech Computer Center pycbc: Python interface for the CBC database 47 Count year Count ID format 001–090 YYYNNNNX 091–present YYYSSKK The YYY portion is the count year, with left zero fill. The rest of this field is the “year key” discussed in Section 3.5, “Year key” (p. 5). The part of this key after the YYY part is preserved in the new database, in the year_key column of the efforts table. Here, then, is an entity-relationship model for the old database. The count ID is used as the key for all the relations shown here, except that the relation from stnd to cir uses the lat_lon column. 1 cir 1 1,n stnd 1 aspub 1 1 0,n eff 1 cen Note the two one-to-one relationships. There is one stnd row for each circle-year, and there is no compelling reason to distribute the attributes of a circle-year over two other tables (aspub and eff). So, converting the old schema to the new will lump the old stnd, aspub, and eff tables into the new efforts table. Refer to the 1998 database specification30 for a general description of the older schema. Here is a table showing the actual MySQL column names and types. Table name Column name Column type cir lat_lon CHAR(9) physio VARCHAR(4) water CHAR(1) odd CHAR(1) regions VARCHAR(6) name VARCHAR(80) lat_lon CHAR(9) count_id CHAR(8) count_id CHAR(8) as_lat_lon CHAR(9) as_regions VARCHAR(6) as_name VARCHAR(80) count_id CHAR(8) yyyymmdd CHAR(8) stnd aspub eff 30 http://www.nmt.edu/~shipman/z/cbc/db_spec.html 48 pycbc: Python interface for the CBC database New Mexico Tech Computer Center Table name cen Column name Column type n_obs INT ph_tot DECIMAL(5,1) ph_foot DECIMAL(5,1) ph_car DECIMAL(5,1) ph_o DECIMAL(5,1) h_fd DECIMAL(5,1) h_owl DECIMAL(5,1) pm_tot DECIMAL(5,1) pm_f DECIMAL(5,1) pm_c DECIMAL(5,1) pm_o DECIMAL(5,1) m_owl DECIMAL(5,1) count_id CHAR(8) seq_no CHAR(3) form CHAR(6) rel CHAR(1) alt_form CHAR(6) age CHAR(1) sex CHAR(1) plus CHAR(1) q CHAR(1) census INT 8.2. mycbc.py: Interface to the 1998 database Here we begin file mycbc.py, a module that interfaces to the 1998 MySQL database. The interface is generally similar to the new interface described in Section 5, “Using the pycbc interface” (p. 12), so the comments here will be minimal. The main difference is that the schema is determined through reflection, that is, letting SQLAlchemy probe that database for its structure. mycbc.py '''mycbc.py: Interface to the 1998-format CBC database For complete documentation: http://www.nmt.edu/~shipman/z/cbc/pycbc/ ''' #================================================================ # Imports #---------------------------------------------------------------from sqlalchemy import schema, types, orm, engine The protocol is MySQL instead of Postgresql, and the host is the TCC general database host. New Mexico Tech Computer Center pycbc: Python interface for the CBC database 49 mycbc.py #================================================================ # Manifest constants #---------------------------------------------------------------# The engine is mysql at dbhost.nmt.edu. #-URL_FORMAT = "%s://%s:%s@%s/%s" PROTOCOL = "mysql" DB_USER = "john" DB_HOST = "dbhost.nmt.edu" DB_NAME = "john" #-# Names of the tables. #-CIR_NAME = "cir" STND_NAME = "stnd" AS_PUB_NAME = "aspub" EFF_NAME = "eff" CEN_NAME = "cen" 8.3. class MyCBC: Interface to the old database In this database, all the exported attributes are the same as in Section 5, “Using the pycbc interface” (p. 12), except for the table names. Also, because it is intended only for use in one single-threaded application, the Session() class constructor is not exported. mycbc.py # - - - - - c l a s s M y C B C class MyCBC(object): '''Interface to the 1998 MySQL CBC database Exports: MyCBC(password): [ password is a string -> if password is the MySQL CBC database password -> return a new MyCBC instance giving read-write access to that database else -> return a new MyCBC instance giving read-only access to that database ] .engine: [ an sqlalchemy.engine.Engine instance connected to the database ] .meta: [ the metadata as sqlalchemy.schema.MetaData instance ] .s: [ a Session connected to self.engine ] .Cir: [ class mapped to the cir table ] .Stnd: [ class mapped to the stnd table ] .AsPub: [ class mapped to the aspub table ] 50 pycbc: Python interface for the CBC database New Mexico Tech Computer Center .Eff: [ class mapped to the eff table ] .cir_table, .stnd_table, .aspub_table, .eff_table: [ the actual Table instances for these classes ] Since the only purpose of this module is to drive the extraction of data from the old database described in Section 9, “transloader: Copy over the MySQL database” (p. 55), rather than set up table relations in the orm, we'll just define a few methods that run simple queries that generate the circle records and then dig down to retrieve all the related rows from the other tables. Note that all these retrieval methods do no error checking, on the assumption that all the foreign key constraints on the MySQL database are true. This database was built when MySQL had no foreign key constraints, but the software that loaded it insured them. mycbc.py .genCirs(): [ generate a sequence of Cir instances representing to the rows of the cir table ] .genStnds(lat_lon): [ lat_lon is a lat_lon column value -> generate a sequence of Stnd instances that use that lat_lon ] .getEff(count_id): [ count_id is a count_id column value -> return the Eff instance for that count_id ] .getAsPub(count_id): [ count_id is a count_id column value -> return the AsPub instance for that count_id ] .genCens(count_id): [ (count_id is a count_id column value) -> generate the Cen instances for count_id ] ''' Here are the definitions of the tables and mapped classes, which are all inside the MyCBC class. mycbc.py #================================================================ # Tables and mapped classes #---------------------------------------------------------------meta = schema.MetaData() class Cir(object): def __init__(self, lat_lon, physio, water, odd, regions, name ): self.lat_lon = lat_lon self.physio = physio self.water = water self.odd = odd self.regions = regions self.name = name def __repr__(self): return ( "<Cir(%s %s: %s)>" % (self.lat_lon, self.regions, self.name) ) class Stnd(object): def __init__(self, lat_lon, count_id): self.lat_lon = lat_lon self.count_id = count_id New Mexico Tech Computer Center pycbc: Python interface for the CBC database 51 def __repr__(self): return ( "<Stnd(%s=%s)>" % (self.lat_lon, self.count_id)) class AsPub(object): def __init__(self, count_id, as_lat_lon, as_regions, as_name): self.count_id = count_id self.as_lat_lon = as_lat_lon self.as_regions = as_regions self.as_name = as_name def __repr__(self): return ( "<AsPub(%s %s %s: %s)>" % (self.count_id, self.as_lat_lon, self.as_regions, self.as_name) ) class Eff(object): def __init__(self, count_id, yyyymmdd, n_obs, ph_tot, ph_foot, ph_car, ph_o, h_fd, h_owl, pm_tot, pm_f, pm_c, pm_o, m_owl): self.count_id = count_id self.yyyymmdd = yyyymmdd self.n_obs = n_obs self.ph_tot = ph_tot self.ph_foot = ph_foot self.ph_car = ph_car self.ph_o = ph_o self.h_fd = h_fd self.h_owl = h_owl self.pm_tot = pm_tot self.pm_f = pm_f self.pm_c = pm_c self.pm_o = pm_o self.m_owl = m_owl def __repr__(self): return ( "<Eff(%s %s %d)>" % (self.count_id, self.yyyymmdd, self.n_obs) ) class Cen(object): def __init__(self, count_id, seq_no, form, rel, alt_form, age, sex, plus, q, census): self.count_id = count_id self.seq_no = seq_no self.form = form self.rel = rel self.alt_form = alt_form self.age = age self.sex = sex self.plus = plus self.q = q self.census = census 52 pycbc: Python interface for the CBC database New Mexico Tech Computer Center 8.4. MyCBC.__init__() mycbc.py # - - - M y C B C . _ _ i n i t _ _ def __init__(self, password): '''Constructor for MyCBC ''' #-- 1 -# [ if MySQL database is available and accepts # password (password) -> # self.engine := an sqlalchemy.engine.Engine instance # that connects to the database with (password) ] # else -> raise sqlalchemy.exc.SQLAlchemyError ] url = ( URL_FORMAT % (PROTOCOL, DB_USER, password, DB_HOST, DB_NAME) ) self.engine = engine.create_engine(url) #-- 2 -# [ self.meta := metadata reflected from self.engine # as a schema.MetaData instance ] self.meta = schema.MetaData(bind=self.engine, reflect=True) #-- 3 -# [ self.Session := a constructor for sessions using # self.engine # self.s := an instance of that constructor ] self.Session = orm.sessionmaker ( bind=self.engine, autoflush=True, autocommit=False, expire_on_commit=True ) self.s = self.Session() The constructor for this class is generally similar to the one in Section 6.14, “CBCData.__init__(): Constructor” (p. 30). The primary difference is that the schema is obtained by reflection, so we don't need to declare the Table instances. Once the metadata is connected to the engine with the reflect=True option, we will use an undocumented feature of SQLAlchemy: the MetaData instance has an attribute .tables, which is a dictionary whose keys are the names of the reflected tables, and each related value is the corresponding Table instance. See Section 8.5, “MyCBC.__mapTable: Locate and bind a table” (p. 54). mycbc.py #-- 4 -# [ tables CIR_NAME, STND_NAME, AS_PUB_NAME, EFF_NAME, # and CEN_NAME are in self.meta -> # self.Cir := a class mapped to table CIR_NAME # self.Stnd := a class mapped to table STND_NAME # self.AsPub := a class mapped to table AS_PUB_NAME # self.Eff := a class mapped to table EFF_NAMe # self.Cen := a class mapped to table CEN_NAME ] self.__mapTable(CIR_NAME, self.Cir) self.__mapTable(STND_NAME, self.Stnd) self.__mapTable(AS_PUB_NAME, self.AsPub) self.__mapTable(EFF_NAME, self.Eff) self.__mapTable(CEN_NAME, self.Cen) New Mexico Tech Computer Center pycbc: Python interface for the CBC database 53 8.5. MyCBC.__mapTable: Locate and bind a table mycbc.py # - - - M y C B C . _ _ m a p T a b l e def __mapTable(self, tableName, className): '''Map one table class [ (tableName is a table name found in self.meta) and (className is a Class) -> orm := orm with className mapped to that table ] ''' #-- 1 -# [ if self.meta.tables has a key (tableName) -> # table := the related value # self.(tableName) := the related value # else -> # raise IOError ] try: table = self.meta.tables[tableName] setattr(self, tableName, table) except KeyError, detail: raise IOError("MySQL database does not contain a table " "named %s: %s." % (tableName, detail) ) #-- 2 -# [ orm := orm with class (classname) mapped to table ] orm.mapper(className, table) 8.6. MyCBC.genCirs(): Generate all circles The technique used in this and the remaining methods are all pretty basic use of the SQLAlchemy Query class. mycbc.py # - - - M y C B C . g e n C i r s def genCirs(self): '''Generate all circles, ascending by lat-lon. ''' #-- 1 -# [ q := a Session.Query to retrieve all rows of self.Cir ] q = self.s.query(self.Cir) #-- 2 -# [ generate the rows in q ] for row in q: yield row 54 pycbc: Python interface for the CBC database New Mexico Tech Computer Center 8.7. MyCBC.genStnds(): Generate all the circle-years for a given circle mycbc.py # - - - M y C B C . g e n S t n d s def genStnds(self, lat_lon): '''Generate all stnd rows for a given lat-lon. ''' q = self.s.query(self.Stnd).filter_by(lat_lon=lat_lon) for row in q: yield row 8.8. MyCBC.getEff(): Retrieve the eff row for a given circle-year mycbc.py # - - - M y C B C . g e t E f f def getEff(self, count_id): '''Retrieve the eff row for count_id. ''' eff = self.s.query(self.Eff).get(count_id) return eff 8.9. MyCBC.getAsPub(): Retrieve the aspub row for a circle-year mycbc.py # - - - M y C B C . g e t A s P u b def getAsPub(self, count_id): '''Retrieve the aspub row for a circle-year. ''' asPub = self.s.query(self.AsPub).get(count_id) return asPub 8.10. MyCBC.genCens(): Generate census records for one circle-year mycbc.py # - - - M y C B C . g e n C e n s def genCens(self, count_id): '''Generate the census rows for one circle-year. ''' q = self.s.query(self.Cen).filter_by(count_id=count_id) for row in q: yield row 9. transloader: Copy over the MySQL database This script is run after staticloader (see Section 7, “The staticloader script: Populate the static tables” (p. 40)) to convert the MySQL database to the new form. New Mexico Tech Computer Center pycbc: Python interface for the CBC database 55 9.1. transloader: Prologue transloader #!/usr/bin/env python # transloader: Copy MySQL CBC database to Posgresql # For documentation, see: # http://www.nmt.edu/~shipman/z/cbc/pycbc/ Input is from the mycbc module: see Section 8.2, “mycbc.py: Interface to the 1998 database” (p. 49). Output is to the pycbc module; see Section 5, “Using the pycbc interface” (p. 12). transloader #================================================================ # Imports #---------------------------------------------------------------from timer import Timer t0 = Timer('Imports') import sys import mycbc, pycbc print t0 The passwords are kept in external files readable only by the author, so the actual passwords don't need to appear here. transloader #================================================================ # Manifest constants #---------------------------------------------------------------MY_PASS = "mypass" # MySQL password file PS_PASS = "pspass" # Postgresql password file 9.2. transloader: main() transloader # - - - - - m a i n def main(): '''Main program: Copy MySQL CBC database to Postgresql. [ (MY_PASS and PS_PASS name files containing passwords) and (MySQL and Postgresql CBC databases are available) and (all Postgresql tables except regions and nations are empty) -> Postgresql CBC database := MySQL CBC database ] ''' For the function that reads passwords from a file, see Section 9.3, “transloader: readPassword()” (p. 57). transloader #-- 1 -# [ myPassword := first line of file MY_PASS, stripped # psPassword := first line of file PS_PASS, stripped ] tTotal = Timer("Entire database loaded") t0 = Timer("Connect to mysql") 56 pycbc: Python interface for the CBC database New Mexico Tech Computer Center myPassword = readPassword(MY_PASS) psPassword = readPassword(PS_PASS) #-- 2 -# [ my := a mycbc.MyCBC instance representing the MySQL # CBC database with password (myPassword) # db := a pycbc.CBCData instance representing the # Postgresql CBC database with password (psPassword) ] my = mycbc.MyCBC(myPassword) print t0 t1 = Timer("Connect to postgresql") db = pycbc.CBCData(psPassword) print t1 The main copying and reformatting logic is in Section 9.4, “transloader: dbCopy()” (p. 57). transloader #-- 3 -# [ db := db with all data added from my ] dbCopy(my, db) print tTotal 9.3. transloader: readPassword() transloader # - - - r e a d P a s s w o r d def readPassword(fileName): '''Read a password from a named file [ fileName is a string naming a readable file -> return the first line of that file, stripped ] ''' passFile = file(fileName) password = passFile.readline().strip() passFile.close() return password 9.4. transloader: dbCopy() transloader # - - - d b C o p y def dbCopy(my, db): '''Copy all the MySQL data to Postgresql. [ (my is a MyCBC instance) and (db is a CBCData instance) -> db := db with all data added from my ] ''' New Mexico Tech Computer Center pycbc: Python interface for the CBC database 57 This process is driven by the old cir file. For each circle, we add a new row to the circles table. Then, for each region, we'll add a cir_reg row relating the circle and region, and for each physiographic stratum code, add a cir_physio row relating the circle and physiographic region. There is one additional wrinkle: the original cir table has a dummy entry for lat-long “000000000”, which should not be propagated to the new database. transloader cirCount = 0 for cir in my.genCirs(): cirCount += 1 if cir.lat_lon != '000000000': copyCir(my, db, cir) 9.5. transloader: copyCir(): Copy data for one circle transloader # - - - c o p y C i r def copyCir(my, db, cir): '''Copy the data for one circle. [ (my is a MyCBC instance) and (db is a CBCData instance) and (cir is a MyCBC.Cir instance) -> db := db + (all data for cir) ] ''' To preserve all the data for one circle in the old database, we need to add rows to up to five tables: circles, cir_reg, cir_physio, efforts. The first three are built in Section 9.6, “transloader: addCircle()” (p. 59), transloader #-- 1 -# [ db.s +:= a circle row and related cir_reg and cir_physio # rows added, made from cir ] t0 = Timer('Adding circle %s' % cir.lat_lon) addCircle(my, db, cir) That takes care of all the one-per-circle items. Items related to circle-years are processed in Section 9.7, “transloader: addCircleYear()” (p. 60). transloader #-- 2 -# [ db.s +:= all circle-year data from db for cir ] for stnd in my.genStnds(cir.lat_lon): addCircleYear(my, db, stnd) So far all those added rows are in the session, db.s; now commit them. transloader #-- 3 -# [ db := db with transactions in db.s committed ] db.s.commit() print t0 sys.stdout.flush() 58 pycbc: Python interface for the CBC database New Mexico Tech Computer Center 9.6. transloader: addCircle() transloader # - - - a d d C i r c l e def addCircle(my, db, cir): '''Add rows to the circles, cir_reg, and cir_physio tables. [ (my is a MyCBC instance) and (db is a CBCData instance) and (cir is a db.Cir instance) -> db := db with a circle row and related cir_reg and cir_physio rows added, made from cir ] ''' First we assemble a db.Circle instance. For the constructor, see Section 6.8, “The circles table” (p. 24). transloader #-- 1 -# [ db +:= a db.Circle instance made from cir ] lat = cir.lat_lon[:4] lon = cir.lat_lon[4:] name = unicode(cir.name) circle = db.Circle(lat, lon, cir.water, cir.odd, name) db.s.add ( circle ) db.s.commit() Next, create cir_reg instances for each region code. The number of region codes is the length of the old database's regions field divided by two. For the constructor, see Section 6.9, “The cir_reg table” (p. 24). transloader #-- 2-# [ db +:= db.CirReg instances made from cir's regions, # if any ] nRegs = len(cir.regions.strip())/2 for regx in range(nRegs): regCode = cir.regions[regx*2:regx*2+2] cirReg = db.CirReg(lat, lon, regx, regCode) db.s.add ( cirReg ) Similarly, create cir_physio instances for the physiographic strata, if any. For the CirPhysio constructor, see Section 6.10, “The cir_physio table” (p. 25). transloader #-- 3 -# [ db.s +:= db.CirPhysio instances made from cir's # physio codes, if any ] nPhysios = len(cir.physio.strip())/2 for physx in range(nPhysios): physioCode = cir.physio[physx*2:physx*2+2] cirPhysio = db.CirPhysio(lat, lon, physx, physioCode) db.s.add ( cirPhysio ) #-- 4 -db.s.commit() New Mexico Tech Computer Center pycbc: Python interface for the CBC database 59 9.7. transloader: addCircleYear() transloader # - - - a d d C i r c l e Y e a r def addCircleYear(my, db, stnd): '''Copy all the data for a given circle-year. [ (my is a MyCBC instance) and (db is a CBCData instance) and (stnd is a my.Stnd instance) -> db := db + (all circle-year data for stnd) ] ''' This function handles the copying of new rows to the efforts and censuses tables. The first order of business is to retrieve the Eff and AsPub instances for this count ID. transloader #-- 1 -# [ lat := latitude from stnd # lon := longitude from stnd # year_no := year number from stnd # year_key := year key from stnd # eff := Eff instance from (my) for stnd.count_id # asPub := AsPub instance from (my) for stnd.count_id ] lat = stnd.lat_lon[:4] lon = stnd.lat_lon[4:] year_no = stnd.count_id[:3] year_key = stnd.count_id[3:] eff = my.getEff(stnd.count_id) asPub = my.getAsPub(stnd.count_id) Now create the new Effort instance. For the constructor, see Section 6.11, “The efforts table” (p. 26). transloader #-- 2 -# [ db.s +:= a new db.Effort instance representing eff and asPub ] asLat = asPub.as_lat_lon[:4] asLon = asPub.as_lat_lon[4:] effort = db.Effort(lat, lon, year_no, year_key, eff.yyyymmdd, asLat, asLon, asPub.as_name, eff.n_obs, eff.ph_tot, eff.ph_foot, eff.ph_car, eff.ph_o, eff.h_fd, eff.h_owl, eff.pm_tot, eff.pm_f, eff.pm_c, eff.pm_o, eff.m_owl) db.s.add ( effort ) Copying of records to the censuses table is handled in Section 9.8, “transloader: addCensus()” (p. 61). transloader #-- 3 -# [ db.s +:= new db.Census instances representing rows from # the cen table in my for count_id (count_id) and # year (year_no) ] for cen in my.genCens(stnd.count_id): #-- 3 body -- 60 pycbc: Python interface for the CBC database New Mexico Tech Computer Center # [ db.s +:= a new db.Census instance representing cen ] addCensus(db, lat, lon, year_no, year_key, cen) 9.8. transloader: addCensus() transloader # - - - a d d C e n s u s def addCensus(db, lat, lon, year_no, year_key, cen): '''Add one row to the censuses table. [ (db is a CBCData instance) and (lat is a latitude as 'ddmm') and (lon is a longitude as 'dddmm') and (year_no is a year number as 'nnn') and (year_key is a year key) and (cen is a MyCBC.Cen instance) -> db.s +:= a new db.Census instance representing cen ] ''' For the Census constructor, see Section 6.12, “The censuses table” (p. 27). transloader census = db.Census(lat, lon, year_no, year_key, cen.seq_no, cen.form, cen.rel, cen.alt_form, cen.age, cen.sex, cen.plus, cen.q, cen.census) db.s.add ( census ) 9.9. transloader: Epilogue transloader #================================================================ # Epilogue #---------------------------------------------------------------if __name__ == "__main__": main() 10. Static data files Here are the files used by Section 7, “The staticloader script: Populate the static tables” (p. 40) to populate the nation and regions table. For downloads, see the links in Section 2, “Downloadable files” (p. 3). The nationlist file: nationlist FRA France CAN Canada USA United States of America The regionlist file: New Mexico Tech Computer Center pycbc: Python interface for the CBC database 61 regionlist FRA CAN CAN CAN CAN CAN CAN CAN CAN CAN CAN CAN CAN CAN CAN USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA USA 62 FR AB BC MB NB NF NL NS NT NU ON PE QC SK YT AL AK AZ AR CA CO CT DE DC FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK Saint Pierre et Miquelon Alberta British Columbia Manitoba New Brunswick Newfoundland Newfoundland and Labrador Nova Scotia North West Territories Nunavut Ontario Prince Edward Island Quebec Saskatchewan Yukon Territory Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma pycbc: Python interface for the CBC database New Mexico Tech Computer Center USA USA USA USA USA USA USA USA USA USA USA USA USA USA OR PA RI SC SD TN TX UT VT VA WA WV WI WY Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming Here is the physiolist file. Codes 43 and 44 were added because there are old circle files that use them. physiolist 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Subtropical Floridian Coastal Flatwoods Upper Coastal Plain Mississippi Alluvial Plain Coastal Prairies South Texas East Texas Prairies Glaciated Coastal Plain Northern Piedmont Southern Piedmont Southern New England Ridge and Valley Highland Rim Lexington Plain Great Lakes Plain Driftless Area St. Lawrence River Plain Ozark-Ouachita Plateau Great Lakes Transition Cumberland Plateau Ohio Hills Blue Ridge Mountains Allegheny Plateau Open Boreal Forest Adirondack Mountains Northern New England N. Spruce-Hardwoods Closed Boreal Forest Aspen Parklands Till Plains Dissected Till Plains Osage Plain-Cross Timbers High Plains Border Rolling Red Prairies New Mexico Tech Computer Center pycbc: Python interface for the CBC database 63 36 37 38 39 40 43 44 53 54 55 56 61 62 63 64 65 66 67 68 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 98 99 64 High Plains Drift Prairie Glaciated Missouri Plateau Great Plains Roughlands Black Prairie Unknown stratum 43 Unknown stratum 44 Edward's Plateau Rolling Red Plains Staked Plains Chihuahuan Desert Black Hills Southern Rockies Fraser Plateau Central Rockies Dissected Rockies Sierra Nevada Cascade Mountains Northern Rockies Great Basin Deserts Mexican Highlands Sonoran Desert Mojave Desert Pinyon-Juniper Woodlands Pitt-Klamath Plateau Wyoming Basin Intermountain Grasslands Basin and Range Columbia Plateau S. California Grasslands Central Valley California Foothills S. Pacific Rainforests N. Pacific Rainforests Los Angeles Ranges S. Alaska Coast Willamette Lowlands Tundra pycbc: Python interface for the CBC database New Mexico Tech Computer Center