INDEXING PLACE CHARACTERISTICS

Throughout the term you have learned various basic skills; now you'll apply them to a research projec you design.  You've probably seen indexes like that in Rand McNalley's Places Rated Almanac , or formulas to find best vacation places, best retirement places, etc. (try this site for some on-line "place selectors").  Unless you'd like to design your own (similar) study, you'll use digitized layers prepared by other 391 classes, and other layers you contribute, to find your favorite place among the 26,000 places in the lower 48 states.  You'll also have access to census data for the entire U.S., at the tract, place, city, Metropolitan Statisticl Area (MSA), county, and state levels.  You also have the skills to create a new spatial layer by digitizing or geocoding, adapt an existing spatial layer such as .dlg or TIGER, or information available on the Geography Network, or join a non-spatial attribute table to an existing layer.

STEP ONE
PRINT THE WORKSHEET: http://www.neiu.edu/~ejhowens/391/indxform.htm
You can use this to jot down and work out your ideas.

  1. Write down your requirements and preferences in a conceptual way, like this:

 

CONCEPTUAL REASON REQ/PREF
Among educated people It's important to have a community that is intellectually stimulating REQ
In a racial/ethnic mix I enjoy exposure to other cultures PREF
In a temperate climate I don't like extreme heat or cold; I like mild weather REQ
Near family/friends The easier it is to visit my family and friends, the better PREF
Near an ocean or large lake I find large bodies of water calming REQ
In a middle-income area I'm middle income and I'd like to be like others like me PREF
In a place with low cost of living Affordability REQ
Where housing is inexpensive I'd like to buy my own home REQ
Where the crime rate is low I'd like to raise a family and need security REQ
Average population density I don't like congestion but don't want to feel isolated PREF

 
Requirements are the kinds of things you select by condition, location or by combining selections.  Each place will either qualify (YES) or will not qualify (NO).  Applying them will result in a selected set.  Here is an example of a requirement: "I must be at least 100 miles from a major urban center," (anywhere beyond that is fine, less than that, unacceptable). Other requirements: a temperate climate zone, in Illinois, in Gerreau's bread basket, a town between 5,000 and 30,000 people or a county with a moderate population density. 

Preferences are not a simple matter of YES/NO; rather thay are BETTER/WORSE.  The preferences will be applied to those places which remain after all the requirements have been applied. Preferences will be used to index the qualifying places.  Here are some examples of preferences: "the warmer the winter, the better," (refer to heating degree days), "the closer to Chicago, the better," (refer to a distance from Chicago variable), "the better educated the better," (refer to percent over 25 with a college degree). 

Avariable which you use as a requirement can also be used as a preference.  For example, "I must live within 100 miles of an ocean, but within that zone, the closer to the ocean, the better."  You'd use a condition "within 100 miles of ocean" to eliminate all nonqualifying places, and then a weighted formula (see below) to factor in distance-to-ocean for each qualifying place.

Come up with a long list of things you like or do not like.
 

  1. Indicate whether they are a REQUIREMENT (must have/not have) or a PREFERENCE (the more/less the better).
It's possible to have the same variable as a requirement and a preference.  For example, I MUST live within 60 miles of an ocean, but within that zone, the closer to the ocean the better.
  1. Look carefully for highly correlated requirements and highly correlated preferences.  For example, in the list above housing cost and cost of living are likely to closely related.  Choose one.
You should end up with at least 3 requirements and at least 3 preferences, and at least one more (either requirement or preference).
  1. Choose a geographical layer for your results. Unless you clear it with me, choose place.  Other options include city, county, MSA, or census tract. This is important, as it will affect your entire analysis. If you choose place, your end result will be "best place."  If you choose MSA you'll find the best metropolitan area, etc.
  1. Operationalize these variables.  Look at the data available to you, and think about what else you have access to.  Then translate the conceptual to the operational, as I have done here...

 

CONCEPTUAL OPERATIONAL SOURCE LAYER
Among educated people Greater than 50% of adults have some college Census data Place
In a racial/ethnic mix Absolute Value of 60-<percent white> Census data Place 
In a temperate climate Any "C" Koppen Climate  G&ES 391 Climate layer
Near family/friends Distance from county centroid to Chicago Centroid PREF County and CityPoint
Near an ocean or large lake Within 50 miles of coast or Lake over X sq. miles in size G&ES 391 Coasts
In a middle-income area Percentage of households between 50-75,000 annual Census data Place
In a place with low cost of living Cost of Living Index (MSA only)* G&ES 391 MSA
Where housing is inexpensive Average value home Census data Place
Where the crime rate is low Low violent crime rate per 100,000 G&ES 391 MSA
Places with an average density ABSValue(Ideal density - County population density)  Census data County

*Any data available by MSA can be transferred to all places in that MSA.  This presents two problems: (1) places outside an MSA have missing information -- hence MSA data should be used only when a requirement is that it MUST BE IN AN MSA.  and (2) all places within a particular MSA will have the same score for that variable.  Hence, do not use more than one MSA variable, or you'll be ranking MSAs and not places.

The requirements, which require a QUALIFY/DISQUALIFY result are in regular type.  They require a specific characteristic or threshold to result in YES or NO for each place.  The preferences are not qualify/disqualify but each receives a specific value or score.  In making this list, refer to the list of layers already prepared (below) and think about what you might be able to contribute.
 
 
The following layers are ready for your use.  In some cases, they are incomplete (where noted).
CARTOGRAPHIC LAYERS -- probably most useful for display 
  • Latitude and Longitude -- maybe most useful for cartographic display rather than analysis
    • 5 degree (filename 5deggrid)
    • 10 degree filename 10dgrid)
    • 30 degree (filename 30dgrid)
    • important latitude (filename imp_lat)
  • the lower 48 states (filename 48states)
    • full 1990 census data attached to each state
    Country -- adjacent countries, maybe most useful for cartographic display rather than analysis (filename country)
NATURAL ENVIRONMENT
  • Climate Regions -- using a modified Koppen climate system
    • for a summary of the climate types try here
  • Natural vegetation (filename: veget)
    • Nine categories, from Goode's World Atlas
      • deciduous forest
      • coniferous forest
      • mixed forest (decid/conif)
      • tropical forest
      • desert shrub
      • low grass savanna
      • grass
      • mediterranean vegetation
      • xerophytic open forest
  • Coasts -- this includes ocean and major lakes, a line layer
    • duplicate this with points to get distance from ...
  • Lakes -- a huge but obviously incomplete area layer of inland lakes (filename lakes)
  • Rivers -- high resolution.  Many many U.S. Rivers (filename riverh)
  • Rivers -- low resolution .  Only the biggest U.S. Rivers (filename riverl)
  • Moisture regions -- like arid, semiarid, humid, etc.  This one has a few bad areas in 2001 (filename moistur)
  • Seismic vulnerability -- a measure of expected seismic activity in next 50years (filename seismic)
  • Elevation -- contours -- but there is an obvious flaw in central states (filename elev_con)
  • Landforms (filename: landform) from Goode's World Atlas
      • depression or basins
      • hills or low tablelands
      • plains
      • mountains
      • high tablelands
      • widely spaced mountains
  • Percent possible sunshine Dec-Feb (filename: sun_d-f)
      •    attribute data are in percentages
  • Percent possible sunshine June - August (filename: sun_j-a)
      • attribute data are in percentages
  • Average temperature in January (filename temp_jan)
  • Thermal efficiency -- total potential evapotranspiration in inches, for average year.  Useful for agriculture (filename therm_e)
  • Rainfall in inches per year (filename rainfall) with low, midrange, and high for all area ranges
  • Forest (filename: forest)  from Goode's World Atlas
      • Rocky Mountain Forests
      • Southern Forests
      • Northern Forests
      • Tropical Forests
      • Pacific Forests
      • Hardwood Forests
      • Treeless Region
    Coal deposits -- natural deposits of coal, with type of coal noted
HUMAN ENVIRONMENT
  • Joel Gerreau's NINE NATIONS OF NORTH AMERICA (filename: gerreau)
    • functional U.S. regions -- see the book (in the lab) for info about nations
  • Cancer rate, all cancers, for 40-year old white men age specific, per 100,000 pop annually (filename cancrm40)
  • Heating Degree Days -- a measure of energy needed to heat through winter (filename Hddays)
  • Dominion Energy Index -- a measure of temperateness of climate, similar to heating and cooling degree days (filename Dom_idx)
  • MSAindex -- metropolitan statistical areas (city proper and suburbs) -- useful if you like to be in or near big cities. (filename MSA)
    • with all census data attached to each
    • Also on this layer is indexed scores taken from the Places Rated Almanac - each score combines multiple measures of one conceptual item.  Read the file MSAindex.txt for more information:
      • crime rate index
      • cost of living index
      • diversity index
      • health care index
      • recreation index
  • Economic Activity -- Major economic activity (filenameWater Pollution points (filename: econ_act)
    • Forestry
    • Stock raising
    • Agriculture
    • Manufacturing
    • Little or no economic activity
  • Overdraft of water -- areas where the local water supply is overtaxed by local uses (filename overdrft)
  • Religion -- predominant religions in major U.S. regions
    • catholic
    • mormon
    • protestant mix
    • methodist
  • Ozone -- areas with regular ozone air pollution problems
  • Water pollution points (filename w_pol_p) from Goode's World Atlas
    • about 8 categories of water pollution in the attribute table
  • Water Pollution areas (filename: w_pol_a) from Goode's World Atlas
    • All areas are places with water pollution
  • Utility lines -- yep, power transmission lines and underground pipelines (filename utility)
  • EPA offices, geocoded -- just as it sounds.  Want to work for the EPA?  Use this one
    • Or use this good idea to map your own list of headquarters
  • Parks -- 12,154 parks (local, state, national), a polygon file with the name of the park attached
NEW SPRING 2003* -- find SPRING 2003 disk in the cabinet
  • trails -- three major scenic trails (Continental Divide, Appalachian, and Pacific Crest)
  • waterresources -- Percent Recharge?  There appears to be something wrong with this data, look at Arizona
  • tornado risk -- low, moderat, high, very high, maximum
  • surface water -- 0 - 90%+?  This means something, don't yet know what.
  • solar radiation received -- in kilowat hoursm2/day?
  • physiography -- mountain systems, highlands, plains, etc. 
  • lyme disease -- low, moderate, high
  • corrosion? -- severe, heavy, moderate, mild
  • btu (British Thermal Unit) zones -- 6, 8, 10?
  • air conditioning zone -- 1, 2, 3, 4, 5?
*If yours isn't here, or if I don't have the metadata txt file explaining what the information is, where it came from, date, etc. please see me.

And from Maptitude geographic CD for the U.S. (convert to Shape with Maptitude if it hasn't been done) 

      • County and State Boundaries, with STF1 and STF3 data
        • Use ccstatel and cccntyl for this analysis -- they're quicker than the high-resolution ccstateh and cccntyh
      • Census Tracts (cctract) with all 1990 census data
      • Places (cplace) -- these are actual boundary files, with data, for anything from villages upward
      • Metropolitan Statistical Area (cmsa)  Areas. There are just over 200 in the US -- the big cities and their suburbs
      • Cities (points) ccuscity -- State Capitals or cities more than 30,000 pop.  Note that state and national capitals come as already-selected sets.  No data are attached.
      • U.S. Populated Place (ccppl) -- similar to places, but point geography with almost no data attached
        • some include elevation
      • Landmarks (cclmarea) Landmarks, as areas
        • military installation, campground, jail or detention center, federal pen/state prison/prison farm, amusement center, national park/forest, other federal land, state or local park or forest
      • Landmarks (cclndmrk) landmarks, as points
        • airport, building, bridge, cemetery, cave, church, crater, dam, glacier, hospital, island, locale, military, mine, park, school, tower
      • Roads and highways
        • Interstate highways (ccishwy)
        • DOT files (CCHIGHWY)
      • Zip code files -- thought hese are not created by the census bureau
      • Water features
        • lakes,inlets, rivers, etc.
And from the ArcGIS data CD 
      • Volcanoes (point layer)
        • age of eruption
        • explosiveness of eruption
        • number of eruptions
        • type of volcano, shape and size
      • DMA Arbitron's Rating Company Areas of Dominant Influence (polygon, usually combinations of counties)
        • Dominant viewing areas of commercial TV broadcast
      • Airports (point)
        • attribute table includes total enplanement in 1994
      • U.S. 106th Congressional Districts (area --ID but no data)
        • Rep name and party affiliation
      • Federal Lands (area)
        • Type of Federal Land (national park, wildnerness, natnl forest, army corps, public domain BBL, etc.)
      • Federal (line)
        • parkways, wild and scenic rivers, etc.
      • Rivers -- dense river network -- no attribute data
      • Major Lakes or Lakes (appear to be the same layer)
        • Name and area of lake (square miles?)
      • Major Roads (generalized)
        • Type and class of road, toll Y/N
      • Interstate -- Major roads with more detail on interchanges
        • Route number and length
      • Majordnet (roads)  Rural interstates, expressways freeways, interstates and arteries
        • Name, toll? Median? Functional class, length
      • Drainage
        • Seven major drainage systems in U.S., names only
      • Hydroln -- linear water features
        • Appears to be a good map of coasts, nothing more
      • Urban -- boundaries of urban areas with population more than 50,000
        • Really appears to be contiguous urbanized areas, with name the only attribute
      • Parks -- National parks, national forests, state and local parks and forests
        • Name and state/national code are only attributes
      • Topoq24, 100, 250 topographic quad map names
        • use this to find the correct topo map
      • Cities and Towns
        • range graded and symbolized by size
      • Quake history
        • Database showing Richter, day/date/time, deaths, property damage
See the back of the Maptitude manual for attribute data available in many of the point and area layers.



ASSIGNMENT PART 1
For next week plan your project.  Use the worksheet to sort out your thoughts and then edit, rename, post and link THIS PAGE, which I will grade.
Here are some questions you should consider.
  1. Rate your preferences.

 

PREFERENCES RATING
Close to parents' home 5
Near Coast 8
Higher income area 6
These "weights" will be applied later.
 
  1. In ArcMap, pull all the useful layers into one session. The layer you will be selecting and rating will be places.
  1. Prepare your fields
    In the dataview for the place layer you will need to find or create the fields you identified as "operational" in step 2 above -- at this stage focus on the "requirement" variables.  Some you may find ready in shape files or ready to convert to shape files on the Maptitude Geographic Data CD (e.g., Per Capita Income '89, or Median Value Home).  Others, you will have to create with a formula field.  For example, if John Doe's conceptual criteria was "lots of single women, because I'd like to get dates" then he would need to create a "percent eligible females" variable by creating this formua:
    (F25-34 Never Married/(F25-34 Never Married +  F25-34 Ever Married))*100

    This will give him "percent of females aged 25-34 who have never been married".  Or you might prefer

    Female Never Married / Male Never Married

    This will give him an eligible female / eligible male ratio, which incorporates male "competition." For another example, if you are concerned about population density, you might try either

    Population / AREA

    for a straighforward measure of per square mile density..., or get a little creative with

    (([HU RentOcc 5-9 HU] + [HU RentOcc 10-19 HU] + [HU RentOcc 20-49 HU] + [HU RentOcc 50+HU])/[HU Renter Occupied]) *100

    This will give you percent of renters living in large apartment buildings -- another measure of density.

    In any case, use calculate in a new field in the attribute table to do this kind of preparation.
     

  1. Apply your requirements

  2. Apply your requirements first, beginning with the simplist and most restrictive ones.   These will remove a lot of places from your qualifying list, and make the preference ranking easier later.  As you decide which requirement to apply first, keep in mind the task each request requires.  The easiest thing for the GIS to do is select areas that are inside a polygon or set of polygons on another layer.  Do you have any requirements like that?

    For example, if you require a place in a particular climate, first  select the climate polygon(s) you are interested in -- you might even export them to a new layer or virtual layer (e.g., "Humid Subtropical").  Then select places on the basis of  location, referring to the desired climate polygon(s).

    Continue to apply your requirements, each time exporting the layer to a smaller and smaller qualifying set until you're left with a layer that only contains places which satisfy all the requirements.

    IN IMPORTANT SETTING TO KEEP IN MIND --  Under the selection setting remember to determine whether you want the intersecting places or just those that are entirely enclosed.  In other words, if part of a place is within your desired climate zone, and part isn't, does it qualify or not?
     

Learn from my mistake and do your analysis in the right order. I tried to select all census places (N=23,753) by location (specific distance from the nearest census city -- N=1,062). After watching 2.5 million calculations, I realized that I had asked my computer to compute the distances between all 23,753 places and all 1,062 cities: 25,225,686 calculations.  Better to apply the "climate" or "coastline" requirements first!
 
  1. Prepare your preference fields.
    Now that you've exported all places that you would even consider, you'll probably have no need to use selection tools any more.  Now, it's a matter of preparing your preference fields and applying the formula.

    Make your <qualifying places> layer current and, in the attribute table make any other necessary formula fields.  Now you can use either field calculation or join attribute data to create the necessary data for your preferences.

    Example: Making a variable for "how far off  from a prefered amount of rainfall"
    Let's say you like 35 inches of rain -- more would begin to be too wet, less too dry.  You have a column called "inches" in the rainfall layer.  With calculate, refer to the "inches" data to your places layer and then apply this formula:  ABS(inches-35).  The bigger the absolute value, the worse the rainfall.
     

  1. Compute your averages.

  2. Using the column summary to get the average for each preference field for all qualifying places.
  3. Calculate the index

  4. Now you will apply your preferences and the weights you assigned to them earlier. Use the Calculate function to apply this formula:

    (((P1-AvgP1)/AvgP1)*W1)+(((P2-AvgP2)/AvgP2)*W2)-(((N1-AvgN1)/AvgN1)*WN1) ... where

For example, say your average PCI was $25,300 and your average % inPoverty was 15%, and you had weighted these 6 and 3 on a scale of 10, respectively. then
((([Per Capita Income] - 25300)/25300)*6)-((([P Below Poverty] - 15)/15)*3)...

The [Per Capita Income] and the [P Below Poverty] are the two variables (the actual values are different for each case, so refer to the variable name), and the other values are fixed (they are the same for every case, so type in the value itself). Income is added (no initial + sign needed), and Percent Below Poverty is subtracted -- because the higher the worse -- it's a bad thing.  When you add the other preferences in the same way (remember, at least three),  you will have your index of favorite places -- the bigger the number, the better the place. Sort by the place column and you'll see the best (or worst -- among qualifying places).
 

  1. In your assignment use plenty of maps and a thorough explanation.  The final result may be a table *and* a map or two.  But you'll probably want to use many maps showing the process step-by step.
When you are displaying your places, you might choose a choroplethic map though a simple proportional circle would do well too. It works well to label your qualifying places -- at least the top ones, with the actual index score.  If you have many qualifying places, you might select only the top 10 or 50 or whatever works.  Remember that every selection set that is set to Highlight will be shown, even when you use the "show selection only" feature, so you must either delete all "working" selection sets, or at least make them "not highlighted."  Only then will your map give you the final qualifiers only.

Note: One weakness of this method is that some values vary from the mean more than others.  For example, percent possible sunshine may average 50 but run anywhere from 5 to 95; whereas the male-female ratio may average 100 (100 men per 100 women) and may vary from 95 to 105.  If you use these two with the formula above, sunshine will be automatically "weighted" much more highly than the M:F ratio.  You should be using "standardized" values instead.  Interested in addressing these issues?  See me about an independent study.

© 1997-2004 Erick Howenstine
  Here's an excellent example from 2003.