\[ \definecolor{data}{RGB}{18,110,213} \definecolor{unknown}{RGB}{217,86,16} \definecolor{learned}{RGB}{175,114,176} \]

Introductory Geocoding

Photon and geopy

Juan Shishido, Andrew Chong
School of Information
GSR, D-Lab

Purpose

Obtain coordinates for: street addresses, intersections, place names, or zip codes

Enables

  • Mapping of addresses
  • Linking with other geospatial data
  • Spatial calculations, such as distances

Process

In general, these are the steps in a geocoding project

  • Identify needs
  • Choose geocoder
  • Preprocess data
  • Geocode
  • Verify output
  • Postprocess data

Considerations

Think about

  • Cost
  • Geographic scope
  • Output quality
  • Speed
  • Scale

Two primary options: local and remote

Local Options

In some cases, you may want or need to use a local geocoder

  • Data is confidential or restricted use
  • Too many addresses for remote

Software options: ArcGIS, Postgres/PostGIS

Remote Options

Many options

  • Google Maps Geocoding API
  • SmartyStreets
  • ArcGIS
  • Nominatim
  • DSTK
  • Photon
  • geopy

Remote Options

Vary based on

  • Usage limits
  • Methods of use
  • Output quality
  • Coverage

Data Science Toolkit

Google Maps Geocoding API

Google Maps Geocoding API has high accuracy but has usage limits:

  • 2,500 free requests per day, 10 per second
  • $0.50/1000 requests, up to 100,000 daily

Street Address to Coordinates

/street2coordinates

Pass in the address as a parameter

https://maps.googleapis.com/maps/api/geocode/json?address=<_your address_>

Street Address to Coordinates

/street2coordinates

Pass in the address as a parameter

https://maps.googleapis.com/maps/api/geocode/json?address=
1600 Amphitheatre Pkwy, Mountain View, CA

Street Address to Coordinates

{ "results" : ...
"geometry" : {
            "location" : {
               "lat" : 37.422245,
               "lng" : -122.0840084
            },
            "location_type" : "ROOFTOP",
            "viewport" : {
               "northeast" : {
                  "lat" : 37.42359398029149,
                  "lng" : -122.0826594197085
               },
               "southwest" : {
                  "lat" : 37.4208960197085,
                  "lng" : -122.0853573802915
               }
            }
         }...
}

Evaluate

Try It

https://maps.googleapis.com/maps/api/geocode/json?address=<_your address_>

What happens when

  • State is omitted
  • Zip code is omitted
  • Commas are removed
  • Mix case

Photon

Photon

Photon is free and open source

  • Uses OpenStreetMap data
  • Worldwide coverage
  • Multilingual search
  • Typo tolerance
  • Fast & scalable

However, "extensive usage will be throttled"

Photon API

Search

photon.komoot.de/api/?q=berkeley

Limit number of results

photon.komoot.de/api/?q=berkeley&limit=1

Preferred language

photon.komoot.de/api/?q=berkeley&lang=fr

Photon API

Pass in the address as a parameter

photon.komoot.de/api/?q=
1600 Amphitheatre Pkwy, Mountain View, CA

Photon API

{"features": [{
  "properties": {
    "osm_key":"office",
    "street":"Amphitheatre Parkway",
    "name":"Google Headquaters",
    "osm_id":2192620021,
    "osm_type":"N",
    "osm_value":"commercial",
  },
  "type":"Feature",
  "geometry": {
    "type":"Point",
    "coordinates":[-122.0850862,37.4228139]
  }
}],
"type":"FeatureCollection"}

Full output

Evaluate

Bonus: geopy

geopy

Geocoding with Python

Access to many geocoding services

  • OpenStreetMap Nominatim
  • ESRI ArcGIS
  • Google Geocoding API
  • Baidu Maps
  • Bing Maps

geopy on GitHub

geopy

Example from the docs

$ pip install geopy
>>> import geopy
>>> from geopy.geocoders import Nominatim
>>> geolocator = Nominatim()
>>> location = geolocator.geocode("1600 Amphitheatre Pkwy, Mountain View, CA")
>>> print((location.latitude, location.longitude))
(37.4228139, -122.0850862)

You can also reverse geocode, calculate distances, and more

Check out the geopy documentation

JSON

JSON Output

JavaScript Object Notation

  • Format typically used to send data between a server and web app

Convert JSON to CSV

  • Write a script in Python, R, etc.
  • Use other modules, e.g., pandas

Manipulating the JSON object in Python

Go to Photon API linkand paste output in command below:


import json

json_str = '[output from Photon API]' 
python_obj = json.loads(json_obj)

# navigating dict by key 
type(python_obj)
python_obj['features']

# navigating dict by list 
type(python_obj['features'])
python_obj['features'][0]

Problems in Reference Data

Varies over services (remote & local)

  • Incorrect street ranges, inaccurate or low quality features
  • Inaccurate feature attributes

Year matching between geocoded and reference data

  • Missing streets
  • Address changes

Verify Output

Ways to assess quality

  • Compare input street name to street_name
  • Count missing values
  • Test against sub-sample with known or high-quality coordinates

Because results are based on an underlying database or interpolation method, there will be variation in coordinate quality. In cases where the results are not good enough, consider using another service for those addresses.

Postprocess

To GeoJSON

Link to Census Blocks

http://data.fcc.gov/api/block/find?latitude=<_latitude_>&longitude=<_longitude_>

Block FIPS="060855046011175"

The first two characters (06) indicate the state (CA), the next three (085) indicate the county (Alameda), the next 6 indicate the census tract (5046.01) and the last four characters indicates the census block group and block number (1175). The first digit of the block identifies the block group.

Mapping

Several options

  • Leaflet
  • geojson.io
  • CartoDB
  • ArcGIS/QGIS
  • GeoCanvas
  • Python

Best Practices

Preprocess data

  • Formatting
  • Components

Sample and test

Use multiple sources

Map results to verify

Tutorial

Clone the repo or download the zip file from:

https://github.com/dlab-geo/geocoding-geopy

Navigate to the directory and start an IPython notebook instance

$ ipython notebook

Let's get to work

We'll create a map of the 44 BART stations in the Bay Area

Thanks!

Web
GitHub
LinkedIn