First time plotting coordinates on Map using Python

Pankaj Chejara, PhD
3 min readMar 5, 2023

--

This tutorial will show you how to plot geospatial data using Python with the library GeoPandas. For this exercise, I have used the Boston Bike Sharing dataset.

I have recently started learning and using Map library with Python and this tutorial is the reflection of things I have learned so far. I hope it will be helpful for those learning the same first time. If you have any suggestions or feedback to improve then please leave it in the comment. It would surely help me in improving.

Loading the dataset

We start first loading our dataset. There are two CSV files, `hubway_trips.csv`, and `hubway_station.csv`. The first data file contains information on all bike trips (e.g., bike number, start and end time of the trip, start and end station of the trip, user information, etc). The second data file `hubway_station.csv` contains information about the geo spatial data of bike stations (e.g., longitude, latitude, municipality).

import pandas as pd

# Bike trip data
trips = pd.read_csv('hubway_trips.csv')
trips.head()
# Bike station data [seperator in this file is ';']
stations = pd.read_csv('hubway_stations.csv',sep=';')
stations.head()

Joining the dataset

We will first join the data to have a single file with lattitude and longitude information for start station of bike trip. For that we will perform join operation on start_statn in trips and id in stations.

# we exclude status column because it is present in both data file and will cause an error on join operation.
stations_non_status = stations[['id','lat','lng','station','municipal']]

# combined data with start station geo spatial information
trips_stations = trips.join(stations_non_status.set_index('id'),on='strt_statn')

# extracting bike trips within boston only
trips_stations = trips_stations.loc[trips_stations['municipal'] == 'Boston',:]

trips_stations.head()

Working with GeoPandas library

Now we will move towards setting up our dataset to have the geospatial data required by the GeoPandas library.

GeoPandas is a python library with the extended capability of Pandas by including GeoSpatial data processing and plotting functionality.

First, install the GeoPandas library if you don’t have it on your computer.

# Installing Geopandas
! pip3 install geopandas
import geopandas as gpd
from shapely import Point, Polygon
geo_column = [Point(lng,lat) for lng, lat in zip(trips_stations['lng'],trips_stations['lat'])]

What we did in the above code was created a list with a Point object for each record in our joined data file. This Point object will provide geospatial data information for GeoPandas to work with.
Now we will create our GeoPandas data frame which will have our data with geospatial information.

crs={'init':'epsg:4326'}
gdf = gpd.GeoDataFrame(trips_stations,crs=crs,geometry=geo_column)

Now, we created a geopandas dataframe which have all our records from trips_stations dataframe and all those records are associated with a geometric point (e.g., geospatial data).

Here, we have also specified the coordinate reference system (CRS).

Each geopandas dataframe requires one geometry column which has information of geospatial data.

Plot geospatial data

gdf.plot()

The above plot does not show Boston’s map. To do that we would need Map files for Boston. You can download it from here. Next, we used the same CRS that we used earlier.

boston = gpd.read_file('./City_of_Boston_Boundary/')
boston = boston.to_crs({'init': 'epsg:4326'})
boston.plot()

Plotting geospatial data over Boston’s map

Now we will plot the data of start station of bike trip on the map of Boston.

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,8))
boston.plot(ax=ax)
gdf.plot(ax=ax,color='red',marker='+',alpha=.5)

--

--

Pankaj Chejara, PhD

Data Scientist@Metrosert AS, Machine Learning, Applied Data Science, Bio-Informatics Python — Tallinn, Estonia