NEON EDUCATION bio photo

NEON EDUCATION

Devoted to open data and open source in science and education.

Tags

R programming (11)
Time Series (10)
Metadata (EML) (1)
Raster Data (1)
Spatial Data & GIS (1)
Hierarchical Data Formats (HDF5) (1)

Tutorial by R Package

dplyr (2)
ggplot (0)
ggplot2 (0)
h5py (0)
lubridate (time series) (0)
maps (0)
maptools (0)
plyr (0)
raster (1)
rasterVis (raster time series) (0)
rgdal (GIS) (1)
rgeos (0)
rhdf5 (0)
sp (0)

View ALL Tutorials

View ALL Workshops




Twitter Youtube Github


Blog.Roll

R Bloggers

Overview

Several factors contributed to extreme flooding that occured in Boulder, Colorado in 2013. In this lesson, we will explore several key variables including:

  • precipipitation (rainfall) data collected by ???NDCD??
  • measures of drought prior to the flooding that explain …
  • increased stream discharge (or flow) that occured due to combination of the drought conditions and extreme rainfall.

###Goals / Objectives After completing this activity, you will:

###Things You’ll Need To Complete This Lesson Please be sure you have the most current version of R and, preferably, R studio to write your code. ####R Libraries to Install:

  • ggplot2: install.packages("ggplot2")
  • lubridate: install.packages("lubridate")

####Data to Download

#

The LiDAR and imagery data used to create the rasters in this dataset were collected over the Harvard and San Joaquin field sites and processed at NEON headquarters. The entire dataset can be accessed by request from the NEON website.

####Recommended Pre-Lesson Reading

Lesson Series

#R Libraries

We will be working with time-series data in this lesson so we will load the lubridate library. We will use ggplot2 to efficiently plot our data.

library(lubridate) #work with time series data
library(ggplot2)  #create efficient, professional plots
library(plotly) #create interactive plots

#Soil Moisture

** we could get this from NDCD as well Boulder station w Soil Moisture: https://www.drought.gov/drought/content/tools/crn-soil-data#

CRN Station ID: 1045 LAT:40.0354 LON:-105.54090000000001 Name: BOULDER 14 W View Data

#Drought Data - the Palmer Drought Index The Palmer Drought Severity Index is an overall index of drought that is useful to understand drought associated with agriculture. It uses temperature and precipitation data and soil moisture to calculate water supply and demand.

#About downloading the data… Get the data here: http://www7.ncdc.noaa.gov/CDO/CDODivisionalSelect.jsp Metrics:

Abbreviation Metric
PCP Precipitation Index
TAVG Temperature Index
TMIN Minimum Temperature Index
TMAX Maximum Temperature Index
PDSI Palmer Drought Severity Index
PHDI Palmer Hydrological Drought Index
ZNDX Palmer Z-Index
PMDI Modified Palmer Drought Severity Index
CDD Cooling Degree Days
HDD Heating Degree Days
SPnn Standard Precipitation Index

01 - Ar Drainage Baisin 02 - CO Drainage Basin 03 - KS Drainage Basin 04- Platte Drainage 05 Rio Grand Stuff about the index will go here

https://www.drought.gov/drought/content/products-current-drought-and-monitoring-drought-indicators/palmer-drought-severity-index

We are interested in looking at the general state of drought in Colorado before the 2013 floods. The data are in the drought directory of your download here: drought/CDODiv8721376888863_CO.txt. To begin, we will read the data into R.

Our data have a header - the first row repsents the variable name for each column. Thus we will use header=TRUE when we import the data to tell R to use that row as a column name rather than a row of data.

#Import State Wide index data
drought <- read.csv("drought/CDODiv8721376888863_CO.txt", header = TRUE)
head(drought)

##   StateCode Division YearMonth  PCP TAVG  PDSI  PHDI  ZNDX  PMDI CDD  HDD
## 1         5        0    199001 0.96 26.4 -3.42 -3.42 -1.69 -3.42   0 1144
## 2         5        0    199002 1.08 27.7 -3.65 -3.65 -1.74 -3.65   0 1038
## 3         5        0    199003 1.99 36.6 -3.65 -3.65 -1.12 -3.65   0  903
## 4         5        0    199004 2.08 45.1 -3.37 -3.37 -0.31 -3.37   0  609
## 5         5        0    199005 1.97 50.5 -3.02 -3.02  0.01 -2.96   0  446
## 6         5        0    199006 0.74 65.3 -3.59 -3.59 -2.64 -3.59  97   73
##    SP01  SP02  SP03  SP06  SP09  SP12  SP24 TMIN TMAX  X
## 1 -0.03 -0.59 -1.40 -1.13 -0.98 -1.34 -1.38 13.8 39.1 NA
## 2  0.06 -0.08 -0.55 -1.22 -0.78 -1.45 -1.26 15.5 39.9 NA
## 3  0.90  0.74  0.52 -0.89 -0.76 -1.07 -1.08 24.7 48.5 NA
## 4  0.51  0.80  0.73 -0.21 -0.51 -0.56 -0.95 31.8 58.3 NA
## 5  0.09  0.32  0.59  0.25 -0.42 -0.33 -1.02 36.6 64.5 NA
## 6 -1.26 -0.62 -0.26 -0.01 -0.76 -0.75 -1.31 48.9 81.6 NA

#view data structure
str(drought)

## 'data.frame':	311 obs. of  21 variables:
##  $ StateCode: int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Division : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ YearMonth: int  199001 199002 199003 199004 199005 199006 199007 199008 199009 199010 ...
##  $ PCP      : num  0.96 1.08 1.99 2.08 1.97 0.74 3.25 1.76 2.1 1.61 ...
##  $ TAVG     : num  26.4 27.7 36.6 45.1 50.5 65.3 66 64.8 61.2 46.7 ...
##  $ PDSI     : num  -3.42 -3.65 -3.65 -3.37 -3.02 -3.59 -2.54 -2.45 -1.83 -1.39 ...
##  $ PHDI     : num  -3.42 -3.65 -3.65 -3.37 -3.02 -3.59 -2.54 -2.45 -1.83 -1.39 ...
##  $ ZNDX     : num  -1.69 -1.74 -1.12 -0.31 0.01 -2.64 2.06 -0.53 1.11 0.75 ...
##  $ PMDI     : num  -3.42 -3.65 -3.65 -3.37 -2.96 -3.59 -1.67 -1.75 -0.67 0.04 ...
##  $ CDD      : int  0 0 0 0 0 97 74 62 35 0 ...
##  $ HDD      : int  1144 1038 903 609 446 73 45 55 141 563 ...
##  $ SP01     : num  -0.03 0.06 0.9 0.51 0.09 -1.26 1.84 -0.36 0.95 0.48 ...
##  $ SP02     : num  -0.59 -0.08 0.74 0.8 0.32 -0.62 0.43 0.95 0.48 0.87 ...
##  $ SP03     : num  -1.4 -0.55 0.52 0.73 0.59 -0.26 0.38 0.12 1.25 0.56 ...
##  $ SP06     : num  -1.13 -1.22 -0.89 -0.21 0.25 -0.01 0.63 0.46 0.56 0.53 ...
##  $ SP09     : num  -0.98 -0.78 -0.76 -0.51 -0.42 -0.76 0.05 0.21 0.68 0.77 ...
##  $ SP12     : num  -1.34 -1.45 -1.07 -0.56 -0.33 -0.75 -0.26 -0.31 -0.04 0.31 ...
##  $ SP24     : num  -1.38 -1.26 -1.08 -0.95 -1.02 -1.31 -0.85 -0.81 -0.77 -0.39 ...
##  $ TMIN     : num  13.8 15.5 24.7 31.8 36.6 48.9 52.2 50.1 47.1 32.1 ...
##  $ TMAX     : num  39.1 39.9 48.5 58.3 64.5 81.6 79.8 79.5 75.3 61.2 ...
##  $ X        : logi  NA NA NA NA NA NA ...

##Date Fields

Let’s have a look at the date field in our data. Viewing the structure, it appears as if it is not in a standard format. It imported into R as an integer.

$ YearMonth: int 199001 199002 199003 199004 199005 ...

We want to convert these numbers into a date field. We can use the as.Date function to convert our string of numbers into a date that R will recognize.

#view structure of date field
str(drought$YearMonth)

##  int [1:311] 199001 199002 199003 199004 199005 199006 199007 199008 199009 199010 ...

#convert to date
drought$YearMonth <- as.Date(drought$YearMonth, format="%Y%m%d")

## Error in as.Date.numeric(drought$YearMonth, format = "%Y%m%d"): 'origin' must be supplied

When we tried to convert our numeric string to a date, R returned an origin error. R needs a day of the month in order to properly convert this to a date class.

We can add this using the paste0 function. Let’s add 01 to each field so R thinks each date represents the first of the month.

#convert date field (YearMonth) to date class

#add a day of the month to each year-month combination
drought$YearMonth <- paste0(drought$YearMonth,"01")

#convert to date
drought$YearMonth <- as.Date(drought$YearMonth, format="%Y%m%d")

We’ve now successfully converted our YearMonth column into a date class. Next, let’s plot the data using ggplot.

#plot Palmer Drought Index
palmer.drought <- ggplot(data=drought,
       aes(YearMonth,PDSI)) +
       geom_bar(stat="identity",position = "identity") +
       xlab("Year / Month") + ylab("Palmer Drought Severity Index") +
       ggtitle("Palmer Drought Severity Index - Colorado\nNA values included")

palmer.drought

Great - we’ve successfully created a plot, but what is going on with the Y axis? It appears as if we have values that extend to -100. Let’s look at a quick summery of our data.

#viw summary stats of the Palmer Drought Severity Index
summary(drought$PDSI)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -99.9900  -2.1900  -0.0600  -0.7354   1.6300   5.0200

#view histogram of the data
hist(drought$PDSI,
     main="Histogram of PDSI value",
     col="wheat3")

It appears as if we have a negative value -99.99 that is throwing off our plot. This value is a no data value. We need to assign this value to NA, which is R’s representation of a null or no data value.

Then we can plot again!

#assign -99.99 to NA in the PDSI column
#note: you may want to do this across the entire data.frame or with other columns.
drought[drought$PDSI==-99.99,] <-  NA

#view the histogram again - does the range look reasonable?
hist(drought$PDSI,
     main="Histogram of PDSI value with -99.99 removed",
     col="springgreen4")

#plot Palmer Drought Index
ggplot(data=drought,
       aes(YearMonth,PDSI)) +
       geom_bar(stat="identity",position = "identity") +
       xlab("Year") + ylab("Palmer Drought Severity Index") +
       ggtitle("Palmer Drought Severity Index - Colorado\n1990 to 2015")

##Plot.ly - Free Online Interactive Plots

We can create an interactive version of our plot using plot.ly.

#plot Palmer Drought Index
ggplot(data=drought,
       aes(YearMonth,PDSI)) +
       geom_bar(stat="identity",position = "identity") +
       xlab("Year") + ylab("Palmer Drought Severity Index") +
       ggtitle("Palmer Drought Severity Index - Colorado\n1990 to 2015")

#view plotly plot in R
#note - plotly doesn't recognize \n to create a second title line
ggplotly()
#publish plotly plot to your plot.ly online account when you are happy with it
#plotly_POST(droughtPlot.plotly)

#NOTE: we will have to manually present the plot.ly plot here as an embed as R won’t knit it out.

##Subsetting our data

Once we have a column of data defined as a date, class in R, we can quickly subset the data to view portions of the data. We can subset the data directly in our ggplot command however that will not translate to plot.ly. So let’s create a new r object that is represents drought data from 2005-2015.

We will then plot that using ggplot and plot.ly.

To subset, we use the subset function, and specify:

  1. the R object that we wish to subset
  2. the date column and start date of the subset
  3. the date column and end date of the subset

Let’s subset our data.

#subset out date between 2005 and 2015 
drought2005.2015 <- subset(drought, 
                        YearMonth >= as.Date('2005-01-01', tz = "America/Denver") &
                        YearMonth <= as.Date('2015-11-01', tz = "America/Denver"))

#plot the data - September-October
droughtPlot.2005.2015 <- ggplot(data=drought2005.2015,
        aes(YearMonth,PDSI)) +
         geom_bar(stat="identity",position = "identity") +
       xlab("Year / Month") + ylab("Palmer Drought Severity Index") +
       ggtitle("Palmer Drought Severity Index - Colorado\n2005 - 2015")
     
droughtPlot.2005.2015 

#view plotly plot in R
ggplotly()
#publish plotly plot to your plot.ly online account when you are happy with it
#plotly_POST(droughtPlot.plotly)

#Need to remember where i got the data from. I have a feeling all could be accessed via the API!! but i can’t remember how i found PDSI.

This is my order url - but it’s expired. http://www1.ncdc.noaa.gov/pub/orders/CDODiv5787606888799.txt

drought <- read.

#Challenge Another dataset - import and plot


Get Lesson Code

(some browsers may require you to right click.)