Assignment 9: Visualization in Base R, Lattice and ggplot2
Assignment 9: Visualization in Base R, Lattice and ggplot2
This assignment we were tasked with selecting a dataset within the provided list, which I chose the air quality dataset which I've been given the chance to use in the past in other assignments, and I felt it a good fit for this also I began by exploring the dataset and I realized that if I wanted to do a timeseries plot for the lattice graph I had to reformat the data to add in the year, which I found exploring the data with "?airquality". The entirety of the code is available on my GitHub here, and the code of all 3 graphs is shown here:
The first graph using base R was simple and shown here:
This graph is showing the ozone level during the summer of 1973. An issue I found when creating this first graph was that the dataset had a large number of large gaps, and I dealt with this issue by just removing the NA values of the dataset and storing this all in a new variable airClean. I chose for it to be a line graph, and I labelled the x and y axis, as well as using date on the x and ozone level on the y with one overarching title explaining the goal of the graph which I believe it all illustrates clearly.
We can see there are spikes often in this graph, with August experiencing the highest spike, and an especially active period during July and August. We see that May and Most of September see low ozone levels. It seems as though the ozone levels spike during what should be some warmer months and returning to lower levels during colder months.
The second graph chosen was a lattice graph shown here:
The final graph above is very similar to the previous two but as you can see the linewidth is thicker. This is the final graph needed for contrast, and the difference with this one being how the dataset is referenced, the choice of geom_line to create a line graph instead of any other variations, and the ability to choose a theme for this graph. ggplot2 uses a layered system to build plots, making it more structured and flexible than base R or lattice.
This graph is very similar to the previous but is constructed using a different format of code. Unlike the first plot, I had to use the y ~ x formula, but unlike the first plot I could simply declare the dataset it was using instead of declaring it each line. This is really the only major difference, but it also does generate the image all at once which the first graph does not.
The third graph was a ggplot 2 showing the final version of the graphs requested:
The final graph above is very similar to the previous two but as you can see the linewidth is thicker. This is the final graph needed for contrast, and the difference with this one being how the dataset is referenced, the choice of geom_line to create a line graph instead of any other variations, and the ability to choose a theme for this graph. ggplot2 uses a layered system to build plots, making it more structured and flexible than base R or lattice.
I would say that I appreciate the ggplot2 the most in terms of what gave me most control. Everything is segmented, and the ability to manipulate the aesthetic of the graph so easily and in so many ways is something to note. Like I previously mentioned, I found there to be gaps in the data which I solved by NA omitting and creating a new cleaned data variable. Aside from that there were no major issues exploring this dataset.
This was another fascinating and enjoyable week, my semester is coming to a close, so the opportunity to do this is always appreciated. I continue to look forward to future tasks and lectures, approaching the end of my college career and I am soaking in the sunlight as I can. All the graphs and code are available on my GitHub, and links are embedded throughout the post taking you exactly where they are.
Comments
Post a Comment