TransWikia.com

How to Plot line graph in R with the following Data

Stack Overflow Asked on December 11, 2021

I want a line graph of around 145 data observations using R, the format of data is as below

Date  Total Confirmed  Total Deceased
3-Mar    6               0
4-Mar    28              0
5-Mar    30              5
.
.
.
141 more obs like this

I’m new to ggplot 2 in R so i don’t know how to get the graph, I tried plotting the graph, but the dates
in x-axis becomes overlaped and were not visible. I want line graph of Total confirmed column and the Total Deceased column together with dates in the x- axis, please help and please also tell me how to colour the line graph, i want a colorfull graph, so… Please Do help in your busy schedule.. thank you so much…

Similar questions like this gives a lot of error, so I would like an answer for my specific requirements.

One Answer

There are a lot of resources to help you create what you are looking to do - and even quite a few questions already answered here. However, I understand it's tough starting out, so here's a quick example to get you started.

Sample Data:

df <- data.frame(
  dates=c('2020-01-01','2020-02-01','2020-03-03','2020-03-14','2020-04-01'),
  var1=c(13,15,18,29,40),
  var2=c(5,8,11,13,18)
)

If you are plotting by date on your x axis, you need to ensure that df$dates is formatted as a "Date" class (or one of the other date-like classes). You can do that via:

df$dates <- as.Date(df$dates, format='%Y-%m-%d')

The format= argument of as.Date() should follow the conventions indicated in strptime(). Just type ?striptime in your console and you can see in the help for that function how the various terms are defined for format=.

The next step is very important, which is to recognize that the data is in "wide" format, not "long" format. You will always want your data in what is known as Tidy Data format - convenient for any analysis, but necessary for ggplot2 and the related packages. In your data, the measure itself is numbers of cases and deaths. The measure itself is number of people. The type of the measure is either cases or deaths. So "number of people" is spread over two columns and the information on "type of measure" is stuck as a name for each column when it should be a variable in the dataset. Your goal should be to gather() those two columns together and create two new columns: (1) one to indicate if the number is "cases" or "deaths", and (2) the number itself. In the example I've shown you can do this via:

library(dplyr)
library(tidyr)
library(ggplot2)

df <- df %>% gather(key='var_name', value='number', -dates)

The result is that the data frame has columns for:

  • dates: unchanged
  • var_name: contains either var1 or var2 as a character class
  • number: the actual number

Finally, for the plot, the code is quite simple. You apply dates to the x aesthetic, number to y, and use var_name to differentiate color for the line geom:

ggplot(df, aes(x=dates, y=number)) +
  geom_line(aes(color=var_name))

enter image description here

Answered by chemdork123 on December 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP