Need

help with the following assignment ITS836 Assignment 1: Data

Analysis in R1) Read the income dataset, “zipIncomeAssignment.csv”,

into R. (You can find the csv file in iLearn under the C

Need

help with the following assignment

ITS836

Assignment 1: Data Analysis in R

1) Read the

income dataset, “zipIncomeAssignment.csv”, into R. (You can find

the csv file in iLearn under the Content -> Week 2 folder.)

2)

Change the column names of

your data frame so

that zcta becomes zipCode and meanhouseholdincome becomes income.

3)

Analyze the summary of

your data. What are the mean and median average incomes?

4) Plot a

scatter plot of the data. Although this graph is not too

informative, do you see any outlier values? If so, what are

they?

5)

In order to omit outliers, create a subset of

the data so that:

$7,000

< income=””>< $200,000=”” (or=””

in=”” r=”” syntax=”” ,=””

income=””> 7000 & income <>

6)

What’s your new mean?

7)

Create a simple box

plot of

your data. Be sure to add a title and label the axes.

HINT:

Take a look

at: https://www.tutorialspoint.com/r/r_boxplots.htm (specifically,

Creating the Boxplot.) Instead of “mpg ~ cyl”, you want to use

“income ~ zipCode”.

In

the box plot you created, notice that all of the income data is

pushed towards the bottom of the graph because most average incomes

tend to be low. Create a new box plot where the y-axis uses a

log scale. Be sure to add a title and label the axes. For the

next 2 questions, use the ggplot library

in R, which enables you to create graphs with several different types

of plots layered over each other.

8)

Make a ggplot that

consists of just a scatter plot using the function geom_point() with

position = “jitter” so

that the data points are grouped by zip code. Be sure to

use ggplot’s

function for taking the log10 of the y-axis data. (Hint:

for geom_point,

have alpha=0.2).

9)

Create a new ggplot by

adding a box plot layer to your previous graph. To do this, add

the ggplot function geom_boxplot().

Also, add color to the scatter plot so that data points between

different zip codes are different colors. Be sure to label the

axes and add a title to the graph. (Hint:

for geom_boxplot,

have alpha=0.1

and outlier.size=0).

10)

What can you conclude from this data analysis/visualizati

Let’s block ads! (Why?)

Send us your paper details now

We’ll find the best professional writer for you!

This template supports the sidebar's widgets. Add one or use Full Width layout.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept Read More

Privacy & Cookies Policy

error: Content is protected !!