GettingCleaningData-Project

Created: 2014-05-19 15:08
Updated: 2014-05-26 21:50
r

README.md

Description of run_analysis.R

The R script contained in this repository is used to process data obtained from this source about human activity recognition using smartphones.

In order to run run_analysis.R, download the UCI Har Dataset folder and the script and place them in the same directory.

Step 1: Read the Data into R.

The following code will read all of the data sets into R to prepare for processing.

xtest <- read.table("UCI Har Dataset/test/X_test.txt")
xtrain <- read.table("UCI Har Dataset/train/X_train.txt")
ytest <- read.table("UCI Har Dataset/test/Y_test.txt")
ytrain <- read.table("UCI Har Dataset/train/Y_train.txt")
subjectTest <- read.table("UCI Har Dataset/test/subject_test.txt")
subjectTrain <- read.table("UCI Har Dataset/train/subject_train.txt")

Step 2: Merging the data sets.

The first step I took is to combine the activities dataset. ytrain and ytest each have a single varible and many observations of it.

activities <- rbind(ytrain, ytest)

In order to create a data set that contains descriptive names for the activities, this for loop will replace each number with the appropriate activity name.

for (i in 1:nrow(activities)) {
    if (activities[i, 1] == 1) {
        activities[i, 1] <- c("Walking")
    } else if (activities[i, 1] == 2) {
        activities[i, 1] <- c("Walking_Upstairs")
    } else if (activities[i, 1] == 3) {
        activities[i, 1] <- c("Walking_Downstairs")
    } else if (activities[i, 1] == 4) {
        activities[i, 1] <- c("Sitting")
    } else if (activities[i, 1] == 5) {
        activities[i, 1] <- c("Standing")
    } else {
        activities[i, 1] <- c("Laying")
    }
}

Now I will merge the rest of the data sets together and combine them with the activities data set.

xy_test <- cbind(xtest, subjectTest)
xy_train <- cbind(xtrain, subjectTrain)
combined <- rbind(xy_train, xy_test)
data <- cbind(combined, activities)

Next we will add names to the columns from the features.txt file in the data

features <- read.table("UCI Har Dataset/features.txt")
features <- features[, 2]
features_vector <- as.vector(features)
features_vector <- append(features_vector, c("subject", "activity"))

Now apply the names to the data frame.

names(data) <- features_vector

Step 3: Subsetting the data to only keep columns referncing mean and standard deivation

Using grep, I will capture what columns to keep through partial matchng of column names.

mean_names <- grep("mean()", colnames(data))
std_names <- grep("std()", colnames(data))
keep_cols <- c(mean_names, std_names, 562, 563)

Then we can subset the data

data_subset <- data[keep_cols]

Step 4: Aggregating the data by subject and actviity

agg_subset <- aggregate(data_subset, list(data_subset$subject, data_subset$activity), 
    FUN = mean)
tidy_data <- agg_subset[1:81]

The following code will clean up the clunky variable names by removing hyphens and parathensis.

variable.names <- c("subject", "activity", colnames(data_subset[1:79]))
variable.names <- gsub("-", ".", variable.names)
variable.names <- gsub("\\(\\)", "", variable.names)
variable.names <- tolower(variable.names)
names(tidy_data) <- variable.names

Finally, we write the data to a file in the working directory called tidy_data.txt

write.table(tidy_data, file = "tidy_data.txt")
Cookies help us deliver our services. By using our services, you agree to our use of cookies Learn more