Created: 2014-05-19 10:23
Updated: 2014-05-25 14:35


The run_analysis.R file contains code to tidy up the Human Activity Recognition raw data files.

At a high level, the run_analysis function will:

  • Read the train and test datasets and combine the rows together.
  • Read column headings from functions.txt.
  • Remove parentheses as they are invalid characters as R column names.
  • Filter the columns so only means and standard deviation values are included.
    • Mean measurements are anything with the word mean.
  • Activity data is read from data files and the rows combined.
  • The activity id/name mappings are read from activity_label.txt
  • The activity id/name mappings are used to replace the activity id's in the dataset with names.
    • The activity names in the data file are human readable, so no more translation is required.
  • Subject data is read from data files, the rows combined, and added to the dataset as a new column.
  • The dataset is "melted" using reshape2 package.
    • Identity columns are subject and activity
    • "measurement" column: contains column name
    • "value" column: contains the values
  • The dataset is "cast" using reshape2 package.
    • Used aggregate function mean to calculate the mean for each subject/activity combination.

A file called "tidy.txt" is produced. This file can be easily read into R using read.table().

To run the code

Pre-install the reshape2 package.

Place all data files under the "UCI HAR Dataset" subdirectory. On a Mac, unzipping the source zip file will automatically create the subdirectory for you.

  1. This helps separate code from the raw data, so it's easier to focus on the code
  2. This helps protect the raw data from accidental modifications
Cookies help us deliver our services. By using our services, you agree to our use of cookies Learn more