The run_analysis.R file contains code to tidy up the Human Activity Recognition raw data files.
At a high level, the run_analysis function will:
- Read the train and test datasets and combine the rows together.
- Read column headings from functions.txt.
- Remove parentheses as they are invalid characters as R column names.
- Filter the columns so only means and standard deviation values are included.
- Mean measurements are anything with the word mean.
- Activity data is read from data files and the rows combined.
- The activity id/name mappings are read from activity_label.txt
- The activity id/name mappings are used to replace the activity id's in the dataset with names.
- The activity names in the data file are human readable, so no more translation is required.
- Subject data is read from data files, the rows combined, and added to the dataset as a new column.
- The dataset is "melted" using reshape2 package.
- Identity columns are subject and activity
- "measurement" column: contains column name
- "value" column: contains the values
- The dataset is "cast" using reshape2 package.
- Used aggregate function mean to calculate the mean for each subject/activity combination.
A file called "tidy.txt" is produced. This file can be easily read into R using read.table().
To run the code
Pre-install the reshape2 package.
Place all data files under the "UCI HAR Dataset" subdirectory. On a Mac, unzipping the source zip file will automatically create the subdirectory for you.
- This helps separate code from the raw data, so it's easier to focus on the code
- This helps protect the raw data from accidental modifications