Created: 2014-05-18 17:36
Updated: 2014-05-18 19:24


Repository for assignment of getting and cleaning data

Instructions on how to run the scripts

  1. Download file in your working directory Samsung dataset
  2. Unzip the Samsung dataset in your working directory.
  3. Download the run_analysis.R file in your working directory.
  4. Execute in R terminal > source("run_analysis.R")
  5. Output file will be created as tidy_data.txt

Important Note before running the script

This script is meant to clean and tidy this particular dataset.
INPUT DATA SET The contents of the zip file should be extracted to working directory.
OUTPUT FILE: tidy_data.txt
Format of the output file is described in

Script will quit mid-way with no results if following files do not exist relative to the working directory.

  1. ./UCI HAR Dataset/features.txt
  2. ./UCI HAR Dataset/activity_labels.txt
  3. ./UCI HAR Dataset/test/subject_test.txt
  4. ./UCI HAR Dataset/test/X_test.txt
  5. ./UCI HAR Dataset/test/y_test.txt
  6. ./UCI HAR Dataset/train/subject_train.txt
  7. ./UCI HAR Dataset/train/X_train.txt
  8. ./UCI HAR Dataset/train/y_train.txt

run_analysis.R functions

This script contains the following four functions.


This is the main driver function which returns a tidy data set. This does not output a tidy dataset to a file. This function calls all other read functions to read corresponding files.


get_features function returns a tidy feature names list from features.txt file This function returns columns names which contain mean and std as substring This function also cleans out "(", ")" and "-" characters from the column names


get_activity_labels function reads and returns the activity labels read from activity_labels.txt file


get_data function will perform the following tasks

  • First read subject, X, y files in test and train directory
  • Second, it will keep only those column indexes specified by keepfeatures
  • Third, name them with proper feature names from keep features
  • Fourth, label activities with proper descriptive names
Cookies help us deliver our services. By using our services, you agree to our use of cookies Learn more