Created: 2014-05-18 23:04
Updated: 2014-05-19 01:32


Data Mining - Sarunkorn Chotvijit 1364721 -

The codes are divided into parts: 1.Installation 2.Data Pre-processing 3.Topic Models and Visualization 4.Classification 5.Clustering

#Rueters dataset installation (Line 5-15)

  • Run line-by-line from line 5 to 15

#EX1: Data preprocessing (Line 20-22)

  • Run the funciton tdm which using function TermDocumentMatrix both line 21 and 22 at the same time (take some times)

#EX2: Feature representation of douments and news articles (Line 27-57)

  • Install and require the packages for generating wordcloud function from line 28 to 31
  • Construct wordcloud for better visualization by running the code line-by-line from line 34 to 38 (take some times)
  • Install and require the packages for topic models from line 41 to 47
  • Construct topic models with features of LDA and CTM byrunning the code line-by-line from line 50 to 57 (take some times)

#EX3: Classification (Line 62-142)

  • Install and require the packages for classification process from line 62 to 67
  • Drag all the lines from line 69 to 142 and run at once (take some times)

#EX4: Clustering (Line 147-169)

  • Install and require the packages for clustering process from line 147 to 150
  • Run line 153 to 154 to remove the sparse term and use the document term matrix instead of term document matrix
  • Run the k-means at line 157 (take some times)
  • Run for the HAC line-by-line from line 160 to 166 (take some times)
  • Run function DBSCAN at line 169 (take some times)
Cookies help us deliver our services. By using our services, you agree to our use of cookies Learn more