Lab1: Introduction to Data Mining

Course: CSCI-866-001: Data Mining & Knowledge Discovery
Lecturer: Sothea HAS, PhD


Objective: This lab aims at reproducing what we have seen in the class about overall pattern/connection existed in the dataset (Amazon). You will also investigate some interesting aspects of this dataset.


1. Amazon dataset

Let’s begin by importing the dataset into our environment.

# To do
UserId ProductId Rating Timestamp
0 A39HTATAQ9V7YF 0205616461 5.0 1369699200
1 A3JM6GV9MNOF9X 0558925278 3.0 1355443200
2 A1Z513UWSAAO0F 0558925278 5.0 1404691200
3 A1WMRR494NWEWV 0733001998 4.0 1382572800
4 A3IAAVS479H7M7 0737104473 1.0 1274227200

A. Reproduce the result and graph illustrated from slide 26 of the course.

B. Product Analysis

B.1. Visualize the rating distribution for the most popular product.

# To do

B.2. Repeat the previous question for the 2nd and 3rd most popular products.

# To do

C. Time Evolution

C.1. Visualize the rating distribution over time.

# To do

C.2. Does it seem that the recent products are higher rated compared to the older ones?

References