Course: INF-604: Data Analysis Lecturer: Sothea HAS, PhD
Objective: In this lab, you will explore the columns of a dataset according to their data types. Your task is to employ various techniques, including statistical values and graphical representations, to understand the dataset before conducting deeper analysis.
This dataset contains food delivery times based on various influencing factors such as distance, weather, traffic conditions, and time of day. It offers a practical and engaging challenge for machine learning practitioners, especially those interested in logistics and operations research. Read and load the data from kaggle: Food Delivery Dataset.
A. What’s the dimension of the data? Which variables are considered quantitative and which are qualitative?
Answer:
Are there any rows with missing values?
Are there any duplicated data?
Handling missing values is more complicated than you may expect. Here, we can simply drop those rows.
B. Qualitative variables:
Create statistical summary of qualitative columns.
Create graphical representation of these qualitative columns to understand them better.
Explain each column based on the stastical values and graphs.
C. Quantitative variables:
Create statistical summary of quantiative columns.
Create graphical representation of these quantitative columns to understand them better.
Explain each column based on the stastical values and graphs.
Are there any columns with outliers?
2. Cardiovascular Disease dataset
This dataset consists of 70 000 records of patients data, 11 features and a column of the presence or absence of cardiovascular disease. The data can be downloaded from kaggle using the following link: https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset.