Third Mini Lectures This Week: Introducing Data Visualization and Applied Statistics in R

Do you know that the use of maps in graphs or spatial analytics was pioneered by John Snow? (No, not the one from Game of Thrones!) In fact, he changed the way we see disease today. Doctor John Snow demonstrated spatial clustering of cholera outbreak in London during the 19th century, which provided strong evidence in support of his theory that cholera was a water-borne disease. This is how data visualization has evolved and with an increased volume of data, it is nearly impossible to show/tell stories without visualization.

Data Visualization and Applied Statistics in R were the two important topics covered during our 3rd mini lecture last week with Master’s degree students of University of Indonesia. Our Senior Data Scientist, Hamid Dimyati, facilitated the first session by sharing how to find story in your data by searching for patterns or interesting insights. For example: trends, correlations, and outliers. The key take out from this session is the importance of knowing your data and its relationship within the variables. He also shared the different types of visualization students can use such as bubble chart, heat map, scatter plots and others. At the end of the session, students were given an individual exercise to create various charts using a package in R called ggplot2.

Natasya Denaya, our Senior Data Scientist, followed thru with the next session on Applied Statistics in R. She started by sharing on how to perform exploratory data analysis as a first step to analyzing data. Exploratory Data Analysis or commonly known as EDA is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.

The sole purpose of EDA is to maximize insights, uncover any underlying structures, extract important variables, as well as detect outliers and anomalies. Students also learned on making statistical hypothesis. Students were given an exercise using datacamp course materials to get a practical experience in performing EDA and statistical hypothesis in R. At the end of the session, there were some additional quizzes to test the students’ understanding about the topic.

It was quite a productive and well-spent Saturday! See you all next week 🙂