Data analysis

Data analysis#

Just to repeat briefly, the aim of the project is to use machine learning in order to predict whether a particpant can be classified either as control or patient with psychotic disorder based on different brain modalities. Before starting with our first modality being cortical thickness (CT), the learning problem and the task type should be defined to know which model suits the best for the purpose of the project aim.

Considering the task type, our purpose is to classify the samples in two categories being control and patient. Hence, the task type is classification. For that, I want to use the given information regarding the labels for each sample, consequently the learning problem is supervised.

More specifically, with the data at hand the task type is binary classification. Binary classification refers to those classification task that have two class labels (in this data set control/patient). Commonly used algorithms for binary classification include Logistic Regression, k-Nearest Neighbors, Decision Trees, Support Vector Machine and Naive Bayes (for further information click here). In this project, I will focus on two algorithms being Logistic Regression and Support Vector Machine.