Chemical Data Analysis in Python

Chemical Data Analysis in Python – supervised and unsupervised methods in computational chemistry

The 2-day “Chemistry Data Analysis in Python” course is aimed at participants who want to gain practical skills in using Python for chemical data analysis, unsupervised and supervised learning methods, and molecular modeling tasks.

Upon completion of this course, participants will be equipped with the skills to effectively analyze chemical data, create machine learning models, and be familiar with the basics of using programming tools to work on computational chemistry tasks.

The course is in English (or Polish – if the entire group of participants is Polish).

*Price includes: participation in classes, training materials, certificate of participation.


Day 1: Data Processing and Unsupervised Methods 


Morning Session: 

Data Processing and Preparation 

1. Course Introduction and Official Commencement 

2. Lecture – Data Collection and Processing 

  • The importance of data quality in the analysis of chemical data. 
  • An overview of data sources and data collection methods. 
  • Data processing techniques for handling missing values, duplicates, and outliers. 

    3. Code-Along Session with the Instructor (Python) 
  • Techniques for data processing and preparation in Python. 
  • Utilizing specialized libraries for data science. 


Afternoon Session: 

Introduction to Chemical Data Analysis and Unsupervised Machine Learning 

1. Lecture – Introduction to Chemical Data Analysis 

  • The specifics of chemical data analysis and its types
          (chemical structure descriptors, properties).
  • Popular chemical databases. 
  • Unsupervised machine learning methods in chemistry with examples. 

2. Code-Along Session with the Instructor (Python) 

  • Using Python and public database APIs to automate the generation of chemical datasets. 
  • Utilizing the generated datasets in multidimensional data analysis, examples of various unsupervised machine learning algorithms (clustering, principal component analysis, clustering). 

Day 2: Supervised Methods and QSAR/QSPR Methodology


Morning Session:

Introduction to Supervised Machine Learning and QSAR/QSPR Methodology

  1. Lecture – Supervised Machine Learning – Regression
    • What is supervised learning?
    • How to validate predictive regression models? The concept of overfitting and methods to minimize it.
    • Feature selection in machine learning.
    • Fundamentals of QSAR and QSPR.
    • Regression methods.

  2. Code-Along Session with the Instructor (Python + Scikit-learn)
    • Generating data (molecular structure descriptors) using Python.
    • Feature selection methods.
    • Creating a QSAR/QSPR model (regression).
    • Model interpretation.

Afternoon Session:

Classification Methods

  1. Lecture – Supervised Machine Learning – Classifiers
  • Introduction to classification methods.
  • Classification algorithms (decision trees, random forests, SVM).
  • How to validate predictive classification models?

  1. Code-Along Session with the Instructor (Python + Scikit-learn)
    • Preparing data for classifier training.
    • Hyperparameter optimization.
    • Creating a QSAR/QSPR model (classification).
    • Evaluation of classifier metrics.
    • Visualization and interpretation of the model.
Interested? Register now! Fill out the register form.