Chemical Data Analysis in Python

Chemical Data Analysis in Python – supervised and unsupervised methods in computational chemistry

The 2-day “Chemistry Data Analysis in Python” course is aimed at participants who want to gain practical skills in using Python for chemical data analysis, unsupervised and supervised learning methods, and molecular modeling tasks.

Upon completion of this course, participants will be equipped with the skills to effectively analyze chemical data, create machine learning models, and be familiar with the basics of using programming tools to work on computational chemistry tasks.

The course is in English (or Polish – if the entire group of participants is Polish).

*Price includes: participation in classes, training materials, certificate of participation, coffee catering.


Day 1.

Data curation and unsupervised methods

Morning Session:

Data Curation and Preprocessing

  1. Course Introduction
  2. Data Collection and Curation
    1. Importance of quality data in chemical data analysis.
    2. Introduction to data sources and data collection methods.
    3. Data curation techniques to handle missing values, duplicates, and outliers.
  3. Data Preprocessing in Python
    1. Data preprocessing techniques using Python libraries (NumPy, Pandas).
    2. Data normalization, scaling, and feature selection.

Practical Exercise:

Participants will work on a provided chemical dataset and apply data curation and preprocessing techniques using Python.

Afternoon Session: 

Introduction to Chemical Data Analysis and Unsupervised Learning

  1. Fundamentals of Chemical Data Analysis (practical exercises in Python)
  • Exploration of types of chemical data (molecular structures, properties, descriptors, etc.).
  • Common challenges and preprocessing steps.
  1. Unsupervised Learning Techniques (practical exercises in Python)
  • Unsupervised learning and its applications in computational chemistry.
  • Presentation of various unsupervised algorithms (Clustering, Principal Component Analysis etc.).

Day 2.

Supervised Methods and QSAR/QSPR Methodology

Morning Session:

Introduction to Supervised Learning and QSAR/QSPR

  1. Introduction to Supervised Learning
  • Common supervised algorithms (Linear Regression, Decision Trees, Random Forests, etc.).
  • Concept of overfitting and methods to mitigate it.
  1. QSAR/QSPR Methodology
  • Introduction Quantitative Structure-Activity Relationship (QSAR) and Quantitative Structure-Property Relationship (QSPR) concepts

Afternoon Session:

Supervised Learning and Model Evaluation 

  1. Building Supervised Learning Models in Python (practical exercises in Python)
  • Implementation of supervised learning models using Python libraries (Scikit-learn).
  • Model training, validation, and hyperparameter tuning.
  1. Model Evaluation and Interpretation
  • Evaluation metrics for regression and classification tasks in chemical data analysis.
  • Interpretation and analysis modelling results.

Practical Exercise:

Participants will work on a provided chemical dataset, build supervised learning models, and evaluate their performance using Python.

The practical exercises will be conducted using customized Jupyter/Colab notebooks and interactive Python environments to enhance participants’ learning experience.

Interested? Register now! Fill out the register form.