3 min read

My Master's Thesis

My Master's Thesis

I achieved my Master's Degree in Medical Informatics from the Technical University of Deggendorf (European Campus), Germany. 🥳 In this post, I would like to share with you the abstract and the video demo of the prototype tool.

My thesis was internally supervised by Prof. Dr. Georgi Chaltikyan (Program Director, Master of digital health) and Prof. Dr. Sasha Kreiskott (Dean of Studies). Also, externally supervised by a renowned cardiopulmonary diagnostic solution provider in Europe, custo med, GmbH. They provided me with their 'custo diagnostic' software, datasets, and a monthly stipend to focus on my work. I must mention Michael von Rhein (Head of software development, MedTec & Science, GmbH) and Dr. Peter Rumm (custo med, GmbH) for their heartiest cooperation, regular monitoring, and evaluation of the work.

A very special thanks to Prof. Dr. Agnes Nocon who encouraged me to learn medical statistics to solve real-life statistical problems with R programming language.

Analysis of the ‘custo diagnostic’ software towards building a prototype analytic tool for 'resting ECG' data.


Introduction and objectives: Electrocardiogram (ECG) is the first line non-invasive medical test to detect heart conditions by measuring its electrical activity. Now-a-days, these activities are stored electronically and automated reports are given to the user based on proprietary algorithms. Usually these services are offered by many vendors using ECG devices along with the software. In the long run, a large number of data will be stored on customer’s database which might contain valuable information that could be extracted by statistical analysis and used for better patient outcome. In this study, an ECG software was technically analyzed and a prototype analytical tool was developed where user can perform basic analysis and predict Right Axis Deviation (RAD) based on the ‘resting ECG’ parameters.

Methodology: A desktop software ‘custo diagnostic’ was used for this study. Data structure of this software was explored by ‘Squirrel SQL’, analysis and prototype was written in R- language with the help of community provided packages. Two data sets – one is demo and another is from real study, with a total of 19691 ECGs were used as a data source where each ECG has 385 ‘resting ECG’ parameters. Logistic regression, a type of classification techniques in machine learning algorithm was used to build RAD model and predict the condition. 70% of the dataset was used to train the model and 30% was used for prediction.

Result: A total of 3483 patient’s ‘resting ECG’ data was found in the dataset where male and female frequency was 2446 (70.23%) and 1029 (29.54%) respectively where patients from all age group was present. The mean age of the population from where the data obtained was 45 (± 17.4) years. Male patient was slightly older than female and their mean age was 45.6 (± 17.3) and 43.9 (± 17.3) years respectively. The mean heart rate was 61 bpm with a standard deviation of 12. Female heart rate was slightly higher that male, 63 (± 11.6) and 60 (± 12.2) bpm respectively. The mean QRS axis was 61.1 (± 39.4) degrees. QRS axis of female was slightly higher than male and it was 64.3 (± 33) and 59.8 (± 41.7) degrees respectively. A total of 561 strongly positive and 154 strongly negative correlation was found among the ECG parameters. 149 evaluations were found Right Axis Deviation (RAD) positive where male and female were 136 (91.3%) and 13 (8.7%) respectively. P wave duration in lead V3 and AVF, T amplitude in V1 and V4, ST in V1, S in V1 and V6 were found very strongly significant (99.9%) in predicting RAD along with male sex, heart rate, R amplitude in lead II and AVL, T in V3 and ST in lead I (99%). The model which was used to predict RAD from 5792 evaluations, was found to have an accuracy of 99.34% with 53.3% sensitivity and 99.7% specificity. The classification error of the model was 0.66%.

Recommendations: Storing more clinical information (risk factors, pre-existing conditions etc.) about the patient, structuring the reports or findings with any classification database (e.g. SNOMED CT), fixing data export issue and avoid file based database of the software might improve the result of the study and will open further scope of analysis.

Conclusion: Years of gathering ECG data, collected for one purpose, can be used for extracting lifesaving information by the use of statistical analysis and advanced machine learning technique. Building such analytical tool will help to make clinical decision rapidly and effectively.

Prototype video demonstration

To view the details of the prototype tool, click here.