
Rohit Sahoo
Seeking Full-Time AI/ML Roles
Ex-Data Scientist @ World Bank, TCS
3+ YOE in ML, LLMs, MLOps, NLP | CS Graduate @ NEU | Building Scalable AI Solutions
About Me
I am a driven and skilled Data Science professional currently pursuing a Master's degree in Computer Science, specializing in Artificial Intelligence and Data Science at Northeastern University.
With a proven track record as a capable Data Scientist, I possess more than 2.5 years of hands-on work experience in developing and deploying advanced machine learning and deep learning models, excelling in time series forecasting, image segmentation, and NLP. Proficient in Python, R, and Java, I am adept at utilizing key data science libraries including TensorFlow, Keras, PyTorch, NumPy, and Pandas, holding the distinction of a Certified TensorFlow Developer. As a published author and researcher, I bring academic excellence and practical contributions to the table, showcasing a well-rounded and innovative approach to Data Science.
Please feel free to reach me at sahoo.ro@northeastern.edu
Research Interests: Machine Learning, Deep Learning, Time Series Forecasting, Natural Language Processing, Generative AI.
Publications
B. Sandwidi, A. Curmally, R. Sahoo “Introducing an AI system for automated evaluation of ESIA Reports,” 44th Annual Conference of the International Association for Impact Assessment, 2025 (Associated with International Finance Corportation, The World Bank Group) [Accepted]
Research Report: Using machine learning to improve sleep habits in Dementia patients, Data Study Group team, The Alan Turing Institute, London, UK, “Data Study Group Final Report: UK Dementia Research Institute”. Zenodo, Jul. 05, 2022. doi: 10.5281/zenodo.6798769. [DOI]
Research Report: Rapid identification of plankton using machine learning, Alan Turing Institute, Data Study Group team, London, UK, “Data Study Group Final Report: Centre for Environment, Fisheries and Aquaculture Science”. Zenodo, Jul. 05, 2022. doi: 10.5281/zenodo.6799166.[DOI]
R. Sahoo, V. Naik, S. Singh, S. Malik, “GANs and VAEs as methods of synthetic data generation and augmentation to enhance heart disease prediction,” International Journal of Engineering and Advanced Technology, vol.11, December 2021. [DOI]
R. Sahoo, M. Kubal, C. Kathale, S. Malik; “Auto-Table-Extract: A System to Identify and Extract Tables from Pdf to Excel,” International Journal of Scientific & Technology Research, vol. 9, May 2020.[DOI]
V. Naik, R. Sahoo, S. Mahajan, S. Singh;“Exploration-Exploitation problem in Policy-Based Deep Reinforcement Learning for episodic and continuous environments,” International Journal of Engineering and Advanced Technology, vol.11, December 2021. [DOI]
S. Malik, A. Tyagi, R. Sahoo; “Machine Learning algorithms for Big Data Analytics including Deep Learning,” in Machine Learning Based Blockchain Technologies for IoTs and Big Data, Fundamentals, methods and applications, Ed. London, UK, The Institution of Engineering and Technology, July 2021. [DOI]
S. Kute, A. K. Tyagi, R. Sahoo and S. Malik, "Building a Smart Healthcare System Using Internet of Things and Machine Learning," in Big Data Management in Sensing: Applications in AI and IoT, River Publishers, 2021, pp. 159-178.[DOI]
S. Malik, R. Sahoo, D. Jain, V. Arora; “Image Processing,” in Computational Science and its Applications, Ed. Florida, USA, Apple Academic Press, January 2024. [DOI]
S. Mahajan, V. Naik, G. Bangar R. Sahoo; “An Overview of Causal Inference and its Applications in Health-care and Finance using methods such as Bayesian Networks and Granger’s Causality,” International Journal of Emerging Technologies and Innovative Research, vol. 8, October 2021. [DOI]
R. Sahoo, V. Naik; “An Intuitive Sky-High View of Recommendation Systems,” International Research Journal of Engineering and Technology, vol. 7, Issue 2, February 2020. [DOI]
Skills
Portfolio

Natural Language Processing / Transformers / Hugging Face
ArguSense: Elevating Argument Evaluation using NLP
Implemented a state-of-the-art NLP model using Longformers to accurately identify writing structures like thesis statements, evidence, and claims in lengthy argumentative essays, while employing BERT for classifying argumentative elements. Validated the model, yielding a 0.633 F1-Score for structure identification and 0.65 Log Loss for argument classification.
GitHubTime Series Forecasting / Machine Learning / FbProphet
Improving Sleeping Habits in Dementia Patients
Developed a multivariate time series model to predict the sleep patterns of patients with dementia by forecasting the number of wakeups and sleep duration for the UK Dementia Research Institute. Achieved an accuracy of RMSE 39.69 for forecasting sleep duration and an RMSE 0.62 for forecasting the number of wakeups.
Research Report - DOI

Python / Deep Learning / Machine Learning / Flask
Auto-Table-Extract: Tabular Data Extraction from PDF documents
Developed a Machine Learning-based software that is capable of identifying tables from PDF documents and extracting the tabular information into an Excel sheet. Evaluated the model to determine accuracy using the F1 score, which is 0.89 for extracting information from bordered tables and 0.85 for borderless or partially bordered tables.
Research Paper - DOI GitHubMore Projects at github.com\rohit-sahoo
Work Experience
Tata Consultancy Services Limited
August 2020 - December 2022Data Scientist
- Developed and deployed a multivariate Time Series Forecasting model, achieving a 94% accuracy in sales forecasting which resulted in cost savings of $300k and improved decision-making processes by eliminating the 3-week waiting period and external vendor dependency which provided delayed sales counts.
- Improved accuracy of Machine Learning algorithms by tuning the hyperparameters and handling the imbalanced data.
- Optimized performance of the applications by 65%, reducing time and overhead cost of cloud services, and developed dashboards to visualize the crucial areas of the client in the supply chain.
- Implemented and presented a Proof of Concept for a critical issue faced by the client.
Research Assistant
(Skills: Machine Learning, Deep Learning, Image Segmentation, Research, Mentoring, Communication Skills)
- Accomplished research in Brain MRI image segmentation, detecting tumors by achieving an IOU (Intersection over Union) of 93.2%, utilizing a dataset of Brain MRI images alongside manual FLAIR abnormality segmentation masks.
- Co-authored 3 chapters and 3 research papers on Smart Healthcare, Machine Learning Algorithms, GANs, and VAEs.
- Mentored 10 freshmen and sophomore juniors, fostering their AI ML research skills and helping with their ideas.

University of Mumbai
August 2019 - April 2020Bombay Stock Exchange Technologies
(MarketPlace Technologies Pvt. Ltd.)
June 2018 - July 2018
Summer Intern - Data Engineer
- Implemented PySpark-based ETL pipeline on Cloudera Hadoop, reducing data processing time to perform transformations on data by 30%.
- Developed reusable Python scripts for data preprocessing.
- Collaborated with cross-functional teams, enhancing data quality and sharing insights.
Education

Northeastern University
Khoury College of Computer Sciences
January 2023 - May 2025Master of Science in Computer Science
Coursework: Programming Design Paradigm, Database Management Systems, Algorithms, Deep Learning, Natural Language Processing
Bachelor of Engineering in Computer Engineering
- Co-authored 3 chapters and 3 research papers on Smart Healthcare, Machine Learning Algorithms, GANs, and VAEs.
- Mentored 10 freshmen and sophomore juniors as part of Research Assistant under Dr. Shaveta Malik, fostering their AI ML research skills and helping with their ideas.

University of Mumbai
Terna Engineering College
August 2016 - October 2020Certifications
TensorFlow Developer Certificate by TensorFLow, Google (Exam Score: 25/25 test cases); May, 2021 – May, 2024.
Math for Machine Learning Specialization by Coursera and Imperial College London. (Grade: 97.75%); September, 2021.
Machine Learning Specialization by Coursera and University of Washington. (Grade: 97.21%); May, 2021.
Achievements
Received "on-the-spot" Awards for on-time project deployments and "Contextual Masters" award for utilizing the AI-ML knowledge to solve clients critical problem, Tata Consultancy Services, October 2022.
Received an award “The Super-Additives” for championing collaboration and knowledge exchange, for working respectfully and productively as a team member by The Alan Turing Institute, 2021.
3x Expert on the Kaggle platform and amongst the top 5% of Data Scientists worldwide, 2020.
Ranked in the top 10% of the world’s largest coding competition, “CodeVita,” which was conducted by Tata Consultancy Services Limited (TCS), 2019.
Received a scholarship, “Secure and Private AI Scholarship” by Udacity and Facebook AI, 2019.
Received a scholarship, “Udacity Technology Scholarship” by Udacity and Bertelsmann, 2019.