Senior Staff AI Scientist

About Me

Hi, my name is Sima and I’m a Senior Staff AI Scientist at Samsung SDS AI Science Lab in San Jose CA. Prior to Samsung, I worked as a Data Scientist, and Probabilistic Engineer for Applied Materials and GE Global Research. My background is in Bayesian Machine Learning, Deep Learning, and Generative AI. I received my PhD from Georgia Institute of Technology on Topology Optimization for Computer-Aided Engineering (CAE) of microstructures under the supervision of Professor Tequila Harris.

Experience

Samsung SDS,

Senior Staff AI scientist

Feb 2019 - Present

www.samsungsds.com

AI Science Lab, San Jose, CA, USA

Responsibilities : Establishing technical goals, Leading task force & collaborations across Samsung’s or external entities for Technology Development & Technology Transfer, Developing products, solutions & services to secure Samsung’s competitive advantages.
Projects:
- Generative AI:
  - Development of tailored multi-modal LLMs for data in text, image, and tabular formats
  - Uncertainty quantification and model calibration for text-image multi-modal and large language models
- Data efficient Deep Learning and AI:
  - Development of Representation & Unsupervised Learning algorithms for reducing the labeling need for Deep Learning models
  - Development of data efficient alignment methods for LLMs
- Computer Vision:
  - Active Learning and Unsupervised accuracy estimation for classification, detection & semantic segmentation
  - Probabilistic Multi-View 3D semantic segmentation Deep Learning Active Learning framework
- Predictive data analytics:
  - Developed sale forecasting models for Samsung business units
  - Optimized marketing strategies for sale campaigns

Applied Materials

Senior Data Scientist

Nov 2017 - Feb 2019

www.appliedmaterials.com

Data Science Group, Santa Clara, CA, USA

Development & deployment of AI solutions to increase the performance/ productivity of engineering, services & supply chain teams
Leading AI app productization in collaboration with Software, UI, DevOp teams & internal or external customers

GE Global Research

Research Engineer

Aug 2014 - Nov 2017

www.geaerospace.com

Probabilistic Lab, San Ramon, CA, USA

Development of Probabilistic Methods & Machine Learning models for calibration & validation, uncertainty quantification, optimization and meta-modeling/surrogates modeling
Development of Deep Learning & Machine Learning models to optimize engineering design & manufacturing process
Development of Computer Vision defect detection software

Education

Georgia Institute of Technology

Ph.D. & M.S., Mechanical Engineering

2009 - 2014

Research topics: CAE, Topological Characterization & Optmization, Image Proessing

Developed periodic surface parametric models to represent the topology of microstructures
Developed voxel-based representations for 3D composite microstructures
Conducted topological characterization of 3D models using graph representations
Conducted topology optimization for voxel representation of 3D volumes using GA

University of Tehran

B.S., Mechanical Engineering

2001 - 2005

Selected Publications

My full publication list can be found at Scholar profile. Here is the list of my recent publictions:
- Improving instruction following in language models through proxy-based uncertainty estimation, arXiv, Accepted at ICML, (2024)
- Bayesian active learning for semantic segmentation, arXiv, (2024)
- Self-Supervised contrastive representation learning for 3D mesh segmentation, arXiv, Accepted at AAAI, (2023)
- Highly efficient representation and active learning framework for imbalanced data and its application to COVID-19 X-Ray classification, arXiv NeurIPS Data-Centric AI workshop, (2021)
- Active learning performance in labeling radiology images is 90% Effective, Frontiers in Radiology, (2021)
- PatchNet: Unsupervised object discovery based on patch embedding, arXiv, (2021)
- Modeling and optimizing the impact of process and equipment parameters in sputtering deposition systems using a Gaussian process machine learning framework, IEEE Transactions on Semiconductor Manufacturing, (2021)

Selected Patents

Bayesian semantic segmentation active learning with Beta approximation, 20230368507
Object discovery, 20220383105
Unsupervised representation learning and active learning to improve data efficiency, 20220138935
Long short-term memory anomaly detection for multi-sensor equipment monitoring, 20200104639
Correcting component failures in ion implant semiconductor manufacturing tool, 11348813
Chamber matching with neural networks in semiconductor equipment tools, 11133204
Magnetic mixer, 11097236
Magnetic drive for bioreactor, 10335750
System and method for characterizing conditions in a fluid mixing devices, 10682618

Awards & Recognitions

Samsung SDS Circle of Excellence Award granted by CEO of Samsung SDSA for Outstanding performance, Creativity, Organizational abilities and Team Work, (2022)
Best Presentation Award at Future of Information and Communication Conference (FICC), for the paper titled “Medical Image Labeling via Active Learning is 90% Effective”, (2022)
GE Above & Beyond Award for development of automated probabilistic testing framework, (2017)
GE Global Research Aero-Thermal Mechanical System (ATMS) impact award for outstanding contribution to Artificial Lift Optimization project, (2016)
ASME’s International Engineering Congress and Exposition outstanding research award, (2012)

Projects

Proxy-Based Uncertainty Estimation for Improving Instruction Following in Language Models

arXiv link

Accepted at ICML 2024

We defined a novel Uncertainty-aware Reward Model (URM) for the preference training of LLMs based on Bayesian approximation to quantify the uncertainty of paired responses. We experimentally demonstrated that using URM in LLMs training boosts their instruction following capability and their policy optimization objectives. The URM based finetuning surpasses existing methods by a large margin on benchmarks such as Vicuna and MT-bench.

The code & data for the paper are shared at P-B-U Git Repo

alt text

Sima Didari