Maksim Siniukov

Maksim Siniukov

PhD student at Intelligent Human Perception Lab

Google Scholar Profile
PDF CV

Education

PhD student at Intelligent Human Perception Lab, USC Institute for Creative Technologies
University of Southern California, Thomas Lord Department of Computer Science, USC Viterbi School of Engineering,
Intelligent Human Perception Lab
2023–current

Master of Science in Computer Science, University of Southern California
USC Viterbi School of Engineering
2023–2025

Bachelor of Computer Science, Moscow State University named after M. V. Lomonosov
Applied Mathematics and Computer Science, Department of Intelligent Information Technologies,
Graphics and Media Lab
2019–2023

SCIENTIFIC RESEARCH

Publications

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov, Di Chang, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani
IEEE/CVF International Conference on Computer Vision (ICCV) 2025
[Project Page] [arXiv]

SEMPI: A Database for Understanding Social Engagement in Video-Mediated Multiparty Interaction
Maksim Siniukov*, Yufeng Yin*, Eli Fast, Yingshan Qi, Aarav Monga, Audrey Kim, Mohammad Soleymani
ACM International Conference on Multimodal Interaction (ICMI) 2024
[Paper]

DIM: Dyadic Interaction Modeling for Social Behavior Generation
Minh Tran*, Di Chang*, Maksim Siniukov, Mohammad Soleymani
European Conference on Computer Vision (ECCV) 2024
[Project Page] [arXiv]

Towards a Generalizable Speech Marker for Parkinson's Disease
Maksim Siniukov, Ellie Xing, Sanaz Attaripour Isfahani, Mohammad Soleymani
Submitted to EURASIP Journal on Audio, Speech, and Music Processing
[arXiv]

Unveiling the Limitations of Novel Image Quality Metrics
Maksim Siniukov, Dmitriy Kulikov, Dmitriy Vatolin
IEEE 25th International Workshop on Multimedia Signal Processing (MMSP) 2023
[Paper]

Hacking VMAF and VMAF NEG: vulnerability to different preprocessing methods
Maksim Siniukov, Anastasia Antsiferova, Dmitriy Kulikov, Dmitriy Vatolin
4th Artificial Intelligence and Cloud Computing Conference (AICCC) 2021
[Paper]

Applicability limitations of differentiable full-reference image-quality metrics
Maksim Siniukov, Dmitriy Kulikov, Dmitriy Vatolin
Data Compression Conference (DCC) 2023
[Paper] [arXiv]

Work Experience

Graduate Researcher (Sep 2024 -- Present)
Intelligent Human Perception Lab, USC Institute for Creative Technologies, Playa Vista, Los Angeles
Developed deep learning methods for human interaction modeling from multimodal data with applications in affective computing and listener behavior synthesis

  • Developed DiTaiListener (ICCV25) — state-of-the-art video diffusion model for listener response generation, achieving 73.8% improvement in photorealism, allowing coherent long video generation. Introduced controllable face video generation with CTM-Adapter for diffusion transformer
  • Developed SEMPI model (ICMI24) — multimodal deep learning model that combines visual, speech, and language modalities to predict human engagement, achieving 27% CCC improvement in the visual domain
  • Engineered Dyadic Interaction Modeling (ECCV24) — generative AI system modeling speaker-listener interactions from 1.5TB large-scale unlabeled data, reaching state-of-the-art performance in listener behavior generation

Graduate Researcher (Sep 2023 -- Sep 2024)
Intelligent Human Perception Lab, USC Institute for Creative Technologies, Playa Vista, Los Angeles
Addressed low-resource tasks via self-supervised learning on large-scale unlabeled data and transfer learning

  • Developed state-of-the-art speech-based Parkinson's disease detection model that adapts pre-trained speech encoders to target task through domain adaptation on unlabeled data
  • Achieved 10.5% improvement in PD detection F1 score using speech representation learning for speech marker
  • Tools: PyTorch, Python, Artificial Intelligence, Speech Analysis, Audio Signal Processing, Transformer Models

Student Researcher (Feb 2021 -- June 2023)
Computer Graphics and Multimedia Lab, CMC MSU, Moscow
Computer vision research on video quality metrics and adversarial machine learning

  • Engineered adversarial ML video processing method (AICCC21) that improves NETFLIX VMAF subjective video quality metric by up to 218%
  • Developed Convolutional Neural Network (CNN)-based frameworks for Image Quality Metrics Analytics (MMSP23), that identified vulnerabilities in 10 image quality metrics including widely-used LPIPS
  • Published 5 papers, including works presented at DCC 2023, MMSP 2023, and AICCC 2021

Teaching Assistant (Aug 2023 -- Dec 2023)
Thomas Lord Department of Computer Science, USC, Los Angeles
Assisting and guiding 70 graduate students. DSCI549: Introduction to Computational Thinking and Data Science

Awards

The First Prize of Excellent Youth Science & Technology Innovation Project, The 39-th Beijing Young Science Creation Competition, first place

The 2-nd place at the competition of scientific and technical schoolchildren works ”Scientists of the Future”, 11-th grade, MSU, 2018

The 3-rd place in the All-Russian competition of schoolchildren scientific works ”Junior”, 9-th grade, MEPhI, 2017

Olympiad ”Kurchatov”, prize-winner

Olympiad ”Phystech”, winner

Engineering Olympiad for schoolchildren, winner

Olympiad of St. Petersburg State University, prize-winner

Additional education

Stanford University Machine Learning Course
Stanford Online
[Certificate]

Specialization "Deep Learning Specialization"
DeepLearning.AI
[Certificate]

Course Convolutional Neural Networks
Coursera
[Certificate]

Course Neural Networks and Deep Learning
DeepLearning.AI.
[Certificate]

Course Structuring Machine Learning Projects
Coursera
[Certificate]

Course Sequence Models
Coursera
[Certificate]

Course Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
DeepLearning.AI
[Certificate]

Educational course on the basics of sports programming at the MISiS Research Technological University, 2018
National University of Science and Technology (MISiS)

English Language Certificate: LTC. General English. Certificate of Attendance, 2018
LTC

Course "Media data processing and compression methods"
CMC MSU

Course "Intelligent methods of video processing"
CMC MSU

Skills

Research:
Multimodal Learning, Generative AI, Affective Computing, Human Perception, Human Interaction Modeling, Speech Representation Learning, Artificial Intelligence (AI), Neural Networks, Computer Vision, Video Quality
Technical:
Python, PyTorch, TensorFlow 2, OpenCV, Transformers, scikit-learn, NumPy, diffusers, C++, C, Docker, High-Performance Computing (HPC), Stable Diffusion, Distributed Training (DeepSpeed)
Languages:
Russian (Native), English (C1)