You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Time series analysis (TSA) is a longstanding research topic in the data mining community and has wide real-world significance. Compared to "richer" modalities such as language and vision, which have recently experienced explosive development and are densely connected, the time-series modality remains relatively underexplored and isolated. We notice that many recent TSA works have formed a new research field, i.e., Multiple Modalities for TSA (MM4TSA). In general, these MM4TSA works follow a common motivation: how TSA can benefit from multiple modalities. This survey is the first to offer a comprehensive review and a detailed outlook for this emerging field. Specifically, we systematically discuss three benefits: (1) reusing foundation models of other modalities for efficient TSA, (2) multimodal extension for enhanced TSA, and (3) cross-modality interaction for advanced TSA. We further group the works by the introduced modality type, including text, images, audio, tables, and others, within each perspective. Finally, we identify the gaps with future opportunities, including the reused modalities selections, heterogeneous modality combinations, and unseen tasks generalizations, corresponding to the three benefits. We release this up-to-date GitHub repository that includes key papers and resources. More details please check our survey.
Contributing
🚀 We will continue to update this repo. If you find it helpful, please Star it or Cite Our Survey.
🤝 Contributions are welcome! Please feel free to submit a Pull Request.
Citation
🤗 If you find this survey useful, please consider citing our paper. 🤗
@misc{liu2025timeseriesanalysisbenefit,
title={How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook},
author={Haoxin Liu and Harshavardhan Kamarthi and Zhiyuan Zhao and Shangqing Xu and Shiyu Wang and Qingsong Wen and Tom Hartvigsen and Fei Wang and B. Aditya Prakash},
year={2025},
eprint={2503.11835},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.11835},
}
Mobile traffic prediction in consumer applications: A multimodal deep learning approach
IEEE Transactions on Consumer Electronics
Urban informal settlements classification via a transformer-based spatial-temporal fusion network using multimodal remote sensing and time-series human activity data
International Journal of Applied Earth Observation and Geoinformation
Spatial-temporal attention-based convolutional network with text and numerical information for stock price prediction
Neural Computing and Applications
Traffic congestion prediction using toll and route search log data
IEEE International Conference on Big Data (Big Data) 2022
Understanding city traffic dynamics utilizing sensor and textual observations
AAAI 2016
Citygpt: Empowering urban spatial cognition of large language models
arXiv 24.06
Where Would I Go Next? Large Language Models as Human Mobility Predictors
arXiv 23.08
Leveraging Language Foundation Models for Human Mobility Forecasting
arXiv 22.09
2.3.2 Medical Time Series
Title
Venue
Addressing asynchronicity in clinical multimodal fusion via individualized chest x-ray generation
NeurIPS 2024
EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented Generation
CIKM 2024
Improving medical predictions by irregular multimodal electronic health records modeling
ICML 2023
Multimodal pretraining of medical time series and notes
Machine Learning for Health (ML4H) 2023
Learning missing modal electronic health records with unified multi-modal data embedding and modality-aware attention
Machine Learning for Health (ML4H) 2023
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Machine Learning for Health (ML4H) 2022
Miracle: Causally-aware imputation via learning missing data mechanisms
NeurIPS 2021
How to leverage the multimodal EHR data for better medical prediction?
Conference on Empirical Methods in Natural Language Processing 2021
Deep multi-modal intermediate fusion of clinical record and time series data in mortality prediction
Frontiers in Molecular Biosciences
Integrated multimodal artificial intelligence framework for healthcare applications
NPJ digital medicine
PTB-XL, a large publicly available electrocardiography dataset
Scientific data
Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines
NPJ digital medicine
Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers
arXiv 25.01
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records
arXiv 24.09
Multimodal risk prediction with physiological signals, medical images and clinical notes
medrxiv 23.05
2.3.3 Financial Time Series
Title
Venue
Fnspid: A comprehensive financial news dataset in time series
KDD 2024
Multi-modal deep learning for credit rating prediction using text and numerical data streams
Applied Soft Computing
Multimodal multiscale dynamic graph convolution networks for stock price prediction
Pattern Recognition
Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections
Proceedings of the Fourth ACM International Conference on AI in Finance
Natural language based financial forecasting: a survey
Artificial Intelligence Review
Financial analysis, planning & forecasting: Theory and application
Unknown
Text2timeseries: Enhancing financial forecasting through time series prediction updates with event-driven insights from large language models
arXiv 24.07
Natural language processing and multimodal stock price prediction
arXiv 24.01
Modality-aware Transformer for Financial Time series Forecasting
arXiv 23.10
Predicting financial market trends using time series analysis and natural language processing
arXiv 23.09
Stock price prediction using sentiment analysis and deep learning for Indian markets
arXiv 22.04
Volatility prediction using financial disclosures sentiments with word embedding-based IR models
arXiv 17.02
2.4 Gaps and Outlooks
2.4.1 Heterogeneous Modality Combinations
Title
Venue
Imagebind: One embedding space to bind them all
CVPR 2023
LANISTR: Multimodal learning from structured and unstructured data
arXiv 23.05
3. TimeAsX
3.1 Time Series as Text
Title
Venue
Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series
ICLR 2025
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
AAAI 2025
Exploiting Language Power for Time Series Forecasting with Exogenous Variables
THE WEB CONFERENCE 2025
Lstprompt: Large language models as zero-shot time series forecasters by long-short-term prompting
ACL 2024 Findings
TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting
ICLR 2024
TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
ICLR 2024
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
ICLR 2024
Are language models actually useful for time series forecasting?
NeurIPS 2024
Autotimes: Autoregressive time series forecasters via large language models
NeurIPS 2024
S2 IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting
ICML 2024
Large language models are zero-shot time series forecasters
NeurIPS 2023
One fits all: Power general time series analysis by pretrained lm
NeurIPS 2023
PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting
IEEE Transactions on Knowledge and Data Engineering
Chronos: Learning the language of time series
TMLR
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters
ACM Transactions on Intelligent Systems and Technology
Large Language Models are Few-shot Multivariate Time Series Classifiers
arXiv 25.02
TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents
arXiv 25.02
ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning
arXiv 24.12
Large language models can deliver accurate and interpretable time series anomaly detection
arXiv 24.05
Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning
arXiv 24.02
Lag-llama: Towards foundation models for time series forecasting
arXiv 23.10
3.2 Time Series as Image
Title
Venue
CAFO: Feature-Centric Explanation on Time Series Classification
KDD 2024
TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis
ICLR 2024
Towards total recall in industrial anomaly detection
CVPR 2022
Deep video prediction for time series forecasting
Proceedings of the Second ACM International Conference on AI in Finance 2021
Forecasting with time series imaging
Expert Systems with Applications
Can Multimodal LLMs Perform Time Series Anomaly Detection?
arXiv 25.02
Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
arXiv 25.02
See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers
arXiv 24.11
Plots Unlock Time-Series Understanding in Multimodal Models
arXiv 24.10
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
arXiv 24.08
Training-Free Time-Series Anomaly Detection: Leveraging Image Foundation Models
arXiv 24.08
ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting
arXiv 24.07
Time Series as Images: Vision Transformer for Irregularly Sampled Time Series
arXiv 23.03
An image is worth 16x16 words: Transformers for image recognition at scale
arXiv 20.10
Imaging Time-Series to Improve Classification and Imputation