Explores pretraining sequence-to-sequence models with BART and T5, discussing transfer learning, fine-tuning, model architectures, tasks, performance comparison, summarization results, and references.
Explores the mathematics of language models, covering architecture design, pre-training, and fine-tuning, emphasizing the importance of pre-training and fine-tuning for various tasks.
Explores chemical reaction prediction using generative models and molecular transformers, emphasizing the importance of molecular language processing and stereochemistry.
Explains the full architecture of Transformers and the self-attention mechanism, highlighting the paradigm shift towards using completely pretrained models.
Delves into training and applications of Vision-Language-Action models, emphasizing large language models' role in robotic control and the transfer of web knowledge. Results from experiments and future research directions are highlighted.