This lecture delves into advancing system models of human intelligence through integrative benchmarking, focusing on brain-inspired computing and artificial neural networks. It explores the challenges in evaluating model ideas and the importance of benchmarks for reproducible and fair comparisons. The talk discusses the development of Brain-Score, an experimental paradigm for operationalizing and codifying brain data similarity metrics. It also covers the significance of neural alignment tests and the correlation between task performance and Brain-Score. The lecture showcases the impact of models like CORnet and VOneNet in bridging neural, behavioral, and computational aspects of language. It concludes with future directions in building novel system models and creating new benchmarks for vision and language domains.