Remote sensing visual question answering (RSVQA) opens new opportunities for the use of Earth observation imagery by the general public, by enabling human-machine interaction with natural language. Building on the recent advances in natural language processing and computer vision, the goal of RSVQA is to answer questions formulated in natural language about a remote sensing image. Language understanding is essential to the success of the task but has not yet been thoroughly examined in RSVQA. In particular, the problem of language biases is often overlooked in the remote sensing community. Biases impact model robustness and lead to wrong conclusions about the performances of the model. The present work aims at highlighting the problem of language biases in RSVQA with a threefold analysis strategy: through question-only models, adversarial testing and dataset analysis. Thanks to this analytical study, we observed that biases in remote sensing are severe, likely due to the specifics of existing remote sensing datasets, e.g. geographical similarities and sparsity, as well as the question generation strategies. With this work, we introduce and advocate for the use of more informative and relative evaluation metrics, sensitive to the issue, to assess both methods and datasets. The development of less-biased datasets along with visually grounding methods is a necessity for the future of the promising field of RSVQA: to develop future natural-language-based digital assistants to Earth observation, it is of the upmost importance that new RSVQA studies communicate transparently and openly with respect to the serious pitfall of language biases.