Coffee production serves as an important source of income for millions of farmers across many tropical countries. Given the scale of the production, accurate and up-to-date maps of coffee plantations are needed to detect, monitor, and mitigate potential negative environmental impacts. Such maps can be produced based on satellite imagery in a cost-effective and replicable way. However, this comes with certain challenges, including the dynamic spectral signature of coffee, complex topographies in which it is often cultivated, and the high cloud coverage common in tropical countries. In this work, we train a deep learning model to detect coffee plantations in Brazil directly from time series of Sentinel-2 images, alleviating the need for manual extraction of features. We show that the model outperforms significantly models trained on single images, is more robust against clouds and various seasonal patterns, and generalizes better to new regions.