Study objectives Sleep staging is usually performed by manual scoring of polysomnography (PSG), which is expensive, laborious, and poorly scalable. We propose an alternative to PSG for ambulatory sleep staging using wearable photoplethysmography (PPG) recorded by a smartwatch and automated scoring. Methods We previously trained a deep learning model on public datasets, with the specific purpose of performance generalizability to unseen datasets. In the present work, the model was assessed on two datasets of reflective PPG collected from wrist-worn devices: a) 68 overnight recordings and b) for the first time, 493 long-term recordings each lasting for 24 hours (170 subjects). Findings were compared either to a) expert scored sleep stages from PSG for the night recordings or b) actigraphy for the long-term recordings. Results For the overnight recordings, the PPG-based model achieved 78.7% accuracy and a Cohen’s κ of 0.68 on reflective PPG collected using wrist-worn devices compared to PSG using a 4-class setup (wake, N1 and N2 combined, N3 and REM) and a sleep/wake accuracy of 94.1%, with a Cohen’s κ of 0.71. For the long-term recordings, a sleep/wake accuracy of 92.5% with a Cohen’s κ of 0.80 was achieved when compared to a state-of-the-art actigraphy-based deep learning model. Conclusions This state-of-the-art accuracy achieved on wrist-worn devices represents a significant advancement for home sleep monitoring and a valuable alternative to PSG-based sleep staging. Additionally, our model demonstrated promising results on long-term ambulatory recordings, paving the way towards continuous ambulatory monitoring of sleep stages and sleep–wake cycles.