Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaning-bearing units in a sequence of characters. Rather than segmenting the sequence, this model discovers continuous representations of the "objects" in the sequence, using a recently proposed architecture for object discovery in images called Slot Attention. We train our model on different languages and evaluate the quality of the obtained representations with probing classifiers. Our experiments show promising results in the ability of our units to capture meaning at a higher level of abstraction.