**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Non-negative matrix factorization

Summary

Non-negative matrix factorization (NMF or NNMF), also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.
NMF finds applications in such fields as astronomy, computer vision, document clustering, missing data imputation, chemometrics, audio signal processing, recommender systems, and bioinformatics.
In chemometrics non-negative matrix factorization has a long history under the name "self modeling curve resolution".
In this framework the vectors in the right matrix are continuous curves rather than discrete vectors.
Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the 1990s under the name positive matrix factorization.
It became more widely known as non-negative matrix factorization after Lee and Seung investigated the properties of the algorithm and published some simple and useful
algorithms for two types of factorizations.
Let matrix V be the product of the matrices W and H,
Matrix multiplication can be implemented as computing the column vectors of V as linear combinations of the column vectors in W using coefficients supplied by columns of H. That is, each column of V can be computed as follows:
where vi is the i-th column vector of the product matrix V and hi is the i-th column vector of the matrix H.
When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix.

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (15)

Related people (6)

Related units (3)

Related concepts (7)

Related courses (2)

Related lectures (16)

Latent Dirichlet allocation

In natural language processing, Latent Dirichlet Allocation (LDA) is a Bayesian network (and, therefore, a generative statistical model) that explains a set of observations through unobserved groups, and each group explains why some parts of the data are similar. The LDA is an example of a Bayesian topic model. In this, observations (e.g., words) are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

Non-negative matrix factorization

Non-negative matrix factorization (NMF or NNMF), also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered.

Multilinear subspace learning

Multilinear subspace learning is an approach for disentangling the causal factor of data formation and performing dimensionality reduction. The Dimensionality reduction can be performed on a data tensor that contains a collection of observations have been vectorized, or observations that are treated as matrices and concatenated into a data tensor. Here are some examples of data tensors whose observations are vectorized or whose observations are matrices concatenated into data tensor s (2D/3D), video sequences (3D/4D), and hyperspectral cubes (3D/4D).

This course teaches an overview of modern optimization methods, for applications in machine learning and data science. In particular, scalability of algorithms to large datasets will be discussed in t

This course introduces the foundations of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed information systems.

, , , , ,

Matrix Factorization: Information ExtractionCS-423: Distributed information systems

Explores matrix factorization for information extraction, Bayesian ranking, and relation embeddings.

Entity & Information ExtractionCS-423: Distributed information systems

Explores knowledge extraction from text, covering key concepts like keyphrase extraction and named entity recognition.

Latent Semantic IndexingCS-423: Distributed information systems

Covers Latent Semantic Indexing, a method to improve information retrieval by mapping documents and queries into a lower-dimensional concept space.

Inspired by the human ability to localize sounds, even with only one ear, as well as to recognize objects using active echolocation, we investigate the role of sound scattering and prior knowledge in regularizing ill-posed inverse problems in acoustics. In particular, we study direction of arrival estimation with one microphone, acoustic imaging with a small number of microphones, and microphone array localization. Not only are these problems ill-posed but also non-convex in the variables of interest when formulated as optimization problems. To restore well-posedness, we thus use sound scattering which we construe as a physical form of regularization. We additionally use standard regularization in the form of appropriate priors on the variables. The non-convexity is then handled with tools such as linearization or semidefinite relaxation.
We begin with direction of arrival estimation. While conventional approaches require at least two microphones, we show how to estimate the direction of one or more sound sources using only one. This is made possible thanks to regularization by sound scattering which we achieve by compact structures made from LEGO that scatter the sound in a direction-dependent manner. We also impose a prior on the source spectra where we assume they can be sparsely represented in a learned dictionary. Using algorithms based on non-negative matrix factorization, we show how to use the LEGO devices and a speaker-independent dictionary to successfully localize one or two simultaneous speakers.
Next, we study acoustic imaging of 2D shapes using a small number of microphones. Unlike in echolocation where the source is known, we show how to image an unknown object using an unknown source. In this case, we enforce a prior on the object using a total variation norm penalty but no priors on the source. We also show how to use microphones embedded in the ears of a dummy head to benefit from the diversity encoded in the head-related transfer function. We then propose an algorithm to jointly reconstruct the shape of the object and the sound source spectrum. We demonstrate the effectiveness of our approach using numerical and real experiments with speech and noise sources.
Finally, the need to know the microphone positions in acoustic imaging and a number of other applications led us to study microphone localization. We assume the positions of the loudspeakers are also unknown and that all devices are not synchronized. In this case, the times of arrival from the loudspeakers to the microphones are shifted by unknown source emission times and unknown sensor capture times. We thus propose an objective that is timing-invariant allowing us to localize the setup without first having to estimate the unknown timing information. We also propose an approach to handle missing data as well as show how to include side information such as knowledge of some of the distances between the devices. We derive a semidefinite relaxation of the objective which provides a good initialization to a subsequent refinement using the Levenberg-Marquardt algorithm. Using numerical and real experiments, we show we can localize unsynchronized devices even in near-minimal configurations.

In this paper we study stationary graphs for functionals of geometric nature defined on currents or varifolds. The point of view we adopt is the one of differential inclusions, introduced in this context in the recent papers (De Lellis et al. in Geometric measure theory and differential inclusions, 2019. arXiv:1910.00335; Tione in Minimal graphs and differential inclusions. Commun Part Differ Equ 7:1–33, 2021). In particular, given a polyconvex integrand f, we define a set of matrices Cf that allows us to rewrite the stationarity condition for a graph with multiplicity as a differential inclusion. Then we prove that if f is assumed to be non-negative, then in Cf there is no T′N configuration, thus recovering the main result of De Lellis et al. (Geometric measure theory and differential inclusions, 2019. arXiv:1910.00335) as a corollary. Finally, we show that if the hypothesis of non-negativity is dropped, one can not only find T′N configurations in Cf, but it is also possible to construct via convex integration a very degenerate stationary point with multiplicity.

2021Athanasios Nenes, Maria Apostolaki

Polycyclic aromatic hydrocarbons (PAHs) are organic pollutants with proven mutagenic and carcinogenic potential that originate from incomplete combustion, and partition to fine particulate matter. Nitro-PAHs & oxy-PAHs are oxidation products of PAHs with increased toxicity compared to their parent members and may reveal useful information about the aging and oxidation processes of PAHs. In this study, we investigate the seasonal profiles of 31 PAHs and select oxidized forms such as nitro PAHs & quinones in Athens, Greece to understand their sources, levels, toxicity and impacts. PAHs levels were found to be significantly higher during winter, particularly during intense pollution episodes, compared to the other seasons. Chemical markers linked to biomass burning (BB) emissions are found to correlate well with the total amount of PAHs (ΣPAHs) during wintertime, strongly indicating that BB emissions are a significant source of PAHs. Positive Matrix Factorization (PMF) analysis showed that more than 50% of ΣPAHs originate from BB emissions and that a “factor” (composed of a specific mixture of PAHs) characterizes biomass burning emissions – and can potentially be used as a tracer. Analysis of the PMF series suggests that BB aerosol is much more carcinogenic than the effects of gasoline and diesel combustion combined. Finally, the exposure impact during winter is 9 times higher compared with the other seasons.

2021