**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Person# Meiyue Shao

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related units

Loading

Courses taught by this person

Loading

Related research domains

Loading

Related publications

Loading

People doing similar research

Loading

Related research domains

No results

Courses taught by this person

People doing similar research

No results

No results

Related units (2)

Related publications (7)

Loading

Loading

Loading

The locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm is a popular approach for computing a few smallest eigenvalues and the corresponding eigenvectors of a large Hermitian positive definite matrix A. In this work, we propose a mixed precision variant of LOBPCG that uses a (sparse) Cholesky factorization of A computed in lower precision as the preconditioner. To further enhance performance, a mixed precision orthogonalization strategy is proposed. To analyze the impact of reducing precision in the preconditioner on performance, we carry out a rounding error and convergence analysis of PINVIT, a simplified variant of LOBPCG. Our theoretical results predict and our numerical experiments confirm that the impact on convergence remains marginal. In practice, our mixed precision LOBPCG algorithm typically reduces the computation time by a factor of 1.4-2.0 on both CPUs and GPUs.

Small relative perturbations to the entries of an essentially nonnegative matrix introduce small relative errors to entries of its exponential. It is thus desirable to compute the exponential with high componentwise relative accuracy. Taylor series approximation coupled with scaling and squaring is used to compute the exponential of an essentially nonnegative matrix. An a priori componentwise relative error bound of truncation is established, from which one can choose the degree of Taylor series expansion and the scale factor so that the exponential is computed with desired componentwise relative accuracy. To reduce the computational cost, the degree of the Taylor series expansion is chosen small, while the scale factor is chosen sufficiently large to achieve the desired accuracy. The rounding errors in the squaring stage are not serious as squaring is forward stable for nonnegative matrices. We also establish a posteriori componentwise error bounds and derive a novel interval algorithm for the matrix exponential of an essentially nonnegative matrix. Rounding error analysis and numerical experiments demonstrate the efficiency and accuracy of the proposed methods.

Bo Kågström, Daniel Kressner, Meiyue Shao

Library software implementing a parallel small-bulge multishift QR algorithm with Aggressive Early Deflation (AED) targeting distributed memory high-performance computing systems is presented. Starting from recent developments of the parallel multishift QR algorithm [Granat et al., SIAM J. Sci. Comput. 32(4), 2010], we describe a number of algorithmic and implementation improvements. These include communication avoiding algorithms via data redistribution and a refined strategy for balancing between multishift QR sweeps and AED. Guidelines concerning several important tunable algorithmic parameters are provided. As a result of these improvements, a computational bottleneck within AED has been removed in the parallel multishift QR algorithm. A performance model is established to explain the scalability behavior of the new parallel multishift QR algorithm. Numerous computational experiments confirm that our new implementation significantly outperforms previous parallel implementations of the QR algorithm.