Adi Akavia, Hayim Shaul, Mor Weiss, and Zohar Yakhini, 2019
[full version, proceedings version]
Abstract
Developing machine learning models from federated training data, containing many independent samples, is an important task that can significantly enhance the potential applicability and prediction power of learned models. Since single users, like hospitals or individual labs, typically collect data-sets that do not support accurate learning with high confidence, it is desirable to combine data from several users without compromising data privacy. In this paper, we develop a privacy-preserving solution for learning a linear regression model from data collectively contributed by several parties (``data owners''). Our protocol is based on the protocol of Giacomelli et al. (ACNS 2018) that utilized two non colluding servers and Linearly Homomorphic Encryption (LHE) to learn regularized linear regression models. Our methods use a different LHE scheme that allows us to significantly reduce both the number and runtime of homomorphic operations, as well as the total runtime complexity. Another advantage of our protocol is that the underlying LHE scheme is based on a different (and post-quantum secure) security assumption than Giacomelli et al. Our approach leverages the Chinese Remainder Theorem, and Single Instruction Multiple Data representations, to obtain our improved performance. For a 1000 x 40 linear regression task we can learn a model in a total of 3 seconds for the homomorphic operations, compared to more than 100 seconds reported in the literature. Our approach also scales up to larger feature spaces: we implemented a system that can handle a 1000 x 100 linear regression task, investing minutes of server computing time after a more significant offline pre-processing by the data owners. We intend to incorporate our protocol and implementations into a comprehensive system that can handle secure federated learning at larger scales.