Secure Cloud Data Analytics with Trusted Processors

Fahad Shaon

Over the last few years, data storage in cloud-based services has been very popular due to the easy management and monetary advantages of cloud computing. Recent developments showed that such data could be leaked due to various attacks. To address some of these attacks, encrypting sensitive data before sending to the cloud emerged as an important protection mechanism. Still, indexing, querying and running complex data analytics tasks on the encrypted data remained as important challenges. In this dissertation, we address some of the encrypted data processing challenges using two different but complementary approaches. First, we explore what kind of data querying functionality we can provide for encrypted data even if we have no support from the server. Later, we provide solutions for the use cases where the cloud server provides a trusted processor for processing some of the encrypted data.

Abstract

Over the last few years, data storage in cloud-based services has been very popular due to the easy management and monetary advantages of cloud computing. Recent developments showed that such data could be leaked due to various attacks. To address some of these attacks, encrypting sensitive data before sending to the cloud emerged as an important protection mechanism. Still, indexing, querying and running complex data analytics tasks on the encrypted data remained as important challenges. In this dissertation, we address some of the encrypted data processing challenges using two different but complimentary approaches. First, we explore what kind of data querying functionality we can provide for encrypted data even if we have no support from the server. Later, we provide solutions for the use cases where the cloud server provides a trusted processor for processing some of the encrypted data.

For the cloud deployments where there is only limited support from the cloud server (Cloud services such as, Dropbox, Box, Google Drive, allow simple data retrieval and do not provide computational support (i.e., running an arbitrary code on the encrypted data)), we provide a new searchable encryption scheme, i.e., a type of encryption technique that allows querying on encrypted data. Furthermore, we provide an extensible framework for supporting complex search queries over encrypted multimedia data. Before any data is uploaded to the cloud, important features are extracted to support different query types (e.g., extracting facial features to support face recognition queries) and complex queries are converted to series of object retrieval tasks for the cloud service. Later, we explore the setting where the cloud servers provide support for processing encrypted data using trusted processors. In this setting, we can execute code in a trusted processor in a secure manner, i.e, adversary cannot temper with the code without detection, and data is always encrypted outside the trusted processor.

Over the past few years, efficient and secure data analytics tools (e.g., map-reduce framework, machine learning models, and SQL querying) that can be executed over encrypted data using the trusted processors have been developed. However, these prior efforts do not provide a simple, secure and high-level language-based framework that is suitable for enabling generic data analytics for non-security experts who do not have important security concepts such as "oblivious execution". We thus provide such a framework that allows data scientists to perform the data analytic tasks with secure processors using a Python/Matlab-like high-level language. Also, we perform block size optimization and provide security guarantees for data obliviousness.

Similarly, systems to accesses encrypted inverted index using trusted processes have been developed before. However, none of these works proposed a mechanism to build the index in the cloud securely. All of these works assume that some form of unencrypted inverted index is already available. Building an inverted index can be very memory consuming task for big data on memory constraint platforms. So we propose a system to build the encrypted inverted index in the cloud using trusted processors for text as well as multimedia data in an oblivious and secure manner. We design our index to support TF-IDF based ranked document retrieval. Our system also supports indexing for answering complex queries such as face recognition.

Cite

@book{shaon2019secure,
  title={Secure Cloud Data Analytics with Trusted Processors},
  author={Shaon, Fahad},
  year={2019},
  publisher={The University of Texas at Dallas}
}

Tags

Secure Analytics, Matrix, BigMatrix, SGX-BigMatrix, LargeMatrix, Encrypted Analytics, Trusted Processor, SGX, Encrypted Storage, Encrypted Search, Searchable Encryption, SSE, Information Retrieval, Encrypted Index