RETRIEVAL OF TOP-K FILES BY MULTI-KEYWORD USING

Transcription

RETRIEVAL OF TOP-K FILES BY MULTI-KEYWORD USING
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
RETRIEVAL OF TOP-K FILES BY MULTI-KEYWORD USING TRSE
SCHEME OVER ENCRYPTED CLOUD DATA
Manoj Kumar R1 , Maria Navin J R2
1
PG Student, Dept. of C.S.E,S.V.C.E, Bengaluru. [email protected]
2.
Asst.Professor, Dept. Of CS&E, S.V.C.E, Bengaluru. [email protected]
ABSTRACT
Cloud computing is the term that describes the mean of delivering a information to the end user as
service. A concern of sensitive information on cloud potentially causes privacy problems. Data encryption
protects data security to some extent. The data owner has a collection of n files to outsource onto the cloud server
in encrypted form. To achieve this, the data owner build a searchable index from a collection of keywords and
then outsources both the encrypted index and encrypted files onto the cloud server. The authorized data user at
first generates a query request and the cloud server sends relevant files to the data user. To eliminate the
information leakage, a two-round searchable encryption (TRSE) scheme has been proposed that supports top-k
multi-keyword retrieval. Homomorpic encryption and Vector space model are employed that involve in ranking.
Since ranking is done on user-side based on order-preserving encryption efficiency in retrieval of file is
improved. The files are ranked in the order of relevance by users interest and only the files with the highest
relevance are sent back to users.
Keywords: Cloud, multi-keyword, vector space model, homomorphic encryption.
----------------------------------------------------------------------------------------------------------------------------1. INTRODUCTION
Data outsourcing is an advanced data service for users to store sensitive data into a storage pace. The
sensitive data is managed on remote servers maintained by trusted third party. The distributed nature of data
management services give assurances to detect and correct faulty behaviour. This is relevant for outsourced data
frameworks in which data owners place their sensitive data into specialized storage spaces. Data owners outsource
their data without assurances of confidentiality, security. Confidentiality can be achieved by encrypting the data. But
the challenge here is that how to enable search and retrieval over such encrypted data. Several searchable symmetric
encryptions (SSE) [1] are available which enable both search and retrieval over encrypted data, but each have their
own drawbacks. Traditional SSE schemes allow users to retrieve the cipher text in a secure way, but most of them
work based on Boolean keyword search [2], [3]. Boolean keyword search gives results based on whether a keyword
exists in a file or not, without considering the relevance with the queried keyword of these files in the result. Some
SSE schemes based on “order preserving encryption” breaches the privacy of sensitive information and allows only
single keyword in search query [4], [5]. There are some multi-keyword based searching schemes which enables
secure indexing and ranked searching [6], [7], but have issues on how to strike a balance between security an d
efficiency. So a new searchable encryption scheme called TRSE is been proposed, in which new technologies in
cryptographic system and IR community are used. It enables us to get the retrieval result as the most relevant files
that match users requirement. It indicates that files are ranked in the order of relevance, and only the files with high
relevance are sent back to users. In TRSE, the concept of homomorphic encryption, vector space model and are
introduced. Since the search operation is performed over encrypted data, information leakage can be eliminated and
data can be searched and retrieved efficiently.
2. RELATED WORK
Searchable symmetric encryption (SSE) allows a client to encrypt data in such a way that it can later
perform search and retrieval from the storage server. A query is given, the server can search over the encrypted data
and return the encrypted files that are appropriate. A SSE scheme is efficient if: (1) the cipher text reveals no
IJRISE| www.ijrise.org|[email protected][152-156]
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
information about the data; (2) the cipher text with a search query reveals at most the result of the search; (3) using
the secret key, search query can be generated.
Existing searchable encryption schemes [1], [6] allow a user to search securely over encrypted data by using
keywords without decrypting it. These techniques support only Boolean keyword search. When directly applied in
large community information outsourcing service, they go through following disadvantage . Most of the schemes
follow either Boolean keyword search or single keyword search without ranking and thereby do not get relevant
data.
Ranked search greatly enhances system usability by returning the matching files in a ranked order. One
simple ranked keyword search is implemented using the order-preserving symmetric encryption (OPSE). Order
Preserving symmetric Encryption (OPE) [8] is an encryption scheme whose encryption function preserves numerical
ordering of the plaintexts. OPE not just permits efficient range queries, but also allows indexing and query
processing to be done exactly as efficiently as for unencrypted data. The main drawback of OPE scheme is that it
inevitably leaks data privacy. Even though data are in encrypted form the server or attacker can still obtain
information through statistical analysis. The leakage of information is termed as statistic leakage.
The homomorphic encryption is proposed to improve the security without sacrificing the efficiency. Ranked search
improves system usability by matching files in a ranked order. This paper proposes a novel encryption with ranking
result of queried data which will give only relevant data.
3. PROPOSED SYSTEM
3.1 System Model
Figure-1 Illustrates the architecture of cloud storage. It consists of the following entities.
Fig-1: Architecture of Proposed Work
Data owner: Data owner encrypt the keywords and the files and stores it in the cloud. Data owner stores the files in
private clouds or public clouds
Data Users: Data users are the users to whom the data owner has offered rights to get to the files stored in a cloud.
The authentication of the users details are verified using cloud servers.
Cloud Server: Cloud server where the encrypted files and the keywords are stored. It verifies the user
authentication and allows the Cloud Service Provider to perform operations on the encrypted data.
3.2 TRSE Design
Existing system uses server side ranking but the proposed system uses user side ranking of files. In our
scheme, if user identities are satisfied then user can decrypt the data. Data owner is responsible to give an access
policy to different kind of users to satisfy the user identity. Our mechanisms works as, Data owner encrypts the
searchable index, data and outsource data onto the cloud. Trusted Centre is responsible for issuing tokens to user for
certifying identity and access policy. When the cloud receives a query consisting of keywords, it calculates the
IJRISE| www.ijrise.org|[email protected][152-156]
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
scores from the encrypted index and then returns the encrypted score to user. Then, the data user decrypts the scores
and picks out the highest ranked files [6]. The files are requested to cloud and required files are obtained, files are
decrypted and used by the user. In this scheme, ranking is done on the data user side and calculations are performed
at server side reducing the computational cost of the data user. To reduce the computational expense on the user
side, all computational work should be done on the server side, so we need an encryption scheme which performs
operations on the corresponding encrypted text. This method is called homomorphic en cryption. The vector space
model consists of two attribute term frequency and document frequency [9]. Term frequency means the number of
occurrences of the term in a file. Document frequency refers to the number of files that contains term. The vector
space model is used to score a file on multiword. Files are ranked according to these scores in an order and most
relevant files can be obtained.
3.2.1 Homomorphic Encryption scheme
Homomorphic encryption allows specific types of computations to be carried out on the cipher text. To improve
the computational cost on the user side, computing work is performed at the server side, so we need an encryption
system to guarantee the performance and security at the same time on server side. Homomorphic encryption [7]
allows calculation of cipher text without knowing anything about the plaintext to get the correct result.
Homomorphic encryption has a great property it guarantee output. On the b asis of homomorphic property, the
encryption scheme can be defined as three stages: KeyGen, Encrypt, Decrypt.
 KeyGen: The public key (PK) and secret key (SK) is generated randomly.
 Encrypt: A message is encrypted using secret key.
 Decrypt: A message is decrypted to get plain text.
3.2.2 Framework of TRSE
The framework of TRSE consists of the following algorithms: Setup phase, Indexbuild, trapdoorreq, scoring,
ranking.
 Setup: Data owner generates private key and public keys for homomorphic encryption .
 Indexbuil d: Searchable index and data are encrypted by the data owner securely and stores it into cloud.
 Trapdoorreq: Data user generates a multiple keyword and send to the cloud and it will be encrypted into a
trapdoor.
 Scoring: When cloud receives trapdoor, it computes the scores of each file returns the encrypted scores to
the data user.
 Ranking: The data user decrypts the score and then requests files with the highest scores from the cloud.
The setup phase is activated only once for the initialization purposes by data owner for one particular application.
For security purposes majority of work should be performed by data owner in the setup phase and other operation in
the cloud. The framework of TRSE consists of two groups Initialization phase, Re trieval phase.
Initialization Phase:
The Initialization phase consists of Setup and Index build, in which the cloud server and data owner are
involved. The details of the Initialization phase are as follows:
1. The data owner calls KeyGen to generate the public key and secret key for the homomorphic encryption scheme.
2. Data owner will extracts the collection of keywords from the file. The data owner builds a searchable index for
each file.
3. The data owner encrypts the searchable index and outsources into cloud server.
4. The data owner encrypts the files with the cryptography schemes, and then outsource to the cloud server.
5. The access policy, identity information is give to trusted centre for access of data user.
Retrieval Phase:
The Retrieval phase consists of Tokengen, Trapdoorreq, Scoring and Ranking, in which the cloud server
and data user are involved. The limited computing control on the user side, the computing work is given to server
side. For the meantime, the confidentiality and privacy of sensitive information cannot be violated. According to the
previous discussion, the ranking should be left to the user side while the cloud server still does most of the work
without learning any sensitive information. The Retrieval phase details are as follows:
IJRISE| www.ijrise.org|[email protected][152-156]
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
1. The data user request the trusted centre, for the authorization by giving identity information for access to the
cloud server.
2. Trusted Centre checks the identity information and generate token to access the cloud for particular user.
3. Then data user generates a set of keywords to search, and then the query is generated. After that, it is encrypted
into trapdoor and then the user sends trapdoor to the cloud server.
4. For each file vector the cloud server computes the score and then returns the result vector to the data user.
5. The data user decrypts and gets the top-k highest-scoring files identifiers and sends it to the cloud server.
6. The cloud server returns the encrypted files to the data user.
The trusted centre plays an important role in identifying user. During the file retrieval process data user will
request cloud access to the trusted centre. When the user identity information’s are satisfied, the trusted centre will
provide token to the data user for the access to th e cloud and data user access information is also shared with the
cloud. When the identity information provided by the data user is not satisfied, data user is not allowed to access the
cloud. The communication overhead will be very large if the encrypted t rapdoorreq size is too large. To solve this
problem and improve efficiency, a substitution of the security of search scheme may be needed unless a new
encryption scheme that provides more reasonable cipher text size becomes available.
The ranking is performed using the top k select algorithm [9]. Note that k, which denotes the number of
files that are most important to the user’s importance. Our proposed system reduces information leakage. We
employed homomorphic to preserve data privacy.
4 SCREEN SHOTS
4.1 Home Page
4.2 Registration page
4.3 Owner Login
4.4 File upload
IJRISE| www.ijrise.org|[email protected][152-156]
International Journal of Research In Science & Engineering
Volume: 1 Special Issue: 2
e-ISSN: 2394-8299
p-ISSN: 2394-8280
4.5 Search keyword
5. Conclusion
In this paper, we define to solve the problem of multi-keyword ranked search over encrypted cloud data
and proposed a TRSE schema employing the homomorphic encryption which fulfils the security requirement of
multi-keyword top-k retrieval over the encrypted cloud data. The homomorphic encryption algorithm gives privacy
for data. We define similarity relevance Ranking scheme improves the retrieval of files. Finally, security and
confidentiality of data is maintained in the cloud.
References
[1] R.Curtmola, J.A. Garay, S. Kamara, and R.V
Ostrovsky, “Searchable Symmetric Encryption:
Improved
Definitions and Efficient Constructions”, Proc. ACM 13th Conf. Computer and Comm. SECURITY (CSS), 2006
[2] D. Song, D. Wagner, and A. Perrig, “Practical Techniques for Searches on Encrypted Data”, Proc. IEEE Symp.
Security and Privacy, 2000
[3] D. Boneh, G. Crescenzo, R. Ostrovsky, and G. Persiano, “Public Key Encryption with Keyword Search”, Proc.
Intl Conf. Theory and Applications of Cryptographic Techniques (Eurocrypt), 2004.
[4] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure Ranked Keyword Search over Encrypted Cloud Data”,
Proc. IEEE 30th Intl Conf. Distributed Computing System (ICDCS), 2010.
[5] A. Swaminathan, Y. Mao, G.-M. Su, H. Gou,A.L. Varna, S. He, M. Wu, “Confidentiality- Preserving RankOrdered Search”, Proc. Workshop Storage Security and Survivability, 2007.
[6] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-Preserving Multikeyword Ranked Search over
Encrypted Cloud Data,” Proc. IEEE INFOCOM, 2011.
[7] H. Hu, J. Xu, C. Ren, and B. Choi, “Processing Private Queries over Untrusted Data Cloud through Privacy
Homomorphism”, Proc. IEEE 27th Intl Conf. Data Eng. (ICDE), 2011.
[8] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant Yirong Xu , “Order Preserving Encryption for Numeric
Data”, IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120
[9] Jiadi Yu, Peng Lu, Yanmin Zhu, Guangtao Xue and Minglu Li,“Toward Secure Multi keyword Top-k
Retrieval Over Encrypted Cloud Data” in IEEE Transactions on Dependable and Secure
Computing Vol.10.,
No.4, July/August – 2013.
IJRISE| www.ijrise.org|[email protected][152-156]