My research statement
Transcription
My research statement
Shikha Singh Research Statement 1/3 I am interested in designing algorithms to solve practical problems posed by strategic behavior and the emergence of big data. Strategic behavior is ubiquitous and comes into play whenever the input of an algorithm is collected from self-interested agents. For example, the voters in an election, or the bidders in an auction can lie to manipulate the outcome for their own benefit. Mechanisms are algorithms tailored to incentivize truthful behaviour from participating agents and studied under the discipline of algorithmic game theory. When dealing with massive datasets, the performance of an algorithm is governed by factors outside traditional algorithmic paradigms. Conventionally, the efficiency of an algorithm is determined by the number of computations it has to perform. However, when the size of the input is too large to fit in in the memory of a device, computing on it requires transferring smaller chunks of data between an external disk and memory. These input/output operations then become a bottleneck, and external memory algorithms are developed to minimize the transfer cost [4]. Furthermore, standard algorithm design assumes that the entire input is known from the start. However, in many situations the data might not be available all at once but arrives over time. Online algorithms [14] specialize in performing well on such uncertain data. Online and External Memory Algorithms Sorting big data is one of the most fundamental computational tasks, used heavily by websites such as Google and Amazon for content organisation. Sorting is also the core operation in all database management systems [22, 25, 27]. External merge sort is a well-known and widely used external memory sorting algorithm [4, 22, 26, 27]. In a joint work with Micheal Bender, Samuel McCauley, Andrew McGregor and Hoa Vu [9], we improve the first phase of external merge sort. We revisit the classic problem of run generation, which has been studied for over 50 years [17, 24, 30]. We provide the first theoretical analysis of the oldest and most common technique for run generation called replacement selection [18, 25, 26, 30]. While it was known to perform well on random data [17], we show that a simple modification of replacement selection, called up-down replacement selection performs asymptotically better. This result extends the analysis of the same technique by Knuth in 1963 [24]. Our optimal online algorithm for run generation had been proposed as a practically implementable heuristic, shown to work well in practice as well [30]. Thus our work establishes the theoretical foundations for analysing heuristics proposed in previous literature [10, 18, 24, 26, 30], which will guide future research on the subject. Algorithmic Game Theory With the advent of cloud computing [1, 3] and crowd-sourced internet marketplaces [2], it is essential to have verification schemes through which the client can ensure correctness of the computation performed by third parties. Such delegation of computation becomes especially important for computationally weak devices such as cellphones and tablets. For businesses to conduct such exchange of money for resources, we need verifiable protocols for delegation of computation [19, 23, 32]. Interactive proofs [7, 20] are a classical way to perform such verification. A weak client or verifier can successfully verify the claims made by a powerful server or prover through an interactive proof. However, interactive proofs are impractical due to the high computation and communication cost incurred by the verifier. Recently, rational proofs [6] were introduced as a Shikha Singh 2/3 simple and efficient alternative to interactive proofs. They incorporate a reward for the prover, which is computed by the verifier based on estimation tools (such as scoring rules [35]). The prover is rational in the game theoretic sense, that is, he only wants to maximize his reward. Rational proofs ensure that the prover’s reward is maximized only if he carries out the desired computation correctly. However, the model of rational proofs only allows for a single prover, while in practice multiple provers might be involved as third-party servers. In a collaborative work with Jing Chen and Samuel McCauley [11], we extend rational proofs to allow any number of provers. With multiple untrusted rational provers, there is an added risk of possible collusion—the provers could cooperate together to obtain a better reward. Our proof system is robust against such collusion and is more powerful than all existing interactive proof systems. Thus, this paper resolves an open problem posed in [6]. We characterize the proof system for two class of provers—provers sensitive to very small losses in reward and those who are not. In future, we hope to construct super-efficient multi-prover rational proofs for useful complexity classes. Future Directions Online Facility Location. In the literature of algorithmic game theory, the data available to an algorithm is collected through strategic agents who might lie to shift the outcome in their favor. For example, if the government wants to establish a public facility such as a park, and asks the citizens to report the ideal location they prefer, they could provide incorrect information to move the establishment closer to them. Similarly, in public elections, several voters might independently or in a group collude to make their favorite candidate win. Mechanisms involving monetary payments are often unsuitable in such situations due to ethical or legal reasons. Thus, moneyless mechanisms are required which are strategyproof, that is, the strategic agents cannot affect the outcome by providing dishonest information [31]. I am interested in the problem of facility location and its variants [15, 29, 31], which is a model problem for such mechanisms without money. In particular, I want to construct strategyproof mechanisms without money for online facility location. In this problem, the agents arrive over time and decisions regarding the number of facilities and their location need to be made online, so as to optimize a social objective. I wish to apply new techniques from mathematical programming, such as Lagrangian duality [34], to solve this problem. Online Algorithms for Estimating Sortedness of Big Data. The research on the run generation problem [9] has fueled my interest in developing algorithms which combine the essential ingredients of external memory [4] and streaming [8]. Streaming algorithms process the input one at a time and take few (ideally one) passes over the data to estimate useful information [16, 28, 33]. I want to design algorithms which have some foreknowledge of the incoming stream (equal to the size of the memory), and can store essential information on an external disk (at the cost of expensive transfers). Thus they are sensitive to both the constraints usually associated with big data (availability over time and disk access cost). I want to focus on constructing such an algorithm to estimate the sortedness of a data stream [13, 21], which is a well-known problem with important applications in bio-informatics [5, 12]. References [1] Amazon elastic compute cloud. Online at http://aws.amazon.com/ec2/. [2] Amazon mechanical turk. Online at https://www.mturk.com/mturk. [3] Sun utility computing. Online at http://www.oracle.com/us/sun/index.htm. Shikha Singh 3/3 [4] A. Aggarwal and S. Vitter, Jeffrey. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, Sept. 1988. [5] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of molecular biology, 215(3):403–410, 1990. [6] P. D. Azar and S. Micali. Rational proofs. In Proc. 44th Annual Symposium on Theory of Computing (STOC), pages 1017–1028, 2012. [7] L. Babai. Trading group theory for randomness. In Proc. 70th Annual ACM Symposium on Theory of Computing (STOC), pages 421–429, 1985. [8] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. 21st ACM Symposium on Principles of Database Systems (PODS), pages 1–16, 2002. [9] M. Bender, S. McCauley, M. Andrew, S. Singh, and H. Vu. Run Generation Revisited: What Goes Up May or May Not Come Down. Submitted. Available online at http://www.cs.stonybrook.edu/~shiksingh/ BenderMcMc15.pdf. [10] B. Chandramouli and J. Goldstein. Patience is a virtue: Revisiting merge and sort on modern processors. In Proc. 2014 ACM SIGMOD Int’l Conference on Management of Data, pages 731–742, 2014. [11] J. Chen, S. McCauley, and S. Singh. Rational interactive proofs with multiple provers. Submitted. Available online at http://www.cs.stonybrook.edu/~shiksingh/ChenMcCauleySingh.pdf. [12] A. L. Delcher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27(11):2369–2376, 1999. [13] F. Ergun and H. Jowhari. On distance to monotonicity and longest increasing subsequence of a data stream. In Proc. 90th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 730–736, 2008. [14] A. Fiat. Online algorithms: The state of the art (lecture notes in computer science). 1998. [15] D. Fotakis and C. Tzamos. Winner-imposing strategyproof mechanisms for multiple facility location games. In Internet and Network Economics, pages 234–245. 2010. [16] A. G´ al and P. Gopalan. Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. SIAM Journal on Computing, 39(8):3463–3479, 2010. [17] B. J. Gassner. Sorting by replacement selecting. Communications of the ACM, 10(2):89–93, 1967. [18] M. A. Goetz. Internal and tape sorting using the replacement-selection technique. Communications of the ACM, 6(5):201–206, 1963. [19] S. Goldwasser, Y. T. Kalai, and G. N. Rothblum. Delegating computation: interactive proofs for muggles. In Proc. 40th Annual ACM Symposium on Theory of Computing (STOC), pages 113–122, 2008. [20] S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity of interactive proof systems. SIAM J. Comput., 18(1):186–208, 1989. [21] P. Gopalan, T. Jayram, R. Krauthgamer, and R. Kumar. Estimating the sortedness of a data stream. In Proc. 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 318–327, 2007. [22] G. Graefe. Implementing sorting in database systems. ACM Computing Surveys (CSUR), 38(3):10, 2006. [23] J. Kilian. A note on efficient zero-knowledge proofs and arguments. In Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, pages 723–732, 1992. [24] D. E. Knuth. Length of strings for a merge sort. Communications of the ACM, 6(11):685–688, 1963. [25] D. E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. 1998. [26] P.-˚ A. Larson. External sorting: Run formation revisited. IEEE Transactions on Knowledge and Data Engineering, 15(4):961–972, 2003. [27] P.-˚ A. Larson and G. Graefe. Memory management during run generation in external sorting. In Proc. 1998 ACM SIGMOD Int’l Conference on Management of Data, volume 27, pages 472–483, 1998. [28] D. Liben-Nowell, E. Vee, and A. Zhu. Finding longest increasing and common subsequences in streaming data. Journal of Combinatorial Optimization, 11(2):155–175, 2006. [29] P. Lu, X. Sun, Y. Wang, and Z. A. Zhu. Asymptotically optimal strategy-proof mechanisms for two-facility games. In Proc. 11th Annual ACM conference on Electronic Commerce, pages 315–324, 2010. [30] X. Martinez-Palau, D. Dominguez-Sal, and J. L. Larriba-Pey. Two-way replacement selection. In Proc. of the VLDB Endowment, volume 3, pages 871–881, 2010. [31] A. D. Procaccia and M. Tennenholtz. Approximate mechanism design without money. In Proc. 10th Annual ACM conference on Electronic commerce, pages 177–186, 2009. [32] G. N. Rothblum. Delegating computation reliably: paradigms and constructions. PhD thesis, Massachusetts Institute of Technology, 2009. [33] X. Sun and D. P. Woodruff. The communication and streaming complexity of computing the longest common and increasing subsequences. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 336–345, 2007. [34] N. K. Thang. Lagrangian duality based algorithms in online scheduling. arXiv:1408.0965, 2014. [35] R. L. Winkler. Scoring rules and the evaluation of probability assessors. Journal of the American Statistical Association, 64(327):1073–1078, 1969.