How to do Research in Computer Science 5 bits of advice
Transcription
How to do Research in Computer Science 5 bits of advice
5 bits of advice Five bits of advice How to do Research in Computer Science 1. Find a good topic Peter Eades 3. Give lots of good talks 2. Use a good research method 4. Write lots of good papers 5. Maintain good research ethics 1 5 bits of advice 2 Two extreme topics Find a good topic Five bits of advice Keywords in Armenian 1. Find a good topic Independent Ira 2. Use a good research method I have always thought that programming languages which use keywords in Armenian lead to more productive software engineering. I want to prove it. 3. Give lots of good talks 4. Write lots of good papers 5. Maintain good research ethics Phylogenia of π-systems: the case k=4 Team member Terri My professor wrote the first π-system, and for the past 17 years has been studying the phylogenia of such systems. Three other people in my laboratory are studying k=1, k=2, and k=3; I will study k=4. 3 4 Find a good topic Find a good topic Two extreme topics Independent Ira: has an idea, and wants to pursue it, even alone. Team member Terri: adds a bit to a long term team project Dangerous topic • may lead nowhere • may be uncompetitive Can be satisfying for some people Funding unlikely Safe topic Can be satisfying for some people More chance of funding 5 Independent topic Part of a team Advantages • More exciting for some people Advantages • Better support from colleagues • Good chance of funding Disadvantages • Funding unlikely • Hard to publish Disadvantages • Can be boring for some people 6 Find a good topic Two extreme topics Irene the introvert 2231-1 is a prime number Independent Part of a team My advice Most people do better in Most IT research somewhere in a team. Aisfew the middle; other sciences tend personalities are to be more oriented suitedteam to independent topics. This problem has been bothering me for decades. I can’t rest until I know the answer. Find a good topic Eddie the extravert 2231-1 is a prime number A guy in a software security company has been phoning me to ask about this “possibly prime” number, 2231-1. I’ll try to solve the problem. 7 8 Find a good topic What is a customer? Two extreme topics Irene the introvert: selfmotivated, wants to find out for her own sake. Eddie the extravert: Has a customer who wants to know, he will try to find out There is no customer Customer oriented A customer may be ¾ An industrial partner ¾ A social community ¾ A separate community of academic researchers A customer wants to know the answer to your research problem ¾ Because he/she is curious, or ¾ Because he/she will make money from it, or ¾ Because it will help the his/her research, or ¾ . . … … ….. 9 10 Find a good topic What is a customer? A customer is someone outside your academic group ¾ Not your supervisor ¾ Not people that you meet at the annual conference for your research area Customers can provide ¾ Feedback, plus ¾ Inspiration and/or Specification ¾ Maybe some coffee, plus ¾ Maybe some funding Introverted research Customer-oriented research Advantages • More exciting for some people Advantages • Good chance of good feedback • Good chance of funding • Better scientific criticism • Better grounded in reality • New problems arise Disadvantages • Funding unlikely • May be worthless to everyone except yourself • May be hard to get good feedback 11 Disadvantages • none 12 Find a good topic Introspective Two more extreme topics Narrow Nancy The effect of the use of critical path planning in managing software projects Customer-oriented Narrow and deep: An investigation of a few variable parameters, with many parameters held fixed. My advice Always ensure that your research has a customer Find a good topic Broad Betty How to manage software projects Wide and shallow: Considers many parameters at once. 13 Narrow and broad topics Narrow Nancy Assume • an OO design method • Java • small teams • 10K – 100K SLOC Investigate effect of • use of critical path planning 14 Find a good topic Broad Betty Investigate the effects of • 15 different design methodologies • 7 different programming languages • Small – huge SLOC • 17 different planning methods Find a good topic Narrow and deep topic Wide and shallow topic Advantages • More chance of pushing the boundary of knowledge • More exciting Advantages • Realistic • Good training for industrial research Disadvantages • Your “model” may be too abstract and unrealistic • It’s hard to choose the variable parameters and the fixed parameters Disadvantages • Can be boring, like a collection of undergraduate projects • Unlikely to contribute a lot to the state of knowledge 15 Find a good topic Narrow 16 Another two extreme topics Fred the fundamentalist Robustness theorems for nonpre-emptive scheduling methods Wide My advice • Choose a narrow and deep topic • Choose your variable and fixed parameters very carefully. Fundamental topic: • abstraction of specific hardware and software 17 Find a good topic Andy the applicationist Disk cache scheduling for Gnu C++ memory management on a Pentium 4 processor running Solaris Applied topic: • specific hardware, • specific software 18 Find a good topic Fundamental topic Applied topic Advantages • Your papers will have a longer life • Your work can have more applications Advantages • Easier problems • May help with getting a job in industry • Can contribute a lot to a relevant area Disadvantages • It’s hard to push the boundaries very far • Your “model” may be too abstract and unrealistic Disadvantages • Your papers can die young • Restricted applications Find a good topic Another two extreme topics Classical Kirsty P=NP Popstar Paul Wireless data mining for pervasive computing in social network censorship I want to solve an problem that has defeated many others I want a lot of newspaper coverage 19 20 Find a good topic Classical topic Hot topic Advantages • You may solve a hard problem • Your papers will have a longer life • Better referees • Higher scientific quality Advantages • Better immediate feedback • With good timing, you can get rich • Easier to publish • Easier problems • Vibrant community Disadvantages • Can be frustrating • Immediate rewards can be small Disadvantages • Your papers can die young • Scientific quality can be low Find a good topic Another two extreme styles Classical hard New hot topics My advice problems Investigate a fundamental and classical topic, with some applications to a couple of hot and applied topics. There are papers all over this range, but there is a tendency in Computer Science to be near the hot end. 21 22 5 bits of advice Find a good topic Advice on topics: summary I recommend: ¾classical topics ¾customer-oriented topics ¾fundamental, deep, and narrow topics and ¾(perhaps shallow) applications ¾to a few hot topics. Five bits of advice 1. Find a good topic 2. Use a good research method 3. Give lots of good talks 4. Write lots of good papers I recommend that you obtain breadth by being a member of a team. 5. Maintain good research ethics 23 24 Use a good research method Use a good research method Researchers need to 1. Create models of these problems abstract away the non-essential details use scientific theories and formalisms 2. Solve the model problems Use skills in CS/Math/commonsense/… Form solutions 3. Evaluate the solution to the model problem Use skills in Math/Experiments/UCST 4. Present the solution to other researchers and to the customer 5. Adjust the model according to the customer’s evaluation, and repeat. The research procedure 1. The customer has a problem. 2. The researcher produces an initial model of the problem. 3. Repeat a) The researcher solves the problem, according to the model. b) The researcher evaluates the solution of the model problem. c) The customer evaluates the solution to the real problem. d) The researcher adjusts the model. Until the customer is satisfied. 25 26 Use a good research method 1. 2. 3. Use a good research method The customer has a problem. The researcher produces an initial model of the problem. Repeat Wetheneed to know a) The researcher solves problem, according to the model. 1. How to create/adjust a model? b) The researcher evaluates the solution of the 2. How to find a solution? model problem. c) The customer evaluates solution to the 3.theHow to evaluate a solution? real problem. d) The researcher adjusts the the model. Until the customer is satisfied. Create/adjust a model 1. Creating/adjusting a model A model is formed by forgetting some of the parameters of the real problem; models are simplifications of real problems. 27 28 Use a good research method Models in IT research • Industrial research You cannot forget many parameters of the real problem Models are complex and fuzzy, maybe not so useful • Academic research (including PhD theses) You can forget most of the parameters of the real problem Models are usually formal, mathematical, and crisp. 2. Solutions • Solutions are artifacts that help the customer. Programs • Protocols To create a solution, you need a) Knowledge, and b) Courage Metaphors Architectures Algorithms Theorems .... 29 30 Use a good research method a) Knowledge • Most IT researchers draw on a number of fundamental skills to create a solution consisting of a number of artifacts. Formal logic Problem Compilers OO models Formal logic Compilers Program OO models Protocol Solution Concurrency Metaphor Algorithms Architecture Concurrency Mathematics Algorithm Algorithms Theorem Mathematics Use a good research method a) Knowledge Your current knowledge is probably not enough to create a solution. You need to increase your skill set Remember your undergraduate work ¾In Physics, Mathematics ¾Also Electronics, Psychology ¾Also in Anthropology, Poetry Read books and research papers Attend seminars and conferences Ask experts to teach you 31 32 Use a good research method b) Courage is very important In IT research, this is critical (even in industrial IT research) You need to consider wild, weird and wonderful possibilities 3. Evaluating a solution To evaluate a solution, you need a) An evaluation measure that tells you whether the solution is good or bad b) An evaluation method to compute the measure You need to disregard commonly accepted wisdom and break commonly accepted rules 33 Use a good research method a) Evaluation measures • There are three basic measures for the quality of a solution: 34 Use a good research method The three measures: 1. Effectiveness: is the solution logically correct? 2. Efficiency: does the solution use resources efficiently? 3. Elegance: is the solution beautiful, simple, and elegant? Effectiveness Measures Elegance All solutions can be measured in terms of these three parameters. Efficiency 35 36 Use a good research method b) Evaluation methods There are three basic evaluation methods The three methods: 1. Mathematics: theorems, proofs 2. Experiments Run programs on test data Collect data about software projects Test systems with human subjects Needs skills in statistics 3. UCST: Try to sell your solution Mathematics Evaluation methods Experiments Use a good research method UCST These are the only evaluation methods in information technology. And many combinations of these approaches 37 38 Example Example The plotter problem A pen plotter is a calligraphic device: it has a pen which moves over the paper to draw the picture. Example: the plotter problem 39 40 Example The plotter problem A digital plotter has a pen which can be up or down. It accepts a sequence of penUp/Down/moveTo instructions. penUp; moveTo (20,80) penDown; moveTo (80,80) penUp; moveTo (20,20) penDown; moveTo (80,20) penUp; moveTo (20,20) penDown; moveTo (80,20) penUp; moveTo (20,80) penDown; moveTo (20,20) penUp; moveTo (80,80) penDown; moveTo (80,20) penUp; zero Example The order of the moveTo instructions has an effect on the pen-up time. The plotter problem is to order the instructions to minimize pen-up time. penUp; moveTo (20,20) penDown; moveTo (20,80) moveTo (80,80) moveTo (80,20) moveTo (20,20) penUp; zero 41 42 Example Pup = ∑ (length _ of _ lineseg ) Difficulty #1: The model is wrong The plotter time is not proportional to distance: the plotter accelerates to top speed, runs at top speed, then decelerates to stop. A specific plotter has A top speed s. A time t0 and distance do to reach top speed from stationary. A time t1 and distance d1 to slow down to stop from top speed. Thus if the pen travels distance d, then the time is t0 + s(d – d0 – d1) + t1. As long as d>d0+d1. BUT . . . pen −up linesegs = ∑ ( xstart − x finish ) 2 + ( y start − y finish ) 2 = 20 2 + 80 2 + 60 2 + 60 2 + 60 2 + 60 2 + 60 2 + 60 2 + 80 2 + 20 2 = 2 2000 + 3 7200 ≅ 357 (20,80) (20,20) (80,80) (80,20) Example Total plotter distance with pen up = Pup = 357cm If speed = 5cm/sec, then total plotter time with pen up ≈ 71 seconds. But . . …. 43 44 Example Example The model problem BUT: The model is good enough The times t0 and t1, and distances do and d1 are both quite small In practice t0 + s(d – d0 – d1) + t1 ≈ sd We have: A set of “primitives”, where Each primitive has a start point and a finish point. We want: An ordering for the primitives to minimize pen up time. We have a trade-off between effectiveness and elegance: by ignoring acceleration and deceleration we Lose a very small amount of effectiveness, in terms of the accuracy of the model Gain a lot in the elegance, in terms of the simplicity of the model 45 Example 46 Example UCST Evaluation The greedy solution can be “proven” effective by UCST: “Since it chooses the best alternative at each stage, it gives minimum pen up time”. One easy solution is the greedy solution: 1. Choose the first primitive so that its start point is the closest start point to PEN_ZERO. 2. Repeat for k=1 to NUM_PRIMS-1 Choose kth so that its start point is the closest unused start point to the previous finish point. This may be convincing for some customers, but not for good scientists. The greedy solution can be proven elegant by UCST: it is easy to understand, easy to implement. 47 48 Example Example Evaluation by mathematics The effectiveness of the greedy solution can be investigated using Mathematics. Evaluation by mathematics The effectiveness of the greedy solution can be investigated using Mathematics. First, it does not always give optimal results. First, it does not always give optimal results. 0.99 1.0 Total penup distance ≈ 12.5 49 50 Example Example The optimal path is shorter. Greedy: Total penup distance ≈ 12.5 GREEDY 12.5 ≅ ≅ 1.8 OPT 7 Optimal: Total penup distance ≈ 7 Total penup distance ≈ 7 51 52 Example BUT: The greedy method is close to optimal: Experimental Evaluation Theorem If GREEDY is the penup time with the greedy solution and OPT is the penup time with the optimum solution then GREEDY / OPT = O( logn ). Plotter instructions Greedy Algorithm Effectiveness test Random Proof Lots of proof lots of mathematical symbols more and more pages of equations and stuff like that it goes on for 30 pages with lemmas and corollaries and lots of proof lots of mathematical symbols more and more pages of equations and stuff like that it goes on for 30 pages with lemmas and corollaries and Lots of proof lots of mathematical symbols more and more pages of equations and stuff like that it goes on for 30 pages with lemmas and corollaries and lots of proof lots of mathematical symbols more and more pages of equations and stuff like that it goes on for 30 pages with lemmas and corollaries and Lots of proof lots of mathematical symbols more and more pages of equations and stuff like that it goes on for 30 pages with lemmas and corollaries and lots of proof lots of mathematical symbols more and more pages of equations and stuff like that it goes on for 30 pages with lemmas and corollaries and Example Measure Pup Compute LBOPT Calculate Pup/LBOPT Customersupplied Benchmark 53 54 Example Example Effectiveness test • We want to compute Pup/OPT as a measure of effectiveness. • However, it is difficult to compute OPT. • Instead we compute LBOPT, a lower bound for OPT (ie, LBOPT<OPT) • Then we have an upper bound on effectivness, since Pup/LBOPT > Pup/OPT Effectiveness test Experiments showed that greedy is very close to optimal: for larger plots it is within 10% of optimal. 120 %Pup/LBOPT 110 Measure Pup Compute LBOPT Calculate Pup/LBOPT BUT . . . 100 1000 2000 3000 4000 5000 6000 55 56 Example Example The research procedure Difficulty #2 We replaced the quality evaluation with a real plotter Plotter instructions Greedy Algorithm 1. The customer has a problem. 2. The researcher produces an initial model of the problem. 3. Repeat a) The researcher solves the problem, according to the model. b) The researcher evaluates the solution of the model problem. c) The customer evaluates the solution to the real problem. d) The researcher adjusts the the model. Until the customer is satisfied. Real plotter And timed the real plotter using the wall clock. It revealed two problems: • The model was wrong, • The greedy algorithm was not efficient. 57 58 Example Example Solution #2: Our model was wrong At a micro-level, the plotter pen moved in three ways: Horizontally Vertically (some plotters) At 450 to horizontal Each micro-movement takes one unit of time. This implies that the distance function is L1 or L∞ rather than L2. Mathematical Evaluation with the new model • It was easy to check that the mathematical results remain true for any distance function, and this change in model did not change the theorems significantly. Theorem If GREEDY is the penup time with the greedy solution and OPT is the penup time with the optimum solution then GREEDY / OPT = O( logn ). • The higher level of abstraction implies that mathematical methods are robust. 59 60 Example Difficulty: Our solution was not efficient in the customer context Difficulty #3 Our solution was not efficient in the customer context Plotter instructions Greedy Algorithm Example Classical solution: ¾Use computational geometry and clever data structures, reduce the time complexity of the algorithm to O(nlogn) Real plotter Better solution: be more creative, break the rules Æ Æ Æ Æ Æ The greedy algorithm runs in time O(n2). This was slower than the drawing procedure. 61 62 Example Example Solution #3 Optimize one buffer-sized section at a time. The bufferised greedy algorithm was almost as effective as the straight greedy algorithm, and much faster. plotter Greedy Algorithm Buffer Plotter mechanics plotter Greedy Algorithm An “optimized” bufferfull is sent from the greedy algorithm to the buffer whenever the plotter exhausted the current buffer. Buffer Plotter mechanics 63 64 Example Example Lessons to learn 1. The research procedure is loopy 2. You need to change your model 3. Laboratory experiments are different to experiments in context 4. Working with a customer motivates good science 5. Breaking the rules of your research area can give good results 6. Different evaluation techniques (maths, empirical, UCST) have different strengths 7. A full evaluation is a combination of the three techniques Lessons to learn 65 Mathematics •Robust to model changes •Does not evaluate the •Good evaluation of model pathological behavior Experiments •Evaluates the model •Good evaluation of normal behavior •Poor evaluator for pathological behavior UCST •OK to evaluate elegance •Poor evaluator of efficiency / effectiveness. 66 5 bits of advice Use a good research method Five bits of advice My advice To evaluate your solution • Concentrate on mathematical and experimental methods, avoid UCST • Relate your results to effectiveness, efficiency and elegance • Try to evaluate with your customer 1. Find a good topic 2. Use a good research method 3. Give lots of good talks 4. Write lots of good papers 5. Maintain good research ethics 67 68 Give good talks Give good talks Once per year Giving a talk is beneficial to the speaker It helps you ¾define your problem ¾understand your own work ¾organize your ideas ¾become famous ¾write papers It brings feedback from others 2 – 3 times per year Often You can present your research At IK-CCs (international killercompetitive conferences) At NLCs (nice local conferences) To research visitors to your lab As a poster / web page To your sister . . . Continuously 69 70 Give good talks Give good talks a) Organization How to give a talk at a conference Giving a talk consists of three elements: a) Organization b) Talking and walking c) Visuals 0 Motivation 5 Some comments about research conference presentations Æ Æ Æ Æ Æ 15 20 23 25 71 Overview of the research Everyone understands Something difficult Overview Conclusion Some understand 72 Give good talks Example: b) Talking and walking Look at the audience as much as possible ¾Choose specific people to focus on Speak slowly and clearly, and avoid idiomatic English ¾English is a second language to most people in IT Use your hands for expression ¾avoid holding a microphone Don’t waste time ¾Check your data-projector connection beforehand Title: Fast spatial data mining in low dimensions 0 Data mining helps people 5 Your data mining algorithms: • description at a high level • no proofs, no details Everyone understands Math for the 2D case 20 Chart of experimental results 23 Repeat main results 25 Some understand 15 Give good talks 73 Give good talks c) Visuals Use a medium that is suitable ¾Use a computer for graphics ¾Use a blackboard for mathematics Ensure that your visuals are perfect ¾No speeling errors ¾No spacing errors ¾Attractive layout (e.g., avoid linebreaks as much as possible) Don’t use visuals as notes to yourself Avoid words; use pictures wherever possible Avoid ducks 74 Give good talks Look at the audience; avoid ducks ducks 75 Give good talks Look at the audience; use your hands 76 Give good talks Look at the audience; avoid holding a microphone; ensure that your slides are perfect 77 78 Give good talks Give good talks Avoid words; use pictures Look at the audience 79 80 Give good talks Give good talks Use the slides for the audience, not as reminders for you More advice • Give a practice talk to your team • Ask people to look out for errors and ducks in the visuals idiomatic and ambiguous English not looking at the audience and write it all down, and tell you • Video the talk, look at the video Formal specification of Security Protocols • • • • The need for security The need for formal specification Porter and Quirk’s language Inadequacies 81 82 5 bits of advice Write good papers Five bits of advice You can write One or two theses Papers ¾in NLCs ¾IK-CCs ¾Journals Chapters in books Books 1. Find a good topic 2. Use a good research method 3. Give lots of good talks 4. Write lots of good papers 5. Maintain good research ethics 83 84 The paper writing process Write good papers The paper writing process Draft a journal paper Adjust for a conference • Reduce the size • Re-write introduction Accepted? Y N Write the journal paper properly 85 86 Write good papers Draft a journal paper Conference papers Extract a Extract a Extract a paper for paper for Note: avoid paperrecirculation for conference C conference A conference B Write the journal paper properly 87 Write good papers 88 Write good papers There are three basic kinds of conferences NLC How the conference paper process works a) You write the paper b) You submit the paper to the program committee chair c) The program committee chair sends it to members of the program committee (takes about a week) d) They read it (in about 4 weeks) and write a brief report. They decide whether to accept your paper e) If your paper is accepted, you revise the paper according to the referee’s comments (2 – 4 weeks) f) You give a talk at the conference IK-CC Conferences Any many in between Scams 89 90 Write good papers Write good papers How do the program committee decide which papers to accept? • In most cases, the papers are scored by members of the PC, then sorted on score. • Very few papers get a very high score or very low score. • Accept/reject decisions for middle-score papers can be fairly arbitrary 10 - 20% Obviously rejected 60 - 80% random and ad-hoc decisions Four steps for conference papers 1. Write a good paper 2. Choose a good conference, and adjust your paper to that conference 3. Send the paper, before the deadline. 4. Sit around and hope that it is accepted 10 - 20% Obviously accepted 91 92 Write good papers 1. Write a good conference paper Assuming that that the page limit is 10 pages: 0 3 Motivation and background Main results 8 8.5 Conclusion Write good papers 1. Write a good conference paper Some advice • Keep it simple: only one main idea, two if you push it. ¾ Most PC members are busy, some are lazy ¾ They have at most 30 minutes to read your paper ¾ Abstract should be short and extremely well written • Make sure that the presentation is perfect ¾ Grammar perfect ¾ Figures beautiful ¾ Exactly the right length ¾ Font size 11 or 12 ¾ Nice layout Everyone understands Experts understand Everyone understands References 10 93 94 Write good papers 2. Choose a good conference, and adjust your paper to that conference Choose a conference ¾ The best possible (see the CORE ranking) ¾ A good program committee ¾ Realistic deadline ¾ Avoid “scams” Adjust your paper ¾ Motivation aimed toward the conference community ¾ Research methods that are familiar to the conference community ¾ Look at web pages of the program committee, and write your paper for them to read Write good papers 3. Send the paper before the deadline* • • Maybe 5 minutes before the deadline Maybe 1 minute before the deadline * Some deadlines are soft, but most highrejection-rate conferences have hard deadlines with no excuses 95 96 Write good papers The journal paper process 4. Sit around and hope that it is accepted • Many many many good papers get rejected for reasons beyond your control • Don’t worry if it is rejected Write good papers 1. Revision 2. Submission 3. Refereeing 4. Published 97 Write good papers 1. Revise (from conference paper(s)) 98 Write good papers 2. Submission You submit it to an editor of a journal a) Choose a person who is an editor and who knows the field well b) Choose the best journal for which your chosen person is on the editorial board c) Send the paper to her/him (even if the journal’s web page instructions say to send it to the managing editor or someone else) d) If she/he does not reply within 7 days, then send a reminder Advice Give yourself a deadline ¾ special issues are good Describe everything fully, in layers of detail ¾ You can delete stuff later ¾ Prove every theorem ¾ Give full literature background ¾ Give full details of experiments 99 Write good papers Note: you should send a reminder every 6 months 3. Refereeing a) The editor sends it to referees, with a three month deadline b) The referees ignore it until the deadline c) The editor sends a reminder, and suggests a new deadline d) The referee reads it (takes many hours, perhaps a few days) and writes: • A report • A recommendation (accept | revise | reject) e) The editor sends you the reports 100 Write good papers 1. Revise • If “accept”, then you make minor revisions and proceed to publication • If “revise” or “reject”, then you revise and re-submit it Don’t get annoyed But don’t take “no” for an answer You can choose a different journal but you should assume that you will get the same referees Address every point made by the referees; record how you addressed it 101 102 5 bits of advice Write good papers Five bits of advice How to get your paper rejected 1. Find a good topic If you really want your paper rejected, here are the top methods 1. Write in bad English 2. Be unaware of current trends in the specific conference community 3. Organize your thoughts badly 4. Omit motivation 2. Use a good research method 3. Give lots of good talks 4. Write lots of good papers 5. Maintain good research ethics 103 104 Ethics Ethics 1. Philosophy and religion In western countries, the dominant ethical philosophy is John Stuart Mill’s utilitarianism: an action X is better than an action Y if X leads to the greater good for humans. There are three main sources of ethics 1. Philosophy and religion 2. The law 3. Professional guidelines Aside: • Discussion of research ethics without considering these sources is pointless. • Amateur ethicists are dangerous 105 106 Ethics Ethics 3. Professional guidelines 2. The law The main laws relevant to computer scientists are Intellectual Property laws. These cover • Patents • Copyright • Trade secrets • Trademarks These laws are very important for industrial research. Written guidelines on ethics are available from Every University Every Government research laboratory ACM code of practice ACS code of practice These cover a wide range of ethical problems that might occur in industry and research. 107 108 Ethics The main issues 8. 9. What is scientific misconduct? Processes for dealing with misconduct Mainly form official policies AVCC Guidelines on Research Practice University of Sydney Guidelines NICTA guidelines CSIRO guidelines All three are discussed in University policies, procedures, and guidlines. General principles Data storage Authorship Plagiarism/Recirculation Omission Supervision Conflict of interest For academic research in IT there are three top ethical issues: a) Authorship: who should be the author of a paper? b) Recirculation, or self-plagiarism: writing the same paper twice. c) Omission: failing to say something relevant. 1. 2. 3. 4. 5. 6. 7. 10. What happens in practice? 11. Some scenarios Summary of guidelines . .. … …. Æ 109 110 General Data storage 2. Data storage Data used in experiments must be stored Must be preserve privacy as defined by an Australian Standard Minimum 5 years Should be stored in the institution as well as with the researcher If you publish a paper based on some data, then you should make the data available to other researchers on demand 1. General principles Researchers must maintain: High Standards Discipline-specific ethics Workplace safety Confidentiality (e.g., in questionnaires from humans) Research results should be open to scrutiny by peer review. (Secrecy is possible, but only for a limited time) 111 112 Ethics Ethics 3. Authorship Authorship Authorship is substantial participation, including: • Conception and design, or analysis of data; and • Drafting/revising the paper; and • Final approval of the version to be published. An author's role must be sufficient for that person to take public responsibility for the paper. Authorship is not • Helping to get funding for the project • The collection of data • General supervision of the research group ¾ “Honorary authorship” is not acceptable. 113 114 Authorship Authorship Authorship Authorship Co-authors • Early in the project, you should discuss who will be an author • One of the co-authors should be assigned to keep records (of experiments etc), and formally accept responsibility for the entire paper. • Authors should sign an authorship statement • This statement should be kept on file in the institute • All authors should agree to being an author All people qualified to be authors should be authors • No person who is allowed to be an author can be excluded as a author. Non-authors who have contributed (e.g., funding) should be acknowledged 115 116 Ethics Publications More guidelines about publications • Private publication (non-reviewed) is OK, but you should explicitly say that it has not been reviewed • You must acknowledge the sources of financial support (as a declaration of possible conflict of interest) • Publishing lies is not allowed 4. Plagiarism/Recirculation Recirculation, or self-plagiarism Publication of multiple papers based on the same data is not acceptable except where there is full cross-referencing within the papers Before you submit two similar papers, you should tell both editors/publishers Always cite previous papers that you have written with a similar theme/content As a rule of thumb, don’t copy-and-paste anything except some parts of the introduction 117 118 Ethics Supervision 6. Supervision 5. Omission You should not omit to say something significant ¾ Obvious example: suppose that your algorithm is derived from an algorithm by person X. Then you should say so in your paper. ¾ There are many more subtle examples. Your supervisors should • Be well qualified • Have a reasonable staff/student ratio • Give you ethics guidance • Ensure (as far as possible) the validity of the data 119 120 Conflict of interest Conflict of interest Conflict of interest Refereeing You cannot referee a paper if you have a conflict of interest with one of the authors. A conflict of interest defined for ACM/IEEE conferences as any situation where you don't feel that you can make an objective assessment, including: you are a co-author one of your current or former students is a coauthor your supervisor / former supervisor is a co-author a colleague from your current institution is a coauthor a colleague who you have worked with on a research project in the past 5 years is a co-author 7. Conflict of interest Money If you will gain financially from some research, then you should say so in the publications etc. 121 122 Misconduct Scientific misconduct = “fabrication, falsification, plagiarism, or other unacceptable practices”. For example: misleading ascription of authorship listing of authors without their permission attributing work to others who have not in fact contributed to the research lack of appropriate acknowledgment of work primarily produced by someone else 8. What is scientific misconduct? ¾ According to the guidelines Æ It does not include honest errors or honest differences in interpretation 123 124 Misconduct Misconduct Sin #1: Misappropriation Sin #2: Interference You should not: Plagiarize (present of the words or ideas of another as his or her own, without reference) Use information in breach of confidentiality associated with the review of a manuscript or grant application Omit reference to the relevant published work of others for the purpose of inferring personal discovery You should not: damage any research-related property of another 125 126 Misconduct Sin #3: Misrepresentation 9. What happens if someone breaks the guidelines / code of ethical behaviour? You should not: Tell lies Omit to say something significant 127 128 Processes for dealing with research misconduct University process 1. Someone makes an allegation 2. The Deputy Vice Chancellor for Research is notified. 3. It is investigated locally, to see whether it is serious. 4. If it is serious, then the matter is referred up the chain, to a series of committees 10. What happens in practice? How does it work in practice? … … … Æ 129 In practice You want: To advertise your research as widely as possible To write as many papers as possible To become rich and/or famous 130 In practice Authorship and acknowledgement Always ensure that every author on the paper agrees to be an author Always be generous in inviting authorship Don’t be insulted if someone declines to be an author Always acknowledge generously ¾Funding sources ¾People with whom you have had significant discussions ¾Generously reference as many relevant papers as possible These things may tempt you to ignore research ethics, BUT you should resist the temptation 131 132 In practice Multiple submission/ recirculation/ self plagiarism It is a difficult issue ¾Multiple submission is bad ¾But two papers can be close to each other Rule of thumb: “Introduction” may be created by copy-paste, but no other part of the paper should be created by copy-paste Ask your boss/supervisor if in doubt Multiple submission makes people mistrust you; trust between researchers is very important In practice Plagiarism Always acknowledge and reference generously Never quote without quote marks Never cut-and-paste from someone else’s work If you use someone’s figures/pictures, ask their permission first. ¾If they do not give permission, then try to use alternative pictures ¾If they do give permission, then acknowledge them fully in the figure caption 133 134 In practice Omission Omission is unfortunately common, even among senior scientists Very difficult to prevent Relies mostly on peoples sense of scientific honesty If you keep rigorously honest, for example: ¾do not use “straw man” comparison; ¾report experiments fully, even when the results did not go as you expected, then people will trust you more In practice University processes in practice • The sinner is warned and the warning is stored in a file. • Bad offences, and repeat offences, result in: ¾ Staff being fired ¾ Students being thrown out ¾ Supervisors being fired / forced to resign (sometimes because their students were involved in research misconduct) ¾ Legal action 135 136 Scenario: Millicent and Mutter, with Dingle 1. An honours student, Millicent, writes a brilliant thesis on simplifying agent-oriented concept design (AOCD). 2. His supervisor, Professor Mutter, sees that it is brilliant and turns it into a joint paper, which is accepted to the rank A conference AOS2003. 3. Millicent gets first class honours and goes to work in Sweden. 4. Professor Mutter presents the joint paper at AOS2003 in Tokyo. 5. Professor Dingle from Ohio State University sends an email to Professor Mutter pointing out that the brilliant simplification of AOCD was all in a paper that Dingle published in 1998. She accuses Mutter of plagiarism. Plus discussion 3. Some scenarios 137 138 Scenario: Millicent and Mutter, with Dingle (cont.) Scenario: Robbie and the Rapid Router 1. Robbie has a new routing algorithm RR that he thinks is faster than previous algorithms. 2. There are some published benchmark data sets for this kind of routing. 3. RR is a randomised algorithm (eg, genetic algorithm) that gives a different result every time you run it. 4. Robbie runs RR on the benchmark 1000 times, and finds that the average runtime is 231.1ms; while the maximum runtime is 1451.7ms and the minimum is 62.1 ms. 5. The best previous result on this benchmark used 81.3ms. 6. Robbie submits a paper reporting that his new algorithm is better than previous algorithms because it ran on the benchmark in 62.1ms. 7. ….? 6. Mutter checks Millicent’s thesis against Dingle’s paper and finds that large sections have been copied, word for word; Mutter apologizes to Dingle. 7. Mutter writes to his dean and asks that Millicent’s honours degree be rescinded. 8. The dean accuses Mutter of plagiarism. 9. The case works its way up the University disciplinary system. 10. The university offers Mutter a choice: accept a demotion to Associate Professor, or resign. 11. Mutter resigns. 139 Scenario: Bertie, Bogie and his wife 140 Scenario: Ellen and the middle-aged Miles 1. A PhD student, Ellen, goes to a conference, and gives a talk. 2. After the talk, a middle-aged respectable professor (called Miles) asks Ellen lots of questions, and asks her about her future directions. 3. Ellen tells him everything; she is very happy that Professor Miles is interested in her work. 4. Miles is ambitious, but he has only published two papers in the last three years. 5. Six months later, Miles publishes a paper which has all the stuff that that he and Ellen discussed. 6. The paper has no acknowledgement to Ellen. 7. …? 1. Bertie, the departmental director of research, does not like Associate Professor Bogie. 2. Bertie notes that Bogie has written a joint paper with his wife, who is a student at a different University. 3. Bertie begins to look through Bogie’s many papers and finds three papers which are almost the same. They are published at three different conferences. 4. In the meantime, Bogie accepts a job as Professor at a different University. 5. Bertie writes to the director of research at Bertie’s University, pointing out that at least three papers of Bogie are virtually the same. 6. … ? 141 Scenario: Kathleen and Mabel, with Malmsbury error detection 142 Scenario: Banbury, Brightwistle and the X-Rays 1. A PhD student, Banbury, invents a wobbly algorithm and applies it to 1996 chest X-ray data from Wentworth. 2. He publishes the paper in WOBBLY2004, claiming that it is better than the 2003 wobbly algorithm of Brightwhistle and Scott. 3. Brightwhistle gets annoyed, because she thinks her algorithm is the best. 4. Brightwhistle wants to test her 2003 algorithm on the 1996 chest X-ray data from Wentworth, and asks Banbury for the data. 5. Banbury replies that he spent half his grant extracting the data from the database, and if Brightwhistle wants the data, then she can get it herself. 6. … ? 1. A PhD student, Kathleen, is writing a paper for a conference (in Colorado) and discusses it a lot. 2. Mabel has a great idea that would fit right in Kathleen’s paper. 3. Kathleen and Mabel chat and agree to include Mabel’s idea; also to include Mabel as an author. 4. The paper is accepted, and presented at the conference by Mabel (Kathleen is in Norway at another conference). 5. At the conference, Professor Marmsbury sees a critical error in the paper. 6. Mabel says “It’s really not my paper, Kathleen wrote it, it’s her error”. 7. … ? 143 144 Scenario: Formby and the X-rays 1. A PhD student, Formby, invents a googly algorithm and applies it to 1996 chest X-ray data from Wentworth. 2. The algorithm runs well on this data. 3. He publishes the paper in GOOGLY2004 4. Section 5 of the paper is “Evaluation”, based on the 1996 Wentworth chest X-ray data. 5. A year later, Formby is depressed because he hasn’t discovered any new googly algorithms for a while. 6. He applies his original googly algorithm to 1997 chest Xray data from Billingworth. 7. He submits a paper in GOOGLY2005, same as the 2004 paper, except that Section 5 uses the 1997 Billingworth chest X-ray data. 8. … ? 145