Mass Edit: Simultaneous Editing for the Web

Transcription

Mass Edit: Simultaneous Editing for the Web
Mass Edit: Simultaneous Editing for the Web
Robert C. Miller, David F. Huynh, Nicole Bieber, Greg Little
MIT CSAIL
Cambridge, MA 02139 USA
{rcm,dfhuynh,nbieber,glittle}@mit.edu
ABSTRACT
Simultaneous editing is a technique for repetitive text editing in which a user controls multiple cursors to edit many
lines at the same time. We present a new technique for
simultaneous editing that uses clustering to automatically
classify text lines by their structure, so that different editing
can be done on different kinds of lines. Mass Edit is an
implementation of this technique for the web browser, written in HTML and Javascript. Mass Edit has a user interface for editing text in the web browser, and also offers an
API allowing web application developers to incorporate it
as a widget. We demonstrate the API by connecting it to
Google Spreadsheets, so that a range of cells can be edited
simultaneously.
ACM Classification: H5.2 [Information interfaces and
presentation]: User Interfaces. - Graphical user interfaces.
General terms: Design, Algorithms, Human Factors.
Keywords: text editing, pattern matching, simultaneous
editing.
editing tasks, where exactly the same editing operations
must be applied to each record. But it performs poorly
when the records vary, as in a list of names where some are
written “John Smith” and others as “Smith, John” or “J.
Smith”. Often, lines that the user wants to edit, such as
names, are interspersed with lines that shouldn’t be
touched, such as phone numbers, delimiters, or just blank
lines. Previous approaches to simultaneous editing left it to
the user to separate the wheat from the chaff, so to speak,
so that different records would be handled differently.
This paper introduces a new technique for simultaneous
editing that uses clustering to automatically classify
records by their structure. Simultaneous editing is then
applied one cluster at a time, so that different editing can
be done on different clusters, and outliers stand out as singleton clusters.
We have implemented this technique in Mass Edit, a web
application written in HTML/Javascript that runs in modern, standards-compliant web browsers, including Firefox,
Safari, and Opera. Figure 1 shows Mass Edit in action.
INTRODUCTION
Text editing is full of small repetitive tasks, such as reformatting phone numbers, abbreviating or rearranging names
of people, and reformatting lists. Users have a rich basket
of tools for automating tasks like these, including find-andreplace, keyboard macros, and scripting.
A powerful technique in this basket is simultaneous editing
[6], which allows the user to edit with multiple cursors at
the same time. Simultaneous editing uses a set of regions
in the text file, called the records. Records might be lines,
paragraphs, or a more specific pattern. In simultaneous
editing mode, whenever the user makes a selection in one
record using the mouse or keyboard, the system responds
by making an equivalent selection in all other records.
Editing operations – such as typing, deleting, and cut-andpaste – affect all records simultaneously, as if the user had
applied the operations to each record individually.
Simultaneous editing is very effective for homogeneous
Figure 1: Mass Edit editing a list of names. The
original list (leftmost column) is split into clusters
(other columns) which can be edited all at once.
One interface to the web application is a standalone interface which allows users to paste lines of text from any application and simultaneously edit those lines. The application also offers a Javascript API that other web developers
can use to embed Mass Edit in their web applications. As
demonstrations, we have embedded it in Google Spreadsheets and in Potluck [3], a data-mashup tool we developed.
The contributions of this paper are a new technique that
combines multiple-cursor simultaneous editing with clustering, and a simple, fast Javascript implementation of the
technique that runs in current web browsers and can be
integrated into web applications.
RELATED WORK
Another approach to the problem of repetitive text editing
is programming by demonstration (PBD). In PBD, the user
demonstrates one or more examples of the transformation
in a text editor, and the system generalizes this demonstration into a program that can be applied to the rest of the
examples. PBD systems for text editing have included
TELS [8], Eager [1], DEED [2], and SmartEDIT [4]. The
system that most closely resembles Mass Edit is probably
Visual Awk [5]. Visual Awk allows a user to create awklike file transformers interactively. Like awk, Visual Awk’s
basic structure consists of lines and words. When the user
selects one or more words in a line, the system highlights
the words at the same position in all other lines. For other
kinds of selections, however, the user must select the appropriate tool: e.g., Cutter selects by character offset, and
Matcher selects matches to a regular expression. In contrast, Mass Edit allows selections anywhere, and uses standard editing commands like cut and paste.
We have previously used clustering in the context of find
and replace [7]. Pattern matches are organized into clusters
based on similarity of structure and similarity of context,
allowing the user to examine and replace whole clusters at
a time while considering outliers individually. Mass Edit’s
clustering was inspired by this technique, but uses a different, much simpler algorithm, and applies it to simultaneous
editing instead of find and replace.
Figure 2: Edits to any line of a cluster are automatically generalized to other lines.
A toolbar provides other common editing operations that
need to be generalized differently for different lines. To
Uppercase and To Lowercase perform alphabetic case conversion. The Fill command is used to align the cursors for
all lines in the column. It duplicates the character just before the cursor until every line’s cursor has the same offset.
Figure 3 shows the effect of applying the Fill command to
insert a row of dots. Once Fill has aligned the cursors, dots
may be added or removed by typing or deleting. The Fill
command can be used with any characters, including spaces and tabs.
USER INTERFACE
The starting interface is a text box where the user pastes the
lines to be edited. Clicking the Mass Edit button enters
simultaneous editing mode.
In simultaneous editing mode, the lines of text are clustered
by structural similarity. Each cluster of lines is displayed
in a separate column (Figure 1). Clicking on a line places
the text cursor in it, and also places a cursor at the corresponding position of every other line in the same column.
(More details about how the cursor position is generalized
are provided in the next section.) Moving the text cursor
with the keyboard arrows, or selecting a range of text,
likewise generalizes to corresponding selections in the rest
of the column.
When the user types new text, the typed characters are inserted at the cursor position of every line in the column.
Similarly, deletions are repeated for all lines.
Clipboard operations (cut, copy, and paste) are handled
differently depending on whether the clipboard text comes
from a selection made in the same line or from an external
source. Text from an external source is treated as if it were
typed, so that the same string is inserted into each line.
Text copied and pasted within the same line, however, is
generalized to do the corresponding copy and paste in the
other lines in the column. For example (Figure 2), to
switch the order of last names and first names, the user can
cut “Collins” and paste it in front of “Michael”, and the
system will generalize appropriately to the rest of the column.
Figure 3: The Fill command inserts enough characters into each line so that all cursors are aligned.
Sometimes the user needs to temporarily suspend simultaneous editing to fix an error in a single line. Mass Edit
supports this with a checkbox that floats at the end of the
current line. Checking the box removes all the other cursors, so that only the current line is edited. Unchecking it,
or clicking on any other line, resumes editing with multiple
cursors.
When the user is done editing simultaneously, clicking the
Done button returns to the text box, which now contains
the updated text (with the edited lines put back in the proper order). The user can select this text and paste it back
into the application or file from which it came.
We have experimented with integrating simultaneous editing into other web applications, starting with Potluck [3]
and Google Spreadsheets. Potluck is a data mashup tool,
which displays data from one or more sources in tabular
form, and allows the user to apply Mass Edit to the values
in any column. Similarly, in Google Spreadsheets, the user
can select a range of cells (which might be a row, a col-
umn, or an arbitrary selection) and select the Mass Edit tab
to pop up a simultaneous editing window for all the selected cells (Figure 4). When the user presses Done to stop
mass editing, the edited values are written back to the appropriate cells of the spreadsheet.
digit runs (such as 728), alphanumeric runs (such as Tomaso), whitespace runs (including tabs and spaces), and individual punctuation characters. For example, the line
UBD_NAME--55 is converted into the structural sequence
alphanumerics, punctuation, alphanumerics, punctuation,
punctuation, digits.
The lines are then clustered using the edit distance between
their structural sequences as the distance metric. Clusters
are generated greedily by choosing an arbitrary line as a
seed for the cluster, ranking the remaining lines by structural edit distance, and then scanning the ranked list for a
sufficiently large gap between two lines. A gap is considered large if it exceeds the average gap in a nearby window by more than a threshold. Lines closer than this gap
are clustered with the seed line, and the remaining lines are
recursively clustered in the same way, until the entire set of
lines has been clustered.
Figure 4: Google Spreadsheets augmented with a
Mass Edit tab (behind), which starts Mass Edit on
the selected cells (front).
DEVELOPER API
Mass Edit provides a Javascript API that allows a web application like Google Spreadsheets to embed simultaneous
editing functionality. The API is included in an HTML
page with the following tag:
<script src=“http://uid.csail.mit.edu/mass-edit/api.js”>
The API has a single entry point: massEdit(div,
data,callback). The div parameter specifies an HTML
element in which Mass Edit should display its user interface. This element may be integrated into the web application’s layout or overlaid on top. The data parameter specifies either a string (which is split into records on line boundaries) or an array of strings (which are used as records
directly). The callback parameter is a function that is
called when the user leaves simultaneous editing, passing
the edited data so that it can be incorporated back into the
application’s own data structures.
Calling the massEdit() function creates the Mass Edit user
interface in the specified div element, initialized with the
provided data. Pressing Done or Cancel removes the Mass
Edit interface and calls the callback function.
IMPLEMENTATION
The implementation of simultaneous editing used in Mass
Edit is substantially simpler than previous implementations
[6,Error! Reference source not found.], in order to make
it run efficiently in Javascript. The algorithm has three
parts: detecting the structure of each text line, clustering
lines by structural similarity, and generalizing cursor positions and selections to all the lines in a cluster. Each part is
described below.
When Mass Edit starts, it scans each text line to extract its
structure. The structure of a line consists of a sequence of
When the user places the text cursor in a line, Mass Edit
must generalize that cursor position to the other lines in its
cluster. This is done by generalizing the cursor position
into a pair, (segment, character), which identifies a structural segment and a character position within that segment.
Positive segment and character positions are offsets from
the start, while negative positions are counted from the end
(as in Python sequence indexing); the generalization algorithm uses whichever direction produces a smaller number,
seeking a minimum description. The resulting (segment,
character) pair is the generalized cursor position, which is
applied to the other lines in the cluster. For example, if the
cursor is placed as shown in Toma|so Poggio, then it is
generalized to (0,-2), indicating the 0th segment (Tomaso)
and the 2nd character from the end of that segment. Applying this generalized position to Patrick Winston produces
Patri|ck Winston. The system may be unable to apply the
generalized position to all the lines, if the user clicks in a
place where the lines differ too much in structure. In this
case, the system highlights the offending lines in red and
disables editing with the selection.
Text lines are normally displayed as uneditable HTML
elements. When the user clicks on a line to give it keyboard focus, Mass Edit converts the line into an editable
text field, and attaches listeners to the text field for selection and change events. Whenever the user moves the cursor or changes the selection, Mass Edit generalizes the selection across the other lines in the cluster and highlights
them appropriately. Whenever a change event occurs,
Mass Edit compares the old value of the text field to its
new value to determine whether text was inserted or deleted, and then generalizes the end points of the change to
the other lines in the cluster and inserts or deletes text appropriately.
One challenge with implementing simultaneous editing in a
web browser is that we cannot directly observe clipboard
operations. Clipboard contents are off-limits to browser
Javascript for security reasons, and Cut, Copy, and Paste
commands do not produce unique events that can be cap-
tured by an event listener. Our current implementation
uses a trick to detect cut and paste operations. Every string
deleted from a line is recorded as long as the line has the
focus. If a deleted string reappears in a subsequent insertion, then we assume that the user did a cut and paste operation within the line, and generalize the operation to the
other lines in the cluster by using their corresponding deleted strings. This technique can be misled, however, and
it also cannot detect Copy operations, since the Copy
command has no visible effect on the text field. We are
exploring alternative solutions, including recording all selections ever made (not just deletions), and watching for
keyboard shortcuts (such as Control-C or Command-C).
Despite being implemented in HTML and Javascript, performance of Mass Edit is sufficient for interactive editing.
On a 2GHz laptop running Windows XP and Firefox 2, the
time to startup Mass Edit and do the clustering is less than
500 ms for 100 lines of text, and 5-15 seconds for 1000
lines. Making a selection takes roughly 100 ms for a 100line cluster, and about 1 s for a 1000-line cluster.
DISCUSSION
This section discusses some of the new directions we are
looking at for Mass Edit: (1) new editing tools, (2) improvements to clustering and selection generalization, (3)
richer spreadsheet integration, and (4) more API capabilities for web application developers.
Implementing Mass Edit in the web browser opens up new
possibilities for integrating web services as editing tools.
Possible examples include converting a US postal code into
city and state; a street address into latitude/longitude coordinates; a telephone code into a time zone; and a person’s
into an email address or phone number. By hooking up
web services like these as tools in the Mass Edit toolbar,
like To Uppercase and To Lowercase, we can apply them
during editing. Because different applications may need
different tools, we plan to open up the API further to allow
application developers to define their own tools.
cursor position, so that, e.g., names written like “John Doe”
could be split into two spreadsheet columns, “John” and
“Doe.” This approach would use Mass Edit to discover the
structure inside spreadsheet cells and expose it to other
spreadsheet operations (sorting, filtering, rearranging columns, etc.). Mass Edit clustering could also be useful for
selecting cells with similar structure and using the selection
for other operations, such as changing the background color of cells that contain names with middle initials.
Finally, we are looking at extending the API to allow web
application developers to customize the Mass Edit interface. For example, Google Spreadsheet cells are not just
plain text, but can include formatting and formula information. The API should allow the spreadsheet to provide its
own custom editor for each record, in place of Mass Edit’s
default plain text field, and hooks for displaying records in
a custom way, locating selections in a record, and specifying new editing commands.
CONCLUSION
Mass Edit is a web interface and API for simultaneous editing, using automatic clustering to allow different lines to be
edited differently, and realized by a simple, fast Javascript
implementation. Mass Edit is available for public use at
http://uid.csail.mit.edu/mass-edit.
ACKNOWLEDGMENTS
We thank members of the UID group who provided valuable feedback on the ideas in this paper. This work was supported in part by the National Science Foundation under
award number IIS-0447800, and by Quanta Computer under the T-Party project. Any opinions, findings, conclusions or recommendations expressed in this publication are
those of the authors and do not necessarily reflect the views
of the sponsors.
REFERENCES
1. Cypher, A. Eager: Programming repetitive tasks by demonstration. In A. Cypher, Ed., Watch What I Do: Programming
by Demonstration, MIT Press, 1993, 205-218.
Automatic clustering generally helps separate different
kinds of records that need to be edited differently, but it
isn’t perfect. Sometimes it creates more clusters than
needed, because the differences in structure aren’t important to the user’s particular editing task. For example, if
the user only needs to edit near the end of each line, then
differences at the start of the line are largely irrelevant, and
it isn’t necessary to split based on those differences. Conversely, sometimes the clustering isn’t fine enough, leaving
heterogeneous clusters that must be edited one line at a
time. One solution to this problem would be to let the user
rearrange the clustering manually, perhaps using drag-anddrop to merge and split clusters. Clustering and selection
generalization would also be improved by recognizing
common text structure like URLs, filenames, email addresses, dates, times, etc.
2. Fujishima, Y. Demonstrational automation of text editing
tasks involving multiple focus points and conversions. In
Proc. IUI ’98, 101-108.
The integration between Mass Edit and Google Spreadsheets suggests new ways to do spreadsheet editing. One
possibility is a command for splitting cells at the current
8. Witten, I.H. and Mo, D. TELS: Learning text editing tasks
3. Huynh, D. F., Miller, R. C., and Karger, D. Potluck: data
mash-up tool for casual users. In Proc. ISWC 2007, 239-252.
4. Lau, T., Wolfman, S., Domingos, P., and Weld, D.S. Learning
repetitive text-editing procedures with SMARTedit. In H.
Lieberman, Ed., Your Wish Is My Command: Giving Users
the Power to Instruct Their Software, Morgan Kaufmann,
2001, 209–226.
5. Landauer, J. and Hirakawa, M. Visual AWK: a model for text
processing by demonstration. In Proc. VL 1995, 267–274.
6. Miller, R.C. and Myers, B.A. Interactive simultaneous editing
of multiple text regions. In Proc. USENIX 2001, 161-174.
7. Miller, R. C. and Marshall, A. A. Cluster-based find & replace. In Proc. CHI 2004, 57-64.
from examples. In A. Cypher, Ed., Watch What I Do: Programming by Demonstration, MIT Press, 1993, 205–218.