A study of three models for image-text relations
Transcription
A study of three models for image-text relations
A study of three models for image-text relations D.S. Kornalijnslijper [email protected] ABSTRACT Images and texts placed together in media are likely to be related. These modes share a relation. Various applications which use image-text combinations can benefit from a model which describes the features of this image-text relation. This research has examined three models which can benefit the IMIX question answering system. IMIX answers medical questions and uses image-text relations to retrieve suitable images for its generated answers. Twenty image-text samples from a medical information corpus have been annotated to create a better understanding of the three models. The results show the advantages and disadvantages of the models and their individual differences. The models are not only different in their structure and design but also in the information content that they can describe. Keywords image-function, semantic-relation, logico-semantic, image, picture, text, relation 1. INTRODUCTION In many media one can find images and text existing and working together. Both images and text express information and when these two modes are placed together their information content likely relates to each other. Recent research is conducted towards creating models to represent these semantic relations. One could use such a model to classify a relation or to identify different types of relations. For example, one could distinguish between image-text combinations in which the image has purely a decorational purpose or where the image has a high information value (for example a diagram). One could also think of image-text combinations in which the text only plays a subsidiary role, for example captions under newspaper images or captions guiding painted art. A detailed model could also describe certain specific properties of the relation, allowing a more detailed identification of the image-text pair. Some of the existing models only offer a single sided viewpoint of the relation, considering only the function of the image in the text. This is because text commonly is considered to be the main carrier of information and the image only to play a supplemental role. The use of images (by which meant static images) in media has strongly increased and one especially finds many images in new media like the Internet, but also in comic- and children's reading books. To a lesser extent but not to lesser importance, one also finds images in technical books and manuals where images among others explain the workings of complex processes. There is thus an increasing demand for models that can handle a mutual relation between two modes. Models that support relations where the text complements the image or two modes are dependent of each other. Before continuing to explain the relational models, briefly we will consider the applications for these models and our goal in this paper. End users and developers of text and image media could benefit from an image-text relation model. One of the examples are authors and editors who want the most effective relation between image and text. Another example are researchers who study the effectiveness of media and researchers of information sciences who seek to use and understand the media. The research in this paper will help improving a system in development at our department. This system, called IMIX (Interactive Multimodal Information eXtraction), is a multimodal question answering system. Essentially, the system can answer a question asked by a user operating the system in natural language. Mostly these systems have a limited but expertise knowledge on a specific knowledge domain. For the IMIX system it is the medical information domain. The system can answer questions on medical problems describing symptoms, causes and cures. The interaction between the user and the system is multimodal and thus the communication happens in diverse ways. IMIX understands and answers questions written and spoken out loud in the Dutch language. The user of the system can enter a question written, spoken in Dutch or in combination with highlighting specific pieces of text which have resulted from earlier interactions with the system. The system will analyze the question and will match it with the contents of the knowledge base, this can result in multiple possible matches which could answer the question. These matches are in turn analyzed to find the most suitable answer. Next, the system creates an answer in textual form and in this step it will also try to include a fitting image to improve the effectiveness of the answer. As last the system will present the user the created answer on the display and spoken out loud over the system speakers. With relational models for text and images we want to improve the process of finding a suitable image. The knowledge base of IMIX contains both texts and images and a model could annotate the existing image-text relations. One annotates by first understanding the properties of the image-text pair (based on the used model) and then by classifying the pair with a relational- function or type that describes the properties of the image-text pair. The annotated pair will contain information that shows the role in which a mode has been applied. In this paper we analyze three different models which potentially could be used with the IMIX system. Our approach will be the following: • explain how the three models function, • use the three models to annotate twenty image-text pair samples from a medical information corpus. The goal is to create a better understanding of these models, • discuss the results of the annotation process and analyze how the models function in the medical information domain, • draw a conclusion and offer recommendations. 1.1 Functions and types As mentioned in the introduction, prior studies generally discuss the 'function of images' in text. In this kind of relationship the image supplements the text. These studies exclude any other arrangements of the modes. However, with the increasing use of images the other arrangements, where the text serves the image and the cases where both modes are dependent or independent of each other, should no longer be ignored. In relations where both modes are dependent of each other one can no longer speak of 'functions' since both serve each other. Here we would prefer to use a more general term. For this paper we will use a term used in Martinec and Salway (2005), namely; 'types of relations'. Each different ‘type of relation’ identifies a different relation between image and text. For clarity we will still use the term 'function of images' with the models that are limited to relations for image functions in text. • Representational pictures depict that what the text describes, partly or completely. Some representational pictures go beyond the text and depict more than the text describes. An example is a picture of painting and a text describing the contents of the painting. • Organizational pictures show structural information the text contains. Usually the image depicts the information in steps. For example an illustration showing the necessary steps to take in case of an emergency or the example described in Carney and Levin (2002); an illustrated map of a hiking trail. • Interpretational pictures help to depict information that would be more difficult to explain and communicate with only text. Examples are the pictures showing the workings of machinery or complex models. • Transformational pictures, as described in Carney and Levin (2002), “include systematic mnemonic (memory enhancing) components”. These components depict information from the text in a literal sense, though the whole picture itself doesn’t necessarily depict the intended information literally. The example of Carney and Levin depicts information on the town Belleview. First, a bell in the picture represents the part of the town name “Bell”. Inside this example there are more components which are a literal translations of the text. The reader should associate these components with the contents of the text and should store the mnemonic picture in his or her memory. It should then be easier for the user to retrieve the more detailed textual information from memory. 1.2 Setup of this paper Chapter 1 introduced image-text relations and explained why there is the need to use models to describe these relations. The chapter also discussed the goal of this paper and clarified the terms 'functions' and 'types'. Next, chapter 2 gives a summary on three different models and it will explain how they function. Chapter 3 discusses the annotation process and its results. For this research a selection of image-text pairs has been annotated using the three models to create a better understanding of the models. All image-text pairs and the resulting annotation are in appendix A. Chapter 3 also gives some guidelines to the annotation process and briefly discusses the multiplicity of annotation. Chapter 3.1 discusses the results of the annotation process. In chapter 4 the conclusions are drawn on the analysis done in chapter 3 and recommendations are made on further research into this subject. 2. PRIOR RESEARCH The following chapters discuss 3 different models of earlier studies, the first being of Carney and Levin (2002). They studied the function of images in text as a means to understand the educational value of images in educational text better. The second model is of Marsh and White (2003). They created a taxonomy of prior models in an effort to create a general model for image-text relations. The third model is of Martinec and Salway (2005) who have taken a theoretical approach towards a general model for image-text relations. Each model uses different types of relations and the following chapters will explain each of these types. We will use the information provided in the papers. An exception is the model of Martinec and Salway which they partly based on theories from Halliday (1994). To better understand this model study of some parts of Halliday (1994) were necessary. 2.1 Carney and Levin Carney and Levin (2002) discuss the function of images in text and do not look at any cases where text serves a function to images or cases where both modes are of equal importance. They distinguish five different functions; decorational, representational, organizational, interpretational and transformational functions of images in text. One can assign to each image-text relationship one single function type. Following here is a description and explanation of each function: • Decorational pictures only serve to decorate the text, they contain little or no additional information to the text in the document. For example a picture of the sun in a traveling brochure for Egypt would have a decorational function. 2.2 Marsh and White The second model is that of Marsh and White (2003). Marsh and White also discuss the function of images in text and they also do not consider cases of text functions for images or cases where both modes are of equal importance. They created a taxonomy of image functions in text and based their function types on earlier studies. The resulting taxonomy contained many, though with different names, similar functions. They filtered these under a common name. And following they tested and adjusted the final taxonomy. The resulting taxonomy consists of 49 image functions (see Table 1). There are 3 levels of precision, each level being more specific than the other. The first precision level contains 3 general image functions. The second level 11 functions which expand the first level. The third level contains 30 image functions which are again an expansion of the 2nd level. The 3 general image functions from the first level represent 3 types of strength of relation between image and text. The first group (A) contains functions of images that express little relation to the text. The second group (B) contains functions of images expressing a close relation to the text. The last group (C) contains functions of images where the image expresses more information than the text expresses (Marsh and White describe this as “functions that go beyond the text”). One can describe a relation between image and text by more than one function. One can combine different functions to create a more detailed description of the relationship. The coming section briefly explains the types of the 2nd level. For more details on and the complete descriptions of the various functions we refer to the appendix of Marsh and White (2003) page 666-672. In group A contains the following function types of the 2 nd level: decorate, elicit emotion, and control. • Type decorate is a function for images that make the text more attractive without having any substantial affect on understanding the information, • Type elicit emotion is for images that display a content that provokes a certain emotion, • Type control is for images that exercise a restraining or directing influence on the reader. The content holds the attention or encourages a response from the reader. Excluded are responses which are primarily emotional those are part of type elicit emotion, group A. that expand or supply more or extra information than contained within the text. The function types in group C describe relations where image and text are independent or interdependent of each other. Table 1 Taxonomy of functions of images to the text. Marsh and White (2003), page 653. In group B are the following function types of the 2nd level: reiterate, organize, relate, condense, explain. • Reiterate describes images that repeat that information in the text with minimal change or interpretation, • Organize shows the information in a structured form and is often applied to display the information in a way which is better explained graphically than textually. Examples are diagrams, charts, maps and other forms which clarify the information in a more organized form (not necessarily of diagram or map style), • Relate is for images that refer to processes or concepts contained within the text. Types of the 3de level used by Marsh and White here are; compare, contrast and parallel. • Condense is a function type for images that makes the information more compact or which reduces them to their essential elements. • Explain makes the information plain or understandable, it can only be applied if the contents of the image follows the text closely otherwise a similar function from group C has to be used. Explain type is for images that define the text by identifying the essential qualities or meaning of the information, or complement the text by helping to transfer the intended information. In group C contains the following function types of the 2 nd level: interpret, develop and transform. • Interpret clarifies complex textual concepts into more concrete forms. The image can emphasize the text or provide factual or substantial support. • Develop expands the information in the text by providing more details, by illustration, or by closer analysis of the information. • Transform puts the information in the text into another form. The information is recoded, related to each other, or organized to improve recall (for example mnemonic images). This type also includes images that continue from where the text stopped or take turns with the text to provide information. The images in this type also model ideas which cannot be represented or understood by text. Examples provided are cognitive and mechanical processes. Though Marsh and White do not mention this specifically, one could consider group C to be a step towards to a more complex model. A model which also supports relations where the text has a function to the image and where both text and image have an equal relationship. Group C identifies functions for images 2.3 Martinec and Salway Martinec and Salway (2005) describe a mutual relationship between image and text, their model does not discriminate between the two modes. The descriptions of their relational types do not refer to images or text but refer to modes interacting together in a multimodal relation. The model is, as they call it, “a generalized system of image–text relations”. One can use it to describe relations where the image serves the text, where the text serves the image, and where image and text are equally dependent or independent of each other. Martinec and Salway based the model on earlier work of Barthes (1977a, 1977b) and Halliday (1994). The parts that they used from Halliday (1994) solely concentrate on text, however by combining it with the work of Barthes 1 they created a model that works on text and images. According to Martinec and Salway (2005) others used this idea earlier (Martinec and Salway (2005), pp. 340), however they used it solely for specific examples of multimodal relations. Martinec and Salway state that one can use their model for all image-text relations, for old and new media. In the model are two kinds of relations, a status relation and the logico-semantic relation. Each image-text relation has a status relation and a logico-semantic relation (see Figure 3). A relation has one status and one logico-semantic type. Martinec and Salway though show in their examples that between an image and text more than one relation can exist. Different components in the image and text can have different relations between one another. 2.3.1 Status relations The status relation indicates the relative status between text and image. There can be an equal or an unequal relationship. In an unequal relationship one mode is subordinate to the other (one 1 Barthes, R. (1977a). The Photographic Message, trans. Heath S., in Image–Music–Text, Fontana, London, 15–31. Barthes, R. (1977b). Rhetoric of the Image, trans. Heath S., in Image– Music–Text, Fontana, London, 32–51. mode serves the other), in an equal relationship the modes are either independent or complementary to each other. an opposite effect and cause the image to be subordinate to the text, see Figure 2. When the whole image relates with the whole text, both modes are in an equal status relationship. When both modes depend on each other or modify each other equally then their status relation is complementary. When both modes can exist in parallel as two separate processes without relying on each other both modes are independent. 2.3.2 Logico-semantic relations Logico-semantic relations are in two main types, expansion and projection. Expansion, as the word says; one mode expands the other mode. Projection repeats in one mode that what the other mode is showing. Projection is divided into locution and idea. Expansion is divided intro three types: elaboration, extension and enhancement. First we will explain the subtypes of the expansion type. Figure 1 News photograph with caption in present tense. "(unreadable name) walks up the courthouse steps with his legal team in a recent photo." The text is subordinate to the image. Martinec and Salway (2005), pp. 348. Figure 3 Network of combined status and logico-semantics. Martinec and Salway (2005), pp. 358. One uses the elaboration type when one mode provides a more detailed description of the other mode. The elaborating mode does not necessarily provide new information but elaborates on the information in the elaborated mode. Elaboration has two subtypes; exposition and exemplification. One uses exposition when two modes present the same information but in a different form or presentation method. Figure 2 News photograph with caption in past tense. "Marian Bates died protecting her daughter" The image is subordinate to the text. Martinec and Salway (2005), pp. 348. Modes are subordinate to each other when one relies on the other. Image subordination is realized when the image relates with only a part of the text. Text subordination comes in two forms; or by direct reference to image or by “the combination of material or behavioral processes with simple present or present progressive tense” (Martinec and Salway (2005), pp. 347). One can often see this in news photograph captions, see Figure 1. Material and behavioral processes in past tense have Figure 4 Example of elaboration, examplification, image more general. Martinec and Salway (2005), pp. 350. Both modes are equally general and restate each other. Exemplification further expands the information that was available in the expanded mode by providing a more specific example or by providing a more specific instance of the information. Again, the elaborating mode does not provide new information but elaborates on the information in the elaborated mode. An example of exemplification is given in Figure 4. The skull and crossbones in the picture are a generally recognized symbol of death. The process 'kills' will eventually lead to death and is thus associated with death. The image is more general than the text; the text mentions a specific method of killing of prey. As the expanding mode is an example or an instance of the expanded mode the latter mode most be more general than the prior mode. Exemplification is subdivided into “text more general” and “image more general”. One uses the extension type when one mode extends the information of another mode. The extending mode adds a new element, gives an exception or offers an alternative on the extended mode. The extending information cannot be extracted (it cannot be seen or read) from the extended mode and is thus new information. However the information in the two modes need to be related. An example is given in Figure 5, the image shows a crossed fork and knife which commonly in western cultures symbolizes the process of eating. Together both modes suggest that “fish and small prey” can be eaten (from Martinec and Salway (2005) pp. 363). One could also replace the fork and knife with a hunting rifle and fishing rod, suggesting the possibility of hunting on fish and small prey. The image extends the text with a behavioral process. Figure 5 Example of extension. Martinec and Salway (2003), pp. 363. The extension type used in Martinec and Salway (2005) is a reinterpreted version of the version used in Halliday (1994) and includes the participants in material and behavioral processes (Martinec and Salway (2005), pp. 363). The third subtype of expansion is the enhancement type. A mode enhances another mode by referencing it with circumstantial information; a place, a time, a purpose, or a reason. To explain better we will give some textual examples of enhancement. For example: an artwork image is enhanced by a text when the text explains where or when the artwork was crafted. The text then enhances the image by place and by time. A mode that enhances by place can also show a spatial location where the enhanced mode takes places. To give an example: a text shows: “Mary lost the train” and an image shows: the train station where Mary lost the train. In this example the image enhances the text by location. Enhancement by reason expands a mode by explaining a cause or giving a reason for the event or processes in the enhanced mode. For example: a text showing “Mary arrived late” is enhanced by an image of Mary actually losing the train. As last an example of enhancement by purpose: an image depicting a clock is enhanced by a text showing “a clock shows the current time”. The enhancement type is similar to the extension type; both identify relations where the expanding mode supplies new information. However expanding modes enhance another mode by providing a circumstantial setting. The second main type in the model is the projection type and it has two subtypes: locution and idea. As stated in Martinec and Salway (2003): “Locution is a projection of wording, usually by a verbal process, and idea a projection of meaning, most often by a mental process.” Figure 6 Example of projection in comic books, left a thinking bubble, right a talking bubble. Martinec and Salway (2005), pp. 352. Martinec and Salway explain projection in two different contexts; comic books and a combination of diagrams and text. The difference between locution and idea is clear for comic books, locution represents image-text pairs where the text is placed into a “talking bubble” and idea where the text is placed into a “thinking bubble” (see Figure 6). This is a straightforward and literal interpretation of the projection type and follows closely the textual interpretation of projection by Halliday (1994). Martinec and Salway also use projection from a slightly different point of view. This is the most obvious with projection of meaning (idea). A mode can project the information from another mode; the projecting mode projects or restates the meaning of the projected mode. This is similar to the elaboration type. However the elaboration type expands the information by example while idea projects the meaning of the information by restating it. The example Martinec and Salway use for projection of meaning in this form are diagrams, however there are some exceptions. Figure 7 Example of images where the labels are not part of the image. Image has abstract content and is of type exposition. Martinec and Salway (2005), pp. 353. A diagram-text combination can be of type projection, exposition or 'text more general' (exemplification, elaboration). Which type one has to apply depends on whether the text is part of the image, or whether the text is a separate mode that is in a relation with the image. They explain this in the following manner; text and image are separate modes if the image provides the ideational content and the text only serves as labels for the content of the image. The image provides the ideational content when the image alone contains the general concept of the displayed information. The text then only supplies information on parts of the image. Both images in Figure 7 and 8 are not of type projection and are an example of when the text is a separate mode in relation with the image. Both images provide the general idea for the information and the text provides further details on the image contents. The difference between exposition and 'text more general' is on basis of generality and abstractness. If the textual labels are generic and the image abstract (for example technical drawings) then the relation is of type exposition (see Figure 7). If the textual labels are generic and the image is of natural content (for example a photograph) then the relational type is text more general (see Figure 8). Figure 9 Example of ideational content. Martinec and Salway (2005), pp. 354. Figure 8 Example of images where the labels are not part of the image. Image has natural content and is of type text more general. Martinec and Salway (2005), pp. 353. When it is the text that provides the ideational content and the drawings only serve to enclose or separate the text then the drawings and text together form the image. In this case the picture is of type projection. The image itself can have a relation with another text outside the image which gives more detailed or other related information on the contents of the image (see Figure 9). The relation between the text and the text inside the image is based on lexical and componential cohesion. If one regards the drawings in the image only as separators of the labels then the relation can be considered to be between text and text. Martinec and Salway go on further into this componential cohesion and mention that more than one logico-semantic relation can exist between image and text. When smaller components within a mode refer to components in the other mode then also these components share together a logicosemantic relation. Two modes have different levels of logicosemantic relations, ranging from a small relation between the smallest components to the main relation between the complete modes. This paper will stay limited the main relation between image and text. However analysis of the different components within image and text could perhaps provide a more detailed annotation, resulting perhaps in more specific and better matching results. Figure 9 is accompanied by the following text: “In looking at the commonalities among the three disciplines, design and marketing tend to both focus on desirability of a product – the brand and lifestyle images, ease of use, and costs to take into account the aesthetics. Marketing and engineering both focus on usefulness of a product – the functional features, platform upon which the product is built, safety and reliability issues, and production costs. And design and engineering both focus on usability of a product – the ergonomics, interface with the product, the integration of the different features and associated costs, the selection of material, and manufacturing. Each overlap is secondarily also concerned with the other two value attributes, but the primary driver of interaction is as indicated. The point is that the usefulness, usability, and desirability of the product stem directly from the interaction between the disciplines. Thus, it is the overlaps between disciplines that define the value of the product to the consumer, the value that leads to success in the market and profit for the company (as shown in Figure 6.2).” 3. ANNOTATION To further study the 3 models they were used to annotate 20 image-text pair samples. These samples were selected from the corpus from Hooijdonk et al. (2007). The corpus consists of medical questions and answers. The researchers in Hooijdonk et al. (2007) gave students the assignment to answer these medical questions as best as they would see fit. The answer could also include other media than text. The result is a corpus of medical questions with diverse answers and diverse make-up. The questions and answers in the corpus are similar to those used in the IMIX QA system. For this paper we filtered out all corpus samples which did not contain any images. From the remaining subset of image-text pairs we selected at random the 20 samples. Only one annotator annotated the samples. This causes some limitations to our final analysis on the model. These limitations we will address now. The assignment of functions or types to relations does not always go right during annotation. This can be due to various reasons. Sometimes the annotator can doubt between the different types to assign to a relation or different annotators might disagree on each others individual choices. Mann and Thompson (1987) (page 26-30) name five causes to multiplicity of annotation: boundary judgments, text structure ambiguity, simultaneous analysis, differences between analyst and analytical error. Boundary judgment happens when a relation falls between two possible types and the annotator has to choose to which type the relations belongs. Text structure ambiguity is relevant to ambiguity in text. However one can also see this as normal ambiguity; a single analysis of the relations leads to the assignment of more than one type and neither type can be discarded. Simultaneous analysis happens also when more than one type is valid but is different from ambiguity. Simultaneous analysis happens when more than one analysis of the relation is valid and leads to assign multiple types. Differences between analyst can lead to differences in assignments. Analyst can have different experiences with the presented information and could assign relation types differently. According to Mann and Thompson (1987) this happens infrequently and often leads too agree that a particular ambiguity exists. Analytical error happens when the analysis preformed by the annotator was erroneous, this can be caused by wrong analysis of the relation or wrong application of the model. Wrong application of the model is not always due to a mistake of the annotator. Not all models are as well documented as one would hope for and leave the annotator poorly informed. Another cause are models which were created for specific domains and cannot be applied without difficulty on other domains. Seen the above problems, an annotation by a single annotator will form a handicap. There is no reference to compare neither the results with nor a comparison to filter out any annotation differences or errors. Unfortunately at this point there is not the time to perform a broader analysis with multiple annotators. We will use the results of the single annotation only to provide more insight into the models. This will help to explore the advantages and disadvantages of the models and their differences and suitability in the medical information domain. Our research is thus an effort to create an initial understanding of these models. Considering the first 3 problems of Mann and Thompson (1987), the annotator has noted down any met annotation problem in the sample comments. Annotation with the first two models Carney and Levin (2002) and Marsh and White (2003) will be exclusively done considering the function of the image in the text. The Martinec and Salway (2005) model needs a deeper analysis; first needs to be determined the status relation between image and text and following the semantic relation. The status relation dependents on what one considers to be the “whole text”. An image and text are of equal states when the whole image is related to the whole text and the whole text is related to the whole image. Thus we need to set a size for the whole text. Martinec and Salway (2005) take as a maximum size a paragraph. The image can also be related to a larger piece of text. The samples used in this paper do not contain large texts and the accompanied images complement the answer. Therefore we will consider the whole text as being all text in an answer. There is another issue with images and labels. In some cases Martinec and Salway consider the labels of an image not to be part of the image but a separate text in relation with the image. For this paper we adapted this approach; these labels are considered part of the whole text in the sample. As mentioned earlier each sample consists out of a question and an answer. When possible, the annotator tried to ignore the relation between a question and its answer. The images in the samples are part of an answer. To analyze the questions one would therefore not only need to analyze the relation between image and text in the answer but also the relation between the question and the answer. However the questions in the samples were not excluded because in almost all cases they help to clarify the subject of the answer. Most of the sample have the image positioned after the text. However to avoid any complications the relative position of the image to the text was ignored. The samples can be found in the appendix A, all samples are written in Dutch natural language. Every sample is followed with a summary and a description of its contents in English. Following are the results of the annotation. The results are stored in a table and are accompanied by comments on the annotation process. The annotator has used all three models to annotate the samples. 3.1 ANNOTATION RESULTS In this chapter we will discuss the results of the analysis. For all three models we will discuss their advantages and disadvantages that have been noticed during annotation. Carney and Levin (2002) is oriented on the educational value of different image functions in text. Because of this the paper does not give much attention to explain the constructs of the model. The paper gives a short explanation of the functions and some examples, which are both specific to the domain of educational media. While applying the model to another domain one is left with some open interpretation of the model. The model is fairly small and contains only 5 image functions. This reduces the complexity of the model. The decorational function is fairly straightforward. This type is also found in the Marsh and White (2003) model. One should apply it when the image has only an aesthetic purpose in the text. However some interpretations (from different annotators) of aesthetic could be less rigid than others. For example sample 1 in appendix A; The image of the syringe was annotated as having a decorational function because there is no direct reference in the text to the used administration method for the “DKTP-vaccine” (DKTP vaccination). One could also argue that the image of a syringe is an example of a possible administration method and thus has a representational function. Generally one uses the organizational and interpretational functions for images which provide a better understanding of the informational content, while the representational function just mirrors the text content. The organizational and interpretational functions are often assigned to images which present information in a more comprehensive form than any textual form. Subsequently, these images can also contain extra or more information. Beyond that what the text provides. Carney and Levin designed these two functions for the educational domain. The organizational images provide a structural framework and the interpretational images explain difficult systems. However one cannot always separate these two functions. During annotation the annotator had doubts on some samples because, for example, the image in the sample explained a difficult system but also provided a structural framework. In Hooijdonk et al. (2007) these two functions have been grouped in a single function called “additional function”. This function not only absorbs the organizational and interpretational function but also includes any other image that has more than just a representational function. The last function, the transformational function, has not been applied in the annotations. This image function is intended to code information from the text into the image in such a way that it eases recall of the textual information. Most pictures used in the medical information domain aim to provide information and not necessarily aim to imprint this information in the mind of its viewer. It is doubtful that this image function will be applied in the medical information domain. A general problem in the Carney and Levin (2002) model is that it was designed for the functions of images in text. This design dictates that most of the information content will be found in the text. This is a problem when it is instead the image that provides most of the information content. An example is given in sample 4. In this sample there is almost no text to represent, organize or interpret. In this sample it is the information content of the images that has an organizational function. But the images do not organize any text as Carney and Levin intended for this function. The scope of the informational content forms a second problem. Some images might provide more or different information than that what the text provides. Should one then consider the overall information content of the image for annotation or only the content which relates specifically to the text? This problem is clear in sample 20. This image helps to interpret the text (and has been annotated interpretational) but on its own also provides an organizational function. Following the instruction of Carney and Levin it has been decided, in these cases, to only look at how the image functions the text. In the cases of sample 4 and 9 there is not much text to base a functional relationship on. In both these samples the annotator did not base his choice on the function of the image in the text but on the function of the image on its own right. Depending on the application of the model a choice has to be made on how to deal with situations where the image does not function the text. The Marsh and White (2003) model is much larger and complex than the Carney and Levin (2002) model. The model at first looks confusing because of its many functions. It is likely that this is partially because of the process by which Marsh and White created the model. Because of the large number of functions for every annotated sample one must consider and read the description of every single function. The 3 main groups which separate 3 different types of functions help in the decision process. However it is still possible for the annotator to use functions from all 3 groups to annotate a sample. The descriptions of the functions are much more clear than the model structure. Most of the functions have a good description. Most of the functions are also applicable to the medical information domain. The names of the functions describe their meaning and a short summary further clarifies this. Next to this, lead-in terms guide the functions which give more names and terms with which one can describe the function. Overall the descriptions simplify the use of the model. Some functions contain exceptions. These exceptions should prevent the annotator from using the functions wrongly. However they also make the structure of the model more complex. These exceptions refer to a different function when this function might be better suited. These functions, which refer to each other, mostly have similar descriptions. They are different from each other in a degree of severity or they slightly describe the content in a different way. For example A3.1 (engage) and A3.2 (motivate) hold the attention of the reader or motivate a response. Both these functions refer to A2 (elicit emotion) in case in the process there are any emotions involved. Another example is B1.1 (concretize) which one can use for images that make a thing or a concept more explicit. This function refers to B1.4 (describe) for content that gives more details and to C1 (interpret) and C3.2 (model) for content that is more complex. Most likely Marsh and White included these exceptions to narrow down annotation errors and disagreements. These similar but different in severity functions create another layer in the structure of the model. One might not need this kind of detail in certain domains and some functions might be grouped together to create a less complex model. Another issue which makes the model more complex is that not all functions in this model are of the same type. By this we mean the kind of content a function describes. The functions in the model of Carney and Levin (2002) are all the same kind; they describe how the image organizes its information content and how this information is related to the text. The most frequent type of function in the Marsh and White (2003) model are functions of images that serve the text in one way or another. Two other types of functions were found. One describes only the contents of the image (examples: A2 (elicit emotion), A2.2 (express poetically), A3 (control) and B1.5 (graph)). The other two do not specifically describe the content of the image but describe a relation between the image and the text. Examples are: A1.1 (change pace), C3.1 (alternate progress) and C3.3 (inspire). Or they describe the content of both the image and the text. Examples are: A1.2 (match style), B1.3 (common referent), B5.2 (complement), C2.1 (compare) and C2.2 (contrast). Not all functions in the Marsh and White (2003) model did seem applicable to the medical information domain. Among these are also functions of which its meaning was not clear (A1.2 (match style), B1.3 (common referent)). This could be because of a poor description of the function or because it was intended for a different domain. The following functions were not immediately applicable to the medical information domain. Most of these are useful for a storytelling domain. They describe certain styles which are not common in the medical information domain: A1.1 (change pace), A2.1 (alienate), A2.2 (express poetically), C3.1 (alternate progress) and C3.3 (inspire). Another function is similar to Carney and Levin's transformational images: C3 (transform), a method which one will not see often in our domain. Some other functions, for similar reasons, one will not use often in the medical information domain but occasionally proven to be useful: A1.2 (match style), A2 (elicit emotion), A3.1 (engage), A3.2 (motivate), and B2.3 (locate). Functions A2, A3.1 and A3.2 are interesting because one can us them to annotate images which encourage a response from the user. For example this response could be emotional because of a disturbing image. One can see the way the model was constructed also influenced its structure and usability. By merging different models from different domains, the model supports many functions for different kinds of content, different intensities of content and different domains. However this diversity and wide range of application makes the model lack concentration and structure and causes the model to become to complex and confusing. One can expect that in an experiment with multiple annotators disagreements would arise and different functions would be used on the same samples. On the other hand this diversity also simplifies the ease of use of the model and we think the function descriptions are quite clear. With a larger experiment it might be possible to find excessive functions and to recognize a better structure. One could also use the model as a second layer on a less complex model with a clear structure, combining structure with diversity. The Martinec and Salway (2005) model provides a clear and intuitive structure. The structure works positively on keeping an overview of the model and eases choosing the right type during annotation. The relational types and their description are less simple. It is easy to get confused on meaning and use of the different types. Especially considering the expansion types. When one simply looks at their names they appear similar, both elaboration and extension types expand information. The differences between the types are small and without explanation they can be confusing. The extension type is used for all modes which add (related) information to the overall presentation, information which one cannot find in the other mode. The elaboration type is used for modes which make more specific or reaffirm presented information by exemplification or by repeating information in a different form. A similar problem arises between extension and enhancement. While enhancement also expands, by adding new information, one uses it exclusively for information that provides a circumstantial setting such as a time, place, reason and purpose. The place of the projection type in the model is confusing, it seems as if the designers forced the type into the model to stay compliant with Halliday (1994). The projection type seems to be only applicable to comic book media and media with diagrams because comic images and diagrams project information. This limit of 2 media (comic images and diagrams) is restricting and it makes the projection type media dependent. And controversially, the expansion type is a general purpose type which one can apply to any image-text media with expanding relations. A broader interpretation of the projection type could be possible with the idea type. As Martinec and Salway mention, idea is a "projection of meaning". This concept one could use to define relations in which one mode projects the meaning of another. This could include for example modes that summarize or compact information in projective manner, the diagram example in Martinec and Salway (2005) (page 354) is an example of this. In its current form the use of the projection type, in contrast to the rest of the model, is to much limited. Considering the medical information domain, although not used often, the idea type remains useful for relations similar to the diagram example of Martinec and Salway (2005). The locution type has not been used during the annotation process. Overall the clear structure doesn't help much in applying the model. It are the types of the model that make its application considerably more complicated. Only after practice one can understand the subtle differences between the types. Considering the medical information domain there is a frequent use of the expansion types but not so much of the projection types. The number of remaining types for the medical information domain is thus rather limited, much like Carney and Levin (2002). Though the two models annotate information content differently. Martinec and Salway (2005) is more abstract and general. The second type of relation in this model is the status relation. It enables the kind of relation between text and image which the introduction discussed; an image can serve a text, a text can serve an image and both modes can be complementary or independent. The status allows for more flexibility in the application of the model. As seen in our small set of samples and also in the medical information domain there are samples where the image does more than just function the text. Possibly the status relation could also be extended to other models. To give an example with Carney and Levin (2002): a text could also be representational to the content of an image. Though not everything will be that easily extended; a text will not likely be decorational because text in general (as natural language) is inherently informative. Though the possibility that someone might anyhow use text for decorational purposes we cannot exclude. Marsh and White (2003) already made a step towards status relations by including functions for image that go beyond the text. The functions in this group are not only valid for images that are subordinate to the text but could also include cases of the other three status relation types. For some functions it would require a reinterpretation of their description. 4. CONCLUSIONS AND RECOMMENDATIONS The Carney and Levin (2002) model does not contain many functions and thus has a simple structure. Its functions however were designed for a single purpose and because of this some of its functions cause simultaneous annotation in the medical information domain. The functions in the model describe the structure of content of an image serving a text. They describe decorational, representational, organizational, interpretational, and transformational functions of images. Theune et al. (2007) (section 5.1) suggest using the properties of the functions to match images on relevance and image content. Interpretational and organizational images tend to be highly relevant to their related text, their information content is also very specific. Thus to combine these image with any text (like in IMIX, combining with an answer) needs a high relevancy and strong match. Decorational image contain much less specific information and can be relevant to many texts. The chance of making a bad match with decorational images is much smaller. Marsh and White based and constructed their model on and from many different relational models. It suffers and takes advantage of opposite issues from the other two models. The model can describe different kinds of information content, ranging from content related features to relation related features. Most of its functions describe how images serve the text. To give an example, an image reiterates, compares or concentrates the text. Other functions describe the content of the image and some others describe the overall impact on both the image and the text. An advantage is that one can use this model to annotate many sorts of image-text relations and it gives much flexibility during annotation. Also its functions represent simple concepts which make them easy to understand. A disadvantage is the lack of focus and structure of the model. Following this, a better organization of Marsh and White (2003) is likely possible. Depending on what is necessary, one could decrease the number of functions to a more manageable size. It would make the model less detailed but also more organized. Another possibility is to use the structure of one of the other two models as a base for a new structure. Overall this model allows a detailed annotation of an image-text pair, however caution is advised in its application. The great number of diverse functions could likely cause disagreements in annotation. Martinec and Salway (2005) is the most abstract of all three models. This model annotates how information is exchanged between two modes and how strongly they are related. The logico-semantic relation annotates how modes exchange information; by elaboration, extension, enhancement or projection. The status relation annotates if the two modes form independent, complementary or dependent processes. It shows how strongly the modes relate and in which direction they exchange information. It also shows which modes are the main carriers of information and provide additional content. The number of types in the model (compared with Marsh and White (2003)) is not very large. Its structure is clear and gives a good oversight. However the logico-semantic types represent rather complicated concepts and make one doubt often on which types to assign during annotation. The projection types will not be used often in the medical information domain. An especially interesting feature of this model is the status relation. It would be interesting to see if models designed for “image functions in text” could be adapted, using the status relation, to fit other situations like “text functions for images” and equal image-text relations. In the samples of this paper it is visible that also the domain of medical information will need this flexibility. The three models describe different features in the image-text relation. Different features in the way modes exchange information and in the detail in which they describe information. This leads to conclude that these models do not exclude- but could complement each other. Follow up research will need to answer which features of the image-text relations need to be annotated to improve the successful retrieval of images in the IMIX system. The remaining relevant models will need to be tested by a large group of annotators to confirm which aspects of the models need to be adapted for use in the domain of medical information. REFERENCES Halliday, M. A. K. (1994). An Introduction to Functional Grammar. 2nd edition, Edward Arnold, London. Hooijdonk, C. M. J. van, Krahmer, E., Maes, A., Theune, M. and Bosma, W. (2007). Towards automatic generation of multimodal answers to medical questions: a cognitive engineering approach. In the Proceedings of the Workshop on Multimodal Output Generation (MOG 2007), CTIT Workshop Proceedings WP 07-01, 25-26 January 2007, Aberdeen, Scotland, 93-104. Mann, W. C. and Thompson, S. A. (1987). Rhetorical structure theory: A theory of text organization. USC/Information Sciences Institute Technical Report Number ISI/RS-87-190, Marina del Rey, CA, Text, 8(3):243--281. Marsh, E. E. and White, M. D. A (2003). Taxonomy of Relationships between Images and Text. J. of Documentation 59, 6 (2003), 647-672. Martinec, R., Salway, A. (2002), Some Ideas for Modelling Image-Text Combinations. CS-05-02, Department of Computing, School of Electronics and Physical Sciences, University of Surrey. Martinec, R., Salway, A. (2005), A system for image-text relations in new (and old) media. Visual Communication, 4(3), 337-3. Theune, M., Schooten, B. W. van, Akker, H. J. A. op den, Bosma, W. E., Hofs, D. H. W., Nijholt, A., Krahmer, E., Hooijdonk, C. van and Marsi, E. (2007) Questions, pictures, answers: introducing pictures in question-answering systems. ACTAS-1 of X Symposio Internacional de Comunicacion Social, 22-26 Jan 2007, Santiago de Cuba, Cuba. pp. 450-463. Centro de Linguistica Aplicada. ISBN 959-7174-08-1 Carney, R. N., and Levin, J. R. (2002). Pictorial illustrations still improve students’ learning from text. Educational Psychology Review, 14(1), 5-26. A. Appendix image-text pair samples Sample 1 Wat zijn de bijwerkingen van een DKTP-prik? Bijwerkingen van een DKTP-vaccinatie: Summary • Plaatselijke reacties • Hangerigheid, onrustig slapen,koorts • Langdurig, ontroostbaar huilen • Flauwvallen The text consists out of a question on the side effects of a diphtheria, whooping cough, tetanus, and polio vaccination. The answer gives a sum up of the side effects. There is further reference to another vaccination which has milder side effects on children. The picture shows a syringe. • Een verkleurd arm of been • Annotation Koortsstuipingen Bijwerkingen van een DTP-vaccinatie zijn milder dan van het DKTP-vaccin, aangezien kinderen ouder zijn als ze het DTPvaccin krijgen. Bovendien heeft dit vaccin een andere samenstelling. Model Type(s) Carney&Levin(2002) Decorational Marsh&White(2003) A1. Decorate Model Martinec&Salway(2005) Carney&Levin(2002) and Marsh&White(2003) Type(s) Status unequal; subordinate to text image Elaboration, exemplification; image more general Carney&Levin(2002) and Marsh&White(2003) Although it is commonly known that vaccinations can be admitted with a syringe there is no reference to a syringe in text. The picture does not add any useful information to the side effect information in the text. Therefore the picture is of decorational content. Martinec&Salway(2005) The picture is related to only parts of the text. It refers to the word “prik” (injection) and vaccinatie/vaccin (vaccination/vaccine). One gives an injection with an injection needle. There is the process and the means to perform it. Between the word “prik”and the picture of the injection needle the relation would be of type exemplification. If the text would be about injections in general the relation would be of type exposition however the text is discussing a particular type of injection. The image is thus more general than the text. The image relates only to parts of the text and supplies new information, therefore the image is subordinate to the text. Sample 2 Hoeveel X-chromosomen bevat een lichaamscel van een vrouw? The answer only mentions the amount of female body cell xchromosomes. The picture depicts a x-chromosome but doesn't add information which would make the contents of the text any more clear. Martinec&Salway(2005) The image relates only to parts of the text, being the Xchromosomes, therefore the image is subordinate to the text. The text is about multiple female X-chromosomes and the image shows one general X-chromosome, thus the image is more general than the text. Sample 3 Vormen van Colitis Ulcerosa Colitis Ulcerosa is een chronische ontsteking van de dikke darm. De ziekte kan mild, matig of ernstig verlopen. De ziekte heeft vier vormen: • recticitis of proctitis (hierbij is de ziekte alleen in de endeldarm) (zie figuur, plaatje a) • rectosigmoidis (hierbij is de endeldarm en het sigmoid (laatste 20 cm van de dikke darm) aangetast) (zie figuur, plaatje b) • linkszijdige colitis (hierbij gaat de colitis tot aan de milthoek; de gehele linkerzijde van de dikke darm ziek) (zie figuur, plaatj c) • pancolitis of totale colitis: (hierbij is de gehele dikke darm aangetast door de ziekte) (zie figuur, plaatje d) Een lichaamscel van een vrouw heeft 2 X-chromosomen. Summary The question is “How many x-chromosomes does a female body cell contain?”. The answer consists out of one line stating that a female body cell has 2 x-chromosomes. The picture shows an abstract depiction of a x-chromosome and a textual label. Annotation Model Type(s) Carney&Levin(2002) Decorational Marsh&White(2003) A1. Decorate Martinec&Salway(2005) Status unequal; subordinate to text image Elaboration; exemplification; image more general Summary The text discusses what Colitis Ulcerosa is and in which forms it comes. The text mentions that it is a disease with different degrees of seriousness. The text further names 4 forms of the disease and the location they have in the gastrointestinal tract of the body. Each form refers to a picture. The picture consists out of 4 separate depictions of the gastrointestinal tract, 4 labels and some text. Each depiction has a textual label a, b, c or d. Each label refers to the text close to the depictions. The text names the disease forms and to some extent also mentions the location in the gastrointestinal tract. Also the text also mentions that these are forms of Colitis Ulcerosa disease. The picture is an abstract drawing of the gastrointestinal tract with parts of it infected by the disease. Because of the extra text near/in the picture there are actually 2 samples visible here. However the text in the picture and the separate text are almost the same, therefore only the picture and the text outside the picture is considered. Annotation Model Type(s) Carney&Levin(2002) Representational Marsh&White(2003) B1.4 Describe, Translate, B2.4 perspective, Complement B1.7 Induce B5.2 Martinec&Salway(2005) Status unequal; subordinate to text image Elaboration; exposition Carney&Levin(2002) The picture mirrors a part of the text, it mirrors the part about the 4 forms of the disease and the description of their location in the gastrointestinal tract. The picture makes the location of the disease more concrete for the reader. Thus the picture is representational. Marsh&White(2003) Considering Marsh and White (2003), both modes share the same symbolic meaning (common referent), the picture describes and translates part of the text, it induces perspective and complement the text. Stap 3 Summary The text in this sample consists out of some labels which guide the pictures in the answer and the main question. In the question is asked how to apply a sling on the left arm. The labels read: step 1, step 2, step 3 and step 4. The pictures depict in 4 steps how to apply the sling. Annotation Model Sample 4 Hoe leg je een mitella aan bij de linkerarm? Type(s) Carney&Levin(2002) Organizational Marsh&White(2003) C3.2.2 process Model physical Considering labels to pictures: B5.2 Complement Martinec&Salway(2005) In this sample the whole text ranges from “Colitis Ulcerosa is..” up to “..plaatje d). The picture relates to the 4 forms of the disease and the sum up of these 4 forms. The picture does not specify that the Colitis Ulcerosa is a chronically infection. Thus is the status unequal and is the image subordinate to the text. The logico-semantic relation is elaboration, exposition. The textual sum up of the 4 forms of the disease is equal to the contents of the picture; from both can be understood the name and position of the disease form. The picture is here not considered to be an enhancement of place of the disease because this information could also be understood from the contents of the text. Stap 4 Martinec&Salway(2005) Status unequal: subordinate to image text Elaboration; exemplification; text more general Carney&Levin(2002) The textual labels, to some extent, help to answer the question; they clarify there are four steps to follow, however this could also be deduced from the pictures. A more direct purpose of the labels is to help clarify in what sequence the pictures should be followed. For the Carney and Levin (2002) model the pictures are here considered to be of organizational content; they provide a structural framework explaining a series of instructions to follow. The pictures however do not mirror a process in the text, which is what the organizational type demands of the relation. Marsh&White(2003) Considering Marsh and White (2003); the pictures models a physical process and the labels complement the pictures. Martinec&Salway(2005) The labels are subordinate to the pictures; they show information that could already be understood from the contents of the pictures. The text is an exemplification of the picture and the text is more general. Stap 1 Stap 2 Sample 5 Hoe kan ik mijn buikspieren versterken? Buikspieren kunnen worden versterkt door het doen van buikspieroefeningen. Niet alle buikspieroefeningen zorgen voor een optimaal resultaat. Een oefenprogramma voor de buikspieren met opbouwend en goed uitgebalanceerd zijn, en alle buikspieren moeten getraind worden. De buikspieren moeten op alle mogelijke manieren gestimuleerd worden om te werken, alleen zo bekom je het perfecte resultaat. Hieronder staan een aantal voorbeelden van goede buikspieroefeningen: information than is contained within the text. The pictures demonstrate and model a physical process. Martinec&Salway(2005) Considering Martinec and Salway (2005) the pictures provide new information to the text. Without the text however one cannot understand what is to be seen in the pictures. The pictures are thus dependent of the text, the status relation is unequal. The pictures provide new information to the text and extent it. Even though in the text exercises in general are mentioned there is no information on how they should be performed therefore this is not exemplification but extension. Sample 6 Hoeveel mensen lijden aan een hoge bloeddruk? (uitgebreid) Summary The question states: “How can I reinforce my abdominals?”. The textual answer says that for an optimal result one must train all abdominal muscles and that a balanced training program should be followed. The answer then refers to the pictures stating that they are examples of good abdominal exercises. There are 8 pictures with 8 labels. The labels numbered 1 to 4, clarify there are 4 exercises depicted. The numbers are followed by the letters a or b, which suggest the order of the intermediate body positions to follow. Annotation Model Type(s) Carney&Levin(2002) Organizational Marsh&White(2003) B1.1.1 Sample, C2 Develop, C3.2.2 Model physical process Martinec&Salway(2005) Status unequal: subordinate to text image Extension Onder 20-70 jarigen komt een verhoogde bloeddruk bij 27% van de mannen en 22% van de vrouwen voor (bron: Regenboog, 1998-2001). Dit is inclusief personen die medicatie voor een verhoogde bloeddruk gebruiken (en daardoor geen verhoogde bloeddrukwaarden meer hebben). Uit de Doetngchemstudie komen wat lagere cijfers naar voren. Circa 25% van de 30- tot 70-jarigen heeft een verhoogde bloeddruk en/of gebruikt bloeddrukverlagende medicatie (Doetinchem, 1998-2002). Dit percentage bedraagt volgens de gegevens van Regenboog ongeveer 30% in deze leeftijdsgroep. Volgens de Herziene CBO-richtlijn Hoge Bloeddruk (CBO, 2000a) is de bloeddruk verhoogd wanneer de bovendruk (systolische bloeddruk) hoger of gelijk is aan 140 mmHg en/of de onderdruk (diastolische bloeddruk) hoger of gelijk is aan 90 mmHg. Voor personen van 60 jaar en ouder, zonder diabetes mellitus, familiaire hypercholesterolemie of hart- en vaatziekten, geldt 160/90 mmHg als grens voor verhoogde bloeddruk. Bij oudere personen (65 tot 85 jaar) heeft 38% van de mannen en 42% van de vrouwen een bloeddruk boven de 160/90 mmHg en/of gebruikt medicatie voor een te hoge bloeddruk (bron: ERGO, 1997-1999). Figuur 1: Percentage mensen met verhoogde bloeddruk (>= 140/90 mmHg en/of medicatie voor 20-59-jarigen en >= 160/90 en/of medicatie voor 60 jaar en ouder), naar leeftijd en geslacht(Bronnen: Regenboog, 1998-2001; ERGO, 1997-1999). Carney&Levin(2002) Considering the Carney and Levin (2002) model there is more than one option. The text mentions that a balanced program should be followed and the last sentence refers to the pictures being good exercises. The content of the pictures could be considered as being part of a balanced program and to be good exercises and thus being representational content for the content of text. The pictures also depict how the exercises should be performed thus being also of organizational content. However there is nothing mentioned in the text on how exercises should be performed. The pictures do not exactly mirror the content of the text for both types. Because the content of the pictures is more than representational it is concluded here that they are of the organizational type. Marsh&White(2003) For the Marsh and White (2003) model the pictures are samples of the contents in text, they help interpret the text and further develop the provided information by supplying more Summary The question is: how many people suffer of high blood pressure? The first two paragraphs discuss two different statistics while the 3de paragraph explains a third statistic which is also visible in the picture. The fourth paragraph explains how to read the picture. The picture contains a graph with 4 lines showing the amount of people (in percentage) with high blood pressure by age and sex. The two columns show the pressure threshold (when one has high blood pressure) by age group. Annotation Model Type(s) Carney&Levin(2002) Organizational Marsh&White(2003) B1.5 Graph, B1.7 Translate, B2 Organize, B2.4 Induce perspective, B3 Relate, B3.1 Compare, C2 Develop Martinec&Salway(2005) Status equal; independent Elaboration; exposition Carney&Levin(2002) In this case it is not clear which type should be chosen. Carney and Levin point out that the organizational type is used for pictures that provide a structural framework for the text (maps or procedures). The picture does not really represent the text; there is similar information in text but not one paragraph really describes the contents of the picture (except for the caption which start with “Figuur 1:”). This picture provides additional content to the answer, a new source. The picture contents are of organizational orientation and thus of organizational type. Marsh&White(2003) Considering Marsh and White (2003); the picture shows information in a different way than the text and thus in someway it translates. The picture organizes the information and induces perspective. The picture is a graph. The picture relates because it explains the main concepts in the text. The picture allows you to make comparison between age, sex, pressures and different statistics. The picture also goes beyond the text by developing the information further; it provides an overview of the amount of people with high blood pressure over the complete age spectrum. Martinec&Salway(2005) Because there is no direct reference to the picture in the text neither mirrors the other. Though both modes contain similar information and provide separate processes with their own information content, thus each being equal and independent of status. Because of their similar information both modes are also equally general and thus of type exposition. Sample 7 Wat gebeurt er bij een arthroscopie? Bij arthroscopie wordt er een diagnose en behandeling van gewrichtsproblemen gedaan d.m.v. een dun kijkertje (arthoscoop). Summary The asked question is: what happens with an arthroscopy? The answer states that a diagnose and treatment of joint problems is done with a thin viewer (arthroscope). The picture shows the cross section of a joint, two long devices are sticking through the skin. A hand is operating the thinest device. Most likely this is a procedure in progress on the joint. Annotation Model Type(s) Carney&Levin(2002) Representational Marsh&White(2003) B1.1 Concretize, Locate, B2.4 perspective, Complement B2.3 Induce B5.2 Martinec&Salway(2005) Status unequal; subordinate to text image Extension Enhancement (purpose) Carney&Levin(2002) The picture is representational for the text. Marsh&White(2003) The picture makes the use of the device more concrete. The picture helps to locate the use of the device. Overall the picture induces perspective and complements the text. Martinec&Salway(2005) The image is subordinate to the text. The picture provides us with a new perspective on how the arthoscopy is performed inside the body, a perspective that is not given by the text. The picture is also an example of the textual content. Thus the relation could be considered elaborative however the new perspective provides enough new information to consider this relation of type extension. The relation is also of the type enhancement because one can see a purpose of the thin viewer which could not be understood from the text; the text does not mention that the thin viewer is used inside the body. Sample 8 Welke factoren kunnen leiden tot een holvoet? De oorzaak van een holvoet is vaak een gestoorde spierfunctie in de voetspier ten gevolge van een hersen- of ruggemergaandoening. Ook kunnen de tussenbotspiertjes verlamd zijn. Zoals bij de meeste voetafwijkingen kan dit resulteren in pijnklachten en standproblemen in de knie, heup rug en nek. Deze klachten nemen toe in rustpositie zoals zitten of stilstaan. Ook kunnen er door de verkramping van de tenen problemen ontstaan met de nagels en met likdoorns en eelt. Soms is er ook sprake van Mortons Neuralgie. Sample 9 Hoe moet ik mijn werkplek inrichten om RSI te voorkomen? Summary The question is: which factors lead to a hollow foot? The answer explains that a hollow foot is often caused by a disrupted functioning of certain foot muscles, this is caused by a cerebral or spinal disease. Further the text explains that this can result in pains or standing problems in the knee, hip, back and neck. The problems increase when the body is in a resting position. Further problems can be cramps between the toes and problems with nails, callus growth and callosity. The picture shows 3 feet; a normal foot, a flatfoot and a hollow foot. Annotation Model Type(s) Carney&Levin(2002) Representational Marsh&White(2003) B1.1 Concretize, Sample Martinec&Salway(2005) Status equal; independent Summary The text here only consists out of a question and some labels in the answer. The answer is mostly formed by the picture. The picture is accompanied by some textual labels. The question sounds: how do I organize my working space to prevent RSI? The answer shows a person sitting on a desk chair at a desk. On the desk there is a monitor, keyboard and document holder. The persons position shows the correct body position to hold, while the labels together with arrows and lines indicate various advisable angles, lengths, heights and distances between the various objects in the picture. Annotation B1.1.1 Extension Carney&Levin(2002) The picture is representational because it show what a normal foot and a hollow foot looks like. Model Type(s) Carney&Levin(2002) Organizational Marsh&White(2003) B1.2 Humanize, B1.4 Describe, C1 Interpret, C1.2 Document Martinec&Salway(2005) Status unequal; subordinate to image text Extension Marsh&White(2003) The picture makes the idea of a hollow foot more concrete. It contrasts between a normal foot and a hollow foot and thus creates a sample (however it does not contrast between elements with in the text so 'B3.2 Contrast' is invalid here). Martinec&Salway(2005) Both image and text each form a separate process. The image shows what a hollow foot looks in comparison with other feet and the text provides the cause and effects of hollow foot. The picture is intended here to provide extra information on the hollow foot thus it extends the information around the hollow foot in the text. Carney&Levin(2002) In this example there is little text. In this case if one ignores the question there is little text to apply the models on. In this sample the question is included into the analysis. There is the question which sets the subject of the information and the answer which is largely formed by an illustration. This picture provides an organized answer to the question. The picture is not representational because there is no text available which it can mirror. Marsh&White(2003) Many of the types within the Marsh and White (2003) model fit the profile of this picture but most of these types are intended for pictures which mirror or represent a text and cannot be applied in this sample. The picture humanizes; it shows the picture of a man or woman which helps us to better relate the information. The picture describes; the information is presented in detail. The picture helps interpret because it put the various positions and distances in their proper location. The picture documents; it instructs the reader on how to setup his working space. Voetensteun 10. Een voetensteun kan worden gebruikt door de werkbladhoogte of om andere redenen de ondersteund moeten worden. als voeten Martinec&Salway(2005) According to Martinec and Salway (2003) the picture and its labels in this sample form two separate modes. Martinec and Salway also provide two similar examples in which there is an abstract image with generic labels and a naturalistic image with generic labels (see chapter 2.3.2). Unfortunately they do not give an example for abstract or naturalistic images with abstract labels. In this samples the labels are mostly abstract, except for the label “documenthouder” which is generic. The abstract labels with the different measurements are the most dominant in this picture and therefore here the “documenthouder” label is ignored. These abstract labels do not provide elaborative (type) content but rather extend the information content of the picture. This relation is thus of type extension. Furthermore the text is subordinate to the image; The text only provides measurement information for some components in the image, but not for all. The labels would be meaningless without the image, while the image without the labels still provides an ideational content and some information on the ideal positions of some depicted objects. Sample 10 Hoe moet ik mijn werkplek inrichten om RSI te voorkomen? Om RSI te voorkomen kun je bij de inrichting van je werkplek rekening houden met het volgende: Houding 1. Het lichaam is goed ondersteund en ontspannen. Hoek tussen bovenarm en onderarm 90 graden. Bureaustoel This sample has the same question as in sample 10 only this time there is also a textual answer. The question sounds: how do I organize my working space to prevent RSI? The answer explains how to position the body and the different objects within the working space. There are 6 subjects/objects to position and 10 directives which how these subjects/objects should be positioned. Included is a picture of a person sitting on a desk chair at a desk. On the desk there is a keyboard and a computer with monitor. Various numbered labels, which refer to the 10 directives, and lines indicate the various distances, sizes, angels and objects to position. Annotation Model 2. Rugleuning op hoogte zodat er rugsteun het holle deel van de rug is. 3. Zet armsteunen op ontspannen schouders). Summary ellebooghoogte 4. Stel stoelhoogte zo in dat de voeten plat de grond rusten. 5.Zittingdiepte: vuistbreedte ruimte knieholte en zittingrand. in (met Carney&Levin(2002) Organizational Marsh&White(2003) B1.1 Concretize, B1.2 Humanize, B1.6 Exemplify, B1.7 Translate, B2 Organize, B2.4 Induce perspective, B3 Relate, B5.2 Complement Martinec&Salway(2005) Status unequal; subordinate to text op tussen Bureau 6. Werkvlakdiepte minimaal eronder voldoende (strek)ruimte. 80 7. Tafelhoogte zo dat armsteunen en op dezelfde hoogte zijn. cm met Elaboration; exemplification; image more general Carney&Levin(2002) afhankelijk Documenthouder 9. Plaats de documenthouder beeldscherm en het toetsenbord. image toetsenbord Beeldscherm 8. De afstand oog - beeldscherm is van scherm- en lettergrootte (50-70cm). Type(s) tussen het The picture is of organizational type, it provides a framework to better interpret the position of the body and the objects. The picture could also be considered interpretational for the same reasons, however the picture does not show the workings of a complex system or process. Marsh&White(2003) The picture makes the contents of the text more concrete. The picture humanizes by displaying a human, it exemplifies by showing the essential meaning of the concepts in the text, it translates the content of the text into another form. The picture shows a spatial representation of the text content and thus; organizes the text content, induces perspective for the reader and helps to relate the text content. The picture complements the text content. Martinec&Salway(2005) The picture is subordinate to the text. The picture is related to almost all the text, but not all. The text provides information on how to prevent RSI, this could not be understood from the image. The text also supplies many details on the needed positions of the body. In fact, the picture only provides a visual representation but the overall comprehensiveness of the information would not have changed much if the picture had been left out. The image is an exemplification of the textual content and here the image is considered to be more general. Sample 11 Wat kun je doen als je een bloedneus hebt? Model Type(s) Elaboration; exposition Extension Enhancement (time) Carney&Levin(2002) As a whole the 3 pictures form an organizational cohesion. They show the steps to follow. Separate each picture is merely representational for the instruction it represents. Marsh&White(2003) The pictures reiterate the contents of the text and make it concrete. They humanize the text by giving us a depiction of a human to which one can relate. They allows us to relate to the contents of the text. The pictures concentrate the instructions because they do not explain all parts of the instructions. The pictures further explain the text content. The modes complement each other. Martinec&Salway(2005) The whole picture and the whole text are related thus their relation is equal. Both modes clarify the message in a different way but are capable of conveying the message without the other mode. They are thus independent. The text contains some extra information that is not visible in the picture: blow the nose once, let go of the nose slowly. This is extension. The text also shows “let go of the nose after ten minutes” which is an enhancement of time. The overall procedure shown in the text and image are equally general are an instance of each other, thus elaboration; exposition. Sample 12 Hoeveel kiezen heeft een mens? Summary This sample contains a textual answer within the image. For this sample the text within the image is considered separate from the depictions within this same image. Thus the answer consists out of text and a number of pictures. The question sounds: what can one do when one has a bleeding nose? The answer states shows a title, “Bleeding nose” and 3 instructions; blow the nose once, close the nose and keep the head in a writing position, let go of the nose after 10 minutes. The 3 pictures show the actions to take in the following order; blow the nose, close it in the writing position, let go of the nose. Annotation De mens heeft 20 kiezen als je ervan uitgaat dat een mens een compleet gebit heeft. Er zijn bijvoorbeeld veel mensen die geen verstandskiezen hebben en dus maar 16 kiezen hebben (zie afbeelding). De premolaren zijn de twee kiezen die zich normaal direct achter de hoektand bevinden. Ze worden ook wel eens valse kiezen genoemd, omdat ze kleiner zijn dan de molaren. Een molaar of echte kies is een tand in de achterste delen van de mond, achter de premolaren en voor de eventuele verstandskies. Een kies is een vrij grote tand die achterin de mond staat. Kiezen vermalen het voedsel met een roterende beweging. Om deze functie te vervullen hebben ze een dubbele knobbelstructuur. Het menselijk gebit van een volwassen persoon: Model Type(s) Carney&Levin(2002) Organizational Marsh&White(2003) B1 Reiterate, B1.1 Concretize,B1.2 Humanize, B3 Relate, B4.1 Concentrate, B5 Explain, 671 B5.2 Complement Martinec&Salway(2005) Status equal; independent Groen: Verstandskiezen Rood: Molaren Blauw: Premolaren Summary The questions is: how many teeth does a human have? The answer explains a human has 20 teeth or 16 for people without the wisdom-teeth. The text goes on explaining the function of the teeth, the different types of teeth and their various positions and sizes. The last line refers to the picture as being the set of teeth of an adult human. Annotation Model Type(s) Carney&Levin(2002) Representational Marsh&White(2003) B1.1.1 Sample, B1.4 Describe, B2.4 Induce perspective, B3.1 Compare, B5.2 Complement Martinec&Salway(2005) Status unequal; subordinate to text image Elaboration; exemplification; image more general Carney&Levin(2002) Beenmerg Summary The question is: where are the red blood cells created? The text explains that red blood cells are created in the bone marrow from a stem cell. The stem cell divides and creates a not yet ripe red blood cell. This unripe cell divides and becomes a red blood cell. In the second paragraph it is explained that bone marrow is a spongy red substance that is inside the bones. The text also mentions which bones. The accompanied picture shows a cross section of a real bone. 3 layers are visible 2 white layers which are the bone and a thick red layer of tissue which is the bone marrow. Annotation Model Carney&Levin(2002) Representational Marsh&White(2003) A2 Elicit emotion, A3.1 Engage, B1.1.1 Sample, Martinec&Salway(2005) Status unequal; subordinate to text The picture is representational for the contents of the text. Martinec&Salway(2005) The image relates only to a part of the text. There a number of processes in the text that do not relate direct to the image, for example considering the last paragraph dealing with the function of the teeth. The image is an elaboration of the first two paragraphs of the text. The image is more general than the text. Sample 13 Waar worden rode bloedcellen aangemaakt? In het beenmerg ontstaan alle bloedcellen, waaronder ook de rode bloedcellen, uit één celtype, de stamcel. Wanneer een stamcel zich deelt, ontstaat er eerst een onrijpe rode bloedcel. Daarna deelt de onrijpe cel zich, groeit verder uit en wordt uiteindelijk een rode bloedcel Beenmerg is de sponsachtige, rode substantie die zich bevindt in het binnenste van beenderen. Je vindt het vooral in het bekken, het borstbeen, de ribben en de ruggenwervels image Elaboration; exemplification; text more general Marsh&White(2003) The picture provides a sample of what human teeth look like. It describes parts of the text. It induces perspective by showing the positions of the teeth. The reader can compare the contents of the text with the objects depicted in the picture. The two modes complement each other. Type(s) Carney&Levin(2002) The picture is representational for the second paragraph of the text, it confirms that bone marrow is inside the bone and that it is a spongy red material. Marsh&White(2003) If the picture elicits emotion or engages the reader depends on what the reader is used to see. Someone that doesn't like to see blood or a cross section of a body part will find this image rather arresting. For someone with medical experience or other experiences which made him or her used to bloody or other such pictures, for those persons this picture will be rather normal. This picture might not even engage these persons. The picture further is a sample of what bone marrow looks like. Martinec&Salway(2005) The image is only related to a part of the text. In the image one cannot see the individual body cells or the process of creating a red blood cell which is discussed in the first paragraph. The relation is of type elaboration; exemplification; the image shows what “beenmerg” (bone marrow) looks like and corresponds with the description in the last paragraph. The text is considered more general because there are likely other pictures with bones and bone marrow in different shapes and sizes. The text however specifies exactly where in general bone marrow can be found. Sample 14 Model Explain, B5.2 Complement Martinec&Salway(2005) Wat gebeurt er bij een tympanometrie? Tympanometrie is een onderdeel van de audiometrie: bepaling van de gevoeligheid van het gehoor met behulp van elektronische apparatuur, waarbij de compilantie van het trommelvlies wordt gemeten: de mate waarin het trommelvlies meegeeft met drukverandering. Type(s) Status unequal; subordinate to text image Projection; idea Carney&Levin(2002) • één die de luchtdruk in de gehoorgang kan regelen The picture is of interpretational type. It helps to understand how a system functions. It partly helps to explain the function of each tube and it demonstrates the position of the device in the ear. • één waardoor het geluid het oor in gaat Marsh&White(2003) • één die de hoogte van het geluidsniveau meet Dit gaat als volgt: de gehoorgang wordt luchtdicht afgesloten met een soort ‘dop’, waardoor drie buisjes lopen: Via een toongenerator worden geluiden het oor in geleid terwijl een microfoon het geluidsniveau ín de gehoor-gang meet. De hoeveelheid teruggekaatst geluid door het trommelvlies en middenoor wordt als maat gebruikt: als er weinig geluid terugkaatst is het orgaan soepel, maar als het geluidsniveau echter hoog is, is dit een indicatie voor een stijf trommelvlies en middenoor, wat kan duiden op aandoeningen aan het oor. Martinec&Salway(2005) In this sample the status relation is close to an equal status. The whole picture is relevant to almost the whole text. A large part of the text discusses the process of tympanometrie which is also visible in the image. Only the last line in the last paragraph discusses how the measured results from the process are interpreted. It is this bit that is not directly related to the contents of the image. The way the information is related in the text and the picture reminds a lot of the diagram example in Martinec and Salway (2005) (pp. 353). The image presents the same process of tympanometrie as explained in the text but in a different but structured way. Here this is considered as a projection of meaning, thus the relation is of type idea. Sample 15 Summary The question is: what happens during a tympanometrie (dutch language)? The answer states: tympanometrie is a part of the audiometry. It reads the sensitivity of the hearing with electronic equipment by measuring the movement of the eardrums when changing the pressure on the eardrum. The text further explains the procedure of the test. The ear is closed off with a cap with 3 tubes. Each tube has a function. By measuring the amount of sound that the eardrum returns one can measure the stiffness of the eardrum. The accompanied picture shows an abstract cross section of the hearing system. There is a yellow cap inside the ear and 3 tubes lead inside the cap. At the end of the tubes are a number of boxes. Inside the boxes are the names of devices which are connected to the tubes. The picture is a mixture of an abstract medical depiction and a flowchart. The text does not exactly explain what these devices should do. The text only explains which activities happen on each of the tubes. With some technical knowledge the reader can understand which devices create which activity. Model The picture describes part of the text, it induces perspective into how the system works, it explains in another way (parallel) the workings of the tubes, it explains the workings of the system with the boxes. Overall the picture complements the text. Type(s) Carney&Levin(2002) Interpretational Marsh&White(2003) B1.4 Describe, B2.4 Induce perspective, B3.3 Parallel, B5 Wat is een allergie? In sommige gevallen kan het immuunsysteem (verkeerd) reageren op onschuldige stoffen, zoals huismijt, melkproducten en stuifmeel. Een allergie is een abnormale reactie van het immuunsysteem na contact met die stof. Het immuunsysteem reageert overdreven als het met deze vreemde stoffen of organismen te maken krijgt, en behandelt ze alsof ze schadelijk zijn, zoals bij bacteriën. Het gevolg is een allergische reactie of een histaminereactie. Daardoor kun je gaan niezen, een loopneus krijgen, piepende ademhaling krijgen en slijm gaan ophoesten. Je kunt ook netelroos krijgen en bij zeer ernstige allergieën kun je in shock raken, wat zelfs dodelijk kan zijn (bijvoorbeeld bij sommige ernstige voedselallergieën). Zo'n shock kan gepaard gaan met ademhalingsproblemen, vocht vasthouden en vernauwing van de luchtwegen. Veel voorkomende allergenen (stoffen waarop je allergisch kunt reageren) zijn bepaalde voedingsmiddelen, graspollen, sporen, geneesmiddelen, schoonmaakmiddelen. Ook stress kan allergische reacties geven. Sommige mensen reageren allergisch op koude, warmte, temperatuursschommelingen en druk op de huid. De meest voorkomende allergische reacties zijn netelroos, dermatitis (huidontsteking), astma en hooikoorts. Twee veel voorkomende vormen van allergie: netelroos en dermatitis Het begint bij de hypothalamus. Vanaf de puberteit geeft deze (naast zijn andere functies) steeds meer speciale signaaltjes af aan de hypofyse. Deze signaaltjes (GnRH, Gonadotrofinestimulerend hormoon) zorgen ervoor dat de hypofyse op zijn beurt weer speciale signaaltjes doorgeeft aan de testis (door middel van FSH - follikel-stimulerend hormoon en LH luteïniserend hormoon). Als de concentraties FSH en LH hoog genoeg zijn, maken de testis meer testosteron. De hypothalamus 'meet' de concentratie testosteron in het bloed. Als deze boven een bepaalde waarde komt, geeft de hypothalamus weer een ander signaaltje aan de hypofyse, zodat de productie van testosteron geremd wordt. Summary Testosteron wordt in de lever afgebroken. The question is: what is an allergy? The text explains that an allergy is an abnormal reaction of the immune system on certain substances. It further explains that as a cause one can start sneezing, get breathing problems, cough up slime, get a skin infection, get asthma, hay fever or even go into shock. Next are mentioned some more specific causes. The two pictures show two skin infections, each showing an inflated skin with red or pink spots. The pictures could be disturbing for a reader which is not accustomed to its contents. Testosteron wordt gevormd uit progesteron, het uitgangsproduct van alle hormonale steroïden, met als tussenproduct androstendion. Annotation Model Type(s) Carney&Levin(2002) Representational Marsh&White(2003) A3.1 Engage, B1.1 Concretize, B1.1.1 Sample Martinec&Salway(2005) Status unequal; subordinate to text image Summary The picture engages the user by showing possibly a shocking image. It makes the effects of allergy and what a skin infection looks like more concrete. The picture serves as a sample of allergy effects. The question is: where is testosterone produced? The answer states that testosterone is produced in the adrenal gland and for men also in the testicle. According to the text the regulation of testosterone is complicated and it gives a simplified summary. The summary says that the process starts at the hypothalamus. From puberty this gives special signals to the pituitary gland. These signals, GnRH a hormone, makes sure that the pituitary gland gives signals to the testicle (by means of FSH and LH hormone). The testicle will create testosterone at the right amounts of FSH and LH hormone. When the hypothalamus detects enough testosterone in the blood it will give a different signal to brake the production of testosterone. The liver absorbs testosterone. The accompanied picture is a flowchart of the testosterone production process. Martinec&Salway(2005) Annotation Elaboration; exemplification; text more general Carney&Levin(2002) The picture serves as an example of one of the mentioned effects of an allergy, thus being representational. Marsh&White(2003) The text mentions many things about allergies while the pictures only show two possible allergic effects. The relation is thus unequal and the image is subordinate to the text. One can see that the image serve as an example of the mentioned allergic effects and thus are of the exemplification type. Model Carney&Levin(2002) Organizational Marsh&White(2003) B1 Reiterate, B1.4 Describe, B1.7 Translate, B2 Organize, B2.2 Contain, B2.4 Induce perspective, B5 Explain, B5.2 Complement Martinec&Salway(2005) Status equal; independent Sample 16 Waar vindt de productie van testosteron plaats? Testosteron wordt geproduceerd in de bijnieren en (bij mannen) in de testis, ongeveer 7 mg/dag bij mannen en 1-2 mg/dag bij vrouwen. Onderdeel van de testis zijn de Leydig-cellen, waar cholesterol een conversie ondergaat naar testosteron. De regulering van testosteron is ingewikkeld; hier volgt een vereenvoudigde samenvatting: Type(s) Projection; idea Carney&Levin(2002) The picture is of organizational type because it provides a framework for understanding the process explained in the summary. Marsh&White(2003) The picture reiterates, describes, translates and organizes the process described in the summary. The picture is of type contain because it is a flowchart. It induces perspective into, and explains the process. The text and picture complement each other to convey the information. Annotation Model Carney&Levin(2002) Decorational Marsh&White(2003) A1 Decorate, A3.1 Engage, A3.2 Motivate, B1.1.1 Sample, B1.2 Humanize Martinec&Salway(2005) Status equal; independent Extension Martinec&Salway(2005) Both picture and text can convey their message independent of each other and are thus equal of status. The picture projects the meaning of a part of the text. This sample is similar to the example given in figure 16 of Martinec and Salway (2005) on page 353. Sample 17 Welke complicaties kunnen optreden bij mazelen? Mazelen wordt veroorzaakt door een virus. Besmetting vindt plaats via druppeltjes die met hoesten en niezen worden verspreid. Mazelen is zeer besmettelijk Complicaties: Middenoorontsteking (10-15% van de gevallen) Longontsteking Hersenvliesonsteking (ongeveer 1 per 1000 gevallen) Er is geen behandeling voor mazelen, behalve bestrijding van koorts en pijn. Type(s) Carney&Levin(2002) If one looks at this sample without considering any prior knowledge on the measles then the picture does not have a clear relation with the text. There is no reference to the boy in the picture nor to the spots on his skin. The reader however could conclude that the spots on the boys body are an effect of the measles without any further information. Because the picture does not mirror the text one cannot say that this picture is representational. Thus one is left with the decorational type. Marsh&White(2003) Equal as mentioned above for Carney and Levin (2002), also in the Marsh and White (2003) model this picture decorates the text. The picture engages the attention of the viewer. Together with the descriptions in the text it also motivates a response from the viewer. The picture could be considered a sample however this is not confirmed in the text. The reader could conclude that the spots on the boys skin are an effect of the disease and thus this picture also forms a sample. The picture humanizes fore the reader can relate to the state of the young boy. Martinec&Salway(2005) Both picture and text form separate processes and are independent. The picture provides new information considering the skin effects of the measles on young children. The picture thus extends the information in the text. Sample 18 Wat is een allergie? Een allergie is een overgevoeligheidsreactie van het afweersysteem van het lichaam op onschadelijke stoffen (deze stoffen bevatten altijd een soort eiwit), zoals luchtwegallergenen (bv. huisstofmijt), voedingsmiddelen, geneesmiddelen, schimmels, enz. Summary The question is: which complications can happen with the measles? The answer states that measles are caused by a virus. Infection is caused by liquid drops that are spread by coughing or sneezing. Measles is very contagious. Follows is a sum up of the complications: ear infection, pneumonia and neuromeningeal infection. There is no treatment for measles except for combating fever and pain. The accompanied picture shows a young boy with a skin rash over the complete upper body and face. Although not mentioned this is probably caused by the measles. Een allergische reactie geeft klachten van neusverstopping, niezen, snotteren, tranende ogen en jeuk aan ogen, neus, keel en/of huid. De belangrijkste veroorzakers van allergieën: Huisstofmijt Bloeiend gras Summary Summary The question is: what is an allergy? The answer states that an allergy is an overreacted response from the protection system on certain substances. Further are described some effects of an allergy. Two picture are included both with label. The text mentions that these are the most important causes of allergies. The first picture is of a dust mite, the second picture is of a grass field in bloom. Annotation Model Type(s) Carney&Levin(2002) Representational Marsh&White(2003) B1.1 Concretize, Sample B1.1.1 Martinec&Salway(2005) Status unequal; subordinate to text image Elaboration; exemplification; text more general The question is: how many people have a to high blood pressure? The answer consists out of a textual answer a picture. The textual answer states that the medial term high blood pressure is hypertension. It further explains that hyper means strongly and tension means pressure. It explains that in the Netherlands live 16 million people and that amount of people with hypertension is about one million people. The accompanied picture is a pie and bar chart. The pie shows the percentage and amount of patients ill with: illness to the longs, cancer, cardiovascular disease, and other diseases. The bar shows a more detailed division of the cardiovascular disease part of the pie: myocardial infarct, stroke and other cardiovascular diseases. Annotation Model Carney&Levin(2002) Representational Marsh&White(2003) B1.4 Describe, B1.5 Graph, B2.2 Contain, B2.4 Induce perspective, B4.1 Concentrate, B5.2 Complement , C1.2 Document, Martinec&Salway(2005) Status equal; independent Carney&Levin(2002) The picture is representational because it shows two examples of causes of allergy which are mentioned in the text. Extension Marsh&White(2003) The pictures makes more concrete the general idea that the reader has about the causes of an allergy. Furthermore they are a samples of allergy causes. Martinec&Salway(2005) The image is subordinate to the text because the subject “allergy” is set by text, also the purpose of the pictures in this answer is given by the text. The pictures are an exemplification of the causes of an allergy. They are however more cause then the two shown and thus the text is more general. Sample 19 Hoeveel mensen lijden aan hoge bloeddruk? De medische term voor hoge bloeddruk is ‘hypertensie'. Hyper betekent 'in zeer sterke mate' en tensie betekent ‘druk'. In Nederland wonen 16 miljoen mensen en het aantal mensen waarvan bekend is dat zij hypertensie hebben loopt door naar het miljoen. Type(s) Carney&Levin(2002) The picture is of representational type. Although it does not mirror the contents of the textual answer it is connected to some extent. High blood pressure is part of the group of cardiovascular diseases and thus it does provide extra information to the answer. So the picture is not decorational. Neither does the picture provide real organizational information to the textual answer. A problem with this sample is that neither the textual answer nor the picture really answer the question, however the picture is the most close in doing so. If the relation between the question and the picture would be considered then the relation would be organizational. Marsh&White(2003) The picture describes the answer, it is a graph and is of type contain (diagrams and enclosing graphics). The picture induces perspective in the division of different diseases, it concentrates the answer by giving a brief answer to the question. The text and picture complement each other. The picture documents because it supplies factual support. In some sense this picture also compares (B3.1) and contrasts (B3.2) between the different disease groups however these function types are only intended for comparing and contrasting elements found in a text. Martinec&Salway(2005) Both text and picture form separate processes and thus are independent. They each extend each other with information on cardiovascular diseases. Sample 20 Hoe kan ik mijn buikspieren versterken? (uitgebreid) Er zijn vier soorten spiergroepen uitwendig zichtbaar. Bovendien liggen er onder deze groepen twee spieren die níet zichtbaar zijn. Voor al deze groepen bestaan specifieke oefeningen: crunches voor ‘middenspieren’ (het recht samentrekken van buikspieren) en ‘zijspieren’ (waarbij het lichaam naar één zijkant wordt gebracht) Voor de dieper liggende spieren is het ingetrokken houden van de buik een goede oefening. Dit moet volgehouden worden tot het ‘brand’. Er moet opgemerkt worden dat buikspieroefeningen géén vet verbranden. Als de versterkte spieren ook zichtbaar moeten worden is cardio-workout noodzakelijk. De vijf soorten spiergroepen die uitwendig zichtbaar zijn. summed up. The accompanied picture is a medical cross section of the abdominal area, showing the 5 mentioned muscle groups. Annotation Model Type(s) Carney&Levin(2002) Interpretational Marsh&White(2003) B2.4 Induce perspective, B3.3 Parallel, B5.2 Complement , C1 Interpret Martinec&Salway(2005) Status unequal; subordinate to text • Rectus Abdominis, de rechte buikspier, ook wel ‘sixpack’. • Obliques Externus, de buitenste schuine buikspier. Elaboration; exposition • Obliques Internus, de binnenste schuine buikspier. Enhancement (place) • Transversus Abdominis, de dwarse buikspier. image Carney&Levin(2002) The picture shows something that would be more difficult to explain by text only. It shows the different positions of the 5 muscle groups and how they are layered. Therefore the picture is of interpretational type. The information content in the image could also be considered to have an organizational function, in an organized manner the structure of the abdominal muscles is visible. However the description for the organizational function shows it has to be applied to pictures which provide a structural framework for the text. Marsh&White(2003) Summary The question is: hoe can I reinforce my abdominal muscles? The answer states that there are 4 muscle groups visible from the surface, under these group there are 2 more groups which are not visible. The text mentions that for all these groups there are specific exercises. The text mentions these exercises. Further the text mentions that these exercises do not burn any fat and that if the muscles need to be visible also a cardio workout is necessary. The text then mentions there are 5 groups of muscles which are visible from the outside, these groups are The picture induces perspective. It gives a parallel option for the 5 muscle groups. The picture complements the text and helps to interpret the positions of the 5 muscle groups. Martinec&Salway(2005) The picture only describes a part of the text and thus is subordinate to the text. Though the picture does not cover all the text it is neither an example of the text. It is more equally general to the information given in the text. Furthermore the image enhances the text by showing the place of certain muscles groups.