A study of three models for image-text relations

Transcription

A study of three models for image-text relations
D.S. Kornalijnslijper
[email protected]
ABSTRACT
Images and texts placed together in media are likely to be
related. These modes share a relation. Various applications
which use image-text combinations can benefit from a model
which describes the features of this image-text relation. This
research has examined three models which can benefit the
IMIX question answering system. IMIX answers medical
questions and uses image-text relations to retrieve suitable
images for its generated answers. Twenty image-text samples
from a medical information corpus have been annotated to
create a better understanding of the three models. The results
show the advantages and disadvantages of the models and their
individual differences. The models are not only different in
their structure and design but also in the information content
that they can describe.
Keywords
image-function, semantic-relation, logico-semantic, image,
picture, text, relation
1. INTRODUCTION
In many media one can find images and text existing and
working together. Both images and text express information
and when these two modes are placed together their information
content likely relates to each other. Recent research is
conducted towards creating models to represent these semantic
relations. One could use such a model to classify a relation or to
identify different types of relations. For example, one could
distinguish between image-text combinations in which the
image has purely a decorational purpose or where the image has
a high information value (for example a diagram). One could
also think of image-text combinations in which the text only
plays a subsidiary role, for example captions under newspaper
images or captions guiding painted art. A detailed model could
also describe certain specific properties of the relation, allowing
a more detailed identification of the image-text pair. Some of
the existing models only offer a single sided viewpoint of the
relation, considering only the function of the image in the text.
This is because text commonly is considered to be the main
carrier of information and the image only to play a
supplemental role. The use of images (by which meant static
images) in media has strongly increased and one especially
finds many images in new media like the Internet, but also in
comic- and children's reading books. To a lesser extent but not
to lesser importance, one also finds images in technical books
and manuals where images among others explain the workings
of complex processes. There is thus an increasing demand for
models that can handle a mutual relation between two modes.
Models that support relations where the text complements the
image or two modes are dependent of each other.
Before continuing to explain the relational models, briefly we
will consider the applications for these models and our goal in
this paper. End users and developers of text and image media
could benefit from an image-text relation model. One of the
examples are authors and editors who want the most effective
relation between image and text. Another example are
researchers who study the effectiveness of media and
researchers of information sciences who seek to use and
understand the media. The research in this paper will help
improving a system in development at our department.
This system, called IMIX (Interactive Multimodal Information
eXtraction), is a multimodal question answering system.
Essentially, the system can answer a question asked by a user
operating the system in natural language. Mostly these systems
have a limited but expertise knowledge on a specific knowledge
domain. For the IMIX system it is the medical information
domain. The system can answer questions on medical problems
describing symptoms, causes and cures. The interaction
between the user and the system is multimodal and thus the
communication happens in diverse ways. IMIX understands and
answers questions written and spoken out loud in the Dutch
language. The user of the system can enter a question written,
spoken in Dutch or in combination with highlighting specific
pieces of text which have resulted from earlier interactions with
the system. The system will analyze the question and will
match it with the contents of the knowledge base, this can result
in multiple possible matches which could answer the question.
These matches are in turn analyzed to find the most suitable
answer. Next, the system creates an answer in textual form and
in this step it will also try to include a fitting image to improve
the effectiveness of the answer. As last the system will present
the user the created answer on the display and spoken out loud
over the system speakers.
With relational models for text and images we want to improve
the process of finding a suitable image. The knowledge base of
IMIX contains both texts and images and a model could
annotate the existing image-text relations. One annotates by
first understanding the properties of the image-text pair (based
on the used model) and then by classifying the pair with a
relational- function or type that describes the properties of the
image-text pair. The annotated pair will contain information
that shows the role in which a mode has been applied.
In this paper we analyze three different models which
potentially could be used with the IMIX system. Our approach
will be the following:
•
explain how the three models function,
•
use the three models to annotate twenty image-text pair
samples from a medical information corpus. The goal is to
create a better understanding of these models,
•
discuss the results of the annotation process and analyze
how the models function in the medical information
domain,
•
draw a conclusion and offer recommendations.
1.1 Functions and types
As mentioned in the introduction, prior studies generally
discuss the 'function of images' in text. In this kind of
relationship the image supplements the text. These studies
exclude any other arrangements of the modes. However, with
the increasing use of images the other arrangements, where the
text serves the image and the cases where both modes are
dependent or independent of each other, should no longer be
ignored. In relations where both modes are dependent of each
other one can no longer speak of 'functions' since both serve
each other. Here we would prefer to use a more general term.
For this paper we will use a term used in Martinec and Salway
(2005), namely; 'types of relations'. Each different ‘type of
relation’ identifies a different relation between image and text.
For clarity we will still use the term 'function of images' with
the models that are limited to relations for image functions in
text.
•
Representational pictures depict that what the text
describes, partly or completely. Some representational
pictures go beyond the text and depict more than the text
describes. An example is a picture of painting and a text
describing the contents of the painting.
•
Organizational pictures show structural information the
text contains. Usually the image depicts the information in
steps. For example an illustration showing the necessary
steps to take in case of an emergency or the example
described in Carney and Levin (2002); an illustrated map
of a hiking trail.
•
Interpretational pictures help to depict information that
would be more difficult to explain and communicate with
only text. Examples are the pictures showing the workings
of machinery or complex models.
•
Transformational pictures, as described in Carney and
Levin (2002), “include systematic mnemonic (memory
enhancing) components”. These components depict
information from the text in a literal sense, though the
whole picture itself doesn’t necessarily depict the intended
information literally. The example of Carney and Levin
depicts information on the town Belleview. First, a bell in
the picture represents the part of the town name “Bell”.
Inside this example there are more components which are a
literal translations of the text. The reader should associate
these components with the contents of the text and should
store the mnemonic picture in his or her memory. It should
then be easier for the user to retrieve the more detailed
textual information from memory.
1.2 Setup of this paper
Chapter 1 introduced image-text relations and explained why
there is the need to use models to describe these relations. The
chapter also discussed the goal of this paper and clarified the
terms 'functions' and 'types'. Next, chapter 2 gives a summary
on three different models and it will explain how they function.
Chapter 3 discusses the annotation process and its results. For
this research a selection of image-text pairs has been annotated
using the three models to create a better understanding of the
models. All image-text pairs and the resulting annotation are in
appendix A. Chapter 3 also gives some guidelines to the
annotation process and briefly discusses the multiplicity of
annotation. Chapter 3.1 discusses the results of the annotation
process. In chapter 4 the conclusions are drawn on the analysis
done in chapter 3 and recommendations are made on further
research into this subject.
2. PRIOR RESEARCH
The following chapters discuss 3 different models of earlier
studies, the first being of Carney and Levin (2002). They
studied the function of images in text as a means to understand
the educational value of images in educational text better. The
second model is of Marsh and White (2003). They created a
taxonomy of prior models in an effort to create a general model
for image-text relations. The third model is of Martinec and
Salway (2005) who have taken a theoretical approach towards a
general model for image-text relations. Each model uses
different types of relations and the following chapters will
explain each of these types. We will use the information
provided in the papers. An exception is the model of Martinec
and Salway which they partly based on theories from Halliday
(1994). To better understand this model study of some parts of
Halliday (1994) were necessary.
2.1 Carney and Levin
Carney and Levin (2002) discuss the function of images in text
and do not look at any cases where text serves a function to
images or cases where both modes are of equal importance.
They distinguish five different functions; decorational,
representational,
organizational,
interpretational
and
transformational functions of images in text. One can assign to
each image-text relationship one single function type.
Following here is a description and explanation of each
function:
•
Decorational pictures only serve to decorate the text, they
contain little or no additional information to the text in the
document. For example a picture of the sun in a traveling
brochure for Egypt would have a decorational function.
2.2 Marsh and White
The second model is that of Marsh and White (2003). Marsh
and White also discuss the function of images in text and they
also do not consider cases of text functions for images or cases
where both modes are of equal importance. They created a
taxonomy of image functions in text and based their function
types on earlier studies. The resulting taxonomy contained
many, though with different names, similar functions. They
filtered these under a common name. And following they tested
and adjusted the final taxonomy. The resulting taxonomy
consists of 49 image functions (see Table 1). There are 3 levels
of precision, each level being more specific than the other. The
first precision level contains 3 general image functions. The
second level 11 functions which expand the first level. The
third level contains 30 image functions which are again an
expansion of the 2nd level.
The 3 general image functions from the first level represent 3
types of strength of relation between image and text. The first
group (A) contains functions of images that express little
relation to the text. The second group (B) contains functions of
images expressing a close relation to the text. The last group
(C) contains functions of images where the image expresses
more information than the text expresses (Marsh and White
describe this as “functions that go beyond the text”). One can
describe a relation between image and text by more than one
function. One can combine different functions to create a more
detailed description of the relationship. The coming section
briefly explains the types of the 2nd level. For more details on
and the complete descriptions of the various functions we refer
to the appendix of Marsh and White (2003) page 666-672.
In group A contains the following function types of the 2 nd
level: decorate, elicit emotion, and control.
•
Type decorate is a function for images that make the text
more attractive without having any substantial affect on
understanding the information,
•
Type elicit emotion is for images that display a content
that provokes a certain emotion,
•
Type control is for images that exercise a restraining or
directing influence on the reader. The content holds the
attention or encourages a response from the reader.
Excluded are responses which are primarily emotional
those are part of type elicit emotion, group A.
that expand or supply more or extra information than contained
within the text. The function types in group C describe relations
where image and text are independent or interdependent of each
other.
Table 1 Taxonomy of functions of images to the text. Marsh
and White (2003), page 653.
In group B are the following function types of the 2nd level:
reiterate, organize, relate, condense, explain.
•
Reiterate describes images that repeat that information in
the text with minimal change or interpretation,
•
Organize shows the information in a structured form and is
often applied to display the information in a way which is
better explained graphically than textually. Examples are
diagrams, charts, maps and other forms which clarify the
information in a more organized form (not necessarily of
diagram or map style),
•
Relate is for images that refer to processes or concepts
contained within the text. Types of the 3de level used by
Marsh and White here are; compare, contrast and parallel.
•
Condense is a function type for images that makes the
information more compact or which reduces them to their
essential elements.
•
Explain makes the information plain or understandable, it
can only be applied if the contents of the image follows the
text closely otherwise a similar function from group C has
to be used. Explain type is for images that define the text
by identifying the essential qualities or meaning of the
information, or complement the text by helping to transfer
the intended information.
In group C contains the following function types of the 2 nd
level: interpret, develop and transform.
•
Interpret clarifies complex textual concepts into more
concrete forms. The image can emphasize the text or
provide factual or substantial support.
•
Develop expands the information in the text by providing
more details, by illustration, or by closer analysis of the
information.
•
Transform puts the information in the text into another
form. The information is recoded, related to each other, or
organized to improve recall (for example mnemonic
images). This type also includes images that continue from
where the text stopped or take turns with the text to
provide information. The images in this type also model
ideas which cannot be represented or understood by text.
Examples provided are cognitive and mechanical
processes.
Though Marsh and White do not mention this specifically, one
could consider group C to be a step towards to a more complex
model. A model which also supports relations where the text
has a function to the image and where both text and image have
an equal relationship. Group C identifies functions for images
2.3 Martinec and Salway
Martinec and Salway (2005) describe a mutual relationship
between image and text, their model does not discriminate
between the two modes. The descriptions of their relational
types do not refer to images or text but refer to modes
interacting together in a multimodal relation. The model is, as
they call it, “a generalized system of image–text relations”. One
can use it to describe relations where the image serves the text,
where the text serves the image, and where image and text are
equally dependent or independent of each other. Martinec and
Salway based the model on earlier work of Barthes (1977a,
1977b) and Halliday (1994). The parts that they used from
Halliday (1994) solely concentrate on text, however by
combining it with the work of Barthes 1 they created a model
that works on text and images. According to Martinec and
Salway (2005) others used this idea earlier (Martinec and
Salway (2005), pp. 340), however they used it solely for
specific examples of multimodal relations. Martinec and
Salway state that one can use their model for all image-text
relations, for old and new media.
In the model are two kinds of relations, a status relation and the
logico-semantic relation. Each image-text relation has a status
relation and a logico-semantic relation (see Figure 3). A relation
has one status and one logico-semantic type. Martinec and
Salway though show in their examples that between an image
and text more than one relation can exist. Different components
in the image and text can have different relations between one
another.
2.3.1 Status relations
The status relation indicates the relative status between text and
image. There can be an equal or an unequal relationship. In an
unequal relationship one mode is subordinate to the other (one
1
Barthes, R. (1977a). The Photographic Message, trans. Heath
S., in Image–Music–Text, Fontana, London, 15–31. Barthes, R.
(1977b). Rhetoric of the Image, trans. Heath S., in Image–
Music–Text, Fontana, London, 32–51.
mode serves the other), in an equal relationship the modes are
either independent or complementary to each other.
an opposite effect and cause the image to be subordinate to the
text, see Figure 2.
When the whole image relates with the whole text, both modes
are in an equal status relationship. When both modes depend on
each other or modify each other equally then their status
relation is complementary. When both modes can exist in
parallel as two separate processes without relying on each other
both modes are independent.
2.3.2 Logico-semantic relations
Logico-semantic relations are in two main types, expansion and
projection. Expansion, as the word says; one mode expands the
other mode. Projection repeats in one mode that what the other
mode is showing. Projection is divided into locution and idea.
Expansion is divided intro three types: elaboration, extension
and enhancement. First we will explain the subtypes of the
expansion type.
Figure 1 News photograph with caption in present tense.
"(unreadable name) walks up the courthouse steps with his
legal team in a recent photo." The text is subordinate to the
image. Martinec and Salway (2005), pp. 348.
Figure 3 Network of combined status and logico-semantics.
Martinec and Salway (2005), pp. 358.
One uses the elaboration type when one mode provides a more
detailed description of the other mode. The elaborating mode
does not necessarily provide new information but elaborates on
the information in the elaborated mode. Elaboration has two
subtypes; exposition and exemplification. One uses exposition
when two modes present the same information but in a different
form or presentation method.
Figure 2 News photograph with caption in past tense.
"Marian Bates died protecting her daughter" The image is
subordinate to the text. Martinec and Salway (2005), pp.
348.
Modes are subordinate to each other when one relies on the
other. Image subordination is realized when the image relates
with only a part of the text. Text subordination comes in two
forms; or by direct reference to image or by “the combination
of material or behavioral processes with simple present or
present progressive tense” (Martinec and Salway (2005), pp.
347). One can often see this in news photograph captions, see
Figure 1. Material and behavioral processes in past tense have
Figure 4 Example of elaboration, examplification, image
more general. Martinec and Salway (2005), pp. 350.
Both modes are equally general and restate each other.
Exemplification further expands the information that was
available in the expanded mode by providing a more specific
example or by providing a more specific instance of the
information. Again, the elaborating mode does not provide new
information but elaborates on the information in the elaborated
mode. An example of exemplification is given in Figure 4. The
skull and crossbones in the picture are a generally recognized
symbol of death. The process 'kills' will eventually lead to death
and is thus associated with death. The image is more general
than the text; the text mentions a specific method of killing of
prey. As the expanding mode is an example or an instance of
the expanded mode the latter mode most be more general than
the prior mode. Exemplification is subdivided into “text more
general” and “image more general”.
One uses the extension type when one mode extends the
information of another mode. The extending mode adds a new
element, gives an exception or offers an alternative on the
extended mode. The extending information cannot be extracted
(it cannot be seen or read) from the extended mode and is thus
new information. However the information in the two modes
need to be related. An example is given in Figure 5, the image
shows a crossed fork and knife which commonly in western
cultures symbolizes the process of eating. Together both modes
suggest that “fish and small prey” can be eaten (from Martinec
and Salway (2005) pp. 363). One could also replace the fork
and knife with a hunting rifle and fishing rod, suggesting the
possibility of hunting on fish and small prey. The image
extends the text with a behavioral process.
Figure 5 Example of extension. Martinec and Salway (2003),
pp. 363.
The extension type used in Martinec and Salway (2005) is a
reinterpreted version of the version used in Halliday (1994)
and includes the participants in material and behavioral
processes (Martinec and Salway (2005), pp. 363).
The third subtype of expansion is the enhancement type. A
mode enhances another mode by referencing it with
circumstantial information; a place, a time, a purpose, or a
reason. To explain better we will give some textual examples of
enhancement. For example: an artwork image is enhanced by a
text when the text explains where or when the artwork was
crafted. The text then enhances the image by place and by time.
A mode that enhances by place can also show a spatial location
where the enhanced mode takes places. To give an example: a
text shows: “Mary lost the train” and an image shows: the train
station where Mary lost the train. In this example the image
enhances the text by location. Enhancement by reason expands
a mode by explaining a cause or giving a reason for the event or
processes in the enhanced mode. For example: a text showing
“Mary arrived late” is enhanced by an image of Mary actually
losing the train. As last an example of enhancement by purpose:
an image depicting a clock is enhanced by a text showing “a
clock shows the current time”. The enhancement type is similar
to the extension type; both identify relations where the
expanding mode supplies new information. However expanding
modes enhance another mode by providing a circumstantial
setting.
The second main type in the model is the projection type and it
has two subtypes: locution and idea. As stated in Martinec and
Salway (2003): “Locution is a projection of wording, usually by
a verbal process, and idea a projection of meaning, most often
by a mental process.”
Figure 6 Example of projection in comic books, left a
thinking bubble, right a talking bubble. Martinec and
Salway (2005), pp. 352.
Martinec and Salway explain projection in two different
contexts; comic books and a combination of diagrams and text.
The difference between locution and idea is clear for comic
books, locution represents image-text pairs where the text is
placed into a “talking bubble” and idea where the text is placed
into a “thinking bubble” (see Figure 6). This is a
straightforward and literal interpretation of the projection type
and follows closely the textual interpretation of projection by
Halliday (1994). Martinec and Salway also use projection from
a slightly different point of view. This is the most obvious with
projection of meaning (idea). A mode can project the
information from another mode; the projecting mode projects or
restates the meaning of the projected mode. This is similar to
the elaboration type. However the elaboration type expands the
information by example while idea projects the meaning of the
information by restating it. The example Martinec and Salway
use for projection of meaning in this form are diagrams,
however there are some exceptions.
Figure 7 Example of images where the labels are not part of
the image. Image has abstract content and is of type
exposition. Martinec and Salway (2005), pp. 353.
A diagram-text combination can be of type projection,
exposition or 'text more general' (exemplification, elaboration).
Which type one has to apply depends on whether the text is part
of the image, or whether the text is a separate mode that is in a
relation with the image. They explain this in the following
manner; text and image are separate modes if the image
provides the ideational content and the text only serves as labels
for the content of the image. The image provides the ideational
content when the image alone contains the general concept of
the displayed information. The text then only supplies
information on parts of the image. Both images in Figure 7 and
8 are not of type projection and are an example of when the text
is a separate mode in relation with the image. Both images
provide the general idea for the information and the text
provides further details on the image contents. The difference
between exposition and 'text more general' is on basis of
generality and abstractness. If the textual labels are generic and
the image abstract (for example technical drawings) then the
relation is of type exposition (see Figure 7). If the textual labels
are generic and the image is of natural content (for example a
photograph) then the relational type is text more general (see
Figure 8).
Figure 9 Example of ideational content. Martinec and
Salway (2005), pp. 354.
Figure 8 Example of images where the labels are not part of
the image. Image has natural content and is of type text
more general. Martinec and Salway (2005), pp. 353.
When it is the text that provides the ideational content and the
drawings only serve to enclose or separate the text then the
drawings and text together form the image. In this case the
picture is of type projection. The image itself can have a
relation with another text outside the image which gives more
detailed or other related information on the contents of the
image (see Figure 9).
The relation between the text and the text inside the image is
based on lexical and componential cohesion. If one regards the
drawings in the image only as separators of the labels then the
relation can be considered to be between text and text.
Martinec and Salway go on further into this componential
cohesion and mention that more than one logico-semantic
relation can exist between image and text. When smaller
components within a mode refer to components in the other
mode then also these components share together a logicosemantic relation. Two modes have different levels of logicosemantic relations, ranging from a small relation between the
smallest components to the main relation between the complete
modes. This paper will stay limited the main relation between
image and text. However analysis of the different components
within image and text could perhaps provide a more detailed
annotation, resulting perhaps in more specific and better
matching results.
Figure 9 is accompanied by the following text: “In looking at
the commonalities among the three disciplines, design and
marketing tend to both focus on desirability of a product – the
brand and lifestyle images, ease of use, and costs to take into
account the aesthetics. Marketing and engineering both focus
on usefulness of a product – the functional features, platform
upon which the product is built, safety and reliability issues,
and production costs. And design and engineering both focus
on usability of a product – the ergonomics, interface with the
product, the integration of the different features and associated
costs, the selection of material, and manufacturing. Each
overlap is secondarily also concerned with the other two value
attributes, but the primary driver of interaction is as indicated.
The point is that the usefulness, usability, and desirability of the
product stem directly from the interaction between the
disciplines. Thus, it is the overlaps between disciplines that
define the value of the product to the consumer, the value that
leads to success in the market and profit for the company (as
shown in Figure 6.2).”
3. ANNOTATION
To further study the 3 models they were used to annotate 20
image-text pair samples. These samples were selected from the
corpus from Hooijdonk et al. (2007). The corpus consists of
medical questions and answers. The researchers in Hooijdonk et
al. (2007) gave students the assignment to answer these medical
questions as best as they would see fit. The answer could also
include other media than text. The result is a corpus of medical
questions with diverse answers and diverse make-up. The
questions and answers in the corpus are similar to those used in
the IMIX QA system. For this paper we filtered out all corpus
samples which did not contain any images. From the remaining
subset of image-text pairs we selected at random the 20
samples. Only one annotator annotated the samples. This causes
some limitations to our final analysis on the model. These
limitations we will address now.
The assignment of functions or types to relations does not
always go right during annotation. This can be due to various
reasons. Sometimes the annotator can doubt between the
different types to assign to a relation or different annotators
might disagree on each others individual choices. Mann and
Thompson (1987) (page 26-30) name five causes to multiplicity
of annotation: boundary judgments, text structure ambiguity,
simultaneous analysis, differences between analyst and
analytical error. Boundary judgment happens when a relation
falls between two possible types and the annotator has to
choose to which type the relations belongs. Text structure
ambiguity is relevant to ambiguity in text. However one can
also see this as normal ambiguity; a single analysis of the
relations leads to the assignment of more than one type and
neither type can be discarded. Simultaneous analysis happens
also when more than one type is valid but is different from
ambiguity. Simultaneous analysis happens when more than one
analysis of the relation is valid and leads to assign multiple
types. Differences between analyst can lead to differences in
assignments. Analyst can have different experiences with the
presented information and could assign relation types
differently. According to Mann and Thompson (1987) this
happens infrequently and often leads too agree that a particular
ambiguity exists. Analytical error happens when the analysis
preformed by the annotator was erroneous, this can be caused
by wrong analysis of the relation or wrong application of the
model. Wrong application of the model is not always due to a
mistake of the annotator. Not all models are as well
documented as one would hope for and leave the annotator
poorly informed. Another cause are models which were created
for specific domains and cannot be applied without difficulty on
other domains. Seen the above problems, an annotation by a
single annotator will form a handicap. There is no reference to
compare neither the results with nor a comparison to filter out
any annotation differences or errors. Unfortunately at this point
there is not the time to perform a broader analysis with multiple
annotators. We will use the results of the single annotation only
to provide more insight into the models. This will help to
explore the advantages and disadvantages of the models and
their differences and suitability in the medical information
domain. Our research is thus an effort to create an initial
understanding of these models. Considering the first 3 problems
of Mann and Thompson (1987), the annotator has noted down
any met annotation problem in the sample comments.
Annotation with the first two models Carney and Levin (2002)
and Marsh and White (2003) will be exclusively done
considering the function of the image in the text. The Martinec
and Salway (2005) model needs a deeper analysis; first needs to
be determined the status relation between image and text and
following the semantic relation. The status relation dependents
on what one considers to be the “whole text”. An image and
text are of equal states when the whole image is related to the
whole text and the whole text is related to the whole image.
Thus we need to set a size for the whole text. Martinec and
Salway (2005) take as a maximum size a paragraph. The image
can also be related to a larger piece of text. The samples used in
this paper do not contain large texts and the accompanied
images complement the answer. Therefore we will consider the
whole text as being all text in an answer. There is another issue
with images and labels. In some cases Martinec and Salway
consider the labels of an image not to be part of the image but a
separate text in relation with the image. For this paper we
adapted this approach; these labels are considered part of the
whole text in the sample.
As mentioned earlier each sample consists out of a question and
an answer. When possible, the annotator tried to ignore the
relation between a question and its answer. The images in the
samples are part of an answer. To analyze the questions one
would therefore not only need to analyze the relation between
image and text in the answer but also the relation between the
question and the answer. However the questions in the samples
were not excluded because in almost all cases they help to
clarify the subject of the answer. Most of the sample have the
image positioned after the text. However to avoid any
complications the relative position of the image to the text was
ignored.
The samples can be found in the appendix A, all samples are
written in Dutch natural language. Every sample is followed
with a summary and a description of its contents in English.
Following are the results of the annotation. The results are
stored in a table and are accompanied by comments on the
annotation process. The annotator has used all three models to
annotate the samples.
3.1 ANNOTATION RESULTS
In this chapter we will discuss the results of the analysis. For all
three models we will discuss their advantages and
disadvantages that have been noticed during annotation.
Carney and Levin (2002) is oriented on the educational value of
different image functions in text. Because of this the paper does
not give much attention to explain the constructs of the model.
The paper gives a short explanation of the functions and some
examples, which are both specific to the domain of educational
media. While applying the model to another domain one is left
with some open interpretation of the model. The model is fairly
small and contains only 5 image functions. This reduces the
complexity of the model. The decorational function is fairly
straightforward. This type is also found in the Marsh and White
(2003) model. One should apply it when the image has only an
aesthetic purpose in the text. However some interpretations
(from different annotators) of aesthetic could be less rigid than
others. For example sample 1 in appendix A; The image of the
syringe was annotated as having a decorational function
because there is no direct reference in the text to the used
administration method for the “DKTP-vaccine” (DKTP
vaccination). One could also argue that the image of a syringe
is an example of a possible administration method and thus has
a representational function. Generally one uses the
organizational and interpretational functions for images which
provide a better understanding of the informational content,
while the representational function just mirrors the text content.
The organizational and interpretational functions are often
assigned to images which present information in a more
comprehensive form than any textual form. Subsequently, these
images can also contain extra or more information. Beyond that
what the text provides. Carney and Levin designed these two
functions for the educational domain. The organizational
images provide a structural framework and the interpretational
images explain difficult systems. However one cannot always
separate these two functions. During annotation the annotator
had doubts on some samples because, for example, the image in
the sample explained a difficult system but also provided a
structural framework. In Hooijdonk et al. (2007) these two
functions have been grouped in a single function called
“additional function”. This function not only absorbs the
organizational and interpretational function but also includes
any other image that has more than just a representational
function. The last function, the transformational function, has
not been applied in the annotations. This image function is
intended to code information from the text into the image in
such a way that it eases recall of the textual information. Most
pictures used in the medical information domain aim to provide
information and not necessarily aim to imprint this information
in the mind of its viewer. It is doubtful that this image function
will be applied in the medical information domain.
A general problem in the Carney and Levin (2002) model is
that it was designed for the functions of images in text. This
design dictates that most of the information content will be
found in the text. This is a problem when it is instead the image
that provides most of the information content. An example is
given in sample 4. In this sample there is almost no text to
represent, organize or interpret. In this sample it is the
information content of the images that has an organizational
function. But the images do not organize any text as Carney and
Levin intended for this function. The scope of the informational
content forms a second problem. Some images might provide
more or different information than that what the text provides.
Should one then consider the overall information content of the
image for annotation or only the content which relates
specifically to the text? This problem is clear in sample 20. This
image helps to interpret the text (and has been annotated
interpretational) but on its own also provides an organizational
function. Following the instruction of Carney and Levin it has
been decided, in these cases, to only look at how the image
functions the text. In the cases of sample 4 and 9 there is not
much text to base a functional relationship on. In both these
samples the annotator did not base his choice on the function of
the image in the text but on the function of the image on its own
right. Depending on the application of the model a choice has to
be made on how to deal with situations where the image does
not function the text.
The Marsh and White (2003) model is much larger and
complex than the Carney and Levin (2002) model. The model
at first looks confusing because of its many functions. It is
likely that this is partially because of the process by which
Marsh and White created the model. Because of the large
number of functions for every annotated sample one must
consider and read the description of every single function. The
3 main groups which separate 3 different types of functions
help in the decision process. However it is still possible for the
annotator to use functions from all 3 groups to annotate a
sample. The descriptions of the functions are much more clear
than the model structure. Most of the functions have a good
description. Most of the functions are also applicable to the
medical information domain. The names of the functions
describe their meaning and a short summary further clarifies
this. Next to this, lead-in terms guide the functions which give
more names and terms with which one can describe the
function. Overall the descriptions simplify the use of the model.
Some functions contain exceptions. These exceptions should
prevent the annotator from using the functions wrongly.
However they also make the structure of the model more
complex. These exceptions refer to a different function when
this function might be better suited. These functions, which
refer to each other, mostly have similar descriptions. They are
different from each other in a degree of severity or they slightly
describe the content in a different way. For example A3.1
(engage) and A3.2 (motivate) hold the attention of the reader or
motivate a response. Both these functions refer to A2 (elicit
emotion) in case in the process there are any emotions involved.
Another example is B1.1 (concretize) which one can use for
images that make a thing or a concept more explicit. This
function refers to B1.4 (describe) for content that gives more
details and to C1 (interpret) and C3.2 (model) for content that is
more complex. Most likely Marsh and White included these
exceptions to narrow down annotation errors and
disagreements. These similar but different in severity functions
create another layer in the structure of the model. One might
not need this kind of detail in certain domains and some
functions might be grouped together to create a less complex
model.
Another issue which makes the model more complex is that not
all functions in this model are of the same type. By this we
mean the kind of content a function describes. The functions in
the model of Carney and Levin (2002) are all the same kind;
they describe how the image organizes its information content
and how this information is related to the text. The most
frequent type of function in the Marsh and White (2003) model
are functions of images that serve the text in one way or
another. Two other types of functions were found. One
describes only the contents of the image (examples: A2 (elicit
emotion), A2.2 (express poetically), A3 (control) and B1.5
(graph)). The other two do not specifically describe the content
of the image but describe a relation between the image and the
text. Examples are: A1.1 (change pace), C3.1 (alternate
progress) and C3.3 (inspire). Or they describe the content of
both the image and the text. Examples are: A1.2 (match style),
B1.3 (common referent), B5.2 (complement), C2.1 (compare)
and C2.2 (contrast).
Not all functions in the Marsh and White (2003) model did
seem applicable to the medical information domain. Among
these are also functions of which its meaning was not clear
(A1.2 (match style), B1.3 (common referent)). This could be
because of a poor description of the function or because it was
intended for a different domain. The following functions were
not immediately applicable to the medical information domain.
Most of these are useful for a storytelling domain. They
describe certain styles which are not common in the medical
information domain: A1.1 (change pace), A2.1 (alienate), A2.2
(express poetically), C3.1 (alternate progress) and C3.3
(inspire). Another function is similar to Carney and Levin's
transformational images: C3 (transform), a method which one
will not see often in our domain. Some other functions, for
similar reasons, one will not use often in the medical
information domain but occasionally proven to be useful: A1.2
(match style), A2 (elicit emotion), A3.1 (engage), A3.2
(motivate), and B2.3 (locate). Functions A2, A3.1 and A3.2 are
interesting because one can us them to annotate images which
encourage a response from the user. For example this response
could be emotional because of a disturbing image.
One can see the way the model was constructed also influenced
its structure and usability. By merging different models from
different domains, the model supports many functions for
different kinds of content, different intensities of content and
different domains. However this diversity and wide range of
application makes the model lack concentration and structure
and causes the model to become to complex and confusing. One
can expect that in an experiment with multiple annotators
disagreements would arise and different functions would be
used on the same samples. On the other hand this diversity also
simplifies the ease of use of the model and we think the
function descriptions are quite clear. With a larger experiment it
might be possible to find excessive functions and to recognize
a better structure. One could also use the model as a second
layer on a less complex model with a clear structure, combining
structure with diversity.
The Martinec and Salway (2005) model provides a clear and
intuitive structure. The structure works positively on keeping an
overview of the model and eases choosing the right type during
annotation. The relational types and their description are less
simple. It is easy to get confused on meaning and use of the
different types. Especially considering the expansion types.
When one simply looks at their names they appear similar, both
elaboration and extension types expand information. The
differences between the types are small and without explanation
they can be confusing. The extension type is used for all modes
which add (related) information to the overall presentation,
information which one cannot find in the other mode. The
elaboration type is used for modes which make more specific or
reaffirm presented information by exemplification or by
repeating information in a different form. A similar problem
arises between extension and enhancement. While enhancement
also expands, by adding new information, one uses it
exclusively for information that provides a circumstantial
setting such as a time, place, reason and purpose.
The place of the projection type in the model is confusing, it
seems as if the designers forced the type into the model to stay
compliant with Halliday (1994). The projection type seems to
be only applicable to comic book media and media with
diagrams because comic images and diagrams project
information. This limit of 2 media (comic images and diagrams)
is restricting and it makes the projection type media dependent.
And controversially, the expansion type is a general purpose
type which one can apply to any image-text media with
expanding relations. A broader interpretation of the projection
type could be possible with the idea type. As Martinec and
Salway mention, idea is a "projection of meaning". This
concept one could use to define relations in which one mode
projects the meaning of another. This could include for example
modes that summarize or compact information in projective
manner, the diagram example in Martinec and Salway (2005)
(page 354) is an example of this. In its current form the use of
the projection type, in contrast to the rest of the model, is to
much limited. Considering the medical information domain,
although not used often, the idea type remains useful for
relations similar to the diagram example of Martinec and
Salway (2005). The locution type has not been used during the
annotation process.
Overall the clear structure doesn't help much in applying the
model. It are the types of the model that make its application
considerably more complicated. Only after practice one can
understand the subtle differences between the types.
Considering the medical information domain there is a frequent
use of the expansion types but not so much of the projection
types. The number of remaining types for the medical
information domain is thus rather limited, much like Carney
and Levin (2002). Though the two models annotate information
content differently. Martinec and Salway (2005) is more
abstract and general.
The second type of relation in this model is the status relation.
It enables the kind of relation between text and image which the
introduction discussed; an image can serve a text, a text can
serve an image and both modes can be complementary or
independent. The status allows for more flexibility in the
application of the model. As seen in our small set of samples
and also in the medical information domain there are samples
where the image does more than just function the text. Possibly
the status relation could also be extended to other models. To
give an example with Carney and Levin (2002): a text could
also be representational to the content of an image. Though not
everything will be that easily extended; a text will not likely be
decorational because text in general (as natural language) is
inherently informative. Though the possibility that someone
might anyhow use text for decorational purposes we cannot
exclude. Marsh and White (2003) already made a step towards
status relations by including functions for image that go beyond
the text. The functions in this group are not only valid for
images that are subordinate to the text but could also include
cases of the other three status relation types. For some functions
it would require a reinterpretation of their description.
4. CONCLUSIONS AND
RECOMMENDATIONS
The Carney and Levin (2002) model does not contain many
functions and thus has a simple structure. Its functions however
were designed for a single purpose and because of this some of
its functions cause simultaneous annotation in the medical
information domain. The functions in the model describe the
structure of content of an image serving a text. They describe
decorational, representational, organizational, interpretational,
and transformational functions of images. Theune et al. (2007)
(section 5.1) suggest using the properties of the functions to
match images on relevance and image content. Interpretational
and organizational images tend to be highly relevant to their
related text, their information content is also very specific. Thus
to combine these image with any text (like in IMIX, combining
with an answer) needs a high relevancy and strong match.
Decorational image contain much less specific information and
can be relevant to many texts. The chance of making a bad
match with decorational images is much smaller.
Marsh and White based and constructed their model on and
from many different relational models. It suffers and takes
advantage of opposite issues from the other two models. The
model can describe different kinds of information content,
ranging from content related features to relation related
features. Most of its functions describe how images serve the
text. To give an example, an image reiterates, compares or
concentrates the text. Other functions describe the content of
the image and some others describe the overall impact on both
the image and the text. An advantage is that one can use this
model to annotate many sorts of image-text relations and it
gives much flexibility during annotation. Also its functions
represent simple concepts which make them easy to understand.
A disadvantage is the lack of focus and structure of the model.
Following this, a better organization of Marsh and White (2003)
is likely possible. Depending on what is necessary, one could
decrease the number of functions to a more manageable size. It
would make the model less detailed but also more organized.
Another possibility is to use the structure of one of the other
two models as a base for a new structure. Overall this model
allows a detailed annotation of an image-text pair, however
caution is advised in its application. The great number of
diverse functions could likely cause disagreements in
annotation.
Martinec and Salway (2005) is the most abstract of all three
models. This model annotates how information is exchanged
between two modes and how strongly they are related. The
logico-semantic relation annotates how modes exchange
information; by elaboration, extension, enhancement or
projection. The status relation annotates if the two modes form
independent, complementary or dependent processes. It shows
how strongly the modes relate and in which direction they
exchange information. It also shows which modes are the main
carriers of information and provide additional content. The
number of types in the model (compared with Marsh and White
(2003)) is not very large. Its structure is clear and gives a good
oversight. However the logico-semantic types represent rather
complicated concepts and make one doubt often on which types
to assign during annotation. The projection types will not be
used often in the medical information domain. An especially
interesting feature of this model is the status relation. It would
be interesting to see if models designed for “image functions in
text” could be adapted, using the status relation, to fit other
situations like “text functions for images” and equal image-text
relations. In the samples of this paper it is visible that also the
domain of medical information will need this flexibility.
The three models describe different features in the image-text
relation. Different features in the way modes exchange
information and in the detail in which they describe
information. This leads to conclude that these models do not
exclude- but could complement each other. Follow up research
will need to answer which features of the image-text relations
need to be annotated to improve the successful retrieval of
images in the IMIX system. The remaining relevant models will
need to be tested by a large group of annotators to confirm
which aspects of the models need to be adapted for use in the
domain of medical information.
REFERENCES
Halliday, M. A. K. (1994). An Introduction to Functional
Grammar. 2nd edition, Edward Arnold, London.
Hooijdonk, C. M. J. van, Krahmer, E., Maes, A., Theune, M.
and Bosma, W. (2007). Towards automatic generation of
multimodal answers to medical questions: a cognitive
engineering approach. In the Proceedings of the Workshop on
Multimodal Output Generation (MOG 2007), CTIT Workshop
Proceedings WP 07-01, 25-26 January 2007, Aberdeen,
Scotland, 93-104.
Mann, W. C. and Thompson, S. A. (1987). Rhetorical structure
theory: A theory of text organization. USC/Information
Sciences Institute Technical Report Number ISI/RS-87-190,
Marina del Rey, CA, Text, 8(3):243--281.
Marsh, E. E. and White, M. D. A (2003). Taxonomy of
Relationships between Images and Text. J. of Documentation
59, 6 (2003), 647-672.
Martinec, R., Salway, A. (2002), Some Ideas for Modelling
Image-Text Combinations. CS-05-02, Department of
Computing, School of Electronics and Physical Sciences,
University of Surrey.
Martinec, R., Salway, A. (2005), A system for image-text
relations in new (and old) media. Visual Communication,
4(3), 337-3.
Theune, M., Schooten, B. W. van, Akker, H. J. A. op den,
Bosma, W. E., Hofs, D. H. W., Nijholt, A., Krahmer, E.,
Hooijdonk, C. van and Marsi, E. (2007) Questions, pictures,
answers: introducing pictures in question-answering systems.
ACTAS-1 of X Symposio Internacional de Comunicacion
Social, 22-26 Jan 2007, Santiago de Cuba, Cuba. pp. 450-463.
Centro de Linguistica Aplicada. ISBN 959-7174-08-1
Carney, R. N., and Levin, J. R. (2002). Pictorial illustrations
still improve students’ learning from text. Educational
Psychology Review, 14(1), 5-26.
A. Appendix image-text pair samples
Sample 1
Wat zijn de bijwerkingen van een DKTP-prik?
Bijwerkingen van een DKTP-vaccinatie:
Summary
•
Plaatselijke reacties
•
Hangerigheid, onrustig slapen,koorts
•
Langdurig, ontroostbaar huilen
•
Flauwvallen
The text consists out of a question on the side effects of a
diphtheria, whooping cough, tetanus, and polio vaccination.
The answer gives a sum up of the side effects. There is further
reference to another vaccination which has milder side effects
on children. The picture shows a syringe.
•
Een verkleurd arm of been
•
Annotation
Koortsstuipingen
Bijwerkingen van een DTP-vaccinatie zijn milder dan van het
DKTP-vaccin, aangezien kinderen ouder zijn als ze het DTPvaccin krijgen. Bovendien heeft dit vaccin een andere
samenstelling.
Model
Type(s)
Carney&Levin(2002)
Decorational
Marsh&White(2003)
A1. Decorate
Model
Martinec&Salway(2005)
Carney&Levin(2002) and Marsh&White(2003)
Type(s)
Status
unequal;
subordinate to text
image
Elaboration, exemplification;
image more general
Carney&Levin(2002) and Marsh&White(2003)
Although it is commonly known that vaccinations can be
admitted with a syringe there is no reference to a syringe in
text. The picture does not add any useful information to the side
effect information in the text. Therefore the picture is of
decorational content.
The picture is related to only parts of the text. It refers to the
word
“prik”
(injection)
and
vaccinatie/vaccin
(vaccination/vaccine). One gives an injection with an injection
needle. There is the process and the means to perform it.
Between the word “prik”and the picture of the injection needle
the relation would be of type exemplification. If the text would
be about injections in general the relation would be of type
exposition however the text is discussing a particular type of
injection. The image is thus more general than the text. The
image relates only to parts of the text and supplies new
information, therefore the image is subordinate to the text.
Sample 2
Hoeveel X-chromosomen bevat een lichaamscel van een
vrouw?
The answer only mentions the amount of female body cell xchromosomes. The picture depicts a x-chromosome but doesn't
add information which would make the contents of the text any
more clear.
The image relates only to parts of the text, being the Xchromosomes, therefore the image is subordinate to the text.
The text is about multiple female X-chromosomes and the
image shows one general X-chromosome, thus the image is
more general than the text.
Sample 3
Vormen van Colitis Ulcerosa
Colitis Ulcerosa is een chronische ontsteking van de dikke
darm. De ziekte kan mild, matig of ernstig verlopen. De ziekte
heeft vier vormen:
•
recticitis of proctitis (hierbij is de ziekte alleen in de
endeldarm) (zie figuur, plaatje a)
•
rectosigmoidis (hierbij is de endeldarm en het sigmoid
(laatste 20 cm van de dikke darm) aangetast) (zie figuur,
plaatje b)
•
linkszijdige colitis (hierbij gaat de colitis tot aan de
milthoek; de gehele linkerzijde van de dikke darm ziek)
(zie figuur, plaatj c)
•
pancolitis of totale colitis: (hierbij is de gehele dikke darm
aangetast door de ziekte) (zie figuur, plaatje d)
Een lichaamscel van een vrouw heeft 2 X-chromosomen.
Summary
The question is “How many x-chromosomes does a female
body cell contain?”. The answer consists out of one line stating
that a female body cell has 2 x-chromosomes. The picture
shows an abstract depiction of a x-chromosome and a textual
label.
Annotation
Model
Type(s)
Carney&Levin(2002)
Decorational
Marsh&White(2003)
A1. Decorate
Status
unequal;
subordinate to text
image
Elaboration; exemplification;
image more general
Summary
The text discusses what Colitis Ulcerosa is and in which forms
it comes. The text mentions that it is a disease with different
degrees of seriousness. The text further names 4 forms of the
disease and the location they have in the gastrointestinal tract of
the body. Each form refers to a picture.
The picture consists out of 4 separate depictions of the
gastrointestinal tract, 4 labels and some text. Each depiction has
a textual label a, b, c or d. Each label refers to the text close to
the depictions. The text names the disease forms and to some
extent also mentions the location in the gastrointestinal tract.
Also the text also mentions that these are forms of Colitis
Ulcerosa disease. The picture is an abstract drawing of the
gastrointestinal tract with parts of it infected by the disease.
Because of the extra text near/in the picture there are actually 2
samples visible here. However the text in the picture and the
separate text are almost the same, therefore only the picture and
the text outside the picture is considered.
Annotation
Model
Type(s)
Carney&Levin(2002)
Representational
Marsh&White(2003)
B1.4
Describe,
Translate,
B2.4
perspective,
Complement
B1.7
Induce
B5.2
Status
unequal;
subordinate to text
image
Elaboration; exposition
Carney&Levin(2002)
The picture mirrors a part of the text, it mirrors the part about
the 4 forms of the disease and the description of their location
in the gastrointestinal tract. The picture makes the location of
the disease more concrete for the reader. Thus the picture is
representational.
Marsh&White(2003)
Considering Marsh and White (2003), both modes share the
same symbolic meaning (common referent), the picture
describes and translates part of the text, it induces perspective
and complement the text.
Stap 3
Summary
The text in this sample consists out of some labels which guide
the pictures in the answer and the main question. In the question
is asked how to apply a sling on the left arm. The labels read:
step 1, step 2, step 3 and step 4. The pictures depict in 4 steps
how to apply the sling.
Annotation
Model
Sample 4
Hoe leg je een mitella aan bij de linkerarm?
Type(s)
Carney&Levin(2002)
Organizational
Marsh&White(2003)
C3.2.2
process
Model
physical
Considering
labels
to
pictures: B5.2 Complement
In this sample the whole text ranges from “Colitis Ulcerosa is..”
up to “..plaatje d). The picture relates to the 4 forms of the
disease and the sum up of these 4 forms. The picture does not
specify that the Colitis Ulcerosa is a chronically infection. Thus
is the status unequal and is the image subordinate to the text.
The logico-semantic relation is elaboration, exposition. The
textual sum up of the 4 forms of the disease is equal to the
contents of the picture; from both can be understood the name
and position of the disease form. The picture is here not
considered to be an enhancement of place of the disease
because this information could also be understood from the
contents of the text.
Stap 4
Status
unequal:
subordinate to image
text
text more general
Carney&Levin(2002)
The textual labels, to some extent, help to answer the question;
they clarify there are four steps to follow, however this could
also be deduced from the pictures. A more direct purpose of the
labels is to help clarify in what sequence the pictures should be
followed. For the Carney and Levin (2002) model the pictures
are here considered to be of organizational content; they
provide a structural framework explaining a series of
instructions to follow. The pictures however do not mirror a
process in the text, which is what the organizational type
demands of the relation.
Marsh&White(2003)
Considering Marsh and White (2003); the pictures models a
physical process and the labels complement the pictures.
The labels are subordinate to the pictures; they show
information that could already be understood from the contents
of the pictures. The text is an exemplification of the picture and
the text is more general.
Stap 1
Stap 2
Sample 5
Hoe kan ik mijn buikspieren versterken?
Buikspieren kunnen worden versterkt door het doen van
buikspieroefeningen. Niet alle buikspieroefeningen zorgen voor
een optimaal resultaat. Een oefenprogramma voor de
buikspieren met opbouwend en goed uitgebalanceerd zijn, en
alle buikspieren moeten getraind worden. De buikspieren
moeten op alle mogelijke manieren gestimuleerd worden om te
werken, alleen zo bekom je het perfecte resultaat. Hieronder
staan een aantal voorbeelden van goede buikspieroefeningen:
information than is contained within the text. The pictures
demonstrate and model a physical process.
Considering Martinec and Salway (2005) the pictures provide
new information to the text. Without the text however one
cannot understand what is to be seen in the pictures. The
pictures are thus dependent of the text, the status relation is
unequal. The pictures provide new information to the text and
extent it. Even though in the text exercises in general are
mentioned there is no information on how they should be
performed therefore this is not exemplification but extension.
Sample 6
Hoeveel mensen lijden aan een hoge bloeddruk?
(uitgebreid)
Summary
The question states: “How can I reinforce my abdominals?”.
The textual answer says that for an optimal result one must train
all abdominal muscles and that a balanced training program
should be followed. The answer then refers to the pictures
stating that they are examples of good abdominal exercises.
There are 8 pictures with 8 labels. The labels numbered 1 to 4,
clarify there are 4 exercises depicted. The numbers are followed
by the letters a or b, which suggest the order of the intermediate
body positions to follow.
Annotation
Model
Type(s)
Carney&Levin(2002)
Organizational
Marsh&White(2003)
B1.1.1 Sample, C2 Develop,
C3.2.2
Model
physical
process
Status
unequal:
subordinate to text
image
Extension
Onder 20-70 jarigen komt een verhoogde bloeddruk bij 27%
van de mannen en 22% van de vrouwen voor (bron:
Regenboog, 1998-2001). Dit is inclusief personen die medicatie
voor een verhoogde bloeddruk gebruiken (en daardoor geen
verhoogde bloeddrukwaarden meer hebben).
Uit de Doetngchemstudie komen wat lagere cijfers naar voren.
Circa 25% van de 30- tot 70-jarigen heeft een verhoogde
bloeddruk en/of gebruikt bloeddrukverlagende medicatie
(Doetinchem, 1998-2002). Dit percentage bedraagt volgens de
gegevens van Regenboog ongeveer 30% in deze leeftijdsgroep.
Volgens de Herziene CBO-richtlijn Hoge Bloeddruk (CBO,
2000a) is de bloeddruk verhoogd wanneer de bovendruk
(systolische bloeddruk) hoger of gelijk is aan 140 mmHg en/of
de onderdruk (diastolische bloeddruk) hoger of gelijk is aan 90
mmHg. Voor personen van 60 jaar en ouder, zonder diabetes
mellitus, familiaire hypercholesterolemie of hart- en
vaatziekten, geldt 160/90 mmHg als grens voor verhoogde
bloeddruk. Bij oudere personen (65 tot 85 jaar) heeft 38% van
de mannen en 42% van de vrouwen een bloeddruk boven de
160/90 mmHg en/of gebruikt medicatie voor een te hoge
bloeddruk (bron: ERGO, 1997-1999).
Figuur 1: Percentage mensen met verhoogde bloeddruk (>=
140/90 mmHg en/of medicatie voor 20-59-jarigen en >= 160/90
en/of medicatie voor 60 jaar en ouder), naar leeftijd en
geslacht(Bronnen: Regenboog, 1998-2001; ERGO, 1997-1999).
Carney&Levin(2002)
Considering the Carney and Levin (2002) model there is more
than one option. The text mentions that a balanced program
should be followed and the last sentence refers to the pictures
being good exercises. The content of the pictures could be
considered as being part of a balanced program and to be good
exercises and thus being representational content for the content
of text. The pictures also depict how the exercises should be
performed thus being also of organizational content. However
there is nothing mentioned in the text on how exercises should
be performed. The pictures do not exactly mirror the content of
the text for both types. Because the content of the pictures is
more than representational it is concluded here that they are of
the organizational type.
Marsh&White(2003)
For the Marsh and White (2003) model the pictures are samples
of the contents in text, they help interpret the text and further
develop the provided information by supplying more
Summary
The question is: how many people suffer of high blood
pressure? The first two paragraphs discuss two different
statistics while the 3de paragraph explains a third statistic
which is also visible in the picture. The fourth paragraph
explains how to read the picture. The picture contains a graph
with 4 lines showing the amount of people (in percentage) with
high blood pressure by age and sex. The two columns show the
pressure threshold (when one has high blood pressure) by age
group.
Annotation
Model
Type(s)
Carney&Levin(2002)
Organizational
Marsh&White(2003)
B1.5 Graph, B1.7 Translate,
B2 Organize, B2.4 Induce
perspective, B3 Relate, B3.1
Compare, C2 Develop
Status equal; independent
Carney&Levin(2002)
In this case it is not clear which type should be chosen. Carney
and Levin point out that the organizational type is used for
pictures that provide a structural framework for the text (maps
or procedures). The picture does not really represent the text;
there is similar information in text but not one paragraph really
describes the contents of the picture (except for the caption
which start with “Figuur 1:”). This picture provides additional
content to the answer, a new source. The picture contents are of
organizational orientation and thus of organizational type.
Marsh&White(2003)
Considering Marsh and White (2003); the picture shows
information in a different way than the text and thus in
someway it translates. The picture organizes the information
and induces perspective. The picture is a graph. The picture
relates because it explains the main concepts in the text. The
picture allows you to make comparison between age, sex,
pressures and different statistics. The picture also goes beyond
the text by developing the information further; it provides an
overview of the amount of people with high blood pressure over
the complete age spectrum.
Because there is no direct reference to the picture in the text
neither mirrors the other. Though both modes contain similar
information and provide separate processes with their own
information content, thus each being equal and independent of
status. Because of their similar information both modes are also
equally general and thus of type exposition.
Sample 7
Wat gebeurt er bij een arthroscopie?
Bij arthroscopie wordt er een diagnose en behandeling van
gewrichtsproblemen gedaan d.m.v. een dun kijkertje
(arthoscoop).
Summary
The asked question is: what happens with an arthroscopy? The
answer states that a diagnose and treatment of joint problems is
done with a thin viewer (arthroscope). The picture shows the
cross section of a joint, two long devices are sticking through
the skin. A hand is operating the thinest device. Most likely this
is a procedure in progress on the joint.
Annotation
Model
Type(s)
Carney&Levin(2002)
Representational
Marsh&White(2003)
B1.1
Concretize,
Locate,
B2.4
perspective,
Complement
B2.3
Induce
B5.2
Status
unequal;
subordinate to text
image
Extension
Enhancement (purpose)
Carney&Levin(2002)
The picture is representational for the text.
Marsh&White(2003)
The picture makes the use of the device more concrete. The
picture helps to locate the use of the device. Overall the picture
induces perspective and complements the text.
The image is subordinate to the text. The picture provides us
with a new perspective on how the arthoscopy is performed
inside the body, a perspective that is not given by the text. The
picture is also an example of the textual content. Thus the
relation could be considered elaborative however the new
perspective provides enough new information to consider this
relation of type extension. The relation is also of the type
enhancement because one can see a purpose of the thin viewer
which could not be understood from the text; the text does not
mention that the thin viewer is used inside the body.
Sample 8
Welke factoren kunnen leiden tot een holvoet?
De oorzaak van een holvoet is vaak een gestoorde spierfunctie
in de voetspier ten gevolge van een hersen- of
ruggemergaandoening. Ook kunnen de tussenbotspiertjes
verlamd zijn. Zoals bij de meeste voetafwijkingen kan dit
resulteren in pijnklachten en standproblemen in de knie, heup
rug en nek. Deze klachten nemen toe in rustpositie zoals zitten
of stilstaan. Ook kunnen er door de verkramping van de tenen
problemen ontstaan met de nagels en met likdoorns en eelt.
Soms is er ook sprake van Mortons Neuralgie.
Sample 9
Hoe moet ik mijn werkplek inrichten om RSI te
voorkomen?
Summary
The question is: which factors lead to a hollow foot? The
answer explains that a hollow foot is often caused by a
disrupted functioning of certain foot muscles, this is caused by
a cerebral or spinal disease. Further the text explains that this
can result in pains or standing problems in the knee, hip, back
and neck. The problems increase when the body is in a resting
position. Further problems can be cramps between the toes and
problems with nails, callus growth and callosity. The picture
shows 3 feet; a normal foot, a flatfoot and a hollow foot.
Annotation
Model
Type(s)
Carney&Levin(2002)
Representational
Marsh&White(2003)
B1.1 Concretize,
Sample
Summary
The text here only consists out of a question and some labels in
the answer. The answer is mostly formed by the picture. The
picture is accompanied by some textual labels. The question
sounds: how do I organize my working space to prevent RSI?
The answer shows a person sitting on a desk chair at a desk. On
the desk there is a monitor, keyboard and document holder. The
persons position shows the correct body position to hold, while
the labels together with arrows and lines indicate various
advisable angles, lengths, heights and distances between the
various objects in the picture.
Annotation
B1.1.1
Extension
Carney&Levin(2002)
The picture is representational because it show what a normal
foot and a hollow foot looks like.
Model
Type(s)
Carney&Levin(2002)
Organizational
Marsh&White(2003)
B1.2
Humanize,
B1.4
Describe, C1 Interpret, C1.2
Document
Status
unequal;
subordinate to image
text
Extension
Marsh&White(2003)
The picture makes the idea of a hollow foot more concrete. It
contrasts between a normal foot and a hollow foot and thus
creates a sample (however it does not contrast between
elements with in the text so 'B3.2 Contrast' is invalid here).
Both image and text each form a separate process. The image
shows what a hollow foot looks in comparison with other feet
and the text provides the cause and effects of hollow foot. The
picture is intended here to provide extra information on the
hollow foot thus it extends the information around the hollow
foot in the text.
Carney&Levin(2002)
In this example there is little text. In this case if one ignores the
question there is little text to apply the models on. In this
sample the question is included into the analysis.
There is the question which sets the subject of the information
and the answer which is largely formed by an illustration. This
picture provides an organized answer to the question. The
picture is not representational because there is no text available
which it can mirror.
Marsh&White(2003)
Many of the types within the Marsh and White (2003) model fit
the profile of this picture but most of these types are intended
for pictures which mirror or represent a text and cannot be
applied in this sample. The picture humanizes; it shows the
picture of a man or woman which helps us to better relate the
information. The picture describes; the information is presented
in detail. The picture helps interpret because it put the various
positions and distances in their proper location. The picture
documents; it instructs the reader on how to setup his working
space.
Voetensteun
10. Een voetensteun kan worden gebruikt
door de werkbladhoogte of om andere redenen
de
ondersteund moeten
worden.
als
voeten
According to Martinec and Salway (2003) the picture and its
labels in this sample form two separate modes. Martinec and
Salway also provide two similar examples in which there is an
abstract image with generic labels and a naturalistic image with
generic labels (see chapter 2.3.2). Unfortunately they do not
give an example for abstract or naturalistic images with abstract
labels. In this samples the labels are mostly abstract, except for
the label “documenthouder” which is generic. The abstract
labels with the different measurements are the most dominant in
this picture and therefore here the “documenthouder” label is
ignored. These abstract labels do not provide elaborative (type)
content but rather extend the information content of the picture.
This relation is thus of type extension. Furthermore the text is
subordinate to the image; The text only provides measurement
information for some components in the image, but not for all.
The labels would be meaningless without the image, while the
image without the labels still provides an ideational content and
some information on the ideal positions of some depicted
objects.
Sample 10
Hoe moet ik mijn werkplek inrichten om RSI te
voorkomen?
Om RSI te voorkomen kun je bij de inrichting van je werkplek
rekening houden met het volgende:
Houding
1. Het lichaam is goed ondersteund en ontspannen.
Hoek tussen bovenarm en
onderarm 90 graden.
Bureaustoel
This sample has the same question as in sample 10 only this
time there is also a textual answer. The question sounds: how
do I organize my working space to prevent RSI? The answer
explains how to position the body and the different objects
within the working space. There are 6 subjects/objects to
position and 10 directives which how these subjects/objects
should be positioned. Included is a picture of a person sitting on
a desk chair at a desk. On the desk there is a keyboard and a
computer with monitor. Various numbered labels, which refer
to the 10 directives, and lines indicate the various distances,
sizes, angels and objects to position.
Annotation
Model
2. Rugleuning op hoogte zodat er rugsteun
het holle deel van de rug is.
3. Zet armsteunen op
ontspannen schouders).
Summary
ellebooghoogte
4. Stel stoelhoogte zo in dat de voeten plat
de grond rusten.
5.Zittingdiepte: vuistbreedte ruimte
knieholte en zittingrand.
in
(met
Carney&Levin(2002)
Organizational
Marsh&White(2003)
B1.1
Concretize,
B1.2
Humanize, B1.6 Exemplify,
B1.7 Translate, B2 Organize,
B2.4 Induce perspective, B3
Relate, B5.2 Complement
Status
unequal;
subordinate to text
op
tussen
Bureau
6. Werkvlakdiepte minimaal
eronder voldoende (strek)ruimte.
80
7. Tafelhoogte zo dat armsteunen en
op dezelfde hoogte zijn.
cm
met
image more general
Carney&Levin(2002)
afhankelijk
Documenthouder
9. Plaats de documenthouder
beeldscherm en het toetsenbord.
image
toetsenbord
Beeldscherm
8. De afstand oog - beeldscherm is
van scherm- en lettergrootte (50-70cm).
Type(s)
tussen
het
The picture is of organizational type, it provides a framework to
better interpret the position of the body and the objects. The
picture could also be considered interpretational for the same
reasons, however the picture does not show the workings of a
complex system or process.
Marsh&White(2003)
The picture makes the contents of the text more concrete. The
picture humanizes by displaying a human, it exemplifies by
showing the essential meaning of the concepts in the text, it
translates the content of the text into another form. The picture
shows a spatial representation of the text content and thus;
organizes the text content, induces perspective for the reader
and helps to relate the text content. The picture complements
the text content.
The picture is subordinate to the text. The picture is related to
almost all the text, but not all. The text provides information on
how to prevent RSI, this could not be understood from the
image. The text also supplies many details on the needed
positions of the body. In fact, the picture only provides a visual
representation but the overall comprehensiveness of the
information would not have changed much if the picture had
been left out. The image is an exemplification of the textual
content and here the image is considered to be more general.
Sample 11
Wat kun je doen als je een bloedneus hebt?
Model
Type(s)
Extension
Enhancement (time)
Carney&Levin(2002)
As a whole the 3 pictures form an organizational cohesion.
They show the steps to follow. Separate each picture is merely
representational for the instruction it represents.
Marsh&White(2003)
The pictures reiterate the contents of the text and make it
concrete. They humanize the text by giving us a depiction of a
human to which one can relate. They allows us to relate to the
contents of the text. The pictures concentrate the instructions
because they do not explain all parts of the instructions. The
pictures further explain the text content. The modes
complement each other.
The whole picture and the whole text are related thus their
relation is equal. Both modes clarify the message in a different
way but are capable of conveying the message without the other
mode. They are thus independent. The text contains some extra
information that is not visible in the picture: blow the nose
once, let go of the nose slowly. This is extension. The text also
shows “let go of the nose after ten minutes” which is an
enhancement of time. The overall procedure shown in the text
and image are equally general are an instance of each other,
thus elaboration; exposition.
Sample 12
Hoeveel kiezen heeft een mens?
Summary
This sample contains a textual answer within the image. For
this sample the text within the image is considered separate
from the depictions within this same image. Thus the answer
consists out of text and a number of pictures. The question
sounds: what can one do when one has a bleeding nose? The
answer states shows a title, “Bleeding nose” and 3 instructions;
blow the nose once, close the nose and keep the head in a
writing position, let go of the nose after 10 minutes. The 3
pictures show the actions to take in the following order; blow
the nose, close it in the writing position, let go of the nose.
Annotation
De mens heeft 20 kiezen als je ervan uitgaat dat een mens een
compleet gebit heeft. Er zijn bijvoorbeeld veel mensen die geen
verstandskiezen hebben en dus maar 16 kiezen hebben (zie
afbeelding).
De premolaren zijn de twee kiezen die zich normaal direct
achter de hoektand bevinden. Ze worden ook wel eens valse
kiezen genoemd, omdat ze kleiner zijn dan de molaren. Een
molaar of echte kies is een tand in de achterste delen van de
mond, achter de premolaren en voor de eventuele verstandskies.
Een kies is een vrij grote tand die achterin de mond staat.
Kiezen vermalen het voedsel met een roterende beweging. Om
deze functie te vervullen hebben ze een dubbele
knobbelstructuur.
Het menselijk gebit van een volwassen persoon:
Model
Type(s)
Carney&Levin(2002)
Organizational
Marsh&White(2003)
B1
Reiterate,
B1.1
Concretize,B1.2 Humanize,
B3 Relate, B4.1 Concentrate,
B5 Explain,
671
B5.2 Complement
Groen: Verstandskiezen
Rood: Molaren
Blauw: Premolaren
Summary
The questions is: how many teeth does a human have? The
answer explains a human has 20 teeth or 16 for people without
the wisdom-teeth. The text goes on explaining the function of
the teeth, the different types of teeth and their various positions
and sizes. The last line refers to the picture as being the set of
teeth of an adult human.
Annotation
Model
Type(s)
Carney&Levin(2002)
Representational
Marsh&White(2003)
B1.1.1
Sample,
B1.4
Describe,
B2.4
Induce
perspective, B3.1 Compare,
B5.2 Complement
Status
unequal;
subordinate to text
image
image more general
Carney&Levin(2002)
Beenmerg
Summary
The question is: where are the red blood cells created? The text
explains that red blood cells are created in the bone marrow
from a stem cell. The stem cell divides and creates a not yet
ripe red blood cell. This unripe cell divides and becomes a red
blood cell. In the second paragraph it is explained that bone
marrow is a spongy red substance that is inside the bones. The
text also mentions which bones. The accompanied picture
shows a cross section of a real bone. 3 layers are visible 2 white
layers which are the bone and a thick red layer of tissue which
is the bone marrow.
Annotation
Model
Carney&Levin(2002)
Representational
Marsh&White(2003)
A2 Elicit emotion, A3.1
Engage, B1.1.1 Sample,
Status
unequal;
subordinate to text
The picture is representational for the contents of the text.
The image relates only to a part of the text. There a number of
processes in the text that do not relate direct to the image, for
example considering the last paragraph dealing with the
function of the teeth. The image is an elaboration of the first
two paragraphs of the text. The image is more general than the
text.
Sample 13
Waar worden rode bloedcellen aangemaakt?
In het beenmerg ontstaan alle bloedcellen, waaronder ook de
rode bloedcellen, uit één celtype, de stamcel. Wanneer een
stamcel zich deelt, ontstaat er eerst een onrijpe rode bloedcel.
Daarna deelt de onrijpe cel zich, groeit verder uit en wordt
uiteindelijk een rode bloedcel
Beenmerg is de sponsachtige, rode substantie die zich bevindt
in het binnenste van beenderen. Je vindt het vooral in het
bekken, het borstbeen, de ribben en de ruggenwervels
image
text more general
Marsh&White(2003)
The picture provides a sample of what human teeth look like. It
describes parts of the text. It induces perspective by showing
the positions of the teeth. The reader can compare the contents
of the text with the objects depicted in the picture. The two
modes complement each other.
Type(s)
Carney&Levin(2002)
The picture is representational for the second paragraph of the
text, it confirms that bone marrow is inside the bone and that it
is a spongy red material.
Marsh&White(2003)
If the picture elicits emotion or engages the reader depends on
what the reader is used to see. Someone that doesn't like to see
blood or a cross section of a body part will find this image
rather arresting. For someone with medical experience or other
experiences which made him or her used to bloody or other
such pictures, for those persons this picture will be rather
normal. This picture might not even engage these persons. The
picture further is a sample of what bone marrow looks like.
The image is only related to a part of the text. In the image one
cannot see the individual body cells or the process of creating a
red blood cell which is discussed in the first paragraph. The
relation is of type elaboration; exemplification; the image
shows what “beenmerg” (bone marrow) looks like and
corresponds with the description in the last paragraph. The text
is considered more general because there are likely other
pictures with bones and bone marrow in different shapes and
sizes. The text however specifies exactly where in general bone
marrow can be found.
Sample 14
Model
Explain, B5.2 Complement
Wat gebeurt er bij een tympanometrie?
Tympanometrie is een onderdeel van de audiometrie: bepaling
van de gevoeligheid van het gehoor met behulp van
elektronische apparatuur, waarbij de compilantie van het
trommelvlies wordt gemeten: de mate waarin het trommelvlies
meegeeft met drukverandering.
Type(s)
Status
unequal;
subordinate to text
image
Projection; idea
Carney&Levin(2002)
•
één die de luchtdruk in de gehoorgang kan regelen
The picture is of interpretational type. It helps to understand
how a system functions. It partly helps to explain the function
of each tube and it demonstrates the position of the device in
the ear.
•
één waardoor het geluid het oor in gaat
Marsh&White(2003)
•
één die de hoogte van het geluidsniveau meet
Dit gaat als volgt: de gehoorgang wordt luchtdicht afgesloten
met een soort ‘dop’, waardoor drie buisjes lopen:
Via een toongenerator worden geluiden het oor in geleid terwijl
een microfoon het geluidsniveau ín de gehoor-gang meet. De
hoeveelheid teruggekaatst geluid door het trommelvlies en
middenoor wordt als maat gebruikt: als er weinig geluid
terugkaatst is het orgaan soepel, maar als het geluidsniveau
echter hoog is, is dit een indicatie voor een stijf trommelvlies en
middenoor, wat kan duiden op aandoeningen aan het oor.
In this sample the status relation is close to an equal status. The
whole picture is relevant to almost the whole text. A large part
of the text discusses the process of tympanometrie which is also
visible in the image. Only the last line in the last paragraph
discusses how the measured results from the process are
interpreted. It is this bit that is not directly related to the
contents of the image. The way the information is related in the
text and the picture reminds a lot of the diagram example in
Martinec and Salway (2005) (pp. 353). The image presents the
same process of tympanometrie as explained in the text but in a
different but structured way. Here this is considered as a
projection of meaning, thus the relation is of type idea.
Sample 15
Summary
The question is: what happens during a tympanometrie (dutch
language)? The answer states: tympanometrie is a part of the
audiometry. It reads the sensitivity of the hearing with
electronic equipment by measuring the movement of the
eardrums when changing the pressure on the eardrum. The text
further explains the procedure of the test. The ear is closed off
with a cap with 3 tubes. Each tube has a function. By measuring
the amount of sound that the eardrum returns one can measure
the stiffness of the eardrum.
The accompanied picture shows an abstract cross section of the
hearing system. There is a yellow cap inside the ear and 3 tubes
lead inside the cap. At the end of the tubes are a number of
boxes. Inside the boxes are the names of devices which are
connected to the tubes. The picture is a mixture of an abstract
medical depiction and a flowchart. The text does not exactly
explain what these devices should do. The text only explains
which activities happen on each of the tubes. With some
technical knowledge the reader can understand which devices
create which activity.
Model
The picture describes part of the text, it induces perspective into
how the system works, it explains in another way (parallel) the
workings of the tubes, it explains the workings of the system
with the boxes. Overall the picture complements the text.
Type(s)
Carney&Levin(2002)
Interpretational
Marsh&White(2003)
B1.4 Describe, B2.4 Induce
perspective, B3.3 Parallel, B5
Wat is een allergie?
In sommige gevallen kan het immuunsysteem (verkeerd)
reageren op onschuldige stoffen, zoals huismijt, melkproducten
en stuifmeel. Een allergie is een abnormale reactie van het
immuunsysteem na contact met die stof. Het immuunsysteem
reageert overdreven als het met deze vreemde stoffen of
organismen te maken krijgt, en behandelt ze alsof ze schadelijk
zijn, zoals bij bacteriën. Het gevolg is een allergische reactie of
een histaminereactie. Daardoor kun je gaan niezen, een
loopneus krijgen, piepende ademhaling krijgen en slijm gaan
ophoesten. Je kunt ook netelroos krijgen en bij zeer ernstige
allergieën kun je in shock raken, wat zelfs dodelijk kan zijn
(bijvoorbeeld bij sommige ernstige voedselallergieën). Zo'n
shock kan gepaard gaan met ademhalingsproblemen, vocht
vasthouden en vernauwing van de luchtwegen.
Veel voorkomende allergenen (stoffen waarop je allergisch
kunt reageren) zijn bepaalde voedingsmiddelen, graspollen,
sporen, geneesmiddelen, schoonmaakmiddelen. Ook stress kan
allergische reacties geven. Sommige mensen reageren
allergisch op koude, warmte, temperatuursschommelingen en
druk op de huid. De meest voorkomende allergische reacties
zijn netelroos, dermatitis (huidontsteking), astma en hooikoorts.
Twee veel voorkomende vormen van allergie: netelroos en
dermatitis
Het begint bij de hypothalamus. Vanaf de puberteit geeft deze
(naast zijn andere functies) steeds meer speciale signaaltjes af
aan de hypofyse. Deze signaaltjes (GnRH, Gonadotrofinestimulerend hormoon) zorgen ervoor dat de hypofyse op zijn
beurt weer speciale signaaltjes doorgeeft aan de testis (door
middel van FSH - follikel-stimulerend hormoon en LH luteïniserend hormoon). Als de concentraties FSH en LH hoog
genoeg zijn, maken de testis meer testosteron. De hypothalamus
'meet' de concentratie testosteron in het bloed. Als deze boven
een bepaalde waarde komt, geeft de hypothalamus weer een
ander signaaltje aan de hypofyse, zodat de productie van
testosteron geremd wordt.
Summary
Testosteron wordt in de lever afgebroken.
The question is: what is an allergy? The text explains that an
allergy is an abnormal reaction of the immune system on
certain substances. It further explains that as a cause one can
start sneezing, get breathing problems, cough up slime, get a
skin infection, get asthma, hay fever or even go into shock.
Next are mentioned some more specific causes. The two
pictures show two skin infections, each showing an inflated
skin with red or pink spots. The pictures could be disturbing for
a reader which is not accustomed to its contents.
Testosteron wordt gevormd uit progesteron, het uitgangsproduct
van alle hormonale steroïden, met als tussenproduct
androstendion.
Annotation
Model
Type(s)
Carney&Levin(2002)
Representational
Marsh&White(2003)
A3.1
Engage,
B1.1
Concretize, B1.1.1 Sample
Status unequal;
subordinate to text
image
Summary
The picture engages the user by showing possibly a shocking
image. It makes the effects of allergy and what a skin infection
looks like more concrete. The picture serves as a sample of
allergy effects.
The question is: where is testosterone produced? The answer
states that testosterone is produced in the adrenal gland and for
men also in the testicle. According to the text the regulation of
testosterone is complicated and it gives a simplified summary.
The summary says that the process starts at the hypothalamus.
From puberty this gives special signals to the pituitary gland.
These signals, GnRH a hormone, makes sure that the pituitary
gland gives signals to the testicle (by means of FSH and LH
hormone). The testicle will create testosterone at the right
amounts of FSH and LH hormone. When the hypothalamus
detects enough testosterone in the blood it will give a different
signal to brake the production of testosterone. The liver absorbs
testosterone. The accompanied picture is a flowchart of the
testosterone production process.
Annotation
text more general
Carney&Levin(2002)
The picture serves as an example of one of the mentioned
effects of an allergy, thus being representational.
Marsh&White(2003)
The text mentions many things about allergies while the
pictures only show two possible allergic effects. The relation is
thus unequal and the image is subordinate to the text. One can
see that the image serve as an example of the mentioned
allergic effects and thus are of the exemplification type.
Model
Carney&Levin(2002)
Organizational
Marsh&White(2003)
B1 Reiterate, B1.4 Describe,
B1.7 Translate, B2 Organize,
B2.2 Contain, B2.4 Induce
perspective, B5 Explain, B5.2
Complement
Sample 16
Waar vindt de productie van testosteron plaats?
Testosteron wordt geproduceerd in de bijnieren en (bij mannen)
in de testis, ongeveer 7 mg/dag bij mannen en 1-2 mg/dag bij
vrouwen. Onderdeel van de testis zijn de Leydig-cellen, waar
cholesterol een conversie ondergaat naar testosteron. De
regulering van testosteron is ingewikkeld; hier volgt een
vereenvoudigde samenvatting:
Type(s)
Projection; idea
Carney&Levin(2002)
The picture is of organizational type because it provides a
framework for understanding the process explained in the
summary.
Marsh&White(2003)
The picture reiterates, describes, translates and organizes the
process described in the summary. The picture is of type
contain because it is a flowchart. It induces perspective into,
and explains the process. The text and picture complement each
other to convey the information.
Annotation
Model
Carney&Levin(2002)
Decorational
Marsh&White(2003)
A1 Decorate, A3.1 Engage,
A3.2
Motivate,
B1.1.1
Sample, B1.2 Humanize
Extension
Both picture and text can convey their message independent of
each other and are thus equal of status. The picture projects the
meaning of a part of the text. This sample is similar to the
example given in figure 16 of Martinec and Salway (2005) on
page 353.
Sample 17
Welke complicaties kunnen optreden bij mazelen?
Mazelen wordt veroorzaakt door een virus. Besmetting vindt
plaats via druppeltjes die met hoesten en niezen worden
verspreid. Mazelen is zeer besmettelijk
Complicaties:
Middenoorontsteking (10-15% van de gevallen)
Longontsteking
Hersenvliesonsteking (ongeveer 1 per 1000 gevallen)
Er is geen behandeling voor mazelen, behalve bestrijding van
koorts en pijn.
Type(s)
Carney&Levin(2002)
If one looks at this sample without considering any prior
knowledge on the measles then the picture does not have a clear
relation with the text. There is no reference to the boy in the
picture nor to the spots on his skin. The reader however could
conclude that the spots on the boys body are an effect of the
measles without any further information. Because the picture
does not mirror the text one cannot say that this picture is
representational. Thus one is left with the decorational type.
Marsh&White(2003)
Equal as mentioned above for Carney and Levin (2002), also in
the Marsh and White (2003) model this picture decorates the
text. The picture engages the attention of the viewer. Together
with the descriptions in the text it also motivates a response
from the viewer. The picture could be considered a sample
however this is not confirmed in the text. The reader could
conclude that the spots on the boys skin are an effect of the
disease and thus this picture also forms a sample. The picture
humanizes fore the reader can relate to the state of the young
boy.
Both picture and text form separate processes and are
independent. The picture provides new information considering
the skin effects of the measles on young children. The picture
thus extends the information in the text.
Sample 18
Wat is een allergie?
Een allergie is een overgevoeligheidsreactie van het
afweersysteem van het lichaam op onschadelijke stoffen (deze
stoffen
bevatten
altijd
een
soort
eiwit),
zoals
luchtwegallergenen (bv. huisstofmijt), voedingsmiddelen,
geneesmiddelen, schimmels, enz.
Summary
The question is: which complications can happen with the
measles? The answer states that measles are caused by a virus.
Infection is caused by liquid drops that are spread by coughing
or sneezing. Measles is very contagious. Follows is a sum up of
the complications: ear infection, pneumonia and neuromeningeal infection. There is no treatment for measles except
for combating fever and pain. The accompanied picture shows a
young boy with a skin rash over the complete upper body and
face. Although not mentioned this is probably caused by the
measles.
Een allergische reactie geeft klachten van neusverstopping,
niezen, snotteren, tranende ogen en jeuk aan ogen, neus, keel
en/of huid.
De belangrijkste veroorzakers van allergieën:
Huisstofmijt
Bloeiend gras
Summary
Summary
The question is: what is an allergy? The answer states that an
allergy is an overreacted response from the protection system
on certain substances. Further are described some effects of an
allergy. Two picture are included both with label. The text
mentions that these are the most important causes of allergies.
The first picture is of a dust mite, the second picture is of a
grass field in bloom.
Annotation
Model
Type(s)
Carney&Levin(2002)
Representational
Marsh&White(2003)
B1.1 Concretize,
Sample
B1.1.1
Status
unequal;
subordinate to text
image
text more general
The question is: how many people have a to high blood
pressure? The answer consists out of a textual answer a picture.
The textual answer states that the medial term high blood
pressure is hypertension. It further explains that hyper means
strongly and tension means pressure. It explains that in the
Netherlands live 16 million people and that amount of people
with hypertension is about one million people. The
accompanied picture is a pie and bar chart. The pie shows the
percentage and amount of patients ill with: illness to the longs,
cancer, cardiovascular disease, and other diseases. The bar
shows a more detailed division of the cardiovascular disease
part of the pie: myocardial infarct, stroke and other
cardiovascular diseases.
Annotation
Model
Carney&Levin(2002)
Representational
Marsh&White(2003)
B1.4 Describe, B1.5 Graph,
B2.2 Contain, B2.4 Induce
perspective,
B4.1
Concentrate,
B5.2
Complement
,
C1.2
Document,
Carney&Levin(2002)
The picture is representational because it shows two examples
of causes of allergy which are mentioned in the text.
Extension
Marsh&White(2003)
The pictures makes more concrete the general idea that the
reader has about the causes of an allergy. Furthermore they are
a samples of allergy causes.
The image is subordinate to the text because the subject
“allergy” is set by text, also the purpose of the pictures in this
answer is given by the text. The pictures are an exemplification
of the causes of an allergy. They are however more cause then
the two shown and thus the text is more general.
Sample 19
Hoeveel mensen lijden aan hoge bloeddruk?
De medische term voor hoge bloeddruk is ‘hypertensie'. Hyper
betekent 'in zeer sterke mate' en tensie betekent ‘druk'. In
Nederland wonen 16 miljoen mensen en het aantal mensen
waarvan bekend is dat zij hypertensie hebben loopt door naar
het miljoen.
Type(s)
Carney&Levin(2002)
The picture is of representational type. Although it does not
mirror the contents of the textual answer it is connected to some
extent. High blood pressure is part of the group of
cardiovascular diseases and thus it does provide extra
information to the answer. So the picture is not decorational.
Neither does the picture provide real organizational information
to the textual answer. A problem with this sample is that neither
the textual answer nor the picture really answer the question,
however the picture is the most close in doing so. If the relation
between the question and the picture would be considered then
the relation would be organizational.
Marsh&White(2003)
The picture describes the answer, it is a graph and is of type
contain (diagrams and enclosing graphics). The picture induces
perspective in the division of different diseases, it concentrates
the answer by giving a brief answer to the question. The text
and picture complement each other. The picture documents
because it supplies factual support. In some sense this picture
also compares (B3.1) and contrasts (B3.2) between the
different disease groups however these function types are only
intended for comparing and contrasting elements found in a
text.
Both text and picture form separate processes and thus are
independent. They each extend each other with information on
cardiovascular diseases.
Sample 20
Hoe kan ik mijn buikspieren versterken? (uitgebreid)
Er zijn vier soorten spiergroepen uitwendig zichtbaar.
Bovendien liggen er onder deze groepen twee spieren die níet
zichtbaar zijn. Voor al deze groepen bestaan specifieke
oefeningen: crunches voor ‘middenspieren’ (het recht
samentrekken van buikspieren) en ‘zijspieren’ (waarbij het
lichaam naar één zijkant wordt gebracht) Voor de dieper
liggende spieren is het ingetrokken houden van de buik een
goede oefening. Dit moet volgehouden worden tot het ‘brand’.
Er moet opgemerkt worden dat buikspieroefeningen géén vet
verbranden. Als de versterkte spieren ook zichtbaar moeten
worden is cardio-workout noodzakelijk. De vijf soorten
spiergroepen die uitwendig zichtbaar zijn.
summed up. The accompanied picture is a medical cross section
of the abdominal area, showing the 5 mentioned muscle groups.
Annotation
Model
Type(s)
Carney&Levin(2002)
Interpretational
Marsh&White(2003)
B2.4 Induce perspective,
B3.3
Parallel,
B5.2
Complement , C1 Interpret
Status
unequal;
subordinate to text
•
Rectus Abdominis, de rechte buikspier, ook wel ‘sixpack’.
•
Obliques Externus, de buitenste schuine buikspier.
•
Obliques Internus, de binnenste schuine buikspier.
Enhancement (place)
•
Transversus Abdominis, de dwarse buikspier.
image
Carney&Levin(2002)
The picture shows something that would be more difficult to
explain by text only. It shows the different positions of the 5
muscle groups and how they are layered. Therefore the picture
is of interpretational type. The information content in the image
could also be considered to have an organizational function, in
an organized manner the structure of the abdominal muscles is
visible. However the description for the organizational function
shows it has to be applied to pictures which provide a structural
framework for the text.
Marsh&White(2003)
Summary
The question is: hoe can I reinforce my abdominal muscles?
The answer states that there are 4 muscle groups visible from
the surface, under these group there are 2 more groups which
are not visible. The text mentions that for all these groups there
are specific exercises. The text mentions these exercises.
Further the text mentions that these exercises do not burn any
fat and that if the muscles need to be visible also a cardio
workout is necessary. The text then mentions there are 5 groups
of muscles which are visible from the outside, these groups are
The picture induces perspective. It gives a parallel option for
the 5 muscle groups. The picture complements the text and
helps to interpret the positions of the 5 muscle groups.
The picture only describes a part of the text and thus is
subordinate to the text. Though the picture does not cover all
the text it is neither an example of the text. It is more equally
general to the information given in the text. Furthermore the
image enhances the text by showing the place of certain
muscles groups.

A study of three models for image-text relations

Transcription

Similar documents

the creation of the netherlands

The National Museum of Science and Technology

PROGRAMME 2008 / 2009 Pre-lecture supper at the Bieb, Hengelo

Conference - 2014 - Spelenderwijs mediawijs

Search Consultant Announcement

Schrijven is een daad van vertrouwen Writing is an

Vakantie strand huis met tuin in Agrustos, Budoni, Sardinië

THeSeLF-TaUGHT MUSICaL - turtle

The short web report – presentation by Wouter Zwijnenburg