a survey of computer graphics contents
Transcription
a survey of computer graphics contents
UNIT - I LESSON – 1: A SURVEY OF COMPUTER GRAPHICS CONTENTS 1.1 Aims and objectives 1.2 Introduction 1.3 History of Computer Graphics 1.4 Applications of Computer Graphics 1.4.1 Computer Aided Design 1.4.2 Computer Aided Manufacturing 1.4.3 Entertainment 1.4.4 Medical Content Creation 1.4.5 Advertisement 1.4.6 Visualization 1.4.7 Visualizing Complex Systems 1.5 Graphical User Interface 1.5.1 Three-dimensional Graphical User Interfaces 1.6 Let us Sum Up 1.7 Lesson-end Activities 1.8 Points for Discussion 1.9 Model answers to “Check your Progress” 1.10 References 1.1 Aims and Objectives The aim of this lesson is to learn the introduction, history and various applications of computer graphics The objectives of this lesson are to make the student aware of the following concepts a. History of Computer Graphics b. Applications of Computer Graphics c. Graphical User Interface 1 1.2 Introduction Computer Graphic is the discipline of producing picture or images using a computer which include modeling, creation, manipulation, storage of geometric objects, rendering, converting a scene to an image, the process of transformations, rasterization, shading, illumination, animation of the image, etc. Computer Graphics has been widely used in graphics presentation, paint systems, computer-aided design (CAD), image processing, simulation, etc. From the earliest text character images of a non-graphic mainframe computers to the latest photographic quality images of a high resolution personal computers, from vector displays to raster displays, from 2D input, to 3D input and beyond, computer graphics has gone through its short, rapid changing history. From games to virtual reality, to 3D active desktops, from unobtrusive immersive home environments, to scientific and business, computer graphics technology has touched almost every concern of our life. Before we get into the details, we have a short tour through the history of computer graphics 1.3 History of Computer Graphics In the 1950’s, output are via teletypes, lineprinter, and Cathode Ray Tube (CRT). Using dark and light characters, a picture can be reproduced. In the 1960’s, beginnings of modern interactive graphics, output are vector graphics and interactive graphics. One of the worst problems was the cost and inaccessibility of machines. In the early 1970’s, output start using raster displays, graphics capability was still fairly chunky. In the 1980’s output are built-in raster graphics, bitmap image and pixel. Personal computers costs decrease drastically; trackball and mouse become the standard interactive devices. In the 1990’s, since the introduction of VGA and SVGA, personal computer could easily display photo-realistic images and movies. 3D image renderings became the main advances and it stimulated cinematic graphics applications. Table 1: gives a general history of computer graphics. 2 Table 1: General History of Computer Graphics Year 1950 1951 1960 1961 1963 1964 1965 1968 1969 1972 1973 1974 1975 1976 1977 1979 1982 1983 1984 1985 1987 1989 1990 1991 1992 1993 1995 2003 Inventions, discovery and findings Ben Laposky created the first graphic images, an Oscilloscope, generated by an electronic (analog) machine. The image was produced by manipulating electronic beams and recording them onto high-speed film. 1) UNIVAC-I: the first general purpose commercial computer, crude hardcopy devices, and line printer pictures. 2) MIT – Whirlwind computer, the first to display real time video, and capable of displaying real time text and graphic on a large oscilloscope screen. William Fetter coins the computer graphics to describe new design methods. Steve Russel developed Spacewars, the first video/computer game 1) Douglas Englebart developed first mouse 2) Ivan Sutherland developed Sketchpad, an interactive CG system, a man-machine graphical communication system with pop-up menus, constraint-based drawing, hierarchical modeling, and utilized lightpen for interaction. He formulated the ideas of using primitives, lines polygons, arcs, etc. and constraints on them; He developed the dragging, rubberbanding and transforming algorithms; He introduced data structures for storing. He is considered the founder of the computer graphics. William Fetter developed first computer model of a human figure Jack Bresenham designed line-drawing algorithm 1) Tektronix – a special CRT, the direct-view storage tube, with keyboard and mouse, a simple computer interface for $15, 000, which made graphics affordable 2) Ivan Sutherland developed first head-mounted display John Warnock – area subdivision algorithm, hidden-surface algorithms Bell Labs – first framebuffer containing 3 bits per pixel Nolan Kay Bushnell – Pong, video arcade game John Whitney. Jr. and Gary Demos – “Westworld”, first film with computer graphics Edwin Catmuff –texture mapping and Z-buffer hidden-surface algorithm James Blinn – curved surfaces, refinement of texture mapping Phone Bui-Toung – specular highlighting Martin Newell – famous CG teapot, using Bezier patches Benoit Mandelbrot – fractal/fractional dimension James Blinn – environment mapping and bump mapping Steve Wozniak -- Apple II, color graphics personal computer Roy Trubshaw and Richard Bartle – MUD, a multi-user dungeon/Zork Steven Lisberger – “Tron”, first Disney movie which makes extensive use of 3-D graphics Tom Brighman – “Morphing”, first film sequence plays a female character which deforms and transforms herself into the shape of a lynx. John Walkner and Dan Drake – AutoCAD Jaron Lanier – “DataGlove”, a virtual reality film. Wavefron tech. – Polhemus, first 3D graphics software Pixar Animation Studios – “Luxo Jr.”, 1989, “ Tin toy” NES – Nintendo home game system IBM – VGA, Video Graphics Array introduced Video Electronics Standards Association (VESA) – SVGA, Super VGA formed Hanrahan and Lawson – Renderman Disney and Pixar – “Beauty and the Beast”, CGI was widely used, Renderman systems provides fast, accurate and high quality digital computer effects. Silicon Graphics – OpenGL specification University of Illinois -- Mosaic, first graphic Web browser Steven Spielberg – “Jurassic Park” a successful CG fiction film. Buena Vista Pictures – “Toy Story”, first full-length, computer-generated, feature film NVIDIA Corporation – GeForce 256, GeForce3(2001) ID Software – Doom3 graphics engine 3 1.4 Applications of Computer Graphics We have a short tour through the applications of computer graphics. 1.4.1 Computer Aided Design Computer-aided design (CAD) is use of a wide range of computer based tools that assist engineers, architects and other design profession in their design activities. It is the main geometry authoring tool within the Product Lifecycle Management process and involves both software and sometimes special-purpose hardware. Current packages range from 2D vector base drafting systems to 3D solid and surface modellers. The CAD Process CAD is used to design, develop and optimize products, which can be goods used by end consumers or intermediate goods used in other products. CAD is also extensively used in the design of tools and machinery used in the manufacture of components, and in the drafting and design of all types of buildings, from small residential types (houses) to the largest commercial and industrial structures (hospitals and factories). CAD is mainly used for detailed engineering of 3D models and/or 2D drawings of physical components, but it is also used throughout the engineering process from conceptual design and layout of products, through strength and dynamic analysis of assemblies to definition of manufacturing methods of components. CAD has become an especially important technology, within the scope of Computer Aided technologies, with benefits such as lower product development costs and a greatly shortened design cycle. CAD enables designers to layout and develop work on screen, print it out and save it for future editing, saving time on their drawings. 4 The capabilities of modern CAD systems include (a) Wireframe geometry creation, (b) 3D parametric feature based modelling, Solid modeling, (c) Freeform surface modeling, (d) Automated design of assemblies, which are collections of parts and/or other assemblies, (e) create Engineering drawings from the solid models, (f) Reuse of design components, (g) Ease of modification of design of model and the production of multiple versions, (h) Automatic generation of standard components of the design, (i) Validation/verification of designs against specifications and design rules, (j) Simulation of designs without building a physical prototype, (k) Output of engineering documentation, such as manufacturing drawings, and Bills of Materials to reflect the BOM required to build the product, (l) Import/Export routines to exchange data with other software packages, (m) Output of design data directly to manufacturing facilities, (n) Output directly to a Rapid Prototyping or Rapid Manufacture Machine for industrial prototypes, (o) maintain libraries of parts and assemblies, (p) calculate mass properties of parts and assemblies, (q) aid visualization with shading, rotating, hidden line removal, etc..., (r) Bi-directional parametric association (modification of any feature is reflected in all information relying on that feature; drawings, mass properties, assemblies, etc... and counter wise), (s) kinematics, interference and clearance checking of assemblies, (t) sheet metal, (u) hose/cable routing, (v) electrical component packaging, (x) inclusion of programming code in a model to control and relate desired attributes of the model, (y) Programmable design studies and optimization, (z) Sophisticated visual analysis routines, for draft, curvature, curvature continuity... Originally software for CAD systems were developed with computer language such as Fortran, but with the advancement of object-oriented programming methods this has radically changed. Typical modern parametric feature based modeler and freeform surface systems are built around a number of key C programming language modules with their own APIs. Today most CAD computer workstations are Windows based PCs; some CAD systems also run on hardware running with one of the Unix operating systems and a few with Linux. Some CAD systems such as NX provide multiplatform support including Windows, LINUX, UNIX and Mac OSX. CAD of Jet Engine CAD and Rapid Prototyping Parachute Modeling and Simulation 5 virtual 3-D interiors (Virtual Environment) CAD design CAM(jewelry industry) CAM CAM CAD robot Generally no special hardware is required with the exception of a high end OpenGL based Graphics card; however for complex product design, machines with high speed (and possibly multiple) CPUs and large amounts of RAM are recommended. The human-machine interface is generally via a computer mouse but can also be via a pen and digitizing graphics tablet. Manipulation of the view of the model on the screen is also sometimes done with the use of a spacemouse/SpaceBall. Some systems also support stereoscopic glasses for viewing the 3D model. 1.4.2 Computer Aided Manufacturing Since the age of the Industrial Revolution, the manufacturing process has undergone many dramatic changes. One of the most dramatic of these changes is the introduction of Computer Aided Manufacturing (CAM), a system of using computer technology to assist the manufacturing process. Through the use of CAM, a factory can become highly automated, through systems such as real-time control and robotics. A CAM system usually seeks to control the production process through varying degrees of automation. Because each of the many manufacturing processes in a CAM system is computer controlled, a high degree of precision can be achieved that is not possible with a human interface. The CAM system, for example, sets the toolpath and executes precision machine operations based on the imported design. Some CAM systems bring in additional 6 automation by also keeping track of materials and automating the ordering process, as well as tasks such as tool replacement. Computer Aided Manufacturing is commonly linked to Computer Aided Design (CAD) systems. The resulting integrated CAD/CAM system then takes the computergenerated design, and feeds it directly into the manufacturing system; the design is then converted into multiple computer-controlled processes, such as drilling or turning. Another advantage of Computer Aided Manufacturing is that it can be used to facilitate mass customization: the process of creating small batches of products that are custom designed to suit each particular client. Without CAM, and the CAD process that precedes it, customization would be a time-consuming, manual and costly process. However, CAD software allows for easy customization and rapid design changes: the automatic controls of the CAM system make it possible to adjust the machinery automatically for each different order. Robotic arms and machines are commonly used in factories, but these do still require human workers. The nature of those workers' jobs change however. The repetitive tasks are delegated to machines; the human workers' job descriptions then move more towards set-up, quality control, using CAD systems to create the initial designs, and machine maintenance. 1.4.3 Entertainment One of the main goals of todays special effects producers and animators is to create images with highest levels of photorealism. Volume graphics is the key technology to provide full immersion in upcoming virtual worlds e.g. movies or computer games. Real world phenomena can be realized best with true physics based models and volume graphics is the tool to generate, visualize and even feel these models! Movies like Star Wars Episode I, Titanic and The Fifth Element already started employing true physics based effects. Entertainment Games 1.4.4 Medical Content Creation Medical content creation has become more and more important in entertainment and education in the last years. For instance, virtual anatomical atlas on CD-ROM and DVD have been build on the base of the NIH Visible Human Project data set and 7 different kind of simulation and training software were build up using volume rendering techniques. Volume Graphics' products like the VGStudio software are dedicated to the used in the field of medical content creation. VGStudio provides powerful tools to manipulate and edit volume data. An easy to use keyframer tool allows to generate animations, e.g. flights through any kind of volume data. In addition VGStudio provides highest image quality and unsurpassed performance already on a PC! Images of a fetus rendered by a V.G. Studio MAX user. 1.4.5 Advertisement Voxel data can be used to visualize the most fascinating and complex facts in the world. The visualization of the human body and medical content creation is an example. Voxel data sets like CT or MRI scans or the exciting Visible Human data show all the finest details up to the gross structures of the human anatomy. Images rendered by Volume Graphics 3D graphics software are already used for US TV productions as well as for advertising. Volume Graphics cooperates with companies specialized on Video and TV productions as well as with advertising agencies. Neutron Radiography of a car engine 1.4.6 Visualization Visualization is any technique for creating images, diagrams, or animations to communicate a message. Visualization through visual imagery has been an effective way to communicate both abstract and concrete ideas since the dawn of man. 8 Visualization today has ever-expanding applications in science, engineering Product visualization, all forms of education, interactive multimedia, medicine etc. Typical of a visualization application is the field of computer graphics. The invention of computer graphics may be the most important development in visualization. The development of animation also helped advance visualization. Visualization of how a car deforms in an asymmetrical crash using finite element analysis. Computer aided Learning Visualization is the process of representing data as descriptive images and, subsequently, interacting with these images in order to gain additional insight into the data. Traditionally, computer graphics has provided a powerful mechanism for creating and manipulating these representations. Graphics and visualization research addresses the problem of converting data into compelling and revealing images that suit users’ needs. Research includes developing new representations of 3D geometry, choosing appropriate graphical realizations of data, strategies for collaborative visualization in a networked environment using three dimensional data, and designing software systems that support a full range of display formats ranging from PDAs to immersive multi-display visualization environments. 1.4.7 Visualizing Complex Systems Graphic images and models are proving not only useful, but crucial in many contemporary fields dealing with complex data. Only by graphically combining millions 9 of discrete data items, for example, can meteorologists track weather systems, including hurricanes that may threaten thousands of lives. Theoretical physicists depend on images to think about events like collisions of cosmic strings at 75 percent of the speed of light, and chaos theorists require pictures to find order within apparent disorder. Computeraided design systems are critical to the design and manufacture of an extensive range of contemporary products, from silicon chips to automobiles, in fields ranging from space technology to clothing design. Computer systems, on which we all increasingly depend, are also becoming more and more visually oriented. Graphical user interfaces are the emerging standard, and graphic tools are the heart of contemporary systems analysis, identifying and preventing critical errors and omissions that might otherwise not be evident until the system is in daily use. Graphic computer-aided systems engineering (CASE) tools are now used to build other computer systems. Recent research indicates that visual computer programming produces better comprehension and accuracy than do traditional programming languages based on words, and commercial visual programming packages are now on the market. Medical research and practice offer many examples of the use of graphic tools and images. Conceptualizing the deoxyribonucleic acid (DNA) double helix permitted dramatic advances in genetic research years before the structure could actually be seen. Computerized imaging systems like computerized tomography (CT) and magnetic resonance imaging (MRI) have produced dramatic improvements in the diagnosis and treatment of serious illness, and a project compiling a three-dimensional cross-section of the human body provides a new approach to the study of anatomy. X-rays, venerable medical imaging tools, are now being combined with expert systems to help physicians identify other cases similar to those they are handling, suggesting additional diagnostic and treatment information relevant to patients. Sociologists and social psychologists use graphic tools extensively in their research programs. They often turn to sociograms and other visual tools to present and explain concepts extracted from complex statistical analyses and to identify meaningful patterns in the data. Graphic depiction of exchange networks permits the study of changes among groups over time. Another useful approach is Bales's Systematic Multiple Level Observation of Groups (SYMLOG), which provides a three-dimensional graphic representation of friendliness, instrumental-versus-expressive orientation, and dominance in small groups. Graphic visualization has demonstrated utility for organizing information effectively and coherently in a broad range of fields dealing with complex data. Social work deals with similarly (and sometimes more) complex patterns and contextual situations, and, in fact, social work and related disciplines have discovered the utility of images for conceptualizing and communicating about clinical practice. 10 1.5 Graphical user interface A graphical user interface (GUI) is a type of user interface which allows people to interact with a computer and computer-controlled devices which employ graphical icons, visual indicators or special graphical elements called "widgets", along with text, labels or text navigation to represent the information and actions available to a user. The actions are usually performed through direct manipulation of the graphical elements. The precursor to graphical user interfaces was invented by researchers at the Stanford Research Institute, led by Douglas Engelbart. They developed the use of textbased hyperlinks manipulated with a mouse for the On-Line System. The concept of hyperlinks was further refined and extended to graphics by researchers at Xerox PARC, who went beyond text-based hyperlinks and used a GUI as the primary interface for the Xerox Alto computer. Most modern general-purpose GUIs are derived from this system. As a result, some people call this class of interface a PARC User Interface (PUI) (note that PUI is also an acronym for perceptual user interface). Following PARC the first commercially successful GUI-centric computer operating models were those of the Apple Lisa but more successfully that of Macintosh System graphical environment. The graphical user interfaces familiar to most people today are Microsoft Windows, Mac OS X, and the X Window System interfaces. IBM and Microsoft used many of Apple's ideas to develop the Common User Access specifications that formed the basis of the user interface found in Microsoft Windows, IBM OS/2 Presentation Manager, and the Unix Motif toolkit and window manager. These ideas evolved to create the interface found in current versions of the Windows operating system, as well as in Mac OS X and various desktop environments for Unix-like systems. Thus most current graphical user interfaces have largely common idioms. Graphical user interface design is an important adjunct to application programming. Its goal is to enhance the usability of the underlying logical design of a stored program. The visible graphical interface features of an application are sometimes referred to as "chrome". They include graphical elements (widgets) that may be used to interact with the program. Common widgets are: windows, buttons, menus, and scroll bars. Larger widgets, such as windows, usually provide a frame or container for the main presentation content such as a web page, email message or drawing. Smaller ones usually act as a user-input tool. The widgets of a well-designed system are functionally independent from and indirectly linked to program functionality, so the graphical user interface can be easily customized, allowing the user to select or design a different skin at will. Some graphical user interfaces are designed for the rigorous requirements of vertical markets. These are known as "application specific graphical user interfaces." Examples of application specific graphical user interfaces: Touch screen point of sale software used by wait staff in busy restaurants 11 Self-service checkouts used in some retail stores.. ATMs Airline self-ticketing and check-in Information kiosks in public spaces like train stations and museums Monitor/control screens in embedded industrial applications which employ a real time operating system (RTOS). The latest cell phones and handheld game systems also employ application specific touch screen graphical user interfaces. Cars have graphical user interfaces in them. For example, GPS navigation, touch screen multimedia centers, and even on dashboards of the newer cars. Metisse 3D Window manager Residents training in Videoendoscopic Surgery Laboratory XGL 3D Desktop Visualization 1.5.1 Three-dimensional graphical user interfaces For typical computer displays, three-dimensional are a misnomer—their displays are two-dimensional. Three-dimensional images are projected on them in two dimensions. Since this technique has been in use for many years, the recent use of the term three-dimensional must be considered a declaration by equipment marketers that the speed of three dimension to two dimension projection is adequate to use in standard graphical user interfaces. 12 Screenshot showing the 'cube' plugin of Compiz on Ubuntu Three-dimensional graphical user interfaces are common in science fiction literature and movies, such as in Jurassic Park, which features Silicon Graphics' threedimensional file manager. In science fiction, three-dimensional user interfaces are often immersible environments like William Gibson's Cyberspace or Neal Stephenson's Metaverse. Threedimensional graphics are currently mostly used in computer games, art and computeraided design (CAD). A three-dimensional computing environment could possibly be used for collaborative work. For example, scientists could study three-dimensional models of molecules in a virtual reality environment, or engineers could work on assembling a three-dimensional model of an airplane. 1.6 Let us Sum Up In this lesson we have learnt about the following a) Introduction to computer graphics b) History of computer graphics and c) Applications of computer graphics 1.7 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) The need of Computer Graphics in the modern world b) 1.8 The use of Computer Graphics in the modern world Points for Discussion Try to discuss the following a) Computer aided design b) Computer aided manufacturing 13 1.9 Model answers to “Check your Progress” In order to check your progress, try to answer the following questions a) Discuss about the application of computer graphics in entertainment b) Discuss about the application of computer graphics in visualization 1.10 1. 2. 3. 4. References Chapter 1 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 Chapter 1 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 Chapter 1, 2, 3 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 Chapter 1 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 14 LESSON – 2: OVERVIEW OF COMPUTER GRAPHICS CONTENTS 1.11 Aims and Objective 1.12 Introduction 1.13 Computer Display 1.14 Random Scan 1.15 Raster Scan 1.15.1 Rasters 1.15.2 Pixel Values 1.15.3 Raster Memory 1.15.4 Key attributes of Raster Displays 1.16 Display Processor 1.17 Let us Sum Up 1.18 Lesson-end Activities 1.19 Points for Discussion 1.20 Model answers to “Check your Progress” 1.21 References 2.1 Aims and Objectives The aim of this lesson is to learn the concepts of computer display, random scan and raster scan systems. The objectives of this lesson are to make the student aware of the following concepts a) Display systems b) Cathode ray tube c) Random Scan d) Raster Scan and e) Display processor 2.2 Introduction Graphics Terminal: Interactive computer graphics terminals comprise distinct output and input devices. Aside from power supplies and enclosures, these usually connect only via a computer both connect to. 15 output: A display system presenting rapidly variable (not just hard-copy) graphical output; input: Some input device(s), e.g. keyboard + mouse. These may provide graphical input: o A mouse provides graphical input the computer echoes as a graphical cursor on the display. o A keyboard typically provides graphical input located at a separate text cursor position. There may be other I/O devices, e.g. a scanner and/or printer, microphone(s) and/or speakers. A Display System typically comprises: A display device such as a CRT (cathode ray tube), liquid crystal display, etc. o Most have a screen which presents a 2D image; o Stereoscopic displays show distinct 2D images to each eye (head-mounted / special glasses); o Displays with true 3D images are available. A display processor controlling the display according digital instructions about what to display. memory for these instructions or image data, possibly part of a computer's ordinary RAM. 2.3 Computer display A computer display monitor, usually called simply a monitor, is a piece of electrical equipment which displays viewable images generated by a computer without producing a permanent record. The word "monitor" is used in other contexts; in particular in television broadcasting, where a television picture is displayed to a high standard. A computer display device is usually either a cathode ray tube or some form of flat panel such as a TFT LCD. The monitor comprises the display device, circuitry to generate a picture from electronic signals sent by the computer, and an enclosure or case. Within the computer, either as an integral part or a plugged-in interface, there is circuitry to convert internal data to a format compatible with a monitor. 16 The CRT or cathode ray tube, is the picture tube of a monitor. The back of the tube has a negatively charged cathode. The electron gun shoots electrons down the tube and onto a charged screen. The screen is coated with a pattern of dots that glow when struck by the electron stream. Each cluster of three dots, one of each color, is one pixel. The image on the monitor screen is usually made up from at least tens of thousands of such tiny dots glowing on command from the computer. The closer together the pixels are, the sharper the image on screen. The distance between pixels on a computer monitor screen is called its dot pitch and is measured in millimeters. Most monitors have a dot pitch of 0.28 mm or less. There are two electromagnets around the collar of the tube which deflect the electron beam. The beam scans across the top of the monitor from left to right, is then blanked and moved back to the left-hand side slightly below the previous trace (on the next scan line), scans across the second line and so on until the bottom right of the screen is reached. The beam is again blanked, and moved back to the top left to start again. This process draws a complete picture, typically 50 to 100 times a second. The number of times in one second that the electron gun redraws the entire image is called the refresh rate and is measured in hertz (cycles per second). It is common, particularly in lowerpriced equipment, for all the odd-numbered lines of an image to be traced, and then all the even-numbered lines; the circuitry of such an interlaced display need be capable of only half the speed of a non-interlaced display. An interlaced display, particularly at a relatively low refresh rate, can appear to some observers to flicker, and may cause eyestrain and nausea. 17 CRT computer monitor As with television, several different hardware technologies exist for displaying computer-generated output: Liquid crystal display (LCD). LCDs are the most popular display device for new computers in the Western world. Cathode ray tube (CRT) o Vector displays, as used on the Vectrex, many scientific and radar applications, and several early arcade machines (notably Asteroids always implemented using CRT displays due to requirement for a deflection system, though can be emulated on any raster-based display. o Television receivers were used by most early personal and home computers, connecting composite video to the television set using a modulator. Image quality was reduced by the additional steps of composite video → modulator → TV tuner → composite video. Plasma display Surface-conduction electron-emitter display (SED) Video projector - implemented using LCD, CRT, or other technologies. Recent consumer-level video projectors are almost exclusively LCD based. Organic light-emitting diode (OLED) display The performance parameters of a monitor are: Luminance, measured in candelas per square metre (cd/m²). Size, measured diagonally. For CRT the viewable size is one inch (25 mm) smaller then the tube itself. Dot pitch. Describes the distance between pixels of the same color in millimetres. In general, the lower the dot pitch (e.g. 0.24 mm, which is also 240 micrometres), the sharper the picture will appear. Response time. The amount of time a pixel in an LCD monitor takes to go from active (black) to inactive (white) and back to active (black) again. It is measured in milliseconds (ms). Lower numbers mean faster transitions and therefore fewer visible image artifacts. 18 Refresh rate. The number of times in a second that a display is illuminated. Power consumption, measured in watts (W). Aspect ratio, which is the horizontal size compared to the vertical size, e.g. 4:3 is the standard aspect ratio, so that a screen with a width of 1024 pixels will have a height of 768 pixels. A widescreen display can have an aspect ratio of 16:9, which means a display that is 1024 pixels wide will have a height of 576 pixels. Display resolution. The number of distinct pixels in each dimension that can be displayed. A fraction of all LCD monitors are produced with "dead pixels"; due to the desire to increase profit margins by companies, most manufacturers sell monitors with dead pixels. Almost all manufacturers have clauses in their warranties which claim monitors with fewer than some number of dead pixels is not broken and will not be replaced. The dead pixels are usually stuck with the green, red, and/or blue subpixels either individually always stuck on or off. Like image persistence, this can sometimes be partially or fully reversed by using the same method listed below, however the chance of success is far lower than with a "stuck" pixel. Screen burn-in, where a static image left on the screen for a long time embeds the image into the phosphor that coats the screen, is an issue with CRT and Plasma computer monitors and televisions. The result of phosphor burn-in are "ghostly" images of the static object visible even when the screen has changed, or is even off. This effect usually fades after a period of time. LCD monitors, while lacking phosphor screens and thus immune to phosphor burn-in, have a similar condition known as image persistence, where the pixels of the LCD monitor "remember" a particular color and become "stuck" and unable to change. Unlike phosphor burn-in, however, image persistence can sometimes be reversed partially or completely. This is accomplished by rapidly displaying varying colors to "wake up" the stuck pixels. Screensavers using moving images, prevent both of these conditions from happening by constantly changing the display. Newer monitors are more resistant to burn-in, but it can still occur if static images are left displayed for long periods of time. Most modern computer displays can show thousands or millions of different colors in the RGB color space by varying red, green, and blue signals in continuously variable intensities. Many monitors have analog signal relay, but some more recent models (mostly LCD screens) support digital input signals. It is a common misconception that all computer monitors are digital. For several years, televisions, composite monitors, and computer displays have been significantly different. However, as TVs have become more versatile, the distinction has blurred. Some users use more than one monitor. The displays can operate in multiple modes. One of the most common spreads the entire desktop over all of the monitors, which thus act as one big desktop. The X Window System refers to this as Xinerama. 19 Two Apple flat-screen monitors used as dual display Display systems use either random or raster scan: Random scan displays, often termed vector displays, came first and are still used in some applications. Here the electron gun of a CRT illuminates points and/or straight lines in any order. The display processor repeatedly reads a variable 'display file' defining a sequence of X,Y coordinate pairs and brightness or colour values, and converts these to voltages controlling the electron gun. A Random Scan Display (outline) Raster scan displays, also known as bit-mapped or raster displays, are somewhat less relaxed. Their whole display area is updated many times a second from image data held in raster memory. The rest of this handout concerns hardware and software aspects of raster displays. 2.4 Random Scan Systems A two-dimensional video data acquisition system comprising: video detector apparatus for scanning a visual scene; controller apparatus for generating scan pattern instructions; system interface apparatus for selecting at least one scan pattern for acquisition of video data from the visual scene, the scan pattern being selected from a plurality of such patterns in accordance with the scan pattern instructions; and scan-video interface apparatus comprising random scan driver apparatus for generating scan control signals in accordance with the selected scan pattern, the video detector apparatus scanning the visual scene in accordance with the scan control signals to provide an output to the system interface such that an intensity data map is stored therein, the controller apparatus performing data processing of the intensity data map in accordance with a predetermined set of video data characteristics. 20 2.5 Raster Scan A Raster scan, or raster scanning, is the pattern of image detection and reconstruction in television, and is the pattern of image storage and transmission used in most computer bitmap image systems. The word raster comes from the Latin word for a rake, as the pattern left by a rake resembles the parallel lines of a scanning raster. In a raster scan, an image is cut up into successive samples called pixels, or picture elements, along scan lines. Each scan line can be transmitted as it is read from the detector, as in television systems, or can be stored as a row of pixel values in an array in a computer system. On a television receiver or computer monitor, the scan line is turned back to a line across an image, in the same order. After each scan line, the position of the scan line is advanced, typically downward across the image in a process known as vertical scanning, and a next scan line is detected, transmitted, stored, retrieved, or displayed. This ordering of pixels by rows is known as raster order, or raster scan order. 2.5.1 Rasters Lexically, a raster is a series of adjacent parallel 'lines' which together form an image on a display screen. In early analogue television sets each such line is scanned continuously, not broken up into distinct units. In computer or digital displays these lines are composed of independently coloured pixels (picture elements). Mathematically we consider a raster to be a rectangular grid or array of pixel positions: A Raster Pixel positions have X,Y coordinates. Usually Y points down. This may reflect early use to display text to western readers. Also when considering 3D, right-handed coordinates imply Z represents depth. 2.5.2 Pixel Values The colour of each pixel of a display is controlled by a distinct digital memory element. Each such element holds a pixel value encoding a monochrome brightness or colour to be displayed. 21 Monochrome displays are of two types. Bi-level displays have 1-bit pixels and have been green or orange as well as black-and-white. Greyscale displays usually have 8 to 16 bit pixel values encoding brightness. Non-monochrome displays also have different types. True-colour displays have pixel values divided into three component intensities, usually red, green and blue, often of 8 bits each. This used to be very costly. Alternatively the pixel values may index into a fixed or variable colour map defining a limited colour palette. Pseudo-colour displays with 8-bit pixels indexing a variable colour map of 256 colours have been common. 2.5.3 Raster Memory Pixmap: A pixmap is storage for a whole raster of pixel values. Usually a contiguous area of memory, comprising one row (or column) of pixels after another. Bitmap: Technically a bitmap is a pixmap with 1 bit per pixel, i.e. boolean colour values, e.g. for use in a black-and-white display. But 'bitmap' is often misused to mean any pixmap - please try to avoid this! Pixrect: A pixrect is any 'rectangular area' within a pixmap. A pixrect thus typically refers to a series of equal-sized fragments of the memory within a pixmap, one for each row (or column) of pixels. Frame Buffer: In a bit-mapped display, the display processor refreshes the screen 25 or more times per second, a line at a time, from a pixmap termed its frame buffer. In each refresh cycle, each pixel's colour value is 'copied' from the frame buffer to the screen. Frame buffers are often special two-ported memory devices ('video memory') with one port for writing and another for concurrent reading. Alternatively they can be part of the ordinary fast RAM of a computer, which allows them to be extensively reconfigured by software. Additional raster memory may exist 'alongside' that for colour values. For example there may be an 'alpha channel' (transparency values) a z-buffer (depth values for hidden object removal), or an a-buffer (combining both ideas). The final section of these notes will return to this area, especially use of a z-buffer. 22 2.5.4 Key Attributes of Raster Displays Major attributes that vary between different raster displays include the following: 'Colour': bi-level, greyscale, pseudo-colour, true colour: see 'pixel values' above; Size: usually measured on the diagonal: inches or degrees; Aspect ratio: now usually 5:4 or 4:3 (625-line TV: 4:3; HDTV: 5:3); Resolution: e.g. 1024×1280 (pixels). Multiplying these numbers together we can say e.g. 'a 1.25 Mega-pixel display'. Avoid terms such as low/medium/high resolution which may change over time. Pixel shape: now usually square; other rectangular shapes have been used. Brightness, sharpness, contrast: possibly varying significantly with respect to view angle. Speed, interlacing: now usually 50 Hz or more and flicker-free to most humans; Computational features, as discused below... Since the 1970s, raster display systems have evolved to offer increasingly powerful facilities, often packaged in optional graphics accelerator boards or chips. These facilities have typically consisted of hardware implementation or acceleration of computations which would otherwise be coded in software, such as: Raster-ops: fast 2D raster-combining operations; 2D scan conversion, i.e. creating raster images required by 2D drawing primitives such as: o 2D lines, e.g. straight/circular/elliptical lines, maybe spline curves (based on several points); o 2D coloured areas, e.g. polygons or just triangles, possibly with colour interpolation; o Text (often copied from rasterised fonts using raster-ops); 3D graphics acceleration, now often including 3D scan conversion, touched on below. It is useful for graphics software developers to be aware of such features and how they can be accessed, and to have insight into their cost in terms of time taken as a function of length or area. 2.6 Display Processor A display processor for displaying data in one or more windows on a display screen. The display processor divides a display screen into a plurality of horizontal strips with each strip further subdivided into a plurality of tiles. Each tile represents a portion of a window to be displayed on the screen. Each tile is defined by tile descriptors which include memory address locations of data to be display in that tile. The descriptors need only be changed when the arrangement of the windows on the screen is changed or when the mapping of any of the windows into the bit-map is changed. The display processor of 23 the present invention does not require a bit map frame buffer to be utilized before displaying windowed data on a screen. Each horizontal strip may be as thin as 1 pixel, which allows for the formation of windows of irregular shapes, such as circles. 2.7 Let us Sum Up In this lesson we have learnt about random scan, raster scan, and the display processor. 2.8 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. Discuss about raster memory Discuss about the key attributes of raster displays 2.9 Points for Discussion Discuss the following Display processor CRT 2.10 Model answers to “Check your Progress” In order to check your progress, try to answer the following questions LCD Performance parameters of a monitor 2.11 References 1. Chapter 1, 26 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 2. Chapter 2 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 3. Chapter 2 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 4. Chapter 4 of J.D. Foley, A. Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 24 LESSON – 3: GRAPHICS SOFTWARE STANDARDS CONTENTS 3.1 Aims and Objectives 3.2 Introduction 3.3 Graphics Kernel System 3.4 PHIGS 3.5 OpenGL 3.6 Let us Sum Up 3.7 Lesson-end Activities 3.8 Points for Discussion 3.9 Model answers to “Check your Progress” 3.10 References 3.1 Aims and Objectives The aim of this lesson is to learn the concept of graphics software standards. The objectives of this lesson are to make the student aware of the following concepts a) Graphics Kernel System b) PHGIS c) OpenGL 3.2 Introduction A list of graphics standards are given below CGI - the computer graphics interface - which is the low-level interface between GKS and the hardware. CGM - the computer graphics metafile - which is defined as the means of communicating between different software packages. 3D-GKS - the three-dimensional extension of GKS. PHIGS - the Programmers Hierarchical Interactive Graphics System - another three-dimensional standard (based on the old SIGGRAPH core). 25 3.3 Graphical Kernel System The Graphical Kernel System (GKS) is accepted as an international standard for two-dimensional graphics (although largely ignored in the USA. The two-dimensional Computer Graphics is closely related to the six output functions of GKS. These are:1. Polyline. Draws one or more straight lines through the coordinates supplied. 2. Polymarker. Draws a symbol at each of the coordinates supplied. The software allows the choice of one of the five symmetric symbols, namely: x + * 0 3. Text. This allows a text string to be output in a number of ways, starting at the coordinate given. 4. Fill-area. This allows a polygon to be drawn and filled, using the coordinates given. Possible types of fill include hollow, solid and a variety of hatching and patterns. 5. Cell-array. This allows a pattern to be defined and output in the rectangle defined by the coordinates given. This is discussed in the section "Patterns & Pictures". 6. Generalised Drawing Primitive (GDP). This allows the provision of a variety of other facilities. Most systems include software for arcs of circles or ellipses and the drawing of a smooth curve through a set of points (I have called this "polysmooth" elsewhere in this text). Following the acceptance of GKS as an international standard, work commenced on two related standards, namely CGI and CGM. The "Computer Graphics Interface" provides a low-level standard between the actual hardware and GKS and specifies how device-drivers should be written. The "Computer Graphics Metafile" is used to transfer graphics segments from one computer system to another. 3.4 PHIGS The Programmer's Hierarchical Interactive Graphics System (PHIGS) is a 3D graphics standard which was developed within ISO in parallel to GKS-3D. The PHIGS standard defines a set of functions and data structures to be used by a programmer to manipulate and display 3-D graphical objects. It was accepted as a full International Standard in 1988. A great deal of PHIGS is identical to GKS-3D, including the primitives, the attributes, the workstation concept, and the viewing and input models. However, PHIGS has a single Central Structure Store (CSS), unlike the separate Workstation Dependent and Workstation Independent Segment Storage (WDSS and WISS) of GKS. The CSS contains Structures which can be configured into a hierarchical directed-graph database, and within the structures themselves are stored the graphics primitives, attributes, and so forth. PHIGS is aimed particularly at highly interactive 26 applications with complicated hierarchical data structures, for example: Mechanical CAD, Molecular Modelling, Simulation and Process Control. At the end of 1991, CERN acquired an implementation of PHIGS in a portable machine-independent version (i.e. it did not consider hardware-dependent packages supplied by the hardware manufacturers). The package is from the French companie G5G -- Graphisme 5eme Generation --. This specific implementation of PHIGS, the only one officially supported at CERN, is called GPHIGS. The package is available on the following platforms: VAX VMS, HP (HP/UX), Silicon Graphics, IBM RISC 6000, SUN (SunOS and Solaris), DEC Station (Ultrix), DEC ALPHA (OpenVMS and OSF/1). Both the FORTRAN and C bindings are available. The following driver interfaces are available: X-Window, DEC-Windows, GL, Starbase, XGL, HP GL, CGM, and PostScript. A new version (3.1) is now available, as announced in CNL 216. 3.5 OpenGL OpenGL is a standard interface developed by Silicon Graphics and subsequently endorsed by Microsoft. OpenGL is a widely accepted standard API for high-end graphics applications. For example, Code written in OpenGL would typically include subroutine calls to do things like "draw a triangle." The details of exactly how the triangle is drawn are inside OpenGL and are hidden from the applications programmer. This leaves open the possibility of having different implementations of OpenGL, all of which work with the application because they all have the same subprogram calls. Different implementations of OpenGL are written for different graphics accelerators. If a computer running Microsoft software does not have a graphics accelerator, Microsoft provides a software implementation that runs on the CPU. If the computer is upgraded with a hardware accelerator, the maker of the accelerator board may supply a version of OpenGL than routes the OpenGL commands to the board, converting the control sequences to commands appropriate to that particular hardware. 3.6 Let us Sum Up In this lesson we have learnt about a) GKS b) PHIGS c) OpenGL 3.7 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Various graphics standards 27 3.8 Points for Discussion Discuss about the following a) Graphics metafile 3.9 Model answers to “Check your Progress” In order to check your progress, try to answer the following questions a) Discuss about PHIGS b) Discuss about GKS 3.10 References 1 2 3 4 Chapter 1 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 Chapter 2 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 Chapter 1, 2, 17 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 Chapter 1, 2, 4, 7 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 28 LESSON – 4: GRAPHICS INPUT DEVICES CONTENTS 4.1 Aims and Objectives 4.2 Introduction 4.3 Keyboard 4.4 Mouse 4.5 Data gloves 4.6 Graphics Tablets 4.7 Scanner 4.8 Joy Stick 4.9 Light Pen 4.10 Let us Sum Up 4.11 Lesson-end Activities 4.12 Points for Discussion 4.13 Model answers to “Check your Progress” 4.14 References 29 4.1 Aims and Objectives The aim of this lesson is to learn the concept of some of important input devices needed for computer graphics The objectives of this lesson are to make the student aware of some of the important input devices. 4.2 Introduction In the following subsection we will learn about the following input devices a) b) c) d) e) f) g) Keyboard Mouse Data gloves Graphics Tablet Scanner Joystick Light Pen 4.3 Keyboard A keyboard is a peripheral partially modeled after the typewriter keyboard. Keyboards are designed to input text and characters, as well as to operate a computer. Physically, keyboards are an arrangement of rectangular buttons, or "keys". Keyboards typically have characters engraved or printed on the keys; in most cases, each press of a key corresponds to a single written symbol. However, to produce some symbols requires pressing and holding several keys simultaneously or in sequence; other keys do not produce any symbol, but instead affect the operation of the computer or the keyboard itself. Keyboard keys Roughly 50% of all keyboard keys produce letters, numbers or signs (characters). Other keys can produce actions when pressed, and other actions are available by the simultaneous pressing of more than one action key. 30 http://en.wikipedia.org/wiki/Image:Foldable_keyboard.jpg Multimedia keyboard http://en.wikipedia.org/wiki/Image:Foldable_keyboard.jpgA foldable keyboard 4.4 Mouse A mouse (plural mice or mouses) functions as a pointing device by detecting two-dimensional motion relative to its supporting surface. Physically, a mouse consists of a small case, held under one of the user's hands, with one or more buttons. It sometimes features other elements, such as "wheels", which allow the user to perform various system-dependent operations, or extra buttons or features can add more control or dimensional input. The mouse's motion typically translates into the motion of a pointer on a display. The name mouse, coined at the Stanford Research Institute, derives from the resemblance of early models (which had a cord attached to the rear part of the device, suggesting the idea of a tail) to the common eponymous rodent. The first marketed integrated mouse — shipped as a part of a computer and intended for personal computer navigation — came with the Xerox 8010 Star Information System in 1981. 31 A contemporary computer mouse The first computer mouse, held by inventor Douglas Engelbart An optical mouse uses a light-emitting diode and photodiodes to detect movement relative to the underlying surface, rather than moving some of its parts — as in a mechanical mouse. http://en.wikipedia.org/wiki/Image:Mouse-mechanism-cutaway.png 4.5 Data gloves http://en.wikipedia.org/wiki/Image:Mouse-mechanism-cutaway.png A glove equipped with sensors that sense the movements of the hand and interfaces those movements with a computer. Data gloves are commonly used in virtual reality environments where the user sees an image of the data glove and can manipulate the movements of the virtual environment using the glove 4.6 Graphics tablet A graphics tablet is a computer input device that allows one to hand-draw images and graphics, similar to the way one draws images with a pencil and paper. A Graphics tablet consists of a flat surface upon which the user may "draw" an image using an attached stylus, a pen-like drawing apparatus. The image generally does not appear on the tablet itself but, rather, is displayed on the computer monitor. A Wacom Graphire4 http://en.wikipedia.org/wiki/Image:Tablet_gerber.jpg graphics tablet graphics tablet. A Gerber 4.7 Scanner 32 A scanner is a device that analyzes images, printed text, or handwriting, or an object (such as an ornament) and converts it to a digital image. Most scanners today are variations of the desktop (or flatbed) scanner. The flatbed scanner is the most common in offices. Hand-held scanners, where the device is moved by hand, were briefly popular but are now not used due to the difficulty of obtaining a high-quality image. Both these types of scanners use charge-coupled device (CCD) or Contact Image Sensor (CIS) as the image sensor, whereas older drum scanners use a photomultiplier tube as the image sensor. Another category of scanner is a rotary scanner used for high-speed document scanning. This is another kind of drum scanner, but it uses a CCD array instead of a photomultiplier. Other types of scanners are planetary scanners, which take photographs of books and documents, and 3D scanners, for producing three-dimensional models of objects, but this type of scanner is considerably more expensive relative to other types of scanners. Another category of scanner are digital camera scanners which are based on the concept of reprographic cameras. Due to the increasing resolution and new features such as anti-shake, digital cameras become an attractive alternative to regular scanners. While still containing disadvantages compared to traditional scanners, digital cameras offer unmatched advantages in speed and portability. Desktop scanner, with the lid raised Scan of the jade rhinoceros 4.8 Joystick A joystick is a personal computer peripheral or general control device consisting of a handheld stick that pivots about one end and transmits its angle in two or three dimensions to a computer. Joysticks are often used to control video games, and usually have one or more push-buttons whose state can also be read by the computer. The term joystick has become a synonym for game controllers that can be connected to the computer since the computer defines the input as a "joystick input". 33 Apart from controlling games, joysticks are also used for controlling machines such as aircraft, cranes, trucks, powered wheelchairs and some zero turning radius lawn mowers. More recently miniature joysticks have been adopted as navigational devices for smaller electronic equipment such as mobile phones. There has a been a recent and very significant drop in joystick popularity in the gaming industry. This is primarily due to the shrinkage of the flight simulator genre, and the almost complete disappearance of space-based simulators. Joysticks can be used within first-person shooter games, but are significantly less accurate than a mouse-keyboard. This is one of the fundamental reasons why multiplayer console games are not compatible with PC versions of the same game. A handful of recent games, including Halo 2 and Shadowrun, have allowed console-PC matchings, but have significantly handicapped PC users by requiring them to use the auto-aim feature. http://en.wikipedia.org/wiki/Image:Joyopis.svg http://en.wikipedia.org/wiki/Image:Joyopis.svg http://en.wikipedia.org/wiki/Image:Joyopis.svg Joystick elements: 1. Stick 2. Base 3. Trigger 4. Extra buttons 5. Autofire switch 6. Throttle 7. Hat Switch (POV Hat) 8. Suction Cup 4.9 Light Pen A light pen is a computer input device in the form of a light-sensitive wand used in conjunction with the computer's CRT monitor. It allows the user to point to displayed objects, or draw on the screen, in a similar way to a touch screen but with greater positional accuracy. A light pen can work with any CRT-based monitor, but not with LCD screens, projectors and other display devices. A light pen is fairly simple to implement. The light pen works by sensing the sudden small change in brightness of a point on the screen when the electron gun refreshes that spot. By noting exactly where the scanning has reached at that moment, the X,Y position of the pen can be resolved. This is usually achieved by the light pen causing an interrupt, at which point the scan position can be read from a special register, or computed from a counter or timer. The pen position is updated on every refresh of the screen. 34 The light pen became moderately popular during the early 1980s. It was notable for its use in the Fairlight CMI, and the BBC Micro. Even some consumer products were given Light pens. For example, the Toshiba DX-900 VHS HiFi/PCM Digital VCR came with one. However, due to the fact that the user was required to hold his or her arm in front of the screen for long periods of time, the light pen fell out of use as a general purpose input device. The first light pen was used around 1957 on the Lincoln TX-0 computer at the MIT Lincoln Laboratory. Contestants on the game show Jeopardy! use a light pen to write down their answers and wagers for the Final Jeopardy! round. Light pens are used country-wide in Belgium for voting. 4.10 Let us Sum Up In this lesson we have learnt about various input devices needed for computer graphics 4.11 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Explain about Joystick b) Explain about Data glove 4.12 Points for Discussion Discuss about the following a) Keyboard b) Scanners 4.13 Model answers to “Check your Progress” In order to check your progress, try to answer the following questions a) Discuss about Mouse b) Discuss about Graphics Tablets 4.14 References 1 Chapter 11 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 35 2 3 4 Chapter 2 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 Chapter 2 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 Chapter 8 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 36 LESSON – 5: OUTPUT PRIMITIVES CONTENTS 5.1 Aims and Objectives 5.2 Introduction 5.3 Points and Lines 5.4 Rasterization 5.5 Digital Differential Analyzer (DDA) Algorithm 5.6 Bresenham’s Algorithm 5.7 Properties of Circles 5.8 Properties of ellipse 5.9 Pixel Addressing 5.10 Let us Sum Up 5.11 Lesson-end Activities 5.12 Points for Discussion 5.13 Model answers to “Check your Progress” 5.14 References 37 5.1 Aims and Objectives The aim of this lesson is to learn the concept of output primitives The objectives of this lesson are to make the student aware of the following concepts a) points and lines b) Rasterization c) DDA and Bresenham’s algorithm d) Properties of circle and ellipse e) Pixel addressing 5.2 Introduction The basic elements constituting a graphic are called output primitives. Each output primitive has an associated set of attributes, such as line width and line color for lines. The programming technique is to set values for the output primitives and then call a basic function that will draw the desired primitive using the current settings for the attributes. Various graphics systems have different graphics primitives. For example GKS defines five output primitives namely, polyline (for drawing contiguous line segments), polymarker (for marking coordinate positions with various symmetric text symbols), text (for plotting text at various angles and sizes), fill area (for plotting polygonal areas with solid or hatch fill), cell array (for plotting portable raster images). At the same time GRPH1 has the output primitives namely Polyline, Polymarker, Text, Tone and have other secondary primitives besides these namely, Line and Arrow 5.3 Points and Lines In a CRT monitor, the electron beam is turned on to illuminate the screen phosphor at the selected location. Depending on the display technology, the positioning of the electron beam changes. In a random-scan (vector) system point plotting instructions are stored in a display list and the coordinate values in these instructions are converted to deflection voltages the position the electron beam at the screen locations. Low-level procedure for ploting a point on the screen at (x,y) with intensity “I” can be given as setPixel(x,y,I) A line is drawn by calculating the intermediate positions between the two end points and displaying the pixels at those positions. 5.4 Rasterization Rasterization is the process of converting a vertex representation to a pixel representation; rasterization is also called scan conversion. Included in this definition are 38 geometric objects such as circles where you are given a center and radius. In these lesson I will cover: The digital differential analyzer (DDA) which introduces the basic concepts for rasterization. Bresenham's algorithm which improves on the DDA. Scan conversion algorithms use incremental methods that exploit coherence. An incremental method computes a new value quickly from an old value, rather than computing the new value from scratch, which can often be slow. Coherence in space or time is the term used to denote that nearby objects (e.g., pixels) have qualities similar to the current object. 5.5 Digital Differential Analyzer (DDA) Algorithm In this algorithm, the line is sampled at unit intervals in one coordinate and find the corresponding values nearest to the path for the other coordinate. For a line with positive slope less than one, x > y (where x = x2-x1 and y = y2-y1). Hence we sample at unit x intervals and compute each successive y values as y k 1 y k m (xi , yi) (xi , Round(yi)) For lines with positive slope greater than one, y > x. Hence we sample at unit y intervals and compute each successive x values as 1 m Since the slope, m, can be any real number, the calculated value must be rounded to the nearest integer. x k 1 x k 39 For a line with negative slope, if the absolute value of the slope is less than one, we make unit increment in the x direction and calculate y values as y k 1 y k m For a line with negative slope, if the absolute value of the slope is greater than one, we make unit decrement in the y direction and calculate x values as x k 1 x k 1 m [Note :- for all the above four cases it is assumed that the first point is on the left and second point is in the right.] DDA Line Algorithm void myLine(int x1, int y1, int x2, int y2) { int length,i; double x,y; double xincrement; double yincrement; length = abs(x2 - x1); if (abs(y2 - y1) > length) length = abs(y2 - y1); xincrement = (double)(x2 - x1)/(double)length; yincrement = (double)(y2 - y1)/(double)length; x = x1 + 0.5; y = y1 + 0.5; for (i = 1; i<= length;++i) { myPixel((int)x, (int)y); x = x + xincrement; y = y + yincrement; } } 5.6 Bresenham’s Algorithm In this method, developed by Jack Bresenham, we look at just the center of the pixels. We determine d1 and d2 which is the "error", i.e., the difference from the "true line." 40 Steps in the Bresenham algorithm: 1. Determine the error terms 2. Define a relative error term such that the sign of this term tells us which pixel to choose 3. Derive equation to compute successive error terms from first 4. Compute first error term Now the y coordinate on the mathematical line at pixel position xi+1 is calculated as y = m(xi+1) + b And the distances are calculated as d1 = y - yi = m(xi +1) + b - yi d2 = (yi +1) - y = yi +1 -m(xi +1) - b Then d1 - d2 = 2m(xi +1) - 2y + 2b -1 Now define pi = dx(d1 - d2) = relative error of the two pixels. Note: pi < 0 if yi pixel is closer, pi >= 0 if yi+1 pixel is closer. Therefore we only need to know the sign of pi . With m = dy/dx and substituting in for (d1 - d2) we get pi = 2 * dy * xi - 2 * dx * yi + 2 * dy + dx * (2 * b - 1) (1) Let C = 2 * dy + dx * (2 * b - 1) Now look at the relation of p's for successive x terms. pi+1 = 2dy * xi+1 - 2 * dx * yi+1 + C 41 pi+1 - pi = 2 * dy * (xi+1 - xi) - 2 * dx * ( yi+1 - yi) with xi+1 = xi + 1 and yi+1= yi + 1 or yi pi+1 = pi + 2 * dy - 2 * dx(yi+1 -yi) Now compute p1 (x1,y1) from (1) , where b = y - dy / dx * x p1 = = = 2dy * x1 - 2dx * y1 + 2dy + dx(2y1 - 2dy / dx * x1 - 1) 2dy * x1 - 2dx * y1 + 2dy + 2dx * y1 - 2dyx1 - dx 2dy - dx if pi < 0, plot the pixel (xi+1, yi) and next decision parameter is pi+1 = pi + 2dy else and plot the pixel (xi+1, yi+1) and next decision parameter is pi+1 = pi + 2dy - 2dx Bresenham Algorithm for 1st octant: 1. 2. 3. 4. 5. 6. Enter endpoints (x1, y1) and (x2, y2). Display x1, y1. Compute dx = x2 - x1 ; dy = y2 - y1 ; p1 = 2dy - dx. If p1 < 0.0, display (x1 + 1, y1), else display (x1+1, y1 + 1) if p1 < 0.0, p2 = p1 + 2dy, else p2 = p1 + 2dy - 2dx Repeat steps 4, 5 until reach x2, y2. Note: Only integer Addition and Multiplication by 2. Notice we always increment x by 1. For a generalized Bresenham Algorithm must look at behavior in different octants. 42 5.7 Properties of Circles The set of points in the circumference of the circle are all at equal distance r from the centre (xc,yc) and its relation is given be pythagorean theorem as x xc 2 y yc 2 r 2 The points in the circumference of the circle can be calculated by unit increments in the x direction from xc - r to xc + r and the corresponding y values can be obtained as 2 y y c r 2 xc x The major problem here is that the spacing between the points will not be same. It can be adjusted by interchanging x and y whenever the absolute value of the slope of the circle is greater than 1. The unequal spacing can be eliminated by using polar coordinates and is given by x x c r cos y y c r sin The major problem in the above two methods is the computational time. The computational time can be reduced by considering the symmetry of circles. The shape of the circle is similar in each quadrant. Thinking one step further shows that there are symmetry between octants too. 43 Midpoint circle algorithm To simplify the function evaluation that takes place on each iteration of our circledrawing algorithm, we can use Midpoint circle algorithm The equation of the circle can be expressed as a function as given below f ( x, y ) x 2 y 2 r 2 If the point is inside the circle then f(x,y)<0 and if it is outside then f(x,y)>0 and if the point is in the circumference of the circle then f(x,y)=0. Thus the circle function is the decision parameter in the midpoint algorithm. Assume that we have just plotted (xk,yk), we have to decide whether to point (xk+1, yk) or (xk+1, yk - 1) nearer to the circle. Now we consider the midpoint between the points and define the decision parameter as 1 p k f x k 1, y k 2 2 1 x k 1 y k r 2 2 2 Similarly 1 p k 1 f x k 1 1, y k 1 2 2 1 x k 1 1 y k 1 r 2 2 2 Now by subtracting the above two equations we get p k 1 p k 2x k 1 y k21 y k2 y k 1 y k 1 where yk+1 is either yk or yk+1 depending on the sign of pk. 44 The initial decision parameter is obtained by evaluating the circle function at the starting position (0,r) 1 p 0 f 1, r 2 2 1 5 1r r2 r 2 4 Hence the algorithm for the first octant is as given below 1. Calculate p0 2. k=0 3. while x ≤ y a) if pk < 0 then plot pixel (xk+1,yk-1) and find the next decision parameter as p k 1 p k 2 x k 1 1 b) else plot pixel (xk+1,yk-1) and find the next decision parameter as p k 1 p k 2 x k 1 1 2 y k 1 c) k=k+1 where 2xk+1=2xk+2 and 2yk+1=2yk-2 5.8 Properties of ellipse An ellipse is defined as the set of points such that the sum of the distances from two fixed positions (foci) is the same for all points. If the distances to the two foci from any point P = (x,y) on the ellipse are labeled d1 and d2 then, the general equation of an ellipse can be stated as d1 + d2 = constant Let the focal coordinates be F1 = (x1,y1) and F2 = (x2,y2). Then by substituting the value of d1 and d2 we will get x x1 2 y y1 2 x x2 2 y y2 2 cons tan t The general equation of the ellipse can be written as Ax 2 By 2 Cxy Dx Ey F 0 where the coefficients A, B, C, D, E, and F are evaluated in terms of the focal coordinates and the dimensions of the major and minor axis of the ellipse. 45 If the major and the minor axis are aligned in the directions of x-axis and y-axis, then the equation of ellipse can be given by 2 2 x xc y yc 1 rx ry where rx and ry are the semi-major and semi-minor axis respectively. The polar equation of the ellipse can be given by x x c rx cos y y c ry sin 5.9 Pixel Addressing The Pixel Addressing feature controls the number of pixels that are read from the Region of interest (ROI). Pixel Addressing is controlled by two parameters – a Pixel Addressing mode and a value. The mode of Pixel Addressing can be decimate (0), averaging (1), binning (2) or resampling (3). With a Pixel Addressing value of 1, the Pixel Addressing mode has no effect and all pixels in the ROI will be returned. For Pixel Addressing values greater than 1, the number of pixels will be reduced by the square of the value. For example, a Pixel Addressing value of 2 will result in ¼ of the pixels. The Pixel Addressing mode determines how the number of pixels is reduced. The Pixel Addressing value can be considered as the size of a block of pixels made up of 2x2 groups. For example, a Pixel Addressing value of 3 will reduce a 6 x 6 block of pixels to a 2 x 2 block – a reduction of 4/36 or 1/9. The decimate mode will drop pixels all the pixels in the block except for the topleft group of four. At the highest Pixel Addressing value of 6, a 12 x 12 block of pixels is reduced to 2 x 2. At this level of reduction detail in the scene can be lost and color artifacts introduced. 46 The averaging mode will average pixels with the similar color within the block resulting in a 2x2 Bayer pattern. This allows details in the blocks to be detected and reduces the effects of the color artifacts. The binning mode will sum pixels with similar color within the block reducing the block to a 2x2 Bayer pattern. Unlike binning with CCD sensors, this summation occurs after the image is digitized so no increase in sensitivity will be noticed but a dark image will appear brighter. The resampling mode uses a different approach involving the conversion of the Bayer pattern in the blocks to RGB pixels. With a Pixel Addressing value of 1, resampling has no effect. With a Pixel Addressing mode of 2 or more, resampling will convert the block of 10-bit pixels to one 30-bit RGB pixel by averaging the red, green and blue channels. Setting the video format to YUV422 mode will result in the best image quality while resampling. Resampling will create images with the highest quality and the least artifacts. Pixel Addressing will reduce the amount of data coming from the camera. However, only the Decimate mode will permit an increase in the frame rate. Averaging, binning and resampling modes will have the same frame rate as if the Pixel Addressing value was 1 (no decimation.). Pixel Addressing works in the same fashion with color or monochrome sensors. For example the pixel addressing a camera and is parameters are shown in the following tables. Controls Camera Auto Manual One-time Auto Off CiD All cameras No Yes No Yes Yes Parameters Camera Parameter Unit Type PLA741 PLA742 PLA770 PLA780 Mode Min Max Default Step Size None Absolute 0 0 0 1 Value Mode None Absolute 1 None Absolute 0 2 3 1 0 1 1 Value Mode None Absolute 1 None Absolute 0 2 3 1 0 1 1 Value None Absolute 1 4 1 1 Mode None Absolute 0 3 0 1 Value None Absolute 1 6 1 1 Comments 0: Decimate 0: Decimate, 1: Average, 2: Bin, 3: Resample 0: Decimate, 1: Average, 2: Bin, 3: Resample Pixel Addressing Value of 3 is not supported 0: Decimate, 1: Average, 2: Bin, 3: Resample Pixel Addressing Value of 5 is not supported 47 5.10 Let us Sum Up In this lesson we have learnt about a) points and lines b) DDA and Bresenhams algorithm c) Properties of circle and ellipse d) Pixel addressing 5.11 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Discuss about midpoint circle algorithm b) Discuss about advantages of Bresenhams algorithm over DDA 5.12 Points for Discussion Discuss the following a) Polar equation of circle b) Pixel addressing 5.13 Model answers to “Check your Progress” In order to check your progress, try to answer the following a) Algorithm for DDA b) Algorithm for Bresenhams algorithm 5.14 References 1. Chapter 2 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 2. Chapter 3 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 3. Chapter 4 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 4. Chapter 3 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 48 UNIT – II LESSON – 6: TWO DIMENSIONAL TRANSFORMATION CONTENTS 6.1 Aims and Objectives 6.2 Introduction 6.3 Representation of Points/Objects 6.4 Translation 6.5 Scaling 6.6 Rotation 6.7 Shear 6.8 Combining Transformations 6.9 Homogeneous Coordinates 6.10 Let us Sum Up 6.11 Lesson-end Activities 6.12 Points for Discussion 6.13 Model answers to “Check your Progress” 6.14 References 49 6.1 Aim and Objectives The aim of this lesson is to learn the concept of two dimensional transformations. The objectives of this lesson are to make the student aware of the following concepts a) translation b) rotation c) scaling d) shear e) Homogenous coordinates systems 6.2 Introduction Transformations are a fundamental part of computer graphics. Transformations are used to position objects, to shape objects, to change viewing positions, and even to change how something is viewed. There are 4 main types of transformations that one can perform in 2 dimensions. translations scaling rotation shearing 6.3 Representation of Points/Objects A point p in 2D is represented as a pair of numbers: p= (x, y) where x is the xcoordinate of the point p and y is the y-coordinate of p . 2D objects are often represented as a set of points (vertices), {p1,p2,...,pn}, and an associated set of edges {e1,e2,...,em}. An edge is defined as a pair of points e = {pi,pj}. For example the three points and three edges of the triangle given here are p1=(1,0), p2=(1.5,2), p3=(2,0), e1={p1,p2}, e2={p2,p3}, and e3={p3,p1}. 50 We can also write points in vector/matrix notation as 6.4 Translation Assume you are given a point at (x,y)=(2,1). Where will the point be if you move it 3 units to the right and 1 unit up? The Answer is (x',y') = (5,2). How was this obtained? This is obtained by (x',y') = (x+3,y+1). That is, to move a point by some amount dx to the right and dy up, you must add dx to the x-coordinate and add dy to the y-coordinate. For example to move the green triangle, represented by 3 points given below, to the red triangle we need dx = 3 and dy = -5. greentriangle = { p1=(1,0), p2=(2,0), p3=(1.5,2) } Matrix/Vector Representation of Translations A translation can also be represented by a pair of numbers, t=(tx,ty) where tx is the change in the x-coordinate and ty is the change in y coordinate. To translate the point p by t, we simply add to obtain the new (translated) point q = p + t. 51 x t x x t x q pt y t y y t y 6.5 Scaling Suppose we want to double the size of a 2-D object. What do we mean by double? Double in size, width only, height only, along some line only? When we talk about scaling we usually mean some amount of scaling along each dimension. That is, we must specify how much to change the size along each dimension. Below we see a triangle and a house that have been doubled in both width and height (note, the area is more than doubled). 52 The scaling for the x dimension does not have to be the same as the y dimension. If these are different, then the object is distorted. What is the scaling in each dimension of the pictures below? 53 And if we double the size, where is the resulting object? In the pictures above, the scaled object is always shifted to the right. This is because it is scaled with respect to the origin. That is, the point at the origin is left fixed. Thus scaling by more than 1 moves the object away from the origin and scaling of less than 1 moves the object toward the origin. This is because of how basic scaling is done. The above objects have been scaled simply by multiplying each of its points by the appropriate scaling factor. For example, the point p = (1.5, 2) has been scaled by 2 along x and 0.5 along y. Thus, the new point is q = (2*1.5, 5*2) = (1, 1). Matrix/Vector Representation of Scaling Scaling transformations are represented by matrices. For example, the above scaling of 2 and 0.5 is represented as a matrix: s x scale matrix : s 0 new s x po int : q 0 0 2 0 sy 0 5 0 x x sx s y y y sy Scaling about a Particular Point What do we do if we want to scale the objects about their center as show below? 54 Let the fixed point (xf, yf) be the center of the object, then the equation for scaling with respect to (xf, yf) is given by x x x f s x x f y y y f s y y f 6.6 Rotation Consider rotation of a point (x,y) with respect to origin in the anti clock wise direction. Let x , y be the new point after rotation and let the angular displacement (ie. Angle of rotation) be as shown in figure. Let be the distance of the points from the origin. And let be the angle between x-axis and the line joining the point (x,y) to the origin. Now applying trigonometric identities, we get the following equations for x , y x cos cos cos sin sin y sin cos sin sin cos Similarly for (x,y), we get the following equation x cos y sin ---- (a) -----(b) Substituting (b) in (a), we the get equation for rotating a point with respect to origin as follows 55 x x cos y sin y x sin y cos Matrix/Vector Representation of Translations x cos y sin sin x cos y Now suppose we want to rotate an object with respect to some fixed point (xf,yf) as shown in the following figure. Then what will be the equation for rotation for a point with respect to the fixed point (xf,yf). The equation for rotation of a point with respect to a fixed point (xf,yf) can be given as x x f x x f cos y y f sin y y f x x f sin y y f cos 56 6.7 Shear A transformation that distorts the shape of an object such that the transformed shape appears as if the object were composed of internal layers that had been caused to slide over each other is called a shear. An x-direction shear relative to x-axis can be given as x x sh x y y y Similarly, y-direction shear relative to y-axis can be given as x x y y sh y x Matrix/Vector Representation of Shearing In matrix representation, the x-direction shear equation can be given as x 1 sh x x y 0 1 y Similarly, the y-direction shear can be given as x 1 y sh y 0 x 1 y 6.8 Combining Transformations We saw that the basic scaling and rotating transformations are always with respect to the origin. To scale or rotate about a particular point (the fixed point) we must first translate the object so that the fixed point is at the origin. We then perform the scaling or rotation and then the inverse of the original translation to move the fixed point back to its original position. For example, if we want to scale the triangle by 2 in each direction 57 about the point fp = (1.5,1), we first translate all the points of the triangle by T = (-1.5,1), scale by 2 (S) , and then translate back by -T=(1.5,1). Mathematically this looks like x 2 0 x1 1.5 1.5 q 2 y 1 1 y 0 2 1 2 Order Matters! Notice the order in which these transformations are performed. The first (rightmost) transformation is T and the last (leftmost) is -T. If you apply these transformations in a different order then you will get very different results. For example, what happens when you first apply T followed by -T followed by S? Here T and -T cancel each other out and you are simply left with S Sometimes (but be careful) order does not matter, For example, if you apply multiple 2D rotations, order makes no difference: R1 R2 = R2 R1 But this will not necessarily be true in 3D!! 6.9 Homogeneous Coordinates In general, when you want to perform a complex transformation, you usually make it by combining a number of basic transformations. The above equation for q, however, is awkward to read because scaling is done by matrix multiplication and translation is done by vector addition. In order to represent all transformations in the same form, computer scientists have devised what are called homogeneous coordinates. Do not try to apply any exotic interpretation to them. They are simply a mathematical trick to make the representation be more consistent and easier to use. Homogeneous coordinates (HC) add an extra virtual dimension. Thus 2D HC are actually 3D and 3D HC are 4D. Consider a 2D point p = (x,y). In HC, we represent p as p = (x,y,1). An extra coordinate is added whose value is always 1. This may seem odd but it allows us to now represent translations as matrix multiplication instead of as vector addition. A translation (dx, dy) which would normally be performed as x dx q y dy is now written as 58 x 1 0 dx x q y T p 0 1 dy y 1 0 0 1 1 Now, we can write the scaling about a fixed point as simply a matrix multiplication: q = (-T) S T p = A p, where A = (-T) S T The matrix A can be calculated once and then applied to all the points in the object. This is much more efficient than our previous representation. It is also easier to identify the transformations and their order when everything is in the form of matrix multiplication. The matrix for scaling in HC is s x S 0 0 0 0 0 1 sy 0 and the matrix for rotation is cos R sin 0 6.10 sin cos 0 0 0 1 Let us Sum Up In this lesson we have learned about two dimensional geometric transformations. 6.11 Lesson-end Activities Do it yourself: - What are the points and edges in this picture of a house? What are the transformations required to move this house so that the peak of the roof is at the origin? What is required to move the house as shown in animation? 59 6.12 Points for Discussion After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Define two dimensional translation b) Discuss about the rotation with respect to a fixed point 6.13 Model answers to “Check your Progress” To check your progress, try to answer the following questions a) Define scaling with respect to a fixed point b) What is the need of homogenous coordinate systems 6.14 References 1. Chapter 4 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 2. Chapter 4 of Steven Harrington, “Computer Graphics – A programming approach”, McGraw Hill, 1987 3. Chapter 5 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 4. Chapter 6 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 5. Chapter 5 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 60 LESSON – 7: TWO DIMENSIONAL VIEWING AND LINE CLIPPING CONTENTS 7.1 Aim and Objectives 7.2 Introduction 7.3 Line Clipping 7.3.1 Clipping Individual Points 7.3.2 Simultaneous Equations 7.3.3 Cohen-Sutherland Line Clipping 7.3.4 Liang-Barsky Line Clipping 7.4 Viewing 7.4.1 Window 7.4.2 Viewport 7.4.3 Window To Viewport Transformation 7.4.4 Viewport to Physical Device Transformation 7.5 Let us Sum Up 7.6 Lesson-end Activities 7.7 Points for Discussion 7.8 Model answers to “Check your Progress” 7.9 References 61 7.1 Aims and Objectives The aim of this lesson is to learn the concept of two dimensional viewing and line clipping. The objectives of this lesson are to make the student aware of the following concepts a) Window b) Viewport c) Window to Viewport Transformation d) And lineclipping 7.2 Introduction Clipping refers to the removal of part of a scene. Internal clipping removes parts of a picture outside a given region; external clipping removes parts inside a region. We'll explore internal clipping, but external clipping can almost always be accomplished as a by-product. There is also the question of what primitive types can we clip? We will consider line clipping and polygon clipping. A line clipping algorithms takes as input two endpoints of line segment and returns one (or more) line segments. A polygon clipper takes as input the vertices of a polygon and returns one (or more) polygons. There are other issues in clipping and some of these are: Text character clipping Scissoring -- clips the primitive during scan conversion to pixels Bit (Pixel) block transfers (bitblts/pixblts) o Copy a 2D array of pixels from a large canvas to a destination window o Useful for text characters, pulldown menus, etc. 7.3 Line Clipping Line clipping is the process of removing lines or portions of lines outside of an area of interest. Typically, any line or part thereof which is outside of the viewing area is removed. This section treats clipping of lines against rectangles. Although there are specialized algorithms for rectangle and polygon clipping, it is important to note that other graphic primitives can be clipped by repeated application of the line clipper. 62 7.3.1 Clipping Individual Points Before we discuss clipping lines, let's look at the simpler problem of clipping individual points. If the x coordinate boundaries of the clipping rectangle are Xmin and Xmax, and the y coordinate boundaries are Ymin and Ymax, then the following inequalities must be satisfied for a point at (X,Y) to be inside the clipping rectangle: Xmin < X < Xmax and Ymin < Y < Ymax If any of the four inequalities does not hold, the point is outside the clipping rectangle. 7.3.2 Simultaneous Equations To clip a line, we need to consider only its endpoints, not its infinitely many interior points. If both endpoints of a line lie inside the clip rectangle, the entire line lies inside the clip rectangle and can be trivially accepted. If one endpoint lies inside and one outside, the line intersects the clip rectangle and we must compute the intersection point. If both endpoints are outside the clip rectangle, the line may or may not intersect with the clip rectangle, and we need to perform further calculations to determine whether there are any intersections. The brute-force approach to clipping a line that cannot be trivially accepted is to intersect that line with each of the four clip-rectangle edges to see whether any intersection points lie on those edges; if so, the line cuts the clip rectangle and is partially inside. For each line and clip-rectangle edge, we therefore take the two mathematically infinite lines that contain them and intersect them. Next, we test whether this intersection point is "interior" -- that is, whether it lies within both the clip rectangle edge and the line; if so, there is an intersection with the clip rectangle. 7.3.3 Cohen-Sutherland Line Clipping The Cohen-Sutherland algorithm clips a line to an upright rectangular window. The algorithm extends window boundaries to define 9 regions: top-left, top-center, top-right, center-left, center, center-right, bottom-left, bottom-center, and bottom-right. See figure 1 below. These 9 regions can be uniquely identified using a 4 bit code, often called an outcode. We'll use the order: left, right, bottom, top (LRBT) for these four bits. In particular, for each point 63 Left (first) bit is set to 1 when p lies to left of window Right (second) bit is set to 1 when p lies to right of window Bottom (third) bit is set to 1 when p lies below window Top (fourth) bit set is set to 1 when p lies above window The LRBT (Left, Right, Bottom, Top) order is somewhat arbitrary, but once an order is chosen we must stick with it. Note that points on the clipping window edge are considered inside (the bits are left at 0). Figure 1: The nine region defined by an upright window and their outcodes. Given a line segment with end points flow of the Cohen-Sutherland algorithm: and , here's the basic 1. Compute 4-bit outcodes LRBT0 and LRBT1 for each end-point 2. If both outcodes are 0000, the trivially visible case, pass end-points to draw routine. This occurs when the bitwise OR of outcodes yields 0000. 3. If both outcodes have 1's in the same bit position, the trivially invisible case, clip the entire line (pass nothing to the draw routine). This occurs when the bitwise AND of outcodes is not 0000. 4. Otherwise, the indeterminate case, - line may be partially visible or not visible. Analytically compute the intersection of the line with the appropriate window edges Let's explore the indeterminate case more closely. First, one of two end-points must be outside the window, pretend it is . 1. Read P1's 4-bit code in order, say left-to-right. 2. When a set bit (1) is found, compute intersection point I of corresponding window edge with line from p0 to p1. 64 As an example, pretend the right bit is set so we want to compute the intersection with the right clipping window edge, also, pretend we've already done the homogeneous divide, so the right edge is x=1, and we need to find y. The y value of the intersection is found by substituting x=1 into the line equation (from p0 to p1) and solving for y Other cases are handled similarly. 7.3.4 Liang-Barsky Line Clipping Liang and Barsky have created an algorithm that uses floating-point arithmetic but finds the appropriate end points with at most four computations. This algorithm uses the parametric equations for a line and solves four inequalities to find the range of the parameter for which the line is in the viewport. Let be the line which we want to study. The parametric equation of the line segment from gives x-values and y-values for every point in terms of a parameter that ranges from 0 to 1. The equations are 65 and We can see that when t = 0, the point computed is P(x1,y1); and when t = 1, the point computed is Q(x2,y2). Algorithm 1. Set and 2. Calculate the values of tL, tR, tT, and tB (tvalues). o o if or ignore it and go to the next edge otherwise classify the tvalue as entering or exiting value (using inner product to classify) o if t is entering value set ; if t is exiting value set 3. If then draw a line from (x1 + dx*tmin, y1 + dy*tmin) to (x1 + dx*tmax, y1 + dy*tmax) 4. If the line crosses over the window, you will see (x1 + dx*tmin, y1 + dy*tmin) and (x1 + dx*tmax, y1 + dy*tmax) are intersection between line and edge. Example 1 - Line Passing Through Window The next step we consider if tvalue is entering or exiting by using inner product. 66 (Q-P) = (15+5,9-3) = (20,6) At left edge (Q-P)nL = (20,6)(-10,0) = -200 < 0 entering so we set tmin = 1/4 At right edge (Q-P)nR = (20,6)(10,0) = 200 > 0 exiting so we set tmax = 3/4 Because then we draw a line from (-5+(20)*(1/4), 3+(6)*(1/4)) to (5+(20)*(3/4), 3+(6)*(3/4)) Example 2 - Line Not Passing Through Window The next step we consider if tvalue is entering or exiting by using inner product. (Q-P) = (2+8,14-2) = (10,12) At top edge (Q-P)nT = (10,12)(0,10) = 120 > 0 exiting so we set tmax = 8/12 At left edge (Q-P)nL = (10,12)(-10,0) = -100 < 0 entering so we set tmin = 8/10 Because tmin > tmax then we don't draw a line. 67 7.4 Viewing When we define an image in some world coordinate system, to display that image we must map the image to the physical output device. This is a two stage process. For 3 dimensional images we must first determine the 3D camera viewpoint, called the View Reference Point (VRP) and orientation. Then we project from 3D to 2D, since our display device is 2 dimensional. Next, we must map the 2D representation to the physical device. We will first discuss the concept of a Window on the world (WDC), and then a Viewport (in NDC), and finally the mapping WDC to NDC to PDC. 7.4.1 Window When we model an image in World Device Coordinates (WDC) we are not interested in the entire world but only a portion of it. Therefore we define the portion of interest which is a polygonal area specified in world coordinates, called the "window". Example: Want to plot x vs. cos(x) for x between 0.0 and 2Pi. Now cos x will be between -1.0 and +1.0. So we want the window as shown here. The command to set a window is Set_window2( Xwmin, Xwmax, Ywmin, Ywmax ). So for plot above use the following command: Set_window2(0, 6.28, -1.0, +1.0 ) We can use the window to change the apparent size and/or location of objects in the image. Changing the window affects all of the objects in the image. These effects are called "Zooming" and "Panning". a) Zooming Assume you are drawing a house: 68 Now increase the window size and the house appears smaller, i.e., you have zoomed out: Set_window( -60, +60, -30, +30 ) If you decrease the window size the house appears larger, i.e., you have zoomed in: Set_window( -21, +21, -11, +11 ) So we can change the apparent size of an image, in this case a house, by changing the window size. b) Panning What about the position of the image? A. Set_window(-40, +20,-15,+15) B. Set_window(-20,+40,-15,+15) Moving all objects in the scene by changing the window is called "panning". 7.4.2 Viewport The user may want to create images on different parts of the screen so we define a viewport in Normalized Device Coordinates (NDC). Using NDC also allows for output device independence. Later we will map from NDC to Physical Device Coordinates (PDC). 69 Normalized Device Coordinates: Let the entire display surface have coordinate values 0.0 <= x,y <= 1.0 Command: Set_viewport2(Xvmin,Xvmax,Yvmin,Yvmax) Examples To draw in bottom 1/2 of screen Set_viewport2( 0.0, 1.0, 0.0, 0.5) To draw in upper right hand corner: Set_viewport2( 0.5, 1.0, 0.5, 1.0 ) We can also display multiple images in different viewports: Set_window( -30, +30, -15, +15); Set_viewport(0.0, 0.5, 0.0, 0.5); -- lower left Draw_house; Set_viewport(0.5, 1.0, 0.0, 0.5); -- lower right Draw_house; Set_viewport(0.0, 0.5, 0.5, 1.0); -- upper left Draw_house; Set_viewport( 0.5, 1.0, 0.5, 1.0); -- upper right Draw_house; This is gives the image as shown here. 70 7.4.3 2D Window To Viewport Transformation The 2D viewing transformation performs the mapping from the window (WDC) to the viewport (NDC) and to the physical output device (PDC). Usually all objects are clipped to the window before the viewing transformation is performed. We want to map a point from WDC to NDC, as shown below: We can see from above that to maintain relative position we must have the following relationship: X W X WMIN X V X VMIN X WMAX X WMIN X VMAX X VMIN YW YWMIN Y YVMIN V YWMAX YWMIN YVMAX YVMIN We can rewrite above as X V X VMIN X VMAX X VMIN X W X WMIN X WMAX X WMIN S X X W X WMIN X VMIN YV YVMIN YVMAX YVMIN YW YWMIN YWMAX YWMIN S Y YW YWMIN YVMIN where S X X VMAX X WMAX X VMIN Y YVMIN and S Y VMAX X WMIN YWMAX YWMIN 71 Note that Sx, "scaling" factors. Sy are If Sx = Sy the objects will retain same shape, else will be distorted, as shown in the example. 7.4.4 Viewport to Physical Device Transformation Now we need to transform to Physical Device Coordinates (PDC), which we can do by just multiplying the Normalized Device Coordinates (NDC) by the resolution in pixels. Xp = Xv * Xnum Yp = Yv * Ynum Note: Remember the aspect ratio problem, e.g., for CGA mode 6 (640 x 200) => 2.4 horizontal pixels = 1 vertical pixel. Therefore: 200 vertical pixels = 480 horizontal. pixels So use Ynum = 199 {0 => 199} Xnum = 479 {0 =>479} Also have problem with 0, 0 being upper left rather than lower left so actual equation used is: Xp = Xv * Xnum Yp = Ynum - Yv * Ynum As a check if Xv = 0.0 => Xp = 0 ( left ) Xv = 1.0 => Xp = 479 ( right ) Yv = 0.0 => Yp = 199 - 0 = 199 (Bottom) Yv = 1.0 => Yp = 199 - 199 = 0 (Top) 72 7.5 Let us Sum Up In this lesson we have learnt about two dimensional viewing and line clipping. 7.6 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Need of line clipping b) How window to viewport transformation is done 7.7 Points for Discussion Discuss the following a) Liang-Barsky Line Clipping b) Cohen-Sutherland 7.8 Model answers to “Check your Progress” In order to check your progress, try to answer the following a) Define Viewport b) Define Window c) Discuss about window to viewport transformation 7.9 References 1. Chapter 5 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 2. Chapter 6 of Steven Harrington, “Computer Graphics – A programming approach”, McGraw Hill, 1987 3. Chapter 6 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 4. Chapter 5 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 5. Chapter 5 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 73 LESSON – 8: GUI AND INTERACTIVE IMPORT METHODS CONTENTS 8.1 Aims and Objectives 8.2 Introduction 8.3 Modes of Input 8.3.1 Request Mode 8.3.2 Sample Mode 8.3.3 Event Mode 8.4 Classes of Logical Input 8.4.1 Locator 8.4.2 Pick 8.4.3 Choice 8.4.4 Valuator 8.4.5 String 8.4.6 Stroke 8.5 Software Techniques 8.5.1 Locating 8.5.2 Modular Constraints 8.5.3 Directional Constraints 8.5.4 Gravity Field Effect 8.5.5 Scales and Guidelines 8.5.6 Rubber-banding 8.5.7 Menus 8.5.8 Dragging 8.5.9 Inking-in 8.6 Let us Sum Up 8.7 Lesson-end Activities 8.8 Points for Discussion 8.9 Model answers to “Check your Progress” 8.10 References 74 8.1 Aims and Objectives The aim of this lesson is to learn the concept of GUI and interactive methods of graphics. The objectives of this lesson are to make the student aware of the following concepts a) Modes of input b) Classes of logical input c) And Software techniques. 8.2 Introduction Most of the programs written today are interactive to some extent. The days when programs were punched on cards and left in a tray to be collected and run by the computer operators, who then returned cards and printout to the users' pigeonholes several hours later, are now past. `Batch-processing', as this rather slow and tedious process was called, may be a very efficient use of machine time but it is very wasteful of programmers' time, and as the cost of hardware falls and that of personnel raises so installations move from batch to interactive use. Interactive use generally results in a less efficient use of the mainframe computer, but gives the programmer a much faster response time, and so speeds up the development of software. If you are not sharing a mainframe, but using your own microcomputer, then for most of the time the speed is limited by the human response time, not that of the computer. When you come to graphics programs, there are some additional modes of graphics input in addition to the normal interactive input you have used before. For example, GKS has three modes of interactive input and six classes of logical input. These are described here, since they are typical of the type of reasoning required to write such programs. 8.3 Modes of Input 8.3.1 Request Mode This is the mode you will find most familiar. The program issues a request for data from a device and then waits until it has been transferred. It might do this by using a `Read' statement to transfer characters from the keyboard, in which case the program will pause in its execution and wait until the data has been typed on the keyboard and the return key pressed to indicate the end of the input. The graphical input devices such as mouse, cursor or digitizing tablet can also be programmed in this way. 75 8.3.2 Sample Mode In this case the input device sends a constant stream of data to the computer and the program samples these values as and when it is ready. The excess data is overwritten and lost. A digitising tablet may be used in this way - it will continually send the latest coordinates of its puck position to the buffer of the serial port and the program may copy values from this port as often as needed. 8.3.3 Event Mode This is similar to sample mode, but no data is lost. Each time the device transmits a value, the program must respond. It may do so by placing the value in an event queue for later processing, so that the logic of the program is very similar to sample mode, but there may also be some data values which cause a different response. This type of interrupt can be used to provide a very powerful facility. 8.4 Classes of Logical Input 8.4.1 Locator This inputs the (x,y) coordinates of a position. It usually comes from a cursor, controlled either by keys or by a mouse, and has to be transferred from Device Coordinates to Normalised Device Coordinates to World Coordinates. If you have several overlapping viewports, they must be ordered so that the one with the highest priority can be used to calculate these transformations. Each pixel position on the screen must correspond to a unique value in world coordinates. It need not remain the same throughout the running of the program, since the priorities of the viewports may be changed. At every moment there must be a single unambiguous path from cursor position to world coordinates. 8.4.2 Pick This allows the user to identify a particular object or segment from all those displayed on the screen. It is usually indicated by moving the cursor until it coincides with the required object, and then performing some other action such as pressing a mouse button or a key on the keyboard to indicate that the required object is now identified. The value transferred to this program is usually a segment identifier. 8.4.3 Choice This works in a very similar manner to the pick input. You now have a limited set of choices, as might be displayed in a menu, and some means of indicating your choice. Only one of the limited list of choices is acceptable as input, any attempt to choose some other segment displayed on the screen will be ignored. 76 8.4.4 Valuator This inputs a single real number by some means, the simplest method being typing it in from the keyboard. 8.4.5 String This inputs a string of characters, again the simplest method is to type them in from the keyboard. 8.4.6 Stroke This inputs a series of pairs of (x,y) coordinates. The combination of Stroke, Input and Sample Mode from a digitising tablet is a very fast method of input. Most of the terminals or microcomputers you will meet will have some form of cursor control for graphic input. You can write your programs using which ever combination of logical input class and mode is most convenient. Alternatively, you could ignore all forms of graphic input and merely rely on `Read' statements and data typed from the keyboard. The choice is yours. 8.5 Software Techniques 8.5.1 Locating Probably you have all used software in which the cursor is moved around the screen by means of keys or a mouse. The program may well give the impression that the cursor and mouse are linked together so that any movement of the mouse is automatically indicated by movement of the cursor on the screen. In fact, this effect is achieved by means of a graphics program which has to read in the new coordinates indicated by the mouse, delete the previous drawing of the cursor and then redraw it at the new position. This small program runs very quickly and gives the impression of a continuous process. Usually this software also contains a test for input from the keyboard and when a suitable key is pressed, the current position of the cursor is recorded. This allows fast input of a number of points to form a picture or diagram on the screen. Some means of storing the data and terminating the program is also required. Such points are recorded to the full accuracy of the screen, which has both advantages and disadvantages. If you are using a digitising tablet instead of a mouse, then the accuracy is even greater and the resulting problems even more extreme. You very seldom want to record information to the nearest 0.1mm, usually to the nearest millimetre is quite sufficient. Problems arise when you want to select the same point a second time. Whatever accuracy you have chosen, you must be able to indicate the point to this accuracy in order to reselect it, as you might need to do if you had several lines meeting 77 at a point. To achieve this more easily, software involving the use of various types of constraint may be used to speed up the input process. 8.5.2 Modular Constraints In this case, you should imagine a grid, which may be visible or invisible, placed across the screen. Now, whenever you indicate a position with the cursor, the actual coordinates are replaced by the coordinates of the nearest point on the grid. So to indicate the same point a second time, you merely have to get sufficiently close to the same grid point. Provided the grid allows enough flexibility to choose the shapes required in the diagram, this gives much faster input. 8.5.3 Directional Constraints These can be useful when you want some lines to be in a particular direction, such as horizontal or vertical. You can write software to recalculate the coordinates so that a line close to vertical becomes exactly vertical. You can choose whether this is done automatically for every line within a few degrees of vertical or only applied when requested by the user. If the constraint is applied automatically, then you can choose how close the line must be to the required direction before it is moved and how the recalculation is computed. You may wish to move both vertices by a small amount, or one vertex by a larger amount, and if you are only moving one, you must specify some rule or rules to decide which one. 8.5.4 Gravity Field Effect The name implies that the line should be visualised as lying at the bottom of a gravity well and points close to the line slide down on to it. As each line is added to the diagram, a small area is defined which surrounds it. When a new point is defined which lies inside the area, its actual coordinates are replaced by the coordinates of the nearest point on the line. There are two commonly used shapes for this area. In each case, along most of the line, two parallel lines are drawn, one each side of the line and a small distance t from it. In the one shape, each vertex at the end of the line is surrounded by a semi-circle of radius t. In the other shape, each vertex is surrounded by a circle of radius greater than t, giving a dumb-bell shape to the entire area. This second case expresses the philosophy that users are much more likely to want to connect other lines to the vertices than to points along the line. 8.5.5 Scales and Guidelines Just as you may use a ruler when measuring distances on a piece of paper, so you may wish to include software to calculate and display a ruler on the screen. The choice of scales and the way in which the ruler is positioned on the screen must be decided when the software is written. 78 8.5.6 Rubber-banding This is name given to the technique where a line connects the previous point to the present cursor position. This line expands or contracts like a rubber band as the cursor is moved. To produce this effect, the lines must be deleted and re-drawn whenever the cursor is moved. 8.5.7 Menus Many programs display a menu of choices somewhere on the screen and allow the user to indicate a choice of option by placing the cursor over the desired symbol. Alternatively, the options could be numbered and the choice could be indicated by typing a number on the keyboard. In either case, the resulting action will depend on the program. 8.5.8 Dragging Many software packages provide a selection of commonly used shapes, and allow the user to select a shape and use the cursor to drag a copy of the shape to any required position in the drawing. Some packages continually delete and redraw the shape as it is dragged, others only redraw it when the cursor halts or pauses. 8.5.9 Inking-in Another type of software imitates the use of pen or paintbrush in leaving a track as it is drawn across the paper. These routines allow the user to set the width and colour of the pen and some also allow patterned `inks' in two or more colours. Then as the cursor is moved across the screen, a large number of coordinates are recorded and the lines joining these points are drawn as required. All these techniques may be coded, using a combination of graphical input and output. The success of such software depends very much on the user-interface. If it is difficult or inconvenient to use, then as soon as something better comes along, the previous software will be ignored. When designing your own graphical packages, you need to have a clear idea of the purpose for which your package is designed and also the habits and experience of the users for whom it is intended. 8.6 Let us Sum Up In this lesson we have learnt about GUI and interactive import methods 8.7 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. 79 a) Explain about pointing b) Explain about inking 8.8 Points for Discussion Discuss the following a) Locator b) Stroke 8.9 Model answers to “Check your Progress” In order to check your progress, try to answer the following a) Modes of input b) Classes of logical input 8.10 References 1. Chapter 11, 12, 13, 14 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 2. Chapter 7 of Steven Harrington, “Computer Graphics – A programming approach”, McGraw Hill, 1987 3. Chapter 8 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 4. Chapter 3 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 5. Chapter 8, 9, 10 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 80 UNIT – III LESSON – 9: THREE DIMENSIONAL CONCEPTS CONTENTS 9.1 Aims and Objectives 9.2 Introduction 9.3 Descriptions of 3D Objects 9.4 Three-dimensional Drawings 9.4.1 Intensity Cues 9.4.2 Hidden-line and hidden-surface removal 9.4.3 Kinetic Depth Effect 9.4.4 Perspective Projections 9.4.5 Stereographic Projection 9.5 Projections into Two-dimensional Space 9.5.1 Parallel Projections 9.5.2 Isometric Projection 9.5.3 Perspective Projections 9.6 Let us Sum Up 9.7 Lesson-end Activities 9.8 Points for Discussion 9.9 Model answers to “Check your Progress” 9.10 References 81 9.1 Aims and Objectives The aim of this lesson is to learn the concept of three dimensional graphics The objectives of this lesson are to make the student aware of the following concepts a) Description of 3D objects b) Issues in 3D drawings c) Projections 9.2 Introduction In the following sections, we shall discuss the topics as though we were dealing with idealised mathematical objects, points with position but no size and lines and planes of zero thickness. Obviously this does not correspond to the real world where even the thinnest plane is hundreds of atoms in thickness. However the ideas can be developed without bothering about the effects of thickness, the need to specify whether we are discussing the centre of the line or one of its outer edges, and these extra complications can be considered later. When we come to draw the resulting diagrams on paper or a computer screen, we shall have to move from the mathematical ideal to a pattern of lines of known thickness or of pixels on a screen which may be interpreted by those looking at them as a representation of the mathematical ideal. In addition, we are attempting to represent the idea of three-dimensional objects by a pattern of lines or dots in two-dimensions. There are certain well-known techniques (you can call them tricks if you are feeling unkind) which fool the human eye into imagining a solid three-dimensional object. These will also be discussed in this section. When dealing with three-dimensional graphics, three coordinates are needed to specify a point uniquely. These are usually the coordinates (x,y,z) relating to a set of Cartesian coordinates, but this is not essential. For example, polar coordinates may be used and values quoted for latitude, longitude and radius. These will give a unique position for each point and once again three values are needed to specify it. 82 Right and Left Handed Axes The three-dimensional Cartesian coordinates may be either right-handed or lefthanded. To visualize this, you should hold up your right or left hand, with the thumb and first two fingers at right angles to each other and this will demonstrate the direction of these axes. When we come to use these coordinates on a computer terminal, most systems still use the two-dimensional version with the origin in the bottom left-hand corner of the screen, the x-axis running from left to right along the bottom of the screen and the y-axis from bottom to top up the left-hand side of the screen. A right-handed set of axes then has the z-axis coming out of the screen towards the user and a left-handed set has the z-axis going into the screen away from the user. Most software then has some means of reducing the three-dimensional object to a two-dimensional drawing in order to view the object. Possible means are to ignore the zvalue, thus giving a parallel or orthogonal projection onto the screen, to calculate a perspective projection onto the screen or to take a slice through the object in the plane of the screen. The projections may or may not have any provision for hidden-line or hiddensurface removal. 9.3 Descriptions of 3D Objects. Consider a simple example of an object in three-dimensions, namely the cube shown below: 83 Object in 3 Dimensions To store sufficient information to produce this wire-frame drawing of the object, you need to store the coordinates of the vertices and some means of indicating which vertices are connected by straight lines. So one essential requirement is an array containing the coordinates of the vertices and in the example given, the coordinates describe a cube centred on the origin. To obtain any other cube, you need to apply shift, scale and rotation transformations to the vertices before re-drawing the cube in its new position. These transformations will be discussed in the next section. Array of coordinates [x,y,z] for each vertex. Name. Index. x y A 1 -1 +1 B 2 +1 +1 C 3 +1 -1 D 4 -1 -1 E 5 -1 +1 F 6 +1 +1 G 7 +1 -1 H 8 -1 -1 z -1 -1 -1 -1 +1 +1 +1 +1 The remaining information can be coded in many ways, one of which is as an array of edges. For each vertex, you store the index numbers of the other vertices to which it is joined and in this case when all edges are straight lines, there is no need to store information on the type of curve used to join the vertices. Array of edges. From. To. 1 2 2 *1 3 *2 4 *1 4 3 4 *3 5 6 7 8 84 5 6 7 8 *1 *2 *3 *4 6 *5 *6 *5 8 7 8 *7 The first row of this array indicates that vertex 1(A) is joined to vertex 2(B), to vertex 4(D) and to vertex 5(E). The second row indicates that vertex 2(B) is joined to vertex 1(A), which is certainly correct, but wasteful, as it implies that this line is drawn twice. Examination of the array shows that the same is true of all other lines in the diagram. To save time by only drawing it once, you need to draw only those cases where the order of the vertices is increasing, that is you omit all the connections marked with a star when drawing the object. These two arrays are sufficient to produce a wire-frame drawing such as that shown in the above figure. However if you wish to discuss solid faces, or use any form of shading or texturing, you will need to move to a more complex representation such as that for boundary representation models used in geometric modelling. In this case, the following data structure is appropriate. 1) Vertex. Set of [x,y,z] coordinates. 2) Edge. Start and end vertices. Equation of curve joining them. 3) Loop. List of edges making up the loop. Direction in which they are drawn. 4) Face. List of loops forming boundary. Equation of surface. Direction of outward normal(s). 5) Shell. List of faces making up the shell. 6) Object. List of shells making up surface of object. One is identified as the `outer shell'. When you come to consider the cube, this is a very simple object. All the edges are straight lines and all the faces are planes. If you choose to define the loops as the square outline made up of 4 edges, then each face has one loop as its boundary. Alternatively, you could have two edges to a loop and then each face would require two loops to specify its boundary. When you come to study geometric modelling, you will find that there are often several equally correct solutions to any given problem. Vertices. The array of coordinates for the 8 vertices has 85 already been described. Edges. There are 12 edges, all straight lines joining pairs of vertices. They may be traversed in either direction. Loops. You may choose to specify 6 loops, each consisting of 4 edges. As an example, the top face may be bounded by the loop consisting of edges [ AE, EF, FB, BA ]. You will again find that each edge is traversed twice, once in each direction, in the complete description of the object. Faces. All solutions will have 6 faces and in one choice of loop, each face will be bounded by one loop. It is usual to adopt a standard convention connecting the direction of circulation of the loops and the directions of the outward-facing normals to the face. In this example, each face will be a plane and will have a single direction for its normal. Shell. This consists of the 6 faces. Object. This is made up of the single shell and the volume contained within it. 9.4 Three-dimensional Drawings When a three-dimensional object is represented by a two-dimensional drawing, various techniques may be used to indicate depth. We shall think initially of wire-frame drawings, but many of the same ideas can apply to shaded drawings of solid objects. 9.4.1 Intensity Cues The points or lines closer to the viewpoint appear brighter if drawn on the screen and are drawn with thicker lines when output to the plotter. A shaded drawing on the screen can adjust the intensity, pixel by pixel, giving a result similar to a grey-scale photograph. 9.4.2 Hidden-line and hidden-surface removal Hidden lines may be removed or indicated with dotted lines, thus leading to an easier understanding of the shape. 9.4.3 Kinetic Depth Effect Rotation of the object, combined with hidden-line removal gives a very realistic effect. It is probably the best representation, but can only be produced at a specialpurpose graphics workstation since it requires considerable computing power to carry out the hidden-line calculations in real time. 86 9.4.4 Perspective Projections If we have some means of knowing the relative size of the objects, then the fact that the perspective transformation makes the closer objects appear larger will give a good effect of depth. If the objects are easily recognized then knowledge about their relative sizes (e.g. a hill is usually larger than a house or a tree) will be interpreted as information about their distance from the viewer. It is only when we have a number of objects, such as cubes or spheres, which are completely separate in space and we have no information on their relative size, that the perspective transformation cannot be interpreted by the viewer in this way. 9.4.5 Stereographic Projection In this case, we have two perspective projections, one for each eye. We need some method of ensuring that each eye sees only its own view and then we can rely on the human brain to merge the views together and let us see a three-dimensional object. One method is to produce separate views at the correct distance and scale for use with a stereographic viewer. This allows for black-and-white or colour drawings to be seen in their true colour. The other method is necessarily polychrome. It requires two perspective projections, one from each eye position. Let us assume the view for the left eye is drawn in one colour (e.g.blue) and the view for the right eye is drawn in another colour (e.g. red). Each eye must see only its own view. So if the view from the left eye is drawn in blue, and the right eye views the drawings through a blue filter then the blue lines will be invisible to the right eye since they will blend into the white background when viewed through a blue filter. 87 Similarly if the drawing for the right eye is in red, and the left eye has a filter of the same colour, then the drawing for the right eye will be invisible to the left eye. In the figure, the eyes are assumed to be distance 2e apart (usually about 3 inches) and the plane onto which the pictures are projected is distance d from the eyes (frequently 12 to 15 inches). So for the left eye, we need to move the axes a distance e in the xdirection and then project onto the plane, and finally shift the drawing back again. Thus the projection for the left eye means that the point (x,y,z) becomes the point ((x+e)*d/z e, y*d/z, 0). For the right eye, the axes must be moved a distance -e and then the point (x,y,z) is projected onto the plane and becomes ((x-e)*d/z + e, y*d/z , 0) When this has been done for all the vertices, they are joined up and the object is drawn in the appropriate colours. 9.5 Projections into Two-dimensional Space We are dealing with objects defined in three-dimensional space, but all the graphics devices to which we have access are two-dimensional. This means that we require some way of representing (i.e drawing) the three-dimensional objects in twodimensions in order to see the results. It is possible to calculate the intersection of any plane with the object and draw a sucession of slices, but it is usually easier to understand what is going on if we calculate the projection of the three dimensional object on to a given plane and then draw its projection. There are two types of projection, parallel (usually orthographic) and perspective. We shall discuss both of these in the remainder of this chapter and also consider some other ways of giving the impression of a three-dimensional object in two-dimensions. One of the most important is the removal of hidden lines or surfaces and this is discussed in another section 9.5.1 Parallel Projections The simplest example of an orthographic projection occurs when you project onto the plane z=0. You achieve this by ignoring the values of the z-coordinates and drawing the object in terms of its x and y coordinates only. In general, an orthographic projection is carried out by drawing lines normal to the specified plane from each of the vertices of the object and the projected points are the intersections of these lines with the plane. Then the projected vertices are joined up to give the two-dimensional drawing. (It is also possible to project the drawing onto a plane which is not orthogonal (at right angles) to the direction of projection.) 88 Since calculating the intersections of general lines and planes is somewhat tedious, you may instead apply transformations to the objects so that the plane onto which you wish to project the drawing becomes the plane z=0. Then your final transformation into two-dimensions is obtained by discarding the z-coordinates. A simple example of an orthographic projection is shown in the figure below. Orthographic Projection There are several types of `axonometric' or parallel projections commonly in use. Let us look at some of them: Trimetric: Dimetric: Isometric: Here the coordinate axes remain orthogonal when projected. Two of the three axes are equally foreshortened when projected. All three axes are equally foreshortened when projected. The diagram below shows an example of a surface drawn using an isometric projection. It used a right-handed set of axes with the Ox axis to the right and inclined at 30degrees to the horizontal, the Oy axis to the left and inclined at the same angle, while the Oz axis is vertically upwards. The Oz axis has been drawn at the edges of the picture to avoid over-writing the graph. 89 Example of an Isometric Drawing of a Surface 9.5.2 Isometric Projection There are three stages in this projection. 1). Rotate through angle A about Oy axis. 2). Rotate through angle B about Ox axis. 3). Project onto plane z=0. After this transformation, the unit vectors along the three axes must still be equal in length. T= T= 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 X cosA 0 sinBsinA cosB 0 0 0 0 1 0 0 cosB 0 sinB 0 0 0 -sinB cosB 0 0 0 0 1 X cosA 0 sinA 0 1 0 -sinA 0 cosA 0 0 0 0 0 0 1 sinA 0 -sinBcosA 0 0 0 0 1 Apply this transformation to the three unit vectors, namely xT=(1,0,0,1), yT=(0,1,0,1) and zT = (0,0,1,1) and you get the vectors (cosA,sinBsinA,0,1), (0,cosB,0,1) and (sinA,-sinBcosA,0,1). The magnitudes of the three vectors must be equal after the transformation, which gives us the following equations for the length L L = SQRT( cos 2A + sin 2A.sin2B +1 ) = SQRT( sin2A + sin2B.cos2A + 1) = SQRT (cos2B + 1) These equations can be re-arranged and solved for A and B. Eventually they give: 90 B = 35.26439 degrees, since sinB = 1/3 and A = 45 degrees, since cosA = 1/SQRT(2) From these values of A and B we can calculate the transformation matrix T= 0.7071 0.4082 0.0 0.0 0.0 0.8166 0.0 0.0 0.7071 -0.4082 0.0 0.0 0.0 0.0 0.0 1.0 which is the transformation matrix for an isometric projection. 9.5.3 Perspective Projections Perspective projections are often preferred because they make the more distant objects appear smaller than those closer to the viewpoint. They involve more calculation than parallel projections, but are often preferred for their greater realism. Note that a parallel projection may be considered as a special case of the perspective projection where the viewpoint is at infinity. A perspective transformation produces a projection from a viewpoint E onto a given plane. Because you can always move the axes to ensure that the plane coincides with z=0 and the normal from the plane through the point E lies along the z-axis, you may restrict the discussion to this simple case. Perspective Projection The above figure shows an example of the perspective projection from the point E at (0,0,-d) to the z=0 plane. The projection is obtained by joining E to each vertex in turn and finding the intersection of this line with the plane z=0. The vertices are then joined by straight lines to give the wire-frame drawing of the object in the plane. 91 This method of drawing the object, makes use of some of the well-known properties of perspective projections, namely that straight lines are projected into straight lines and facets ( A facet is a closed sequence of co-planar line segments, a polygon in other words) are projected into facets. Parallel sets of lines may be projected into a set of parallel lines or into a set of lines meeting at the `vanishing point'. We may consider the equation of the projection either as the result of the transformation matrix or derive it from the following diagram Consider the diagram first. This shows the y=0 plane with the Ox and Oz axes. The point of projection is E at the point (0,0,-d) on the z-axis, so the distance OE is of length d. The point P (with values x and -z) projects into the point P' while the point Q (with values X and Z) projects into the point Q'. From the first set of similar triangles, we can see that d/x' = (d-z)/x and so x'=d*x/(d-z) From the second set of similar triangles, we can see that d/X'=(d+Z)/x and so X'=d*X/(d+Z) Thus if we are careful to take the correct sign for z in each case, we can quote the general rule: x' = d*x/(d+z) and we have a similar position for the y-coordinate when looking at the x=0 plane. Now let us turn to the transformation matrix. In this case it becomes, X 1 0 0 0 Y = 0 1 0 0 Z 0 0 0 0 H 0 0 1/d 1 which gives the four equations. x y z 1 92 X=x Y=y Z=0 H = (z+d)/d To get back to the homogeneous coordinates, we need to make H=1 and so we have to divide throughout by (z+d)/d. This gives: X = d*x/(z+d), Y = d*y/(z+d), Z=0 and H=1 Hence we get the same expression for this derivation. The closer the point of projection, E, is to the object, the more widely divergent are the lines from E to the vertices and the greater the change in size of the projected object. Conversely, the further away we move E, the closer the lines get to a parallel set and the smaller the change in size of the object. Thus we may think of the parallel projection as being an extreme case of perspective when the point of projection E is an infinite distance from both the object and the plane. This perspective projection is an example of a `single-point perspective' and the consequence of this is shown in the next figure. The one set of parallel lines forming edges of the cube meet at the vanishing point, while the other sets meet at infinity (i.e. they remain parallel). The transformation matrix for this projection may be written in the form given below, where r = 1/d. T1 = 1 0 0 0 0 1 0 0 0 0 0 r 0 0 0 1 93 When we come to deal with two or three point perspectives, then we have two or three sets of parallel lines meeting at their respective vanishing points. The matrices for these are given below: T2 = 1 0 0 0 0 1 0 q 0 0 0 r 0 0 0 1 1 0 0 p T3 = 0 1 0 q 0 0 0 r 0 0 0 1 The following figure shows an example of a three-point perspective. We now have enough information to specify the form of a general transformation matrix. T= t{11} t{21} t{31} t{41} t{12} t{22} t{32} t{42} t{13} t{23} t{33} t{43} t{14} t{24} t{34} t{44} This divides into four areas, each of which relates to a different form of transformation. T= T1 shear, scale & rotate T2 usually zero | | | | T3 shift . T4=1 T2 is zero for all affine transformations and when we are dealing with perspective projections, the number of non-zero elements in T2 will tell us whether it is a one-, twoor three-point perspective. 94 9.6 Let us Sum Up In this lesson we have learnt about three dimensional concepts, object representation and projections 9.7 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Define three types of projections b) How to draw 3D objects in 2D screen 9.8 Points for Discussion Discuss the following a) How to represent 3D objects b) Perspective Projections 9.9 Model answers to “Check your Progress” In order to check your progress try to answer the following a) Issues in three-dimensional Drawings b) Parallel Projections 9.10 References 1. Chapter 20, 21, 22 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 2. Chapter 8 of Steven Harrington, “Computer Graphics – A programming approach”, McGraw Hill, 1987 3. Chapter 9, 10, 11, 12 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 4. Chapter 7 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 5. Chapter 6 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 6. Computer Graphics by Susan Laflin. August 1999. 95 LESSON – 10: POLYGONS, CURVED LINES AND CONTENTS 10.1 Aims and Objective 10.2 Introduction 10.2.1 Intersection Test 10.2.2 Angle Test 10.3 Linear Algorithm for Polygon Shading 10.4 Floodfill Algorithm for Polygon Shading 10.5 Polygon in detail 10.6 Plane Equations 10.7 Polygon meshes 10.8 Curved lines and surfaces 10.9 Let us Sum Up 10.10 Lesson-end Activities 10.11 Points for Discussion 10.12 Model answers to “Check your Progress” 10.13 References 96 LESSON 10 POLYGONS, CURVED LINES AND SURFACES 10.1 Aims and Objectives The aim of this lesson is to learn the concept of polygons, curved lines and surfaces. The objectives of this lesson are to make the student aware of the following concepts 10.2 Introduction A polygon is an area enclosed by a sequence of linear segments. There is no restriction on the complexity of the shape produced by these segments, but the last point must always be connected to the first one giving a closed boundary. This differs from Polyline which may produce an open curve with the first and last points being any distance apart. Since a polygon defines an area, then it is possible to decide whether any other point is inside or outside the polygon and there are two very simple tests to determine this. The word polygon is a combination of two Greek words: "poly" means many and "gon" means angle. Along with its angles, a polygon also has sides and vertices. "Tri" means "three," so the simplest polygon is called the triangle, because it has three angles. It also has three sides and three vertices. A triangle is always coplanar, which is not true of many of the other polygons. A regular polygon is a polygon with all angles and all sides congruent, or equal. Here are some regular polygons. We can use a formula to find the sum of the interior angles of any polygon. In this formula, the letter n stands for the number of sides, or angles, that the polygon has. sum of angles = (n – 2)180° 97 Let's use the formula to find the sum of the interior angles of a triangle. Substitute 3 for n. We find that the sum is 180 degrees. This is an important fact to remember. sum of angles = (n – 2)180° = (3 – 2)180° = (1)180° = 180° To find the sum of the interior angles of a quadrilateral, we can use the formula again. This time, substitute 4 for n. We find that the sum of the interior angles of a quadrilateral is 360 degrees. sum of angles = (n – 2)180° = (4 – 2)180° = (2)180° = 360° Polygons can be separated into triangles by drawing all the diagonals that can be drawn from one single vertex. Let's try it with the quadrilateral shown here. From vertex A, we can draw only one diagonal, to vertex D. A quadrilateral can therefore be separated into two triangles. If you look back at the formula, you'll see that n – 2 gives the number of triangles in the polygon, and that number is multiplied by 180, the sum of the measures of all the interior angles in a triangle. Do you see where the "n – 2" comes from? It gives us the number of triangles in the polygon. How many triangles do you think a 5-sided polygon will have? Here's a pentagon, a 5-sided polygon. From vertex A we can draw two diagonals which separates the pentagon into three triangles. We multiply 3 times 180 degrees to find the sum of all the interior angles of a pentagon, which is 540 degrees. 98 sum of angles = (n – 2)180° = (5 – 2)180° = (3)180° = 540° The GKS Fillarea function has the same parameters as polyline, but will always produce a closed polygon. The filling of this polygon depends on the setting of the appropriate GKS parameter. The FillAreaStyles are hollow, solid, pattern and hatch. The hollow style produces a closed polygon with no filling. Solid fills with a solid colour. Pattern uses whatever patterns are offered by the particular system. Hatch will fill it with lines in one or two directions. Algorithms for hatching and cross-hatching are described in this section. 10.2.1 Intersection Test Consider the situation illustrated in the figure below. To determine which of the points Pi, (i=1,2 or 3) lie inside the polygon, it is necessary to draw a line from Pi in some direction and project it beyond the area covered by the polygon. If the number of intersections of the line from Pi with the sides of the polygon is an even number, then the point lies outside the polygon. (Note that the line must start at Pi and may not extend back behind Pi - it is an half-infinite line from Pi). This means that the triangle to the right of the line Q4 Q5 in the figure counts as outside the polygon. If you wished to make it doubly inside, you would have to introduce a parameter equal to the minimum number of intersections of all half-infinite lines through Pi. Intersection Test It is then easy to see that the figure gives values of 0, 2 or 4 for lines through P1 with a minimum of 0. For P2, there are values of 1 or 3 with the minimum=1. All lines from P3 have a value 2. 99 One possible problem arises when lines pass through a vertex. The line P2 Q5 must count the vertex Q5 as two intersections while P2 Q2 must only count Q2 once. The easy way to avoid this problem is to omit all lines which pass through vertices. This still leaves plenty of lines to test the position. 10.2.2 Angle Test Here the point Pi is joined to all the vertices and the sum of the angles Qk Pi Q(k+1) is calculated. If counter-clockwise angles are positive and clockwise ones are negative, then for a point Pi outside the polygon there will be some positive angles and some negative and the resulting sum will be zero. For a point Pi inside the polygon, the points will be either all positive or all negative and the sum will have a magnitude of 360 degrees. The next figure illustrates this for the same polygon as the previous figure and a point P inside the triangle (but outside the polygon). Angle Test Here the sum of angles is 2 * 360 degrees, thus implying that it is doubly inside the polygon. To give consistency with the Intersection Test, this test must be carefully worded. Having evaluated the sum of angles, it will be n * 360 degrees. If n is an even number, then the point P lies outside the polygon, while if n is an odd number, then P lies inside the polygon. Once a unique method of deciding whether a point is inside or outside a polygon has been agreed, it then becomes possible to derive algorithms to shade the inside of a polygon. The two main methods here are linear, which is similar to shading the polygon by drawing parallel lines with a crayon, and floodfill, which is similar to starting with a blob of wet paint at some interior point and spreading it out to fill the polygon. 100 10.3 Linear Algorithm for Polygon Shading Hatching a Triangle This involves shading the polygon by drawing a series of parallel lines throughout the interior. The lines may be close enough to touch, giving a solid fill or they may be a noticeable distance apart, giving a hatched fill. If you have two sets of hatched lines at right angles to each other, this gives a "cross-hatched" result. The figure shows a triangle in the process of being hatch-filled with horizontal lines. For each horizontal scan-line, the following process must be applied. 1. Assume (or check) that the edge of the screen is outside the polygon. 2. Calculate the intersections Pi of this horizontal line with each edge of the polygon and store the coordinates of these intersections in an array. 3. Sort them into increasing order of one coordinate. 4. Draw the segments of the hatch-line from P1 to P2, P3 to P4 and so on. Do not draw the intervening segments. 5. Repeat this process for each scan-line. 101 Problems at Vertices (Note that this does not rely on the lines being horizontal, although scan-lines parallel to one of the axes makes the calculation of the intersection points very much easier). The figure shows one problem with this approach. The scan-line s1 will work correctly since it has four distinct intersections, but the scan-line s2 has two coincident intersection points at the vertex Q6. This is detectable since the number of intersection points will be an odd number. Looking at the vertices, you can see that moving from Q5 to Q6, y decreases as x decreases and from Q6 to Q1, y increases as x decreases. In this case, the hatch lines have the equation y=constant and so this reversal in the direction of y indicates a vertex which must be included twice and consequently known as a Type 2 vertex. Q1 on the other hand is a Type 1 vertex since y continues to increase when going from Q6 through Q1 to Q2. If the shading uses vertical lines (x = constant) then it is necessary to study the behaviour of x to determine the types of vertex. If you have an odd number of intersections and only one of them coincides with a vertex, then it is usually safe to assume that this value needs to be included twice. This may save some time in your algorithm, and will shade most polygons successfully. The full method, testing the type of vertex whenever a vertex is included in the intersection list, will successfully shade even the few cases when two Type Two vertices appear in the intersection list thus giving an even number of points and at least one segment incorrectly drawn. The other problem case occurs when one of the sides of the polygon is parallel to the direction of shading. Mathematically this has an infinite number of intersection points, but computationally only the two end points should be entered in the array so that the whole line is shaded as part of the interior. 10.4 Floodfill Algorithm for Polygon Shading This works in terms of pixels and is applied after the lines forming the boundary have been converted into pixels by a DDA algorithm. The background of the screen has 102 one pixel-value, called "old-value" and the points forming the boundary have another, called "edge-value". The aim of the algorithm is to change all interior pixels from "oldvalue" to "new-value". (e.g. from black to red) Assume the following software is available: a) A function Read-Pixel(x,y) which takes device coordinates (x,y) and returns the value of the pixel at this position. b) A routine Write-Pixel(x,y,p) which sets the new value p to the pixel at the position (x,y) in device coordinates. Then, starting at the designated seed point, the algorithm moves out from it in all directions, stopping when an "edge value" is found. Each pixel with value "old value" is changed to "new-value". The recursive method stops when all directions have come to an "edge value". Because this method is applied to pixels on the screen or in the display buffer, it may run into problems arising from the quantization into pixels of a mathematical line which is infinitely thin and recorded to the full accuracy of a floating-point number within the computer. Intersecting Lines One such problem concerns the method of identifying an intersection of two lines. If you calculate it mathematically, then the equation will give a correct result unless the lines are parallel or nearly parallel. On the other hand, on some hardware it may be quicker to check whether the two lines have any pixels in common and this can be dangerously misleading in some cases. The previous figure shows two lines, one at an angle of 45o and the other at an angle of 135o which cross near the centre of the diagram without having any pixels in common. This type of problem is unlikely to affect the Floodfill routine given above, since the scan-lines move parallel to the x and y axes and the DDA algorithm described earlier ensures that every line has at least one pixel illuminated on each scan-line. 103 However the next figure illustrates another possible problem. Note that in a complex polygon with sides crossing each other, you will need one seed point in each section of the interior to floodfill the whole area. This also occurs in a polygon as shown in below, even though it does not have any of its sides, indicated by the lines in the figure, crossing each other. Quantisation of Pixels Mathematically it is all one contiguous area and any tests on the equations of the sides for intersections will confirm this. However two of the lines are nearly parallel and very close together and consequently although both lines are quite separate and distinct in their mathematical equations, they lead to the same row of pixels after quantisation. The scale of this figure has been enlarged so that the quantisation into pixels appears very coarse in order to emphasize this problem. This polygon will require two seed points in order to shade it completely using the Floodfill algorithm. A little thought will allow you to produce many other similar examples and these can readily be studied by drawing the polygons on squared paper and then marking in the pixel patterns which result. This approach remains of interest in spite of its problems because some terminals provide a very fast hardware polygon fill from a given seed point. Similarly, some microcomputers provide a function to fill the interior of a triangle. To use this facility, you must first split the polygon into triangles and while this is easy for a convex polygon (one whose internal angles are all less than 180o) it is very much more difficult for the general case where you may have sides crossing each other and holes inside the polygon. 10.5 Polygon in detail What is a Polygon? A closed plane figure made up of several line segments that are joined together. The sides do not cross each other. Exactly two sides meet at every vertex. 104 Types of Polygons Regular - all angles are equal and all sides are the same length. Regular polygons are both equiangular and equilateral. Equiangular - all angles are equal. Equilateral - all sides are the same length. Convex - a straight line drawn through a convex polygon crosses at most two sides. Every interior angle is less than 180°. Concave - you can draw at least one straight line through a concave polygon that crosses more than two sides. At least one interior angle is more than 180°. Polygon Formulas (N = # of sides and S = length from center to a corner) Area of a regular polygon = (1/2) N sin(360°/N) S2 Sum of the interior angles of a polygon = (N - 2) x 180° The number of diagonals in a polygon = 1/2 N(N-3) The number of triangles (when you draw all the diagonals from one vertex) in a polygon = (N - 2) Polygon Parts Side - one of the line segments that make up the polygon. Vertex - point where two sides meet. Two or more of these points are called vertices. Diagonal - a line connecting two vertices that isn't a side. Interior Angle - Angle formed by two adjacent sides inside the polygon. 105 Exterior Angle - Angle formed by two adjacent sides outside the polygon. Special Polygons Special Quadrilaterals - square, rhombus, parallelogram, rectangle, and the trapezoid. Special Triangles - right, equilateral, isosceles, scalene, acute, obtuse. http://www.math.com/school/subject3/lessons/S3U2L2GL.htmlPolygon Names Generally accepted names Sides n Name N-gon 3 Triangle 4 Quadrilateral 5 Pentagon 6 Hexagon 7 Heptagon 8 Octagon 10 Decagon 12 Dodecagon Names for other polygons have been proposed. Sides 9 Name Nonagon, Enneagon 11 Undecagon, Hendecagon 13 Tridecagon, Triskaidecagon 14 Tetradecagon, Tetrakaidecagon 15 Pentadecagon, Pentakaidecagon 16 Hexadecagon, Hexakaidecagon 17 Heptadecagon, Heptakaidecagon 18 Octadecagon, Octakaidecagon 19 Enneadecagon, Enneakaidecagon 20 Icosagon 30 Triacontagon 106 40 Tetracontagon 50 Pentacontagon 60 Hexacontagon 70 Heptacontagon 80 Octacontagon 90 Enneacontagon 100 Hectogon, Hecatontagon 1,000 Chiliagon 10,000 Myriagon To construct a name, combine the prefix+suffix Sides Prefix 20 Icosikai... Sides Suffix +1 ...henagon 30 Triacontakai... +2 ...digon 40 Tetracontakai... +3 ...trigon 50 Pentacontakai... 60 Hexacontakai... 70 Heptacontakai... +6 ...hexagon 80 Octacontakai... +7 ...heptagon 90 Enneacontakai... +8 ...octagon + +4 ...tetragon +5 ...pentagon +9 ...enneagon Examples: 46 sided polygon - Tetracontakaihexagon 28 sided polygon - Icosikaioctagon However, many people use the form n-gon, as in 46-gon, or 28-gon instead of these names. 10.6 Plane Equations This is another useful way to describe planes. It is known as the cartesian form of the equation of a plane because it is in terms of the cartesian coordinates x, y and z. The working below follows on from the pages in this section on finding vector equations of planes and equations of planes using normal vectors. The form Ax + By + Cz = D is particularly useful because we can arrange things so that D gives the perpendicular distance from the origin to the plane. 107 To get this nice result, we need to work with the unit normal vector. This is the vector of unit length which is normal to the surface of the plane. (There are two choices here, depending on which direction you choose, but one is just minus the other). I'll call this unit normal vector n. Next we see how using n will give us D, the perpendicular distance from the origin to the plane. In the picture below, P is any point in the plane. It has position vector r from the origin O. Now we work out the dot product of r and n. This gives us r.n = |r||n|cos A. But |n| = 1 so we have r.n = |r|cos A = D. This will be true wherever P lies in the plane. Next, we split both r and n into their components. We write r = xi + yj + zk and n = n1i + n2j + n3k. Therefore r.n = (xi + yj + zk) . (n1i + n2j + n3k) = D so r.n = xn1 + yn2 + zn3 = D. We see that n1, n2 and n3 (the components of the unit surface normal vector) give us the A, B and C in the equation Ax + By + Cz = D. A numerical example I've put this in here so that you can see everything actually happening and see how it ties back to the earlier pages in this section. We start with the plane I show below. We'll let s = i - 6j + 2k and t = 2i - 2j - k 108 We'll take m, the position vector of the known point M in the plane, to be m = 2i + 3j + 5k. P is any point in the plane, with OP = r = xi + yj + zk. First, we find N, a normal vector to the plane, by working out the cross product of s and t. This gives s x t = 10i + 5j + 10k = N. The length of this vector is given by the square root of (102 + 52 + 102) = 15. So the unit normal vector, n, is given by n = 1/15(10i + 5j + 10k) = 2/3i +1/3j + 2/3k. Now we use n.r = n.m = D to get the equation of the plane. This gives us (2/3i +1/3j + 2/3k).(xi + yj + zk) = (2/3i +1/3j + 2/3k).(2i + 3j + 5k) or 2/3x + 1/3y + 2/3z = 4/3 + 3/3 + 10/3 = 17/3. The perpendicular distance of this plane from the origin is 17/3 units. So what would have happened if we had found the equation of the plane using the first normal vector we found? Using N.r = N.m gives (10i + 5j + 10k).(xi + yj + zk) = (10i + 5j + 10k).(2i + 3j + 5k) or 10x + 5y + 10z = 20 + 15 + 50 = 85. It is exactly the same equation as the one we found above except that it is multiplied through by a factor of 15, and 85 gives us 15 times the perpendicular distance of the origin from the plane. Also, are you confident that you will get the same equation for the plane if you start out with the position vector of a different known point in it? 109 The point L also lies in this plane. Its position vector l is given by l = 7i - 7j + 5k. Check that working with l instead of m does give you the same equation for the plane. Geometrically, you can see that this will be so. L and M are both just possible positions of P, so that both n.l and n.m give the distance D. Try one for yourself! The two vectors s = 4i + 3k and t = 8i - j + 3k lie in plane Q. The point M also lies in Q and its position vector from the origin is given by m = 2i + 4j + 7k. Show that the perpendicular distance of the origin to this plane is 2 units and find its equation. The general case This is how the working goes with letters taking the place of the numbers we have used in the numerical example. m is the position vector of the known point in the plane. n is the unit surface normal to the plane. We'll let m = x0i + y0j + z0k and n = Ai + Bj + Ck. The position vector of the general point P in the plane is given by r = xi + yj + zk where the values of x, y and z vary according to the particular P chosen. Now we use n.r = n.m = D to write down the equation of the plane. This gives us (Ai + Bj + Ck) . (xi + yj + zk)= (Ai + Bj + Ck) . (x0i + y0j + z0k). = D so Ax + By + Cz = Ax0 + By0 + Cz0 = D or, if you prefer, you can write A(x-x0) + B(y-y0) + A(z-z0) = 0. 110 If you have found a normal vector which is not of unit length, you will first need to scale it down. Suppose you have found N = N1i + N2j + N3k. Then the length of N is given by and n, the unit normal vector, is given by Now, putting n = Ai + Bj + Ck, we have 10.7 Polygon meshes A polygon mesh or unstructured grid is a collection of vertices and polygons that defines the shape of an polyhedral object in 3D computer graphics. Meshes usually consist of triangles, quadrilaterals or other simple convex polygons, since this simplifies rendering, but they can also contain objects made of general polygons with optional holes. Example of a triangle mesh representing a dolphin. Examples of internal representations of an unstructured grid: Simple list of vertices with a list of indices describing which vertices are linked to form polygons; additional information can describe a list of holes List of vertices + list of edges (pairs of indices) + list of polygons that link edges Winged edge data structure 111 The choice of the data structure is governed by the application: it's easier to deal with triangles than general polygons, especially in computational geometry. For optimized algorithms it is necessary to have a fast access to topological information such as edges or neighboring faces; this requires more complex structures such as the winged-edge representation. 10.8 Curved lines and surfaces Curved surfaces are one of the most popular ways of implementing scalable geometry. Games applying curved surfaces look fantastic. UNREAL's characters looked smooth whether they are a hundred yards away, or coming down on top of you. QUAKE 3: ARENA screen shots show organic levels with stunning smooth, curved walls and tubes. There are a number of benefits to using curved surfaces. Implementations can be very fast, and the space required to store the curved surfaces is generally much smaller than the space required to store either a number of LOD models or a very high detail model. The industry demands tools that can make creation and manipulation of curves more intuitive. A Bezier curve is a good starting point, because it can be represented and understood with a fair degree of ease. To be more specific, we choose cubic Bezier curves and bicubic Bezier patches for the reason of simplicity. Bezier Curves A cubic Bezier curve is simply described by four ordered control points, p0, p1, p2, and p3. It is easy enough to say that the curve should "bend towards" the points. It has three general properties: 1. The curve interpolates the endpoints: we want the curve to start at p0 and end at p3. 2. The control points have local control: we'd like the curve near a control point to move when we move the control point, but have the rest of the curve not move as much. 3. The curve stays within the convex hull of the control points. It can be culled against quickly for visibility culling or hit testing. A set of functions, called the Bernstein basis functions, satisfy the three general properties of cubic Bezier curves. If we were considering general Bezier curves, we'd have to calculate n choose i. Since we are only considering cubic curves, though, n = 3, and i is in the range [0,3]. Then, we further note the n choose i is the ith element of the nth row of Pascal's traingle, {1,3,3,1}. This value is hardcoded rather than computed in the demo program. 112 Bezier Patches Since a Bezier curve was a function of one variable, f(u), it's logical that a surface would be a function of two variables, f(u,v). Following this logic, since a Bezier curve had a one-dimentional array of control points, it makes sense that a patch would have a two-dimensional array of control points. The phrase "bicubic" means that the surface is a cubic function in two variables - it is cubic along u and also along v. Since a cubic Bezier curve has a 1x4 array of control points, bicubic Beizer patch has a 4x4 array of control points. To extend the original Bernstein basis function into two dimension, we evaluate the influence of all 16 control points: The extension from Bezier curves to patches still satisfies the three properties: 1. The patch interpolates p00, p03, p30, and p33 as endpoints. 2. Control points have local control: moving a point over the center of the patch will most strongly affect the surface near that point. 3. The patch remains within the convex hull of its control points. 10.9 Let us Sum Up In this lesson we have learnt about a) Polygon surfaces b) Curved lines and surfaces 10.10 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Intersection Test b) Angle Test 10.11 Points for Discussion Discuss the following a) Importance of Polygon surface in Computer Graphics b) Discuss about drawing curved lines 113 10.12 Model answers to “Check your Progress” In order to check your progress try to answer the following questions a) Plane Equations b) Polygon meshes 10.13 References 1. Chapter 21 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 2. Chapter 11 of Steven Harrington, “Computer Graphics – A programming approach”, McGraw Hill, 1987 3. Chapter 10 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 4. Chapter 12 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 5. Chapter 11 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 114 LESSON – 11: SURFACE DETECTION METHODS CONTENTS 11.1 Aims and Objectives 11.2 Introduction 11.3 Definition of an H. S. R. Algorithm 11.4 Taxonomy of Hidden Surface Removal Algorithms 11.5 Back face detection 11.6 Depth-Buffer Method 11.7 Let us Sum Up 11.8 Lesson-end Activities 11.9 Points for Discussion 11.10 Model answers to “Check your Progress” 11.11 References 115 LESSON 11 SURFACE DETECTION METHODS 11.1 Aims and Objectives The aim of this lesson is to learn the concept of surface detection methods The objectives of this lesson are to make the student aware of the following concepts a) classification of surface detection algorithms b) Back face detection and c) Depth buffer algorithms 11.2 Introduction Computer Graphics attempts to represent objects in the general three-dimensional universe. Most objects are not transparent and so we are interested in their outer surfaces, which have properties such as shape, colour and texture which affect the graphical representation. A wire-frame drawing of a solid object is less realistic because it includes parts of the object which are hidden in reality, and this generates a need for some form of hidden-line or hidden-surface removal. It is important to realise that there is no single algorithm which works equally well in all cases. Most algorithms achieve greater speed and efficiency by taking the format of the data into account and this automatically restricts their use. Since the amount of data needed to store the position of every point on the surface of even quite a small object is impossibly large, we have to make some simplifying assumptions. The choice of these simplifications will decide the form of data structure used to store the objects and will also restrict the choice of hidden-surface algorithm available. A typical set of simplifying assumptions might be those given below. a) Divide the surface of the object into a number of faces surrounded by "boundary curves" or "contours". The contours may be any closed curves and the faces may be curved, so some means of specifying the equations of the surfaces is needed. b) Restrict the description to allow only flat or planar faces. The contours must now be closed polygons in the plane. (Since two planes must intersect in a straight line, an object without any holes must have its edge curves made up of straight lines.) c) Subdivide the polygons until they are all convex. d) Subdivide the polygons until the object is described in terms of triangular facets. At each simplification, the amount of data needed to describe one face is reduced. This should also reduce the time taken for the related calculations. However some objects require many more faces to give an acceptable approximation to the object. A simple 116 example of an object which requires very many triangular facets to give an acceptable approximation is a sphere. 11.3 Definition of an H. S. R. Algorithm One of the earliest attempts to produce a rigorous definition of these algorithms occurs in the text by Giloi. Since it is still relevant, let us consider it: An hidden-surface algorithm is said to consist of five components, namely: O S I T M The set of objects in 3D space (input data). The set of objects in 2D space (results). The set of intermediate representations (workspace). Some algorithms require little or no intermediate storage. These representations, if used, may be in either 2D or 3D. The set of transition functions, usually implemented as subroutines or procedures. The following five transition functions will be required in some form. PM = Projective Mapping. IS = Inter-Section Function. CT = Containment Test. DT = Depth Test. VT = Visibility Test. Strategy Function or Overall Method. This specifies the order in which the transition functions are applied to the input data to produce the results and it may also include instructions to display the results. In fact this may be an over simplification, since we frequently find that the transition functions are used in combination with each other. For example, to decide whether one point in 3D space is hidden by another, it will usually be necessary to apply a combination of both Projective Mapping and Depth Test. Projective Mapping may be either Perspective or Orthogonal. Consider the situation where we have a view point V on one side of the objects to be drawn and are projecting them on to a plane on the far side of the objects. Now assume we have rotated the entire figure so that the plane is z=0, the viewpoint is on the z-axis (coordinates 0,0,Z), and all other z values lie between 0 and Z. Now let us consider two overlapping graphical elements, to determine which of the two is closer to the viewpoint and hence which covers the other when the diagram is drawn. In the following discussion, it is assumed that the graphical elements are plane polygonal facets and each vertex of one facet is tested against all vertices of the other. a) Perspective projection. 117 Consider any two points P1 and P2. P2 is hidden by P1 if and only if (i) V, P1 and P2 are co-linear. (ii) P1 is closer to V. i.e. VP1 < VP2. Consider the test-point P1 (usually a vertex of the first facet F1) and facet F2. Connect the viewpoint V and test-point P1 and calculate P2, the intersection of the line VP1 (continued if necessary) and the facet F2. Calculate the lengths of VP1 and VP2. Then if VP2 is greater than VP1, P1 is not hidden by F2. If VP2 is less than VP1 then P1 is hidden by F2. If VP1 = VP2, then the two points coincide and we may choose which of them to consider visible. b) Orthogonal Projection. Again the viewer is at the height V and looking down on the plane z=0, but now all the lines are parallel. Indeed for an orthogonal projection, they are all perpendicular to the plane z=0 and so parallel to the z-axis. So the point P2 is hidden by P1 if and only if (i) The x and y coordinates of P1 and P2 are equal. (ii) The z-coordinate P1(z) > P2(z). This is equivalent to moving the point V a very large distance from the plane z=0 ("V tends to infinity"). Consider the projection onto the plane z=0 and use the values of z to assign priorities to the faces. In this case, we wish to compare facets F1 and F2. After projection onto the plane z=0, F1 is projected on to S1 and F2 is projected onto S2. The intersection of S1 and S2 is called S. If S is empty, then the projections do not overlap and the priority is irrelevant. 118 Any point (x,y,0), lying in S, corresponds to the point (x,y,z1) in F1 and the point (x,y,z2) in F2. If z1 is greater than z2 for all these points, then "F1 has priority over F2". However if this is true for some points in S and false for others, then the two facets intersect each other and we cannot assign priorities. It will be necessary to calculate the line of intersection of the two facets and split one of them along this line. If F1 is split into F1a and F1b, then we can number them so that F1a has priority over F2 and F2 has priority over F1b. Using these priorities, it is possible to get a unique ordering, showing which facets lie in front of which others and use this to provide the correct output to draw the visible facets. Intersection Function Note that in each of these cases, it was necessary to discuss the intersection of the projection of a vertex of one facet and the projection of the other facet. This may be dealt with by use of the Intersection Function, which defines how to calculate the intersection of two graphic elements. Other examples are the intersection of two lines, the intersection of two segments (lines of fixed length) or the intersection of a line and a plane. In this actual case, it is probably more relevant to use a Containment Test. Containment Test The Containment Test considers the question "Does the point P lie inside the polygon F ?" and returns the result "true" or "false". It is usually applied after projection into two dimensions and so the methods discussed in that section are immediately applicable. Either the angle test or the intersection test may be used. Visibility Tests So far, we have discussed the special case of one or more objects defined as plane facets and considered whether or not one of the facets obscures another. This is a very 119 slow process, especially when all of the very large number of facets have to compared with all the others. There is one very simple consideration which will about halve the number of facets to be considered. If we assume that the facets form the outer surfaces of one or more solid objects, then those facets on the back of the object (relative to the viewing position) cannot be seen and so a test to identify these will remove them from the testing early in the process. This uses a "Visibility Test" which is applied to solid objects, to distinguish between the potentially visible "front faces" and the invisible "back faces". If our picture consists of a single convex object, then all the potentially visible faces are visible and the object may be drawn very quickly and easily. If a perspective projection is being used, then the "line of sight" is the line from the viewpoint V to the point on the surface. However, if a parallel projection is being used, then the relevant line is one parallel to the viewing direction which passes through the point on the surface. In either case, let this direction be denoted by the vector d. The surface normal, denoted by n, is the outward-pointing normal from this point, normal to the surface of the plane. To decide whether the plane is potentially visible or always invisible, it is necessary to consider the angle between d and n and we may use the dot product d.n to decide this. Let A be the angle between these vectors. If A is greater than 90degrees, then the surface is potentially visible, otherwise the surface is invisible. If both vectors are scaled to have unit length, so that we are dealing with direction cosines, then the dot product gives the value of cosA. Thus the face is potentially visible if the dot product is negative. In the above figure, the parallel projection has d = [1,0,0] and face A has outward normal n1 = [-1,2,0] while face B has outward normal n2 = [1,0,0]. The dot product of the visible face A has value -1 and the dot product of the invisible face has the value 1. When the dot product is zero the face is "edge-on" to the viewing direction and may be omitted. 120 Strategy Function or Overall Method. In discussing the method used by any Hidden-surface Removal Algorithm, we shall also need to discuss the input data (O) and the results (S) and indeed it may be useful to classify these algorithms according to the form of input data they can handle. The set (I) of intermediate representations may be important in deciding practical aspects such as the total amount of storage needed by the algorithm, and will be closely connected with the precise form of some of the Transition functions used. Let us consider a number of algorithms, classified according to the form of their input data. Because the same general method may have considerable variation in the details, we shall tend to get groups of algorithms which differ only in their fine detail. 11.4 Taxonomy of Hidden Surface Removal Algorithms The text by Giloi includes a classification based on the form of the input data and provides examples of algorithms for some of these. This classification has been simplified slightly (four classes reduced to three) and the algorithms identified. It is not complete, other algorithms do not fall into these categories and other methods of classifying these algorithms are also possible. a) Class One These include `solids' made up of plane polygonal facets. The resulting object may be represented as a set of `contour lines' and `lines of intersection' or it may be output as a shaded object. e.g. Appel's method, or Watkin's method, or Encarnacao's Priority Method (requires input data as triangles). b) Class Two These are `surfaces' made up of curved faces. The resulting object is represented as a net of grid lines. e.g. Encarnacao's Scan-Grid Method. c) Class Three These are general objects defined analytically. No example of an algorithm for this class is given in Giloi. Alternatively the methods may be grouped according to the type of method. This gives the following: a) Scan-line Methods These include Watkin's method and a number of others. These work in terms of scan lines with the pixel-colour at each point along the line calculated and output. If there 121 is enough storage to hold a copy of the entire screen, instead of just one line across it, we may use a `Z-buffer' algorithm, in which the z value corresponding to each pixel is used to decide on the colour of that pixel. The polygons may be added in any order, but the zvalue is used to decide whether a pixel should be changed or not as the next polygon is added. Again coherence may be used to reduce the number of tests needed. When we come to consider the special case of drawing an isometric projection drawing of a surface (fishnet output), one of the methods of deciding which parts of the drawing should be visible is the `Template method'. Here the order of output is chosen to work from the front of the surface and calculate each section of the drawing in turn. In parallel with this, for each x-value on the screen, a y-value is stored indicating the largest value currently output. This builds up a `template' of the area of screen currently covered by the drawing. New lines are only drawn if they lie outside this area (i.e. if the new yvalues are greater than those previously stored). This allows fast, accurate output of the drawing of the surface. b) List-Priority Methods Depth-sort or Painters' Algorithm. This relies on the polygons for output being sorted into order, so that the polygon furthest from the viewer is output first. It also assumes that output of a second polygon on top of the first will overwrite it and none of the earlier output will remain. This is true on most screens, but not on most printers or plotters. It is similar to the method used by painters in situations where the latest coat of paint conceals the ones below. This may also be used for the case where the output is an image of the isometric projection of a surface. In this case, it is easy to output the patches of the surface with those furthest from the viewpoint being output first and the later ones drawn on top. Another method of this type is Encarnacao's Priority Method . c) Ray-tracing Methods. These use the idea of dropping a line, or ray, from the viewpoint (or eye of the viewer) onto parts of the objects and on to the viewing plane. Appel's method of hidden surface removal introduces the concept of `quantitative invisibility' (counting the number of faces between the surface being tested and the viewer) and uses coherence to reduce the number of tests to give the correct output. d) Methods for Curved Surfaces. One example of such a method is Encarnacao's Scan-Grid Method . 122 11.5 Back face detection A fast and simple object-space method for identifying the back faces of a polyhedron is based on the “inside-outside” tests. A point (x,y,z) is ‘inside’ a polygon surface with plane parameters A, B, C and D if Ax+By+Cz+D<0 When an inside point is along the line of sight to the surface, the polygon must be a back face. If V is a vector in the viewing direction from the eye position and N is the normal vector N to a polygon surface, then the polygon is a back face if V∙N>0 11.6 Depth-Buffer Method A commonly used image space approach to detecting visible surfaces is the depth buffer method, which compares surface depths at each pixel position on the projection plane. A depth buffer is used to store depth values for each (x,y) position as surfaces are processed, and the refresh buffer stores the intensity values for each position. Initially, all positions in the depth buffer are set to 0 (minimum depth) and the refresh buffer is inialialized to the background intensity. Each surface listed in the polygon tables is then processed, one scan line at a time, calculating the depth (z value) at each (x,y) pixel position. The calculated depth is compared to the value previously stored in the depth buffer at that position. If the calculated depth is greater than the value stored in the depth buffer, the new depth value is stored, and the surface intensity at that position is determined and placed in the same xy location in the refresh buffer. Algorithm 1. Initialize the depth buffer and refresh buffer so that for all buffer positions (x,y) Depth(x,y) = 0 Referesh(x,y) = Ibackgnd 2. For each position on each polygon surface, compare depth values to previously stored values in the depth buffer to determine visibility. Calculate the depth z for each (x,y) position on the polygon If z > depth(x,y), then set 123 Depth(x,y) = z, Refersh(x,y) = Isurf(x,y) where Ibackgnd is the value for the background intensity, and Isurf(x,y) is the projected intensity value for the surface at pixel position (x,y). After all surfaces have been processed, the depth buffer contains depth values for the visible surfaces and the refresh buffer contains the corresponding intensity values for those surfaces. 11.7 Let us Sum Up In this lesson we have learnt about various surface detection methods. 11.8 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Classification of surface detection algorithms b) Depth buffer method 11.9 Points for Discussion Discuss the following a) Containment Test b) Visibility Tests 11.10 Model answers to “Check your Progress” In order to check your progress, try to answer the following a) Painters' Algorithm b) Ray-tracing Methods 11.11 References 1. Chapter 24 of William M. Newman, Robert F. Sproull, “Principles of Interactive Computer Graphics”, Tata-McGraw Hill, 2000 2. Chapter 9 of Steven Harrington, “Computer Graphics – A programming approach”, McGraw Hill, 1987 3. Chapter 13 of Donald Hearn, M. Pauline Baker, “Computer Graphics – C Version”, Pearson Education, 2007 4. Chapter 9 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 5. Chapter 15 of J.D. Foley, A.Dam, S.K. Feiner, J.F. Hughes, “Computer Graphics – principles and practice”, Addison-Wesley, 1997 6. Computer Graphics, Susan Laflin. August 1999. 124 UNIT – IV LESSON – 12: INTRODUCTION TO MULTIMEDIA CONTENTS 12.1 Aims and Objective 12.2 Introduction 12.3 History of Multimedia Systems 12.4 Trends in Multimedia 12.5 Applications 12.6 Let us Sum Up 12.7 Lesson-end Activities 12.8 Points for Discussion 12.9 Model answers to “Check your Progress” 12.10 References 125 12.1 Aims and Objectives The aim of this lesson is to learn the concept of multimedia The objectives of this lesson are to make the student aware of the following concepts a) Introduction to multimedia b) History and c) applications 12.2 Introduction Multimedia is the field concerned with the computer-controlled integration of text, graphics, drawings, still and moving images (Video), animation, audio, and any other media where every type of information can be represented, stored, transmitted and processed digitally. A Multimedia Application is an Application which uses a collection of multiple media sources e.g. text, graphics, images, sound/audio, animation and/or video. Hypermedia can be considered as one of the multimedia applications. Multimedia is the combination of text, animated graphics, video, and sound. It presents information in a way that is more interesting and easier to grasp than text alone. It has been used for education at all levels, job training, and games and by the entertainment industry. It is becoming more readily available as the price of personal computers and their accessories declines. Multimedia as a human-computer interface was made possible some half-dozen years ago by the rise of affordable digital technology. Previously, multimedia effects were produced by computer-controlled analog devices, like videocassette recorders, projectors, and tape recorders. Digital technology's exponential decline in price and increase in capacity has enabled it to overtake analog technology. The Internet is the breeding ground for multimedia ideas and the delivery vehicle of multimedia objects to a huge audience. This unit reviews the uses of multimedia, the technologies that support it, and the larger architectural and design issues. Nowadays, multimedia generally indicates a rich sensory interface between humans and computers or computer-like devices--an interface that in most cases gives the user control over the pace and sequence of the information. We all know multimedia when we see and hear it, yet its precise boundaries elude us. For example, movies on demand, in which a viewer can select from a large library of videos and then play, stop, or reposition the tape or change the speed is generally considered multimedia. However, watching the movie on a TV set attached to a videocassette recorder (VCR) with the same abilities to manipulate the play is not considered multimedia. Unfortunately, we have yet to find a definition that satisfies all experts. 126 Recent multimedia conferences, such as the IEEE International Conference on Multimedia Computing and Systems, ACM Multimedia, and Multimedia Computing and Networking, provide a good start for identifying the components of multimedia. The range of multimedia activity is demonstrated in papers on multimedia authoring (i.e., specification of multimedia sequences), user interfaces, navigation (user choices), effectiveness of multimedia in education, distance learning, video conferencing, interactive television, video on demand, virtual reality, digital libraries, indexing and retrieval, and support of collaborative work. The wide range of technologies is evident in papers on disk scheduling, capacity planning, resource management, optimization, networking, switched Ethernet LANs, Asynchronous Transfer Mode (ATM) networking, quality of service in networks, Moving Picture Expert Group (MPEG**) encoding, compression, caching, buffering, storage hierarchies, video servers, video file systems, machine classification of video scenes, and Internet audio and video. Multimedia systems need a delivery system to get the multimedia objects to the user. Magnetic and optical disks were the first media for distribution. The Internet, as well as the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite or Net BIOS on isolated or campus LANs, became the next vehicles for distribution. The rich text and graphics capabilities of the World Wide Web browsers are being augmented with animations, video, and sound. Internet distribution will be augmented by distribution via satellite, wireless, and cable systems. 12.3 History of Multimedia Systems Newspaper were perhaps the first mass communication medium to employ Multimedia -- they used mostly text, graphics, and images. In 1895, Gugliemo Marconi sent his first wireless radio transmission at Pontecchio, Italy. A few years later (in 1901) he detected radio waves beamed across the Atlantic. Initially invented for telegraph, radio is now a major medium for audio broadcasting. Television was the new media for the 20th century. It brings the video and has since changed the world of mass communications. Some of the important events in relation to Multimedia in Computing include: 1945 - Bush wrote about Memex 1967 - Negroponte formed the Architecture Machine Group at MIT 1969 - Nelson & Van Dam hypertext editor at Brown Birth of The Internet 1971 - Email 1976 - Architecture Machine Group proposal to DARPA: Multiple Media 1980 - Lippman & Mohl: Aspen Movie Map 1983 - Backer: Electronic Book 1985 - Negroponte, Wiesner: opened MIT Media Lab 127 1989 - Tim Berners-Lee proposed the World Wide Web to CERN (European Council for Nuclear Research) 1990 - K. Hooper Woolsey, Apple Multimedia Lab, 100 people, educ. 1991 - Apple Multimedia Lab: Visual Almanac, Classroom MM Kiosk 1992 - the first M-bone audio multicast on the Net 1993 - U. Illinois National Center for Supercomputing Applications: NCSA Mosaic 1994 - Jim Clark and Marc Andreesen: Netscape 1995 - JAVA for platform-independent application development. Duke is the first applet. 1996 - Microsoft, Internet Explorer. 12.4 Trends in Multimedia Current big applications areas in Multimedia include World Wide Web -- Hypermedia systems -- embrace nearly all multimedia technologies and application areas. Ever increasing popularity. MBone -- Multicast Backbone: Equivalent of conventional TV and Radio on the Internet. Enabling Technologies -- developing at a rapid rate to support ever increasing need for Multimedia. Carrier, Switching, Protocol, Application, Coding/Compression, Database, Processing, and System Integration Technologies at the forefront of this. 12.5 Applications Examples of Multimedia Applications include: World Wide Web Hypermedia courseware Video conferencing Video-on-demand Interactive TV Groupware Home shopping Games Virtual reality Digital video editing and production systems Multimedia Database systems 128 Multimedia applications are primarily existing applications that can be made less expensive or more effective through the use of multimedia technology. In addition, new, speculative applications, like movies on demand, can be created with the technology. We present here a few of these applications. Video on demand (VOD), also called movies on demand, is a service that provides movies on an individual basis to television sets in people's homes. The movies are stored in a central server and transmitted through a communication network. A set-top box (STB) connected to the communication network converts the digital information to analog and inputs it to the TV set. The viewer uses a remote control device to select a movie and manipulate play through start, stop, rewind, and visual fast forward buttons. The capabilities are very similar to renting a video at a store and playing it on a VCR. The service can provide indices to the movies by title, genre, actors, and director. VOD differs from pay per view by providing any of the service's movies at any time, instead of requiring that all purchasers of a movie watch its broadcast at the same time. Enhanced pay per view, also a broadcast system, shows the same movie at a number of staggered starting times. Home shopping and information systems - Services to the home that provide video on demand will also provide other, more interactive, home services. Many kinds of goods and services can be sold this way. The services will help the user navigate through the available material to plan vacations, renew driver's licenses, purchase goods, etc. Networked games - The same infrastructure that supports home shopping could be used to temporarily download video games with graphic-intensive functionality to the STB, and the games could then be played for a given period of time. Groups of people could play a game together, competing as individuals or working together in teams. Action games would require a very fast, or low-latency , network. Video conferencing - Currently, most video conferencing is done between two specially set-up rooms. In each room, one or more cameras are used, and the images are displayed on one or more monitors. Text, images, and motion video are compressed and sent through telephone lines. Recently, the technology has been expanded to allow more than two sites to participate. Video conferences can also be connected through LANs or the Internet. In time, video conferences will be possible from the home. Education - A wide range of individual educational software employing multimedia is available on CD-ROM. One of the chief advantages of such multimedia applications is that the sequence of material presented is dependent upon the student's responses and requests. Multimedia is also used in the classroom to enhance the educational experience and augment the teacher's work. Multimedia for education has begun to employ servers and networks to provide for larger quantities of information and the ability to change it frequently. Distance learning - Distance learning is a variation on education in which not all of the students are in the same place during a class. Education takes place through a 129 combination of stored multimedia presentations, live teaching, and participation by the students. Distance learning involves aspects of both teaching with multimedia and video conferencing. Just-in-time training - Another variation on education, called just-in-time training, is much more effective because it is done right when it is needed. In an industry context, this means that workers can receive training on PCs at their own workplaces at the time of need or of their choosing. This generally implies storing the material on a server and playing it through a wide-area network or LAN. Digital libraries - Digital libraries are a logical extension of conventional libraries, which house books, pictures, tapes, etc. Material in digital form can be less expensive to store, easier to distribute, and quicker to find. Thus digital technology can save money and provide better capabilities. The Vatican Library has an extraordinary collection of 150 000 manuscripts, including early copies of works by Aristotle, Dante, Euclid, Homer, and Virgil. However, only about 2000 scholars a year are able to physically visit the library in Rome. Thus, the IBM Vatican Library Project, which makes digitized copies of some of the collection available to scholars around the world, is a very valuable service, especially if the copies distributed are of high quality. Virtual reality - Virtual reality provides a very realistic effect through sight and sound, while allowing the user to interact with the virtual world. Because of the ability of the user to interact with the process, realistic visual effects must be created ``on the fly.'' Telemedicine - Multimedia and telemedicine can improve the delivery of health care in a number of ways. Digital information can be centrally stored, yet simultaneously available at many locations. Physicians can consult with one another using video conference capabilities, where all can see the data and images, thus bringing together experts from a number of places in order to provide better care. Multimedia can also provide targeted education and support for the patient and family. 12.6 Let us Sum Up In this lesson we have learnt about a) introduction b) History and c) Applications of multimedia 12.7 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Define multimedia b) Discuss the history of multimedia 130 12.8 Points for Discussion Discuss the following a) Application of multimedia in medicine b) Application of multimedia in education 12.9 Model answers to “Check your Progress” To check your progress, try to answer the following a) Video on demand b) Digital libraries 12.10 References 1. 2. 3. 4. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002 S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995 J.F. Koegel, Multimedia Systems, Pearson Education, 2001 131 LESSON – 13: MULTIMEDIA BUILDING BLOCKS CONTENTS 13.1 Aims and Objectives 13.2 Introduction 13.3 Building blocks of Multimedia 13.4 What is HyperText and HyperMedia? 13.5 Characteristics of a Multimedia System 13.6 Challenges for Multimedia Systems 13.7 Desirable Features for a Multimedia System 13.8 Components of a Multimedia System 13.9 Multimedia technology 13.10 Multimedia architecture 13.11 Let us Sum Up 13.12 Lesson-end Activities 13.13 Points for Discussion 13.14 Model answers to “Check your Progress” 13.15 References 132 LESSON 13 MULTIMEDIA BUILDING BLOCKS 13.1 Aims and Objectives The aim of this lesson is to learn the concept of multimedia building blocks. The objectives of this lesson are to make the student aware of the following concepts a) building blocks b) architecture c) characteristics d) challenges 13.2 Introduction Multimedia is obviously a fertile ground for both research and the development of new products, because of the breadth of possible usage, the dependency on a wide range of technologies, and the value of reducing cost by improving the technology. Now that the technology has been developed, however, the marketplace will determine future direction. The technology will be used when clear value is found. For example, multimedia is widely used on PCs using CDs to store the content. The CDs are inexpensive to reproduce, and the players are standard equipment on most PCs purchased today. The acceptance caused a greater demand for players, which, in turn, caused greater production and further reduced prices. The computer industry is providing demand, and an expanding market, for the key hardware technologies that underlie multimedia. These include solid-state memory, logic, microprocessors, modems, switches, and disk storage. The price declines of 30-60% per year that we have seen for several decades will continue into the foreseeable future. As a result, the application of multimedia, which appears expensive now, will become less expensive and more attractive. An exception to this fast rate of improvement is the cost of data communications. Communications depend both on technology with rapidly decreasing cost and on mundane and basically unchanging tasks such as laying cable with the help of a backhoe or stringing cables from poles. The cost of communication is not likely to decline significantly for quite a while. We feel that multimedia will spread from low-bit-rate to high-bit-rate, and will begin on established intranets first, move to the Internet, and finally be transmitted on broadband connections (ADSL or cable modems) to the home. The initial uses will be information dissemination, education, and training on campus LANs. Multimedia will be used in education, government, and business over campus LANs, with low-bit-rate video that will not place excessive stress on the 133 infrastructure. The availability of switched LAN technology and faster LANs will allow increases in both the bit rate per user and the number of users. As the cost of communications decreases, the cost for Internet attachment for servers will decline, and higher-quality video will be used on the Internet. Multimedia will be a compelling interface for commerce and advertising on the Internet. Eventually, cable modems and/or ADSL will provide bandwidth for movies to the home, and the declining computer and switching costs will allow a cost-effective service. The winner between ADSL and cable modems will have as much to do with the ability of cable companies and RBOCs to raise capital as with the inherent cost and value of the two technologies. IBM researchers continue to play an active role in developing technology, including MPEG encoding and decoding, video servers, delivery systems, digital libraries, applications for indexing and searching for content, and collaboration. Researchers are also engaged in many uses of multimedia technology and in building advanced systems with IBM customers. 13.3 Building blocks of Multimedia The building blocks of multimedia includes a) b) c) d) e) f) Text Vedio Sound Images Animation Hyper Text and Hypermedia 13.4 What is HyperText and HyperMedia? Hypertext is a text which contains links to other texts. The term was invented by Ted Nelson around 1965. Hypertext is therefore usually non-linear (as indicated below). 134 HyperMedia is not constrained to be text-based. It can include other media, e.g., graphics, images, and especially the continuous media - sound and video. Apparently, Ted Nelson was also the first to use this term. The World Wide Web (WWW) is the best example of hypermedia applications. 13.5 Characteristics of a Multimedia System A Multimedia system has four basic characteristics: Multimedia systems must be computer controlled. Multimedia systems are integrated. The information they handle must be represented digitally. The interface to the final presentation of media is usually interactive. 13.6 Challenges for Multimedia Systems Multimedia systems may have to render a variety of media at the same instant -- a distinction from normal applications. There is a temporal relationship between many forms of media (e.g. Video and Audio. These 2 are forms of problems here Sequencing within the media -- playing frames in correct order/time frame in video Synchronisation -- inter-media scheduling (e.g. Video and Audio). Lip synchronisation is clearly important for humans to watch playback of video and audio and even animation and audio. Ever tried watching an out of (lip) sync film for a long time? The key issues multimedia systems need to deal with here are: 135 How to represent and store temporal information. How to strictly maintain the temporal relationships on play back/retrieval What processes are involved in the above. Data has to represented digitally so many initial source of data needs to be digitise -- translated from analog source to digital representation. This will involve scanning (graphics, still images) and sampling (audio/video) although digital cameras now exist for direct scene to digital capture of images and video. 13.7 Desirable Features for a Multimedia System Given the above challenges the following feature a desirable (if not a prerequisite) for a Multimedia System: Very High Processing Power -- needed to deal with large data processing and real time delivery of media. Multimedia Capable File System -- needed to deliver real-time media -- e.g. Video/Audio Streaming. Special Hardware/Software needed e.g RAID technology. Data Representations/File Formats that support multimedia -- Data representations/file formats should be easy to handle yet allow for compression/decompression in real-time. Efficient and High I/O -- input and output to the file subsystem needs to be efficient and fast. Needs to allow for real-time recording as well as playback of data. e.g. Direct to Disk recording systems. Special Operating System -- to allow access to file system and process data efficiently and quickly. Needs to support direct transfers to disk, real-time scheduling, fast interrupt processing, I/O streaming etc. Storage and Memory -- large storage units (of the order of 50 -100 Gb or more) and large memory (50 -100 Mb or more). Large Caches also required and frequently of Level 2 and 3 hierarchy for efficient management. Network Support -- Client-server systems and distributed systems may be supported Software Tools -- user friendly tools needed to handle media, design & develop applications, and deliver media. 136 13.8 Components of a Multimedia System Now let us consider the Components (Hardware and Software) required for a multimedia system: Capture devices -- Video Camera, Video Recorder, Audio Microphone, Keyboards, mice, graphics tablets, 3D input devices, tactile sensors, VR devices. Digitising/Sampling Hardware Storage Devices -- Hard disks, CD-ROMs, Jaz/Zip drives, DVD, etc Communication Networks -- Ethernet, Token Ring, FDDI, ATM, Intranets, Internets. Computer Systems -- Multimedia Desktop machines, Workstations, MPEG/VIDEO/DSP Hardware Display Devices -- CD-quality speakers, HDTV,SVGA, Hi-Res monitors, Colour printers etc. 13.9 Multimedia technology A wide variety of technologies contribute to multimedia. Some of the technologies are going through rapid improvement and deployment because of demand for PCs and workstations. As a result, multimedia benefits from lower-cost, betterperformance microprocessors, memory chips, and disk storage. Other technologies are being developed specifically for multimedia systems. 13.9.1 Networks Telephone networks dedicate a set of resources that forms a complete path from end to end for the duration of the telephone connection. The dedicated path guarantees that the voice data can be delivered from one end to the other end in a smooth and timely way, but the resources remain dedicated even when there is no talking. In contrast, digital packet networks, for communication between computers, use time-shared resources (links, switches, and routers) to send packets through the network. The use of shared resources allows computer networks to be used at high utilization, because even small periods of inactivity can be filled with data from a different user. The high utilization and shared resources create a problem with respect to the timely delivery of video and audio over data networks. Current research centers around reserving resources for timesensitive data, which will make digital data networks more like telephone voice networks. 13.9.2 Internet The Internet and intranets, which use the TCP protocol suite, are the most important delivery vehicles for multimedia objects. TCP provides communication 137 sessions between applications on hosts, sending streams of bytes for which delivery is always guaranteed by means of acknowledgments and retransmission. User Datagram Protocol (UDP) is a ``best-effort'' delivery protocol (some messages may be lost) that sends individual messages between hosts. Internet technology is used on single LANs and on connected LANs within an organization, which are sometimes called intranets, and on ``backbones'' that link different organizations into one single global network. Internet technology allows LANs and backbones of totally different technologies to be joined together into a single, seamless network. Part of this is achieved through communications processors called routers. Routers can be accessed from two or more networks, passing data back and forth as needed. The routers communicate information on the current network topology among themselves in order to build routing tables within each router. These tables are consulted each time a message arrives, in order to send it to the next appropriate router, eventually resulting in delivery. Token ring is a hardware architecture for passing packets between stations on a LAN. Since a single circular communication path is used for all messages, there must be a way to decide which station is allowed to send at any time. In token ring, a ``token,'' which gives a station the right to transmit data, is passed from station to station. The data rate of a token ring network is 16 Mb/s. Ethernet LANs use a common wire to transmit data from station to station. Mediation between transmitting stations is done by having stations listen before sending, so that they will not interfere with each other. However, two stations could begin to send at the same time and collide, or one station could start to send significantly later than another but not know it because of propagation delay. In order to detect these other situations, stations continue to listen while they transmit and determine whether their message was possibly garbled by a collision. If there is a collision, a retransmission takes place (by both stations) a short but random time later. Ethernet LANs can transmit data at 10 Mb/s. However, when multiple stations are competing for the LAN, the throughput may be much lower because of collisions and retransmissions. Switched Ethernet - Switches may be used at a hub to create many small LANs where one large one existed before. This reduces contention and permits higher throughput. In addition, Ethernet is being extended to 100Mb/s throughput. The combination, switched Ethernet, is much more appropriate to multimedia than regular Ethernet, because existing Ethernet LANs can support only about six MPEG video streams, even when nothing else is being sent over the LAN. Asynchronous Transfer Mode(ATM) is a new packet-network protocol designed for mixing voice, video, and data within the same network. Voice is digitized in telephone networks at 64 Kb/s (kilobits per second), which must be delivered with minimal delay, so very small packet sizes are used. On the other hand, video data and other business data usually benefit from quite large block sizes. An ATM packet consists of 48 octets (the term used in communications for eight bits, called a byte in data 138 processing) of data preceded by five octets of control information. An ATM network consists of a set of communication links interconnected by switches. Communication is preceded by a setup stage in which a path through the network is determined to establish a circuit. Once a circuit is established, 53-octet packets may be streamed from point to point. ATM networks can be used to implement parts of the Internet by simulating links between routers in separate intranets. This means that the ``direct'' intranet connections are actually implemented by means of shared ATM links and switches. ATM, both between LANs and between servers and workstations on a LAN, will support data rates that will allow many users to make use of motion video on a LAN. 13.9.3 Data-transmission techniques a) Modems - Modulator/demodulators, or modems, are used to send digital data over analog channels by means of a carrier signal (sine wave) modulated by changing the frequency, phase, amplitude, or some combination of them in order to represent digital data. (The result is still an analog signal.) Modulation is performed at the transmitting end and demodulation at the receiving end. The most common use for modems in a computer environment is to connect two computers over an analog telephone line. Because of the quality of telephone lines, the data rate is commonly limited to 28.8 Kb/s. For transmission of customer analog signals between telephone company central offices, the signals are sampled and converted to ``digital form'' (actually, still an analog signal) for transmission between offices. Since the customer voice signal is represented by a stream of digital samples at a fixed rate (64 Kb/s), the data rate that can be achieved over analog telephone lines is limited. b) ISDN - Integrated Service Digital Network (ISDN) extends the telephone company digital network by sending the digital form of the signal all the way to the customer. ISDN is organized around 64Kb/s transmission speeds, the speed used for digitized voice. An ISDN line was originally intended to simultaneously transmit a digitized voice signal and a 64Kb/s data stream on a single wire. In practice, two channels are used to produce a 128Kb/s line, which is faster than the 28.8Kb/s speed of typical computer modems but not adequate to handle MPEG video. c) ADSL - Asymmetric Digital Subscriber Lines (ADSL) extend telephone company twisted-pair wiring to yet greater speeds. The lines are asymmetric, with an outbound data rate of 1.5 Mb/s and an inbound rate of 64 Kb/s. This is suitable for video on demand, home shopping, games, and interactive information systems (collectively known as interactive television), because 1.5 Mb/s is fast enough for compressed digital video, while a much slower ``back channel'' is needed for control. ADSL uses very high-speed modems at each end to achieve these speeds over twisted-pair wire. 139 ADSL is a critical technology for the Regional Bell Operating Companies (RBOCs), because it allows them to use the existing twisted-pair infrastructure to deliver high data rates to the home. d) Cable systems - Cable television systems provide analog broadcast signals on a coaxial cable, instead of through the air, with the attendant freedom to use additional frequencies and thus provide a greater number of channels than over-the-air broadcast. The systems are arranged like a branching tree, with ``splitters'' at the branch points. They also require amplifiers for the outbound signals, to make up for signal loss in the cable. Most modern cable systems use fiber optic cables for the trunk and major branches and use coaxial cable for only the final loop, which services one or two thousand homes. The root of the tree, where the signals originate, is called the head end. e) Cable modems are used to modulate digital data, at high data rates, into an analog 6MHz-bandwidth TV-like signal. These modems can transfer 20 to 40 Mb/s in a frequency bandwidth that would have been occupied by a single analog TV signal, allowing multiple compressed digital TV channels to be multiplexed over a single analog channel. The high data rate may also be used to download programs or World Wide Web content or to play compressed video. Cable modems are critical to cable operators, because it enables them to compete with the RBOCs using ADSL. f) Set-top box - The STB is an appliance that connects a TV set to a cable system, terrestrial broadcast antenna, or satellite broadcast antenna. The STB in most homes has two functions. First, in response to a viewer's request with the remote-control unit, it shifts the frequency of the selected channel to either channel 3 or 4, for input to the TV set. Second, it is used to restrict access and block channels that are not paid for. Addressable STBs respond to orders that come from the head end to block and unblock channels. g) Admission control - Digital multimedia systems that are shared by multiple clients can deliver multimedia data to a limited number of clients. Admission control is the function which ensures that once delivery starts, it will be able to continue with the required quality of service (ability to transfer isochronous data on time) until completion. The maximum number of clients depends upon the particular content being used and other characteristics of the system. h) Digital watermarks - Because it is so easy to transmit perfect copies of digital objects, many owners of digital content wish to control unauthorized copying. This is often to ensure that proper royalties have been paid. Digital watermarking consists of making small changes in the digital data that can later be used to determine the origin of an unauthorized copy. Such small changes in the digital data are intended to be invisible when the content is viewed. This is very similar to the ``errors'' that mapmakers introduce in order to prove that suspect maps are copies of their maps. In other circumstances, a visible watermark is applied in order to make commercial use of the image impractical. 140 i) Authoring systems - Multimedia authoring systems are used to edit and arrange multimedia objects and to describe their presentation. The authoring package allows the author to specify which objects may be played next. The viewer dynamically chooses among the alternatives. Metadata created during the authoring process is normally saved as a file. At play time, an ``execution package'' reads the metadata and uses it as a script for the playout. Authoring systems, as well as systems for gathering information for multimedia presentations (scanning, classifying, indexing and processing images, audio, and video) are very active research areas. Particularly challenging, and also very useful, are techniques that can be applied to compressed data. Entirely new techniques are required, and the human factors involved in the processing of this new data must be understood. 13.10 Multimedia architecture In this section we show how the multimedia technologies are organized in order to create multimedia systems, which in general consist of suitable organizations of clients, application servers, and storage servers that communicate through a network. Some multimedia systems are confined to a stand-alone computer system with content stored on hard disks or CD-ROMs. Distributed multimedia systems communicate through a network and use many shared resources, making quality of service very difficult to achieve and resource management very complex. • Single-user stand-alone systems - Stand-alone multimedia systems use CD-ROM disks and/or hard disks to hold multimedia objects and the scripting metadata to orchestrate the playout. CD-ROM disks are inexpensive to produce and hold a large amount of digital data; however, the content is static--new content requires creation and physical distribution of new disks for all systems. Decompression is now done by either a special decompression card or a software application that runs on the processor. The technology trend is toward software decompression. • Multi-user systems Video over LANs - Stand-alone multimedia systems can be converted to networked multimedia systems by using client-server remote-file-system technology to enable the multimedia application to access data stored on a server as if the data were on a local storage medium. This is very convenient, because the stand-alone multimedia application does not have to be changed. LAN throughput is the major challenge in these systems. Ethernet LANs can support less than 10 Mb/s, and token rings 16 Mb/s. This translates into six to ten 1.5Mb/s MPEG video streams. Admission control is a critical problem. The OS/2* LAN server is one of the few products that support admission control. It uses priorities with token-ring messaging to differentiate between multimedia traffic and lower-priority data traffic. It also limits the multimedia streams to be sure that they do not sum to more than the capacity of the LAN. Without some type of resource 141 reservation and admission control, the only way to give some assurance of continuous video is to operate with small LANs and make sure that the server is on the same LAN as the client. In the future, ATM and fast Ethernet will provide capacity more appropriate to multimedia. Direct Broadcast Satellite - Direct Broadcast Satellite (DBS), which broadcasts up to 80 channels from a satellite at high power, arrived in 1995 as a major force in the delivery of broadcast video. The high power allows small (18-inch) dishes with line-ofsight to the satellite to capture the signal. MPEG compression is used to get the maximum number of channels out of the bandwidth. The RCA/Hughes service employs two satellites and a backup to provide 160 channels. This large number of channels allows many premium and special-purpose channels as well as the usual free channels. Many more pay-per-view channels can be provided than in conventional cable systems. This allows enhanced pay-per-view, in which the same movie is shown with staggered starting times of half an hour or an hour. DBS requires a set-top box with much more function than a normal cable STB. The STB contains a demodulator to reconstruct the digital data from the analog satellite broadcast. The MPEG compressed form is decompressed, and a standard TV signal is produced for input to the TV set. The STB uses a telephone modem to periodically verify that the premium channels are still authorized and report on use of the pay-per-view channels so that billing can be done. Interactive TV and video to the home - Interactive TV and video to the home allow viewers to select, interact with, and control video play on a TV set in real time. The user might be viewing a conventional movie, doing home shopping, or engaging in a network game. The compressed video flowing to the home requires high bandwidth, from 1.5 to 6 Mb/s, while the return path, used for selection and control, requires far lower bandwidth. The STB used for interactive TV is similar to that used for DBS. The demodulation function depends upon the network used to deliver the digital data. A microprocessor with memory for limited buffering as well as an MPEG decompression chip is needed. The video is converted to a standard TV signal for input to the TV set. The STB has a remote-control unit, which allows the viewer to make choices from a distance. Some means are needed to allow the STB to relay viewer commands back to the server, depending upon the network being used. Cable systems appear to be broadcast systems, but they can actually be used to deliver different content to each home. Cable systems often use fiber optic cables to send the video to converters that place it on local loops of coaxial cable. If a fiber cable is dedicated to each final loop, which services 500 to 1500 homes, there will be enough bandwidth to deliver an individual signal to many of those houses. The cable can also provide the reverse path to the cable head end. Ethernet-like protocols can be used to share the same channel with the other STBs in the local loop. This topology is attractive to cable companies because it uses the existing cable plant. If the appropriate amplifiers 142 are not present in the cable system for the back channel, a telephone modem can be used to provide the back channel. As mentioned above, the asymmetric data rates of ADSL are tailored for interactive TV. The use of standard twisted-pair wire, which has been brought to virtually every house, is attractive to the telephone industry. However, the twisted pair is a more noisy medium than coaxial cable, so more expensive modems are needed, and distances are limited. ADSL can be used at higher data rates if the distance is further reduced. Interactive TV architectures are typically three-tier, in which the client and server tiers interact through an application server. (In three-tier systems, the tier-1 systems are clients, the tier-2 systems are used for application programs, and the tier-3 systems are data servers.) The application tier is used to separate the logic of looking up material in indexes, maintaining the shopping state of a viewer, interacting with credit card servers, and other similar functions from the simple function of delivering multimedia objects. The key research questions about interactive TV and video-on-demand are not computer science questions at all. Rather, they are the human-factors issues concerning ease of the on-screen interface and, more significantly, the marketing questions regarding what home viewers will find valuable and compelling. Internet over cable systems - World Wide Web browsing allows users to see a rich text, video, sound, and graphics interface and allows them to access other information by clicking on text or graphics. Web pages are written in HyperText Markup Language (HTML) and use an application communications protocol called HTTP. The user responses, which select the next page or provide a small amount of text information, are normally quite short. On the other hand, the graphics and pictures require many times the number of bytes to be transmitted to the client. This means that distribution systems that offer asymmetric data rates are appropriate. Cable TV systems can be used to provide asymmetric Internet access for home computers in ways that are very similar to interactive TV over cable. The data being sent to the client is digitized and broadcast over a prearranged channel over all or part of the cable system. A cable modem at the client end tunes to the right channel and demodulates the information being broadcast. It must also filter the information destined for the particular station from the information being sent to other clients. The low-bandwidth reverse channel is the same low-frequency band that is used in interactive TV. As with interactive TV, a telephone modem might be used for the reverse channel. The cable head end is then attached to the Internet using a router. The head end is also likely to offer other services that Internet Service Providers sell, such as permanent mailboxes. This asymmetric connection would not be appropriate for a Web server or some other type of commerce server on the Internet, because servers transmit too much data for the lowspeed return path. The cable modem provides the physical link for the TCP/IP stack in the client computer. The client software treats this environment just like a LAN connected to the Internet. 143 Video servers on a LAN - LAN-based multimedia systems go beyond the simple, client-server, remote file system type of video server, to advanced systems that offer a three-tier architecture with clients, application servers, and multimedia servers. The application servers provide applications that interact with the client and select the video to be shown. On a company intranet, LAN-based multimedia could be used for just-intime education, on-line documentation of procedures, or video messaging. On the Internet, it could be used for a video product manual, interactive video product support, or Internet commerce. The application server chooses the video to be shown and causes it to be sent to the client. There are three different ways that the application server can cause playout of the video: By giving the address of the video server and the name of the content to the client, which would then fetch it from the video server; by communicating with the video server and having it send the data to the client; and by communicating with both to set up the relationship. The transmission of data to the client may be in push mode or pull mode. In push mode, the server sends data to the client at the appropriate rate. The network must have quality-of-service guarantees to ensure that the data gets to the client on time. In pull mode, the client requests data from the server, and thus paces the transmission. The current protocols for Internet use are TCP and UDP. TCP sets up sessions, and the server can push the data to the client. However, the ``moving-window'' algorithm of TCP, which prevents client buffer overrun, creates acknowledgments that pace the sending of data, thus making it in effect a pull protocol. Another issue in Internet architecture is the role of firewalls, which are used at the gateway between an intranet and the Internet to keep potentially dangerous or malicious Internet traffic from getting onto the intranet. UDP packets are normally never allowed in. TCP sessions are allowed, if they are created from the inside to the outside. A disadvantage of TCP for isochronous data is that error detection and retransmission is automatic and required--whereas it is preferable to discard garbled video data and just continue. Resource reservation is just beginning to be incorporated on the Internet and intranets. Video will be considered to have higher priority, and the network will have to ensure that there is a limit to the amount of high-priority traffic that can be admitted. All of the routers on the path from the server to the client will have to cooperate in the reservation and the use of priorities. Video conferencing - Video conferencing, which will be used on both intranets and the Internet, uses multiple data types, and serves multiple clients in the same conference. Video cameras can be mounted near a PC display to capture the user's picture. In addition to the live video, these systems include shared white boards and show previously prepared visuals. Some form of mediation is needed to determine which participant is in control. Since the type of multimedia data needed for conferencing requires much lower data rates than most other types of video, low-bit-rate video, using approximately eight frames per second and requiring tens of kilobits per second, will be 144 used with small window sizes for the ``talking heads'' and most of the other visuals. Scalability of a video conferencing system is important, because if all participants send to all other participants, the traffic goes up as the square of the number of participants. This can be made linear by having all transmissions go through a common server. If the network has a multicast facility, the server can use that to distribute to the participants. 13.11 Let us Sum Up In this lesson we have learnt about multimedia building blocks. 13.12 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Multimedia Architecture b) Multimedia building blocks 13.13 Points for Discussion Discuss the following a) Characteristics of a Multimedia System b) Challenges of Multimedia System 13.14 Model answers to “Check your Progress” In order to check your progress, try to answer the following questions a) Data-transmission techniques b) Desirable Features for a Multimedia System 13.15 References 1. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 2. Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002 3. S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995 4. J.F. Koegel, Multimedia Systems, Pearson Education, 2001Multimedia--An introduction by R. J. Flynn and W. H. Tetzlaff, Volume 42, No 2, 1998, IBM Journal of Research and Development. 145 LESSON – 14: TEXT AND SOUND CONTENTS 14.1 Aims and Objectives 14.2 Introduction to Text 14.3 Multimedia Sound 14.4 The MIDI Format 14.5 The RealAudio Format 14.6 The AU Format 14.7 The AIFF Format 14.8 The SND Format 14.9 The WAVE Format 14.10 The MP3 Format (MPEG) 14.11 Let us Sum Up 14.12 Lesson-end Activities 14.13 Points for Discussion 14.14 Model answers to “Check your Progress” 14.15 References 146 14.1 Aims and Objectives The aim of this lesson is to learn the concept of text and sound in multimedia The objectives of this lesson are to make the student aware of the following concepts a) text b) sound c) sound formats 14.2 Introduction Text is the most widely used and flexible means of presenting information on screen and conveying ideas. The designer should not necessarily try to replace textual elements with pictures or sound, but should consider how to present text in an acceptable way and supplementing it with other media. For a public system, where the eyesight of its users will vary considerably, a clear reasonably large font should be used. Users will also be put off by the display of large amounts of text and will find it hard to scan. To present tourist information about a hotel, for example, information should be presented concisely under clear separate headings such as location, services available, prices, contact details etc. Guidelines Conventional upper and lower case text should be used for the presentation since reading is faster compared to all upper case text. All upper case can be used if a text item has to attract attention as in warnings and alarm messages. The length of text lines should be no longer than around 60 characters to achieve optimal reading speed. Only one third of a display should be filled with text. Proportional spacing and ragged lines also minimizes unpleasant visual effects. 12 point text is the practical minimum to adopt for PC based screens, with the use of 14 point or higher for screens of poorer resolution than a normal desktop PC If the users do not have their vision corrected for VDU use e.g. the public. It is recommended that text of 16 point is preferred if it is to be usable by people with visual impairments. Sentences should be short and concise and not be split over pages. 147 Technical expressions should be used only where the user is familiar with them from their daily routine, and should be made as understandable as possible e.g. "You are now contacting with Paul Andrews" rather than "Connection to Multipoint Control Unit". The number of abbreviations used in an application should be kept to a minimum. They should be used only when the abbreviation is routinely used and where the shorter words lead to a reduction of information density. Abbreviations should be used in a consistent way throughout an entire multimedia application. An explanation of the abbreviations used in the system should be readily available to the user through on-line help facilities or at least through written documentation. Strictly, speaking, text is created on a computer, so it doesn't really extend a computer system the way audio and video do. But, understanding how text is stored will set the scene for understanding how multimedia is stored. Interestingly, when computers were first developed, it was thought that their major use would be processing numbers (called number-crunching). This is not the major use of computers today. Processing words (not called word-crunching!) is the major use. Question: So, how are words stored? Answer: Character by character. Characters can be more than letters - they can be digits, punctuation. Even the carriagereturn when you hit the return key is stored as a character. Computers deal with all data by turning switches off and on in a sequence. We look at this by calling an off switch "0" and and on switch "1". These 0's and 1's are called bits. Everything in a computer is ultimately represented by sequences of 0's and 1's - bits. If the sequence were of length 2, we could have 00, 01, 10, or 11. Four items. Similarly, we find that a sequence of length 3 can represent 8 items (000, 001, 010, ...). A sequence of length 4 can represent 16 things (0000, 0001, 0010, ...). There are about 128 characters that a computer has to store. This should take a sequence of length 7. In reality, 8 bits are used instead of 7 (the 8th bit is used to check on the data). The point to remember here is that: n bits can represent 2^n items 14.3 Multimedia Sound Multimedia Sound is a CD-ROM resource for Physics education. It is a collection of real sounds generated by musical instruments, laboratory sound sources and everyday objects such as glass beakers and plastic straws. Tools provided on the disc allow students to compare and contrast the waveforms and frequency spectra generated by the sound sources, and to measure amplitudes and frequencies as functions of time. 148 The linear dimensions of the sound sources can determined from calibrated photographs enabling investigation of, for example, the relationship between the length of a string or a pipe and the fundamental frequency of the sound it generates. An audio narration describes the key concepts illustrated by each example and a supporting text file provides essential data, poses challenging questions and suggests possible investigations. Additional features of the disc include: a six component sound synthesiser which students can use to generate their own sound samples; the facility to import sounds recorded with a microphone plugged into the PC's sound card or taken from an audio CD; extensive help files outlining the fundamental Physics of sound. 14.4 The MIDI Format The MIDI (Musical Instrument Digital Interface) is a format for sending music information between electronic music devices like synthesizers and PC sound cards. The MIDI format was developed in 1982 by the music industry. The MIDI format is very flexible and can be used for everything from very simple to real professional music making. MIDI files do not contain sampled sound, but a set of digital musical instructions (musical notes) that can be interpreted by your PC's sound card. The downside of MIDI is that it cannot record sounds (only notes). Or, to put it another way: It cannot store songs, only tunes. The upside of the MIDI format is that since it contains only instructions (notes), MIDI files can be extremely small. The example above is only 23K in size but it plays for nearly 5 minutes. The MIDI format is supported by many different software systems over a large range of platforms. MIDI files are supported by all the most popular Internet browsers. Sounds stored in the MIDI format have the extension .mid or .midi. 14.5 The RealAudio Format The RealAudio format was developed for the Internet by Real Media. The format also supports video. The format allows streaming of audio (on-line music, Internet radio) with low bandwidths. Because of the low bandwidth priority, quality is often reduced. Sounds stored in the RealAudio format have the extension .rm or .ram. 149 14.6 The AU Format The AU format is supported by many different software systems over a large range of platforms. Sounds stored in the AU format have the extension .au. 14.7 The AIFF Format The AIFF (Audio Interchange File Format) was developed by Apple. AIFF files are not cross-platform and the format is not supported by all web browsers. Sounds stored in the AIFF format have the extension .aif or .aiff. 14.8 The SND Format The SND (Sound) was developed by Apple. SND files are not cross-platform and the format is not supported by all web browsers. Sounds stored in the SND format have the extension .snd. 14.9 The WAVE Format The WAVE (waveform) format is developed by IBM and Microsoft. It is supported by all computers running Windows, and by all the most popular web browsers. Sounds stored in the WAVE format have the extension .wav. 14.10 The MP3 Format (MPEG) MP3 files are actually MPEG files. But the MPEG format was originally developed for video by the Moving Pictures Experts Group. We can say that MP3 files are the sound part of the MPEG video format. MP3 is one of the most popular sound formats for music recording. The MP3 encoding system combines good compression (small files) with high quality. Expect all your future software systems to support it. Sounds stored in the MP3 format have the extension .mp3, or .mpga (for MPG Audio). 150 What Format To Use? The WAVE format is one of the most popular sound format on the Internet, and it is supported by all popular browsers. If you want recorded sound (music or speech) to be available to all your visitors, you should use the WAVE format. 14.11 Let us Sum Up In this lesson we have learnt about multimedia text and sound 14.12 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Sound b) Text 14.13 Points for Discussion Discuss about the following a) Various text formats b) Various sound formats 14.14 Model answers to “Check your Progress” In order to check your progress, try to answer the following a) MIDI format b) MP3 format 14.15 References 1. 2. 3. 4. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002 S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995 J.F. Koegel, Multimedia Systems, Pearson Education, 2001 151 UNIT – V: LESSON – 15: IMAGES AND ANIMATION CONTENTS 15.1 Aims and Objectives 15.2 Introduction 15.3 Different Graphic Formats? 15.4 Pixels and the Web 15.5 Meta/Vector Image Formats 15.6 What's A Bitmap? 15.7 Compression 15.8 The GIF Image Formats 15.9 Animation 15.10 Transparency 15.11 Interlaced vs. Non-Interlaced GIF 15.12 JPEG Image Formats 15.13 Progressive JPEGs 15.14 Which image do I use where? 15.15 How do I save in these formats? 15.16 Do you edit and create images in GIF or JPEG? 15.17 Animation 15.18 Multimedia Animation 15.19 Let us Sum Up 15.20 Lesson-end Activities 15.21 Points for Discussion 15.22 Model answers to “Check your Progress” 15.23 References 152 LESSON 15 IMAGES AND ANIMATION 15.1 Aims and Objectives The aim of this lesson is to learn the concept of images and animations in multimedia. The objectives of this lesson are to make the student aware of the following concepts a) various imaging formats b) multimedia 15.2 Introduction If you really want to be strict, computer pictures are files, the same way word documents or solitaire games are files. They're all a bunch of ones and zeros all in a row. But we do have to communicate with one another so let's decide. Image. We'll use "image". That seems to cover a wide enough topic range. I went to my reference books and there I found that "graphic" is more of an adjective, as in "graphic format." You see, we denote images on the Internet by their graphic format. GIF is not the name of the image. GIF is the compression factors used to create the raster format set up by CompuServe. (More on that in a moment). So, they're all images unless you're talking about something specific. 15.3 Different Graphic Formats? It does seem like a big number, doesn't it? In reality, there are not 44 different graphic format names. Many of the 44 are different versions under the same compression umbrella, interlaced and non-interlaced GIF, for example. There actually are only two basic methods for a computer to render, or store and display, an image. When you save an image in a specific format you are creating either a raster or meta/vector graphic format. Here's the lowdown: Raster Raster image formats (RIFs) should be the most familiar to Internet users. A Raster format breaks the image into a series of colored dots called pixels. The number of ones 153 and zeros (bits) used to create each pixel denotes the depth of color you can put into your images. If your pixel is denoted with only one bit-per-pixel then that pixel must be black or white. Why? Because that pixel can only be a one or a zero, on or off, black or white. Bump that up to 4 bits-per-pixel and you're able to set that colored dot to one of 16 colors. If you go even higher to 8 bits-per-pixel, you can save that colored dot at up to 256 different colors. Does that number, 256 sound familiar to anyone? That's the upper color level of a GIF image. Sure, you can go with less than 256 colors, but you cannot have over 256. That's why a GIF image doesn't work overly well for photographs and larger images. There are a whole lot more than 256 colors in the world. Images can carry millions. But if you want smaller icon images, GIFs are the way to go. Raster image formats can also save at 16, 24, and 32 bits-per-pixel. At the two highest levels, the pixels themselves can carry up to 16,777,216 different colors. The image looks great! Bitmaps saved at 24 bits-per-pixel are great quality images, but of course they also run about a megabyte per picture. There's always a trade-off, isn't there? The three main Internet formats, GIF, JPEG, and Bitmap, are all Raster formats. Some other Raster formats include the following: CLP Windows Clipart DCX ZOFT Paintbrush DIB OS/2 Warp format FPX Kodak's FlashPic IMG GEM Paint format JIF JPEG Related Image format MAC MacPaint MSP MacPaint New Version PCT Macintosh PICT format PCX ZSoft Paintbrush PPM Portable Pixel Map (UNIX) PSP Paint Shop Pro format 154 RAW Unencoded image format RLE Run-Length Encoding (Used to lower image bit rates) TIFF Aldus Corporation format WPG WordPerfect image format 15.4 Pixels and the Web Since I brought up pixels, I thought now might be a pretty good time to talk about pixels and the Web. How much is too much? How many is too few? There is a delicate balance between the crispness of a picture and the number of pixels needed to display it. Let's say you have two images, each is 5 inches across and 3 inches down. One uses 300 pixels to span that five inches, the other uses 1500. Obviously, the one with 1500 uses smaller pixels. It is also the one that offers a more crisp, detailed look. The more pixels, the more detailed the image will be. Of course, the more pixels the more bytes the image will take up. So, how much is enough? That depends on whom you are speaking to, and right now you're speaking to me. I always go with 100 pixels per inch. That creates a ten-thousand pixel square inch. I've found that allows for a pretty crisp image without going overboard on the bytes. It also allows some leeway to increase or decrease the size of the image and not mess it up too much. The lowest I'd go is 72 pixels per inch, the agreed upon low end of the image scale. In terms of pixels per square inch, it's a whale of a drop to 5184. Try that. See if you like it, but I think you'll find that lower definition monitors really play havoc with the image. 15.5 Meta/Vector Image Formats You may not have heard of this type of image formatting, not that you had heard of Raster, either. This formatting falls into a lot of proprietary formats, formats made for specific programs. CorelDraw (CDR), Hewlett-Packard Graphics Language (HGL), and Windows Metafiles (EMF) are a few examples. Where the Meta/Vector formats have it over Raster is that they are more than a simple grid of colored dots. They're actual vectors of data stored in mathematical formats rather than bits of colored dots. This allows for a strange shaping of colors and images that can be perfectly cropped on an arc. A squared-off map of dots cannot produce that arc as well. In addition, since the information is encoded in vectors, Meta/Vector image formats can be blown up or down (a property known as "scalability") without looking jagged or crowded (a property known as "pixelating"). 155 So that I do not receive e-mail from those in the computer image know, there is a difference in Meta and Vector formats. Vector formats can contain only vector data whereas Meta files, as is implied by the name, can contain multiple formats. This means there can be a lovely Bitmap plopped right in the middle of your Windows Meta file. You'll never know or see the difference but, there it is. I'm just trying to keep everybody happy. 15.6 What's A Bitmap? I get that question a lot. Usually it's followed with "How come it only works on Microsoft Internet Explorer?" The second question's the easiest. Microsoft invented the Bitmap format. It would only make sense they would include it in their browser. Every time you boot up your PC, the majority of the images used in the process and on the desktop are Bitmaps. If you're using an MSIE browser, you can view this first example. The image is St. Sophia in Istanbul. The picture is taken from the city's hippodrome. Against what I said above, Bitmaps will display on all browsers, just not in the familiar <IMG SRC="--"> format we're all used to. I see Bitmaps used mostly as return images from PERL Common Gateway Interfaces (CGIs). A counter is a perfect example. Page counters that have that "odometer" effect are Bitmap images created by the server, rather than as an inline image. Bitmaps are perfect for this process because they're a simple series of colored dots. There's nothing fancy to building them. It's actually a fairly simple process. In the script that runs the counter, you "build" each number for the counter to display. Note the counter is black and white. That's only a one bit-per-pixel level image. To create the number zero in the counter above, you would build a grid 7 pixels wide by 10 pixels high. The pixels you want to remain black, you would denote as zero. Those you wanted white, you'd denote as one. Bitmaps are good images, but they're not great. If you've played with Bitmaps versus any other image formats, you might have noticed that the Bitmap format creates images that are a little heavy on the bytes. The reason is that the Bitmap format is not very efficient at storing data. What you see is pretty much what you get, one series of bits stacked on top of another. 15.7 Compression I said above that a Bitmap was a simple series of pixels all stacked up. But the same image saved in GIF or JPEG format uses less bytes to make up the file. How? Compression. 156 "Compression" is a computer term that represents a variety of mathematical formats used to compress an image's byte size. Let's say you have an image where the upper right-hand corner has four pixels all the same color. Why not find a way to make those four pixels into one? That would cut down the number of bytes by three-fourths, at least in the one corner. That's a compression factor. Bitmaps can be compressed to a point. The process is called "run-length encoding." Runs of pixels that are all the same color are all combined into one pixel. The longer the run of pixels, the more compression. Bitmaps with little detail or color variance will really compress. Those with a great deal of detail don't offer much in the way of compression. Bitmaps that use the run-length encoding can carry either the common ".bmp" extension or ".rle". Another difference between the two files is that the common Bitmap can accept 16 million different colors per pixel. Saving the same image in runlength encoding knocks the bits-per-pixel down to 8. That locks the level of color in at no more than 256. That's even more compression of bytes to boot. 15.8 The GIF Image Formats So, why wasn't the Bitmap chosen as the King of all Internet Images? Because Bill Gates hadn't yet gotten into the fold when the earliest browsers started running inline images. I don't mean to be flippant either; I truly believe that. GIF, which stands for "Graphic Interchange Format," was first standardized in 1987 by CompuServe, although the patent for the algorithm (mathematical formula) used to create GIF compression actually belongs to Unisys. The first format of GIF used on the Web was called GIF87a, representing its year and version. It saved images at 8 pits-perpixel, capping the color level at 256. That 8-bit level allowed the image to work across multiple server styles, including CompuServe, TCP/IP, and AOL. It was a graphic for all seasons, so to speak. CompuServe updated the GIF format in 1989 to include animation, transparency, and interlacing. They called the new format, you guessed it: GIF89a. There's no discernable difference between a basic (known as non-interlaced) GIF in 87 and 89 formats. 15.9 Animation I remember when animation really came into the mainstream of Web page development. I was deluged with e-mail asking how to do it. There's been a tutorial up for a while now at http://www.htmlgoodies.com/tutors/animate.html. Stop by and see it for instruction on how to create the animations yourself. 157 What you are seeing in that example are 12 different images, each set one "hour" farther ahead than the one before it. Animate them all in a row and you get that stopwatch effect. The concept of GIF89a animation is much the same as a picture book with small animation cells in each corner. Flip the pages and the images appear to move. Here, you have the ability to set the cell's (technically called an "animation frame") movement speed in 1/100ths of a second. An internal clock embedded right into the GIF keeps count and flips the image when the time comes. The animation process has been bettered along the way by companies who have found their own method of compressing the GIFs further. As you watch an animation you might notice that very little changes from frame to frame. So, why put up a whole new GIF image if only a small section of the frame needs to be changed? That's the key to some of the newer compression factors in GIF animation. Less changing means fewer bytes. 15.10 Transparency Again, if you'd like a how-to, I have one you for you at http://www.htmlgoodies.com/tutors/transpar.html. A transparent GIF is fun but limited in that only one color of the 256-shade palette can be made transparent. As you can see, the bytes came out the same after the image was put through the transparency filter. The process is best described as similar to the weather forecaster on your local news. Each night they stand in front of a big green (sometimes blue) screen and deliver the weather while that blue or green behind them is "keyed" out and replaced by another source. In the case of the weather forecaster, it's usually a large map with lots of Ls and Hs. The process in television is called a "chroma key." A computer is told to hone in on a specific color, let's say it's green. Chroma key screens are usually green because it's the color least likely to be found in human skin tones. You don't want to use a blue screen and then chroma out someone's pretty blue eyes. That chroma (color) is then "erased" and replaced by another image. Think of that in terms of a transparent GIF. There are only 256 colors available in the GIF. The computer is told to hone in on one of them. It's done by choosing a particular red/green/blue shade already found in the image and blanking it out. The color is basically dropped from the palette that makes up the image. Thus whatever is behind it shows through. The shape is still there though. Try this: Get an image with a transparent background and alter its height and width in your HTML code. You'll see what should be the transparent color seeping through. 158 Any color that's found in the GIF can be made transparent, not just the color in the background. If the background of the image is speckled then the transparency is going to be speckled. If you cut out the color blue in the background, and that color also appears in the middle of the image, it too will be made transparent. When I put together a transparent image, I make the image first, then copy and paste it onto a slightly larger square. That square is the most hideous green I can mix up. I'm sure it doesn't appear in the image. That way only the background around the image will become clear. 15.11 Interlaced vs. Non-Interlaced GIF The GIF images of me playing the Turkish Sitar were non-interlaced format images. This is what is meant when someone refers to a "normal" GIF or just "GIF". When you do NOT interlace an image, you fill it in from the top to the bottom, one line after another. The following image is of two men coming onto a boat we used to cross from the European to the Asian side of Turkey. The flowers they are carrying were sold in the manner of roses we might buy our wife here in the U.S. I bought one. (What a guy.) Hopefully, you're on a slower connection computer so you got the full effect of waiting for the image to come in. It can be torture sometimes. That's where the brilliant Interlaced GIF89a idea came from. Interlacing is the concept of filling in every other line of data, then going back to the top and doing it all again, filling in the lines you skipped. Your television works that way. The effect on a computer monitor is that the graphic appears blurry at first and then sharpens up as the other lines fill in. That allows your viewer to at least get an idea of what's coming up rather than waiting for the entire image, line by line. Both interlaced and non-interlaced GIFs get you to the same destination. They just do it differently. It's up to you which you feel is better. 15.12 JPEG Image Formats JPEG is a compression algorithm developed by the people the format is named after, the Joint Photographic Experts Group. JPEG's big selling point is that its compression factor stores the image on the hard drive in less bytes than the image is when it actually displays. The Web took to the format straightaway because not only did the image store in fewer bytes, it transferred in fewer bytes. As the Internet adage goes, the pipeline isn't getting any bigger so we need to make what is traveling through it smaller. 159 For a long while, GIF ruled the Internet roost. I was one of the people who didn't really like this new JPEG format when it came out. It was less grainy than GIF, but it also caused computers without a decent amount of memory to crash the browser. (JPEGs have to be "blown up" to their full size. That takes some memory.) There was a time when people only had 8 or 4 megs or memory in their boxes. Really. It was way back in the Dark Ages. JPEGs are "lossy." That's a term that means you trade-off detail in the displayed picture for a smaller storage file. I always save my JPEGs at 50% or medium compression. Here's a look at the same image saved in normal, or what's called "sequential" encoding. That's a top-to-bottom, single-line, equal to the GIF89a non-interlaced format. The image is of an open air market in Basra. The smell was amazing. If you like olives, go to Turkey. Cucumbers, too, believe it or not. The difference between the 1% and 50% compression is not too bad, but the drop in bytes is impressive. The numbers I am showing are storage numbers, the amount of hard drive space the image takes up. You've probably already surmised that 50% compression means that 50% of the image is included in the algorithm. If you don't put a 50% compressed image next to an exact duplicate image at 1% compression, it looks pretty good. But what about that 99% compression image? It looks horrible, but it's great for teaching. Look at it again. See how it appears to be made of blocks? That's what's meant by lossy. Bytes are lost at the expense of detail. You can see where the compression algorithm found groups of pixels that all appeared to be close in color and just grouped them all together as one. You might be hard pressed to figure out what the image was actually showing if I didn't tell you. 15.13 Progressive JPEGs You can almost guess what this is all about. A progressive JPEG works a lot like the interlaced GIF89a by filling in every other line, then returning to the top of the image to fill in the remainder. Obviously, here's where bumping up the compression does not pay off. Rule of thumb: If you're going to use progressive JPEG, keep the compression up high, 75% or better. 160 15.14 Which image do I use where? There's just not a good answer to this question. No matter what I say, someone else can give you just as compelling a reason why you should do the opposite. I'll tell you the rules I follow: Small images, like icons and buttons: GIF (usually non-interlaced) Line art, grayscale (black and white), cartoons: GIF (usually non-interlaced) Scanned images and photographs: JPEG. (I prefer sequential. I'm not a fan of progressive.) Large images or images with a lot of detail: JPEG (I prefer sequential) That said, I also follow the thinking, "Do people really need to see this image?" Can I get away with text rather than an image link? Can I make links to images allowing the viewer to choose whether to look or not? The fewer images I have on a page, the faster it comes in. I also attempt to have the same images across multiple pages, if possible. That way the viewer only has to wait once. After that, the images are in the cache and they pop right up. 15.15 How do I save in these formats? You have to have an image editor. I own three. Most of my graphic work for the Web is done in PaintShop Pro. I do that because PaintShop Pro is shareware and you can get your hands on the same copy I have. That way I know if I can do it, you can do it. To get these formats, you need to make a point of saving in these formats. When your image editor is open and you have an image you wish to save, always choose SAVE AS from the FILE menu. You'll get a dialogue box that asks where you'd like to save the image. Better yet, somewhere on that dialogue box is the opportunity for you to choose a different image format. Let's say you choose GIF. Keep looking. Somewhere on the same dialogue box will be an OPTIONS button (or something close). That's where you'll choose 87a or 89a, interlaced or non-interlaced, formats. If you choose JPEG, you'll get the option of choosing the compression rate. You may not get to play with the sliding scale I get. You may only get a series of compression choices, high, medium, low, etc. Go high. 161 15.16 Do you edit and create images in GIF or JPEG? Neither. I always edit in the PaintShop Pro or Bitmap format. Others have told me that image creation and editing should only be done in a Vector format. Either way, make a point of editing with large images. The larger the image, the better chance you have of making that perfect crop. Edit at the highest color level the image program will allow. You can always resize and save to a low-byte format after you've finished creating the file. 15.17 Animation Most Web animation requires special plug-ins for viewing. The exception is the animated GIF format, which is by far the most prevalent animation format on the Web, followed closely by Macromedia's Flash format. The animation option of the GIF format combines individual GIF images into a single file to create animation. You can set the animation to loop on the page or to play once, and you can designate the duration for each frame in the animation. Animated GIFs have several drawbacks. One concerns the user interface. GIF animations do not provide interface controls, so users have no easy way to stop a looping animation short of closing the browser window. They also lack the means to replay nonlooping animation. Second, the animated GIF format does not perform interframe compression, which means that if you create a ten-frame animation and each frame is a 20 KB GIF , you'll be putting a 200 KB file on your page. And the final drawback is a concern that pertains to animations in general. Most animation is nothing more than a distraction. If you place animation alongside primary content you will simply disrupt your readers' 162 concentration and keep them from the objective of your site. If you require users to sit through your spiffy Flash intro every time they visit your site, you are effectively turning them away at the door. There is a place for animation on the Web, however. Simple animation on a Web site's main home page can provide just the right amount of visual interest to invite users to explore your materials. There, the essential content is typically a menu of links, so the threat of distraction is less than it would be on an internal content page. Also, subtle animation such as a rollover can help guide the user to interface elements that they might otherwise overlook. Animation can also be useful in illustrating concepts or procedures, such as change over time. When you have animation that relates to the content of your site, one way to minimize the potential distraction is to present the animation in a secondary window. This technique offers a measure of viewer control: readers can open the window to view the animation and then close the window when they're through. 15.18 Multimedia Animation From the early days of the web, when the only thing that moved on your screen was the mouse cursor - there's now a bewildering array of methods for animating pages. Here's a selection: Java. Shockwave, Flash (formerly FutureSplash). Macromedia's Shockwave plug-ins and Flash are leaders in plug-in animation. QuickTime is the multi-platform industry-standard multimedia architecture used by software tool vendors and content creators to create and deliver synchronized graphics, sound, video, text and music. FLiCs, AVI all require pre-existing software to be on your computer before you can view them. mBED. mbedlets are interactive multimedia interfaces within web pages. They include graphics, animation, sound. They stream data directly off the web as needed and attempt to use bandwidth as efficiently as possible. They can communicate back to the server using standard HTTP methods. And they respond to user actions such as mouse clicks and key events. Enliven, and Sizzler. Javascript animations require preloading and users can disable Javascript in their browser. Framation (TM) is a technique using a combination of meta-refresh and frames. GIF animation. Self-contained GIF files are downloaded once and played from the computer's disk cache. You can download several per page, and even place a single animated GIF dozens of times on the same page, creating effects that would not be easy with other solutions. Unlike other movie formats, GIF still supports transparency, even in animations. They are as simple to use and implement as any still GIF image. The only thing GIF lacks is sound (and BTW sound has been 163 added to GIFs in the past) and real-time speed variation (like AVI's ability to skip frames when on a slow machine). 15.19 Let us Sum Up In this lesson we have learnt about Images and animation in multimedia. 15.20 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) What the various image formats b) Define animation 15.21 Points for Discussion Discuss the following a) Define Compression b) Discuss about JPEG image 15.22 Model answers to “Check your Progress” In order to check your progress, try to answer the following questions a) Meta/Vector Image Formats b) Different Graphic Formats? 15.23 References 1. 2. 3. 4. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002 S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995 J.F. Koegel, Multimedia Systems, Pearson Education, 2001 164 LESSON – 16: VIDEO CONTENTS 16.1 Aims and Objectives 16.2 Introduction 16.3 Advantages of Digital Video 16.4 File Size Considerations 16.5 Video Compression 16.5.1 Lossless Compression 16.5.2 Lossy Compression 16.6 Compression Standards 16.6.1 JPEG 16.6.2 MPEG 16.6.3 Microsoft’s Video for Windows 16.6.4 Apple’s QuickTime 16.7 Hardware Requirements 16.8 Guidelines for short video sequences 16.9 Multimedia Video Formats 16.9.1 The AVI Format 16.9.2 The Windows Media Format 16.9.3 The MPEG Format 16.9.4 The QuickTime Format 16.9.5 The RealVideo Format 16.9.6 The Shockwave (Flash) Format 16.10 Playing Videos On The Web 16.10.1 Inline Videos 16.10.2 Using A Helper (Plug-In) 16.10.3 Using The <img> Element 16.10.4 Using The <embed> Element 16.10.5 Using The <object> Element 16.10.6 Using A Hyperlink 16.11 Let us Sum Up 16.12 Lesson-end Activities 16.13 Points for Discussion 165 16.14 Model answers to “Check your Progress” 16.15 References 16.1 Aims and Objectives The aim of this lesson is to learn the concept of video in multimedia The objectives of this lesson are to make the student aware of the following concepts a) Video b) Various video formats 16.2 Introduction Video is the most challenging multimedia content to deliver via the Web. One second of uncompressed NTSC (National Television Standards Committee) video, the international standard for television and video, requires approximately 27 megabytes of disk storage space. The amount of scaling and compression required to turn this quantity of data into something that can be used on a network is significant, sometimes so much so as to render the material useless. If at all possible, tailor your video content for the Web. Shoot original video; that way you can take steps to create video that will compress efficiently and still look good at low resolution and frame rates. Shoot close-ups. Wide shots have too much detail to make sense at low resolution. Shoot against a simple monochromatic background whenever possible. This will make small video images easier to understand and will increase the efficiency of compression. Use a tripod to minimize camera movement. A camera locked in one position will minimize the differences between frames and greatly improve video compression. 166 Avoid zooming and panning. These can make low frame-rate movies confusing to view and interpret and can cause them to compress poorly. When editing your video, use hard cuts between shots. Don't use the transitional effects offered by video editing software, such as dissolves or elaborate wipes, because they will not compress efficiently and will not play smoothly on the Web. If you are digitizing material that was originally recorded for video or film, choose your material carefully. Look for clips that contain minimal motion and lack essential but small details. Motion and detail are the most obvious shortcomings of low-resolution video. In the past, video has been defined as multimedia. Video makes use of all of the elements of multimedia, bringing your products and services alive, but at a high cost. Scripting, hiring actors, set creation, filming, post-production editing and mastering can add up very quickly. Five minutes of live action video can cost many times more than a multimedia production. The embedding of video in multimedia applications is a powerful way to convey information which can incorporate a personal element which other media lack. Video enhances, dramatizes, and gives impact to your multimedia application. Your audience will better understand the message of your application with the adequate and carefully planned integration of video. Video is an important way of conveying a message to the MTV generation. But be careful; good-quality digital video clips require very sophisticated hardware and software configuration and support. The advantage of integrating video into a multimedia presentation is the capacity to effectively convey a great deal of information in the least amount of time. Remember that motion integrated with sound is a key for your audience's understanding. It also increases the retention of the presented information (knowledge). The ability to incorporate digitized video into a multimedia title marked an important achievement in the evolution of the multimedia industry. Video brings a sense of realism to multimedia titles and is useful in engaging the user and evoking emotion 167 There are two basic approaches to delivering video on a computer screen – analogue and digital video. Analogue video is essentially a product of the television industry and therefore conforms to television standards. Digital video is a product of the computing industry and therefore conforms to digital data standards. Video, like audio. Is usually recorded and played as an analog signal. It must therefore be digitized in order to be incorporated into a multimedia title. Figure below shows the process for digitizing an analog video signal. A video source, such as video camera, VCR, TV, or videodisc, is connected to a video capture card in a computer. As the video source is played, the analog signal is sent to the video card and converted into a digital file that is stored on the hard drive. At the same time, the sound from the video source is also digitized. PAL (Phase Alternating Line) and NTSC (National Television System Committee) are the two video standards of most importance for analogue video. 168 PAL is the standard for most of Europe and the Commonwealth, NTSC for North and South America. The standards are inter-convertible, but conversion normally has to be performed by a facilities house and some quality loss may occur. Analogue video can be delivered into the computing interface from any compatible video source (video recorder, videodisc player, live television) providing the computer is equipped with a special overlay board, which synchronizes video and computer signals and displays computer-generated text and graphics over the video. 16.3 Advantages of Digital Video One of the advantages of digitized video is that it can be easily edited. Analog video, such as a videotape, is linear; there is a beginning, middle, and end. If you want to edit it, you need to continually rewind, pause, and fast forward the tape to display the desired frames. Digitized video on the other hand, allows random access to any part of the video, and editing can be as easy as the cut and paste process in a word processing program. In addition, adding special effects such as fly-in titles and transitions is relatively simple. Other advantages: 169 The video is stored as a standard computer file. Thus it can be copied with no loss in quality, and also can be transmitted over standard computer networks. Software motion video does not require specialized hardware for playback. Unlike analog video, digital video requires neither a video board in the computer nor an external device (which adds extra costs and complexity) such as a videodisc player. 16.4 File Size Considerations The embedding of video in multimedia applications is a powerful way to convey information which can incorporate a personal element which other media lack. Current technology limits digital video's speed of playback and the size of the window which can be displayed. When played back from the computer's hard disk, videos are much less smooth than conventional television images due to the hard disk data transfer rate. Often compression techniques are used with digital video and as a result resolution is often compromised. Also, the storage of video files requires a comparatively large amount of hard disk space. Digitized video files can be extremely large. A single second of high-quality color video that takes up only one-quarter of a computer screen can be as large as 1 MB. Several elements determine the file size; in addition to the length of the video, these include: Frame Rate Image Size Color Depth 170 In most cases, a quarter-screen image size (320 x 240), an 8-bit color depth (256 colors), and a frame rate of 15 fps is acceptable for a multimedia title. And even this minimum results in a very large file size. 16.5 Video Compression Because of the large sizes associated with video files, video compression/decompression programs, known as codecs, have been developed. These programs can substantially reduce the size of video files, which means that more video can fit on a single CD and that the speed of transferring video from a CD to the computer can be increased. There are two types of compression Lossless compression Lossy compression 16.5.1 Lossless Compression Lossless compression preserves the exact image throughout the compression and decompression process. An example of when this is important is in the use of text images. Text needs to appear exactly the same before and after file compression. One technique for text compression is to identify repeating words and assign them a code. For example, if the word multimedia appears several times in a text file, it would be assigned a code that takes up less space than the actual word. During decompression, the code would be changed back to the word multimedia. 171 16.5.2 Lossy Compression Lossy compression actually eliminates some of the data in the image and therefore provides greater compression ratios than lossless compression. The greater the compression ratio, however, the poorer the decompressed image. Thus, the trade-off is file size versus image quality. Lossy compression is applied to video because some drop in the quality is not noticeable in moving images. 16.6 Compression Standards Certain standards have been established for compression programs, including JPEG MPEG Microsoft’s Video for Windows Apple’s QuickTime 16.6.1 JPEG Standards developed by Joint Photographic Experts Groups based on compression of still images. Motion JPEG treats each video frame as a still image. This results in large file sizes or quality degradation at high compression ratios. JPEG; Although strictly a still image compression standard, stills can become movie if delivered at 25 (or 30) frames per second. JPEG compression requires hardware, but decompression can now be achieved in software only (e.g under QuickTime and Video forWindows). Figure below shows how the JPEG process works. Often areas of an image (especially backgrounds) contain similar information. JPEG compression identifies these areas and stores them as blocks of pixels instead of pixel by pixel, thus reducing the amount of information needed to store the image. 172 The blocks are then reassembled when the file is decompressed Rather than separately storing data for each of the 256 blue pixels in this block of background color, JPEG eliminates the redundant information and record just the color, size, and location of the block in graphic A higher number of blocks results in a larger file but better quality Fewer blocks make a smaller file but result in more-lossy compression; data is irrevocably lost. Compression rations of 20:1 can be achieved without substantially affecting image quality. A 20:1 compression ratio would reduce a 1 MB file to only 50 Kb. 173 16.6.2 MPEG Standard based on Motion Picture Expert Group uses an asymmetrical algorithm, which requires a long time to perform real time compression. MPEG is based on frame differences which results in high compression ratios and small file sizes. MPEG also add another process to the still image compression when working with video. MPEG looks for the changes in the image from frame to frame. Keyframes are identified every few frames, and the changes that occur from keyframe to keyframe are recorded MPEG can provide greater compression ratios than JPEG, but it requires hardware (a card inserted in the computer) that is not needed for JPEG compression. This limits the use of MPEG compression for multimedia titles, because MPEG cards are standard on the typical multimedia playback system. 16.6.3 Microsoft’s Video for Windows Microsoft’s Video for Windows software is based on the .AVI (Audio Video Interleave) file format where the audio and video are interleaved. This permits the sound to appear to be in synchronize with the motion of a video file. 16.6.4 Apple’s QuickTime Apple developed software compression for the Macintosh called QuickTime which is in the movie format. The movie format has a data structure format used for production, and a compact format for playback. It uses lossy compression coding and can achieve ratios of 5:1 to 25:1. The QuickTime player does include a volume control. 174 QuickTime for Windows integrates video, animation, high-quality still images, and high quality sound with Windows applications – boosting the impact of all types of communications. 16.7 Hardware Requirements When a developer is interested in including digitally recorded video in an application there are elements of hardware which are essential. To begin with, as with images, there must be some way of gathering the raw video which will be translated into digital video. Video camcorders or video tape recorders can be used for gathering this original information. Depending on the frequency of use of these pieces of equipment it may be necessary to purchase them as part of the multimedia setup. or it may be better to loan equipment from Audio/Visual Services near you Once this media has been collected it is necessary to translate it into a digital format, using a video digitizing card, so it can be used on the computer. PCs do not generally come with video cards, but the Macintosh AV series of computers have built in cards. 175 Video capture boards are designed to either grab still frames or capture motion video as input into a computer. In some cases, the video plays through into a window on the monitor. 16.8 Guidelines for short video sequences Short video sequences, often accompanied by either spoken commentary or atmospheric music is an attractive way of presenting information. Guidelines 1. Care should be taken not to present a video just for the sake of it. For example, voice output only can be as effective, and requires less storage space, than a video of someone speaking (i.e. a "talking-head"). 2. Using video as part of a multimedia application usually requires a quality as high as that of television sets to fulfil users expectations. 3. Use of techniques such as cut, fade, dissolve, wipe, overlap, multiple exposure, should be limited to avoid distracting the user from the content. 4. To make proper use of video sequences in multimedia applications, short sequences are needed as a part of a greater whole. This is different from watching a film which usually involves watching it from beginning to end in a single sequence. Video sequences should be limited to about 45 seconds; longer video sequences can reduce the users concentration. 5. Video should be accompanied by a soundtrack in order to give extra information or to add specific detail to the information. 6. Videos need time and careful direction if they are to present information attractively. 7. If the lighting conditions under which the video is to be viewed may be poor, controls may be provided for the user to alter display characteristics such as brightness, contrast, and colour strength. 8. Provide low quality video within a small window, since full screen video raises the expectation of the user. Often some kind of stage or other 'decoration', e.g. a cinema metaphor (i.e. background) may be used to show low resolution video in a part of a screen. 9. The actual position within the video or animation sequence, and the total length of the sequence, should be shown on a time scale. 10. The user should be able to interrupt the video (or animation) sequence at any time and to repeat parts of it. The most important controls to provide are: play, pause, replay from start. However a minimum requirement is that users should be able to cancel the video or animation sequence at any time, and move on the next part of the interface. 11. Video controls should be based on the controls on a video recorder VCR or hi-fi which are familiar to many people. 12. It is also desirable to provide controls to set video characteristics such as brightness, contrast, colour and hue. 176 16.9 Multimedia Video Formats The common digital video formats are : Motion Pictures Expert Group (.MPG) Quicktime (.MOV) Video for Windows (.AVI). Video can be stored in many different formats. 16.9.1 The AVI Format The AVI (Audio Video Interleave) format was developed by Microsoft. The AVI format is supported by all computers running Windows, and by all the most popular web browsers. It is a very common format on the Internet, but not always possible to play on non-Windows computers. Videos stored in the AVI format have the extension .avi. 16.9.2 The Windows Media Format The Windows Media format is developed by Microsoft. Windows Media is a common format on the Internet, but Windows Media movies cannot be played on non-Windows computer without an extra (free) component installed. Some later Windows Media movies cannot play at all on non-Windows computers because no player is available. Videos stored in the Windows Media format have the extension .wmv. 16.9.3 The MPEG Format The MPEG (Moving Pictures Expert Group) format is the most popular format on the Internet. It is cross-platform, and supported by all the most popular web browsers. Videos stored in the MPEG format have the extension .mpg or .mpeg. 177 16.9.4 The QuickTime Format The QuickTime format is developed by Apple. QuickTime is a common format on the Internet, but QuickTime movies cannot be played on a Windows computer without an extra (free) component installed. Videos stored in the QuickTime format have the extension .mov. 16.9.5 The RealVideo Format The RealVideo format was developed for the Internet by Real Media. The format allows streaming of video (on-line video, Internet TV) with low bandwidths. Because of the low bandwidth priority, quality is often reduced. Videos stored in the RealVideo format have the extension .rm or .ram. 16.9.6 The Shockwave (Flash) Format The Shockwave format was developed by Macromedia. The Shockwave format requires an extra component to play. This component comes preinstalled with the latest versions of Netscape and Internet Explorer. Videos stored in the Shockwave format have the extension .swf. 16.10 Playing Videos On The Web Videos can be played "inline" or by a "helper", depending on the HTML element you use. 16.10.1 Inline Videos When a video is included in a web page it is called inline video. Inline video can be added to a web page by using the <img> element. 178 If you plan to use inline videos in your web applications, be aware that many people find inline videos annoying. Also note that some users might have turned off the inline video option in their browser. Our best advice is to include inline videos only in web pages where the user expects to see a video. An example of this is a page which opens after the user has clicked on a link to see the video. 16.10.2 Using A Helper (Plug-In) A helper application is a program that can be launched by the browser to "help" playing a video. Helper applications are also called Plug-Ins. Helper applications can be launched using the <embed> element, the <applet> element, or the <object> element. One great advantage of using a helper application is that you can let some (or all) of the player settings be controlled by the user. Most helper applications allow manual (or programmed) control over the volume settings and play functions like rewind, pause, stop and play. 16.10.3 Using The <img> Element Internet Explorer supports the dynsrc attribute in the <img> element. The purpose of this element is to embed multimedia elements in web page: <img dynsrc="video.avi" /> The code fraction above displays an AVI file embedded in a web page. Note: The dynsrc attribute is not a standard HTML or XHTML attribute. It is supported by Internet Explorer only. 16.10.4 Using The <embed> Element Internet Explorer and Netscape both support an element called <embed>. The purpose of this element is to embed multimedia elements in web page: 179 <embed src="video.avi" /> The code fraction above displays an AVI file embedded in a web page. A list of attributes for the <embed> element can be found in a later chapter of this tutorial. Note: The <embed> element is supported by both Internet Explorer and Netscape, but it is not a standard HTML or XHTML element. The World Wide Web Consortium (W3C) recommend using the <object> element instead. 16.10.5 Using The <object> Element Internet Explorer and Netscape both support an HTML element called <object>. The purpose of this element is to embed multimedia elements in web page: <object data="video.avi" type="video/avi" /> The code fraction above displays an AVI file embedded in a web page. A list of attributes for the <object> element can be found in a later chapter of this tutorial. 16.10.6 Using A Hyperlink If a web page includes a hyperlink to a media file, most browsers will use a "helper application" to play the file: <a href="video.avi"> Click here to play a video file </a> The code fraction above displays a link to an AVI file. If the user clicks on the link, the browser will launch a helper application like Windows Media Player to play the AVI file. 180 16.11 Let us Sum Up In this lesson we have learnt about a) Video formats b) Compression techniques 16.12 Lesson-end Activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) Discuss about MPEG standard b) Discuss about video file size consideration 16.13 Points for Discussion Discuss the following a) Lossless compression b) Lossy compression 16.14 Model answers to “Check your Progress” To check your progress try answer the following a) Discuss about AVI format b) Discuss about MOV format 16.15 References 1. Chapter 15, 16 of ISRD Group, “Computer Graphics”, McGraw Hill, 2006 2. Z.S. Bojkovic, D.A Milovanovic, Multimedia Communication Systems, PHI, 2002 3. S.J. Gibbs, D.C. Tsichritzis, Multimedia Programming, Addison-Wesley, 1995 4. J.F. Koegel, Multimedia Systems, Pearson Education, 2001 181