Sistemas de Datos - Algebra relacional

Transcription

Sistemas de Datos - Algebra relacional
Sistemas de Datos
XML
XML
Sistemas de Datos
Esquema de la clase
1. Que es XML
2. Para que se utiliza
3. XML en las Bases
de Datos
4. Implementación
en DBMS actuales
XML
Sistemas de Datos
Que es XML
• EXtensible Markup Language (Lenguaje de Etiquetado Extensible).
• Es muy similar a HTML pero su función principal es describir y trasmitir datos y no
mostrarlos como es el caso de HTML.
• Los tags (marcas) de XML no están predefinidos. Es el usuario quien las define.
Los tags son metadatos en los documentos.
• Es un estándar no licenciado, independiente de plataformas, y soportado por toda
la industria de software.
XML
Sistemas de Datos
Que es XML
Ejemplo de un archivo XML
<menu_almuerzo>
Tag de apertura
Raiz del XML
<comida>
Elemento Padre
<nombre>Waffles</nombre>
Valor
<precio>$2.00</precio>
Atributo
<descripcion>Waffles baratos de McDonalds</descripcion>
<calorias>650</calorias>
Elemento Hijo
<ingrediente>
<descripcion>Harina</descripcion>
</ingrediente>
</comida>
<comida>
<nombre>Hamburguesa</nombre>
<precio>$5.00</precio>
<descripcion>La hamburguesa mas comun de McDonalds</descripcion>
<calorias>1500</calorias>
</comida>
</menu_almuerzo>
Tag de cierre
XML
Sistemas de Datos
Para que se usa XML
Datos vs. Documentos
 Comunicación entre aplicaciones.
XML es el estándar de intercambio entre sistemas
heterogéneos). XML basado en datos
Empresa #1
Empresa #2
Intercambio de datos con actores
externos a la organización
Web Services
 Almacenar y recuperar documentos (información
semi-estructurada). Administración de contenido y
metadata. XML basado en información
XML
Sistemas de Datos
Para que se usa XML
XML basado en datos
 XML como medio de transporte de Datos.
 Estructura regular, conjuntos de atributo-valor, poco o ningún contenido adicional.
 El orden en el que se presenta el contenido no es relevante.
 Las Bases de datos pueden ser origen y/o destino de este tipo de documentos.
 Ejemplos: Ordenes de venta, datos sobre stock, itinerarios y horarios de vuelos.
 Resultados de las consultas no ranqueados, solo importa que los resultados
cumplan con las condiciones de la consulta.
 Ejemplo de Consulta a XML basado en datos:
Buscar los empleados cuyo salario sea el mismo este mes que el de hace 12 meses atrás.
XML
Sistemas de Datos
Comparison with Relational Data

Desventajas:


Ineficiente: tags, que representan información sobre el
esquema, están repetidos.
Beneficios:




Al contrario que las tuplas relacionales, los datos en XML se
autodocumentan con el uso de tags.
Formato no rígido: se pueden incorporar nuevos tags.
Permite estructuras anidadas.
Gran aceptación, no sólo por sistemas de bases de datos,
sino también por navegadores, lenguajes de programación y
aplicaciones.
XML
Sistemas de Datos
Motivation for Nesting

Nesting of data is useful in data transfer


Nesting is not supported, or discouraged, in relational
databases




Example: elements representing item nested within an
itemlist element
With multiple orders, customer name and address are stored
redundantly
normalization replaces nested structures in each order by
foreign key into table storing customer name and address
information
Nesting is supported in object-relational databases
But nesting is appropriate when transferring data

External application does not have direct access to data
referenced by a foreign key
XML
Sistemas de Datos
Para que se usa XML
XML basado en datos
<?xml version=" 1.0 " encoding=" UTF-8 " standalone= " yes "?>
<ficha>
<nombre> Angel </nombre>
<apellido> Barbero </apellido>
<direccion> Portela 36 1° A</direccion>
</ficha>
Ejemplo archivo XML: Catalogo CDs
XML
Sistemas de Datos
Para que se usa XML
XML basado en información
 XML como medio de estructurar, almacenar y recuperar documentos/información.
 Estándar de facto para almacenar documentos por su capacidad de almacenar y utilizar su
estructura (párrafos, secciones, notas de pie, etc.) y metadatos (autor, año de publicación, etc.).
 XML diseñado para ser consumido por personas.
 Estructura no regular, baja granularidad en la información (la más pequeña unidad de
información tiene contenido mixto o es el documento entero), mucho contenido mixto.
 El orden en el que se presenta el contenido es relevante.
 Usualmente son escritos a mano en XML o en otro formato que luego es convertido a XML.
 Ejemplos: libros, leyes, email, y cualquier otro documento escrito a mano.
XML
Sistemas de Datos
Para que se usa XML
XML basado en información
Ejemplo archivo XML: RetrieveProductSearchResultContent.XML
XML
Sistemas de Datos
Uso de XML en las Bases de Datos
Almacenamiento: Posibilidad #1
Mapear el XML en columnas de tipos de datos comunes (caracter,
numerico, fecha, etc.) de una o más tablas.
 Solo para XML basado en datos.
 Requiere más proceso.
 Puede realizarse de varias maneras (XML/SQL, etc.)
Ejemplo en T-SQL
INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(@column1)[1]', 'varchar(20)'),
Rows.n.value('(@column2)[1]', 'nvarchar(100)'),
Rows.n.value('(@column3)[1]', 'int'),
FROM @xml.nodes('//Rows') Rows(n)
XML
Sistemas de Datos
Uso de XML en las Bases de Datos
Almacenamiento: Posibilidad #2
Almacenar el XML en campos del tipo Binary Large Object (BLOB) o
Character Large Object (CLOB).
 Solución sencilla.
 Funciona con todos los motores de Bases de Datos.
 Dificultad para realizar consultas sobre el contenido de los datos de manera sencilla.
 Se pueden usar consultas del tipo full-text search, pero se pierde el uso de los tags (no
permite diferenciar datos de metadatos).
XML
Sistemas de Datos
Uso de XML en las Bases de Datos
Almacenamiento: Posibilidad #3
Almacenar el XML en un campo especializado para guardar y/o
indexar XML.
 No todas las bases de datos tenen una forma nativa para guardar XML.
 Las técnicas utilizadas para almacenar y/o indexar XML pueden variar de un motor a
otro. Se crea una dependencia con el motor de base de datos utilizado.
XML
Sistemas de Datos
Uso de XML en las Bases de Datos
Hacer Consultas: Posibilidad #1
Simple language designed for translation from XML to XML and XML to HTML
Usar JDBC o ODBC en conjunto con SAX o DOM (y tal vez XSLT) para
transformar los resultados de consultas SQL a XML.

Por ejemplo, el programa podría consultar los clientes, y luego hacer consultas adicionales
para consultar los proyectos asociados a cada uno de esos clientes.

Este procedimiento puede resultar ineficiente por el número de consultas requerido.
Base de Datos
Consultas SQL
Aplicación
XML
XML
Sistemas de Datos
Uso de XML en las Bases de Datos
Hacer Consultas: Posibilidad #2
Usar las extensiones XML provistas por el motor de base de datos
utilizado.

Estas extensiones pueden resultar más o menos sencillas y mantenibles dependiendo del
motor elegido, pero todas hacen más simple la tarea.

Se crea una dependencia con el motor de base de datos utilizado.
XML
Sistemas de Datos
Uso de XML en las Bases de Datos
Hacer Consultas: Posibilidad #3
Usar SQL/XML (ANSI SQL 2003).

Un pequeño set de funciones han sido agregadas al estándar SQL para publicar XML.

Para el programador SQL requiere poco aprendizaje.

SQL/XML está soportado por Oracle e IBM, pero no por Microsoft (SQL/XML es diferente de
SQLXML, una tecnología propietaria de Microsoft, y el parecido en los nombres ha causado
gran confusión en el sector).

SQL/XML puede ser usado con APIs de base de datos tradicionales como JDBC.

Incluye la definición de un tipo de datos XML nativo, formas implícitas y explícitas de
generar XML desde datos relacionales, y una manera implícita para mapear datos
relacionales a XML.
XML
Sistemas de Datos
Uso de XML en las Bases de Datos
Hacer Consultas: Posibilidad #4
Usar XQuery.

XQuery es un lenguaje de consultas XML nativo.

Como es un lenguaje nuevo, tiene una mayor curva de aprendizaje para los programadores
SQL, pero resulta más natural para los programadores XML.

A diferencia de XML/SQL, XQuery se encuentra optimizado para procesar XML, y es
particularmente bueno para aplicaciones que deben procesar XML junto a datos
relacionales.

Los mayores motores de Bases de datos soportan XQuery.
XML
Sistemas de Datos
Querying and Transforming XML Data




Translation of information from one XML schema to
another
Querying on XML data
Above two are closely related, and handled by the
same tools
Standard XML querying/translation languages

XPath


XSLT


Simple language consisting of path expressions
Simple language designed for translation from XML to XML and
XML to HTML
XQuery

An XML query language with a rich set of features
XML
Sistemas de Datos
Validación en XML
La validación de un documento en como un contrato, el
creador verifica que el documento ha sido creado
apropiadamente, y el consumidor verifica que posee
el formato esperado.
Posibilidades para validar un documento:
 Usar DTD (Document Type Definition)
 Usar XSD (XML Schema Definition
XML
Sistemas de Datos
DTD
 DTD - Document Type Definition.
 Set de markup declarations.
 Define un document type para la familia de lenguajes de markup de SGDML
 Widely used
XML
Sistemas de Datos
Document Type Definition (DTD)


The type of an XML document can be specified using
a DTD
DTD constraints structure of XML data




DTD does not constrain data types


What elements can occur
What attributes can/must an element have
What subelements can/must occur inside each element, and
how many times.
All values represented as strings in XML
DTD syntax


<!ELEMENT element (subelements-specification) >
<!ATTLIST element (attributes) >
XML
Sistemas de Datos
Element Specification in DTD

Subelements can be specified as

names of elements, or

#PCDATA (parsed character data), i.e., character strings

EMPTY (no subelements) or ANY (anything can be a subelement)
Example

<! ELEMENT department (dept_name building, budget)>
<! ELEMENT dept_name (#PCDATA)>
<! ELEMENT budget (#PCDATA)>
Subelement specification may have regular expressions

<!ELEMENT university ( ( department | course | instructor | teaches )+)>

Notation:

“|” - alternatives

“+” - 1 or more occurrences

“*” - 0 or more occurrences
XML
Sistemas de Datos
University DTD
<!DOCTYPE university [
<!ELEMENT university (
(department|course|instructor|teaches)+)>
<!ELEMENT department ( dept name, building, budget)>
<!ELEMENT course ( course id, title, dept name, credits)>
<!ELEMENT instructor (IID, name, dept name, salary)>
<!ELEMENT teaches (IID, course id)>
<!ELEMENT dept name( #PCDATA )>
<!ELEMENT building( #PCDATA )>
<!ELEMENT budget( #PCDATA )>
<!ELEMENT course id ( #PCDATA )>
<!ELEMENT title ( #PCDATA )>
<!ELEMENT credits( #PCDATA )>
<!ELEMENT IID( #PCDATA )>
<!ELEMENT name( #PCDATA )>
<!ELEMENT salary( #PCDATA )>
]>
XML
Sistemas de Datos
Attribute Specification in DTD

Attribute specification : for each attribute


Name
Type of attribute


CDATA
ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)


Whether




more on this later
mandatory (#REQUIRED)
has a default value (value),
or neither (#IMPLIED)
Examples


<!ATTLIST course course_id CDATA #REQUIRED>, or
<!ATTLIST course
course_id
ID
#REQUIRED
dept_name IDREF #REQUIRED
instructors IDREFS #IMPLIED >
XML
Sistemas de Datos
IDs and IDREFs


An element can have at most one attribute of
type ID
The ID attribute value of each element in an
XML document must be distinct



Thus the ID attribute value is an object identifier
An attribute of type IDREF must contain the
ID value of an element in the same document
An attribute of type IDREFS contains a set of
(0 or more) ID values. Each ID value must
contain the ID value of an element in the
same document
XML
Sistemas de Datos
University DTD with Attributes

University DTD with ID and IDREF attribute types.
<!DOCTYPE university-3 [
<!ELEMENT university ( (department|course|instructor)+)>
<!ELEMENT department ( building, budget )>
<!ATTLIST department
dept_name ID #REQUIRED >
<!ELEMENT course (title, credits )>
<!ATTLIST course
course_id ID #REQUIRED
dept_name IDREF #REQUIRED
instructors IDREFS #IMPLIED >
<!ELEMENT instructor ( name, salary )>
<!ATTLIST instructor
IID ID #REQUIRED
dept_name IDREF #REQUIRED >
· · · declarations for title, credits, building,
budget, name and salary · · ·
]>
XML
Sistemas de Datos
XML data with ID and IDREF attributes
<university-3>
<department dept name=“Comp. Sci.”>
<building> Taylor </building>
<budget> 100000 </budget>
</department>
<department dept name=“Biology”>
<building> Watson </building>
<budget> 90000 </budget>
</department>
<course course id=“CS-101” dept name=“Comp. Sci”
instructors=“10101 83821”>
<title> Intro. to Computer Science </title>
<credits> 4 </credits>
</course>
….
<instructor IID=“10101” dept name=“Comp. Sci.”>
<name> Srinivasan </name>
<salary> 65000 </salary>
</instructor>
….
</university-3>
XML
Sistemas de Datos
Limitations of DTDs

No typing of text elements and attributes


All values are strings, no integers, reals, etc.
Difficult to specify unordered sets of subelements


Order is usually irrelevant in databases (unlike in the
document-layout environment from which XML evolved)
(A | B)* allows specification of an unordered set, but


Cannot ensure that each of A and B occurs only once
IDs and IDREFs are untyped

The instructors attribute of an course may contain a
reference to another course, which is meaningless

instructors attribute should ideally be constrained to refer to
instructor elements
XML
Sistemas de Datos
XML Schema
 Newer, increasing use.
XML
Sistemas de Datos
XML
Sistemas de Datos
XML Schema

XML Schema is a more sophisticated schema language which
addresses the drawbacks of DTDs. Supports

Typing of values




User-defined, comlex types
Many more features, including



uniqueness and foreign key constraints, inheritance
XML Schema is itself specified in XML syntax, unlike DTDs


E.g. integer, string, etc
Also, constraints on min/max values
More-standard representation, but verbose
XML Scheme is integrated with namespaces
BUT: XML Schema is significantly more complicated than DTDs.
XML
Sistemas de Datos
XML Schema Version of Univ. DTD
<xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”>
<xs:element name=“university” type=“universityType” />
<xs:element name=“department”>
<xs:complexType>
<xs:sequence>
<xs:element name=“dept name” type=“xs:string”/>
<xs:element name=“building” type=“xs:string”/>
<xs:element name=“budget” type=“xs:decimal”/>
</xs:sequence>
</xs:complexType>
</xs:element>
….
<xs:element name=“instructor”>
<xs:complexType>
<xs:sequence>
<xs:element name=“IID” type=“xs:string”/>
<xs:element name=“name” type=“xs:string”/>
<xs:element name=“dept name” type=“xs:string”/>
<xs:element name=“salary” type=“xs:decimal”/>
</xs:sequence>
</xs:complexType>
</xs:element>
… Contd.
XML
Sistemas de Datos
XML Schema Version of Univ. DTD (Cont.)
….
<xs:complexType name=“UniversityType”>
<xs:sequence>
<xs:element ref=“department” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“course” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“instructor” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“teaches” minOccurs=“0” maxOccurs=“unbounded”/>
</xs:sequence>
</xs:complexType>
</xs:schema>
 Choice of “xs:” was ours -- any other namespace prefix could be
chosen
 Element “university” has type “universityType”, which is defined
separately

xs:complexType is used later to create the named complex type
“UniversityType”
XML
Sistemas de Datos
More features of XML Schema

Attributes specified by xs:attribute tag:




<xs:attribute name = “dept_name”/>
adding the attribute use = “required” means value must be
specified
Key constraint: “department names form a key for department
elements under the root university element:
<xs:key name = “deptKey”>
<xs:selector xpath = “/university/department”/>
<xs:field xpath = “dept_name”/>
<\xs:key>
Foreign key constraint from course to department:
<xs:keyref name = “courseDeptFKey” refer=“deptKey”>
<xs:selector xpath = “/university/course”/>
<xs:field xpath = “dept_name”/>
<\xs:keyref>
XML
Sistemas de Datos
Vendor Solutions
Además de las ofertas de middleware, las bases de datos más populares están habilitadas para
XML. Es decir, que tienen soporte nativo para la conversión de datos relacionales a XML y
viceversa. De hecho, todos los proveedores principales de base de datos relacionales tienen
extensiones propietarias para el uso de XML con su producto, pero cada uno tiene un enfoque
completamente diferente, y hay poca interoperabilidad entre ellos. Los "Tres Grandes"
fabricantes (IBM, Oracle y Microsoft) tienen completo soporte de XML, el almacenamiento de
todo el documento XML, y soportan de alguna forma XQuery.
XML
Sistemas de Datos
IBM DB2
IBM provides a truly unified XML/relational database, supporting the XML data model from the client through the
database, "down to the disk and back again" through a first-class XML data type. By deeply implementing XML
into a database engine that previously was purely relational, IBM offers superior flexibility and performance
relative to other offerings.
IBM DB2 XML support
DB2 manages both conventional relational and XML data. As depicted in the Storage component of the figure,
relational and XML data are stored in different formats that match their respective models: relational as traditional
row-column structures; and XML as hierarchical node structures. Both types of storage are accessed via the DB2
engine which processes plain SQL, SQL/XML and XQuery in an integrated fashion.
SQL and XQuery are handled in a single modelling framework, avoiding the need to translate queries between
them, via so-called bilingual queries that give developers the flexibility to use the language that matches not just
application needs but also their skills. Applications can continue to use SQL to manipulate relational data or the
XML store. SQL/XML extensions enable publishing relational data in XML format based on data retrieved by
embedding XPath or XQuery into SQL statements. XML applications typically use XQuery to access the XML
store; yet XQuery queries can optionally contain SQL to combine and correlate XML with relational data.
XML
Sistemas de Datos
Oracle XML DB
Oracle has been steadily evolving its support for XML since 1998, moving toward flexible, high-performance,
scalable XML storage and processing. With new version releases every few years, they have progressed from
loosely-coupled XML APIs, to XML storage and repository support, later adding XQuery then binary XML storage
and indexing.
Oracle XML DB features
XML DB implements the major W3C standards (e.g., XML, Namespace, XPath, XML Schema, XSLT). They claim
the first major implementation of XQuery as well as support for SQL/XML. This hybrid database provides SQLcentric access to XML content, and XML-centric access to relational content. Multiple XML storage options allow
tuning for optimal application performance. An XML DB repository is a nice addition for serving document-centric
needs.
XML
Sistemas de Datos
XML
Sistemas de Datos
XML
Sistemas de Datos
XML
Sistemas de Datos
XML
Sistemas de Datos
Microsoft SQL Server
Microsoft's SQL Server architecture. This product features XML storage, indexing and query processing. The XML data type provides
a simple mechanism for storing XML data by inserting it into an untyped XML column. The XML data type preserves document order
and is useful for applications such as document management applications. Alternatively, XML Schemas may be used to define typed
XML; this helps the database engine to optimize storage and query processing in addition to providing data validation. The SQL
Server can also handle recursive XML Schemas as well as server-side XQuery.
Microsoft SQL server architecture
Microsoft still marches to its own drummer in some respects. Their SQLXML mapping technology is used to layer
an XML-centric programming model over relational data stored in tables at the server. (Note SQLXML is
completely different from SQL/XML; the similarity in names can cause quite a bit of confusion.) The mapping is
based on defining an XML schema as an XML view. This provides a bi-directional mapping of an XML Schema to
relational tables. This approach can be used for bulk loading XML data into tables and for querying the tables.
Document order is not preserved, however, so the mapping technology is useful for XML data processing as
opposed to XML document processing. Microsoft still advocates sticking with a relational model for structured
data with a known schema.
XML
Sistemas de Datos
Microsoft SQL Server
Microsoft SQL Server, currently version 2005, is a popular and powerful database server. XML
support, including XQuery support and the addition of an XML column type, is one of the primary
areas of improvement in this version
Retrieving XML
SQL Server's T-SQL dialect includes the FOR XML clause for SELECT queries. This clause, which must
be the last clause in the SELECT statement, causes the data returned from the query to be formatted
as XML. This feature was first added with SQL Server 2000, but it has been improved in SQL Server
2005. The actual format of the XML is configurable using one of the optional keywords listed in the
following table.
FOR XML Formatting
RAW
Notes
Each row in the query is returned as an XML element. Individual columns are returned as attributes of that
element. There is no root node by default, although this can be added. By default, the element name is
row. This can be changed by including the name as a parameter to RAW (FOR XML RAW(‘myrowname’) ).
AUTO
Each row is returned as an XML element named for the table providing the data. Individual columns
returned are attributes of that element. There is no root node by default. If related columns are included,
the resulting XML is nested.
EXPLICIT
The structure of the resulting XML must be defined. This provides the most flexibility in creating XML, but
also requires the most work by the developer.
PATH
The structure of the resulting XML can be defined. This method, added with SQL Server 2005, is much easier
to use than the EXPLICIT model. By default, it creates a structure similar to the AUTO output, but columns
are output as elements, not attributes.
XML
Sistemas de Datos
Microsoft SQL Server
Storing XML
SQL Server 2005 adds support for the XML column type. You can create a table containing one
of these columns just as you can for any other data type (see Listing 11-7).
After the table is created, you can populate and query it just as you do any other table:
INSERT INTO dbo.Articles(Title, Body) VALUES('Welcome', '<div class="wrapper">Welcome to
the system</div>') SELECT Body FROM dbo.Articles Simply dumping XML into an XML column,
although it is useful, has few benefits over using a text column. To improve the process, you
can add an XML Schema to the column. Then, adding data to the table triggers validation,
ensuring the column contains data of the appropriate type. To do this with SQL Server, you
create a schema collection in the database. The CREATE XML SCHEMA COLLECTION command
creates the schema collection (see Listing 11-8). In addition to adding an entry in the database
for the schema, adding a schema collection to a database creates a number of new system
tables and views to track the schemas, as well as support validation.
XML
Sistemas de Datos
Bases XML-Nativas
Xindice
Apache Xindice is a database designed from the ground up to store XML data or what is
more commonly referred to as a native XML database. The name is pronounced zeen-deechay in your best faux Italian accent. Don't worry if you get it wrong though, we won't
mind. We just care that you spell it correctly.
You might be wondering what a native XML database is good for? Well it pretty much has
one purpose, storing XML data. If you don't have any XML data, don't want any XML data
or think XML is the most over-hyped technology of the new millennium, then Xindice is not
for you. We're not out to change the way data in general is stored, only to provide a good
solution for storing XML data. If you survey your projects and see XML popping out of
every corner, then Xindice might be a real help for storing that XML.
The benefit of a native solution is that you don't have to worry about mapping your XML
to some other data structure. You just insert the data as XML and retrieve it as XML. You
also gain a lot of flexibility through the semi-structured nature of XML and the schema
independent model used by Xindice. This is especially valuable when you have very
complex XML structures that would be difficult or impossible to map to a more structured
database.
At the present time Xindice uses XPath for its query language and XML:DB XUpdate for its
update language. We provide an implementation of the XML:DB API for Java development
and it is possible to access Xindice from other languages using built in XML-RPC API. As
standards in the XML database area mature Xindice will include support for those that are
most important.
Xindice is the continuation of the project that used to be called the dbXML Core. The
dbXML source code was donated to the Apache Software Foundation in December of 2001.