The W3C standard of composing websites. SEO optimization
Transcription
The W3C standard of composing websites. SEO optimization
Informatyka Bazy danych Przemysław Godlewski Nr albumu s7785 The W3C standard of composing websites. SEO optimization and positioning websites within the most common search engines. Examination of the PJWSTK website for positioning in the search engines. Praca magisterska napisana pod kierunkiem Prof. Lecha Banachowskiego Warszawa, lipiec, 2011 1 WPROWADZENIE Celem niniejszej pracy magisterskiej jest przedstawienie najnowszych standardów projektowania witryn internetowych (W3C). Omówione zostaną dogłębnie zagadnienie optymalizacji SEO, jak również techniki i metody skutecznego pozycjonowania witryn www w najbardziej dostępnych wyszukiwarkach internetowych (Google, Yahoo!, MSN). W pierwszym rozdziale pracy zestawione zostaną sposoby budowania witryn internetowych w oparciu o obowiązujące standardy (W3C, model DOM, XHTML Transitional, AJAX, CSS, typy dokumentów DOCTYPE). Swoją uwagę skoncentruję na różnicach pomiędzy najbardziej dostępnymi wyszukiwarkami, przestarzałą technologią HTML (i niekorzyściami wynikającymi z jej zastosowania) oraz obecnie używaną technologią XHTML. Mając na uwadze przyszłośd powstających obecnie witryn, przedstawię korzyści płynące z projektowania stron www w oparciu o model W3C. Drugi rozdział pracy poświęcony zostanie zagadnieniu optymalizacji SEO witryn internetowych, którego głównym celem jest wysoka pozycja w rankingach wyszukiwania. Omówione zostaną SEO organiczne, etyczne oraz nieetyczne techniki i strategie SEO, spamowanie SEO, szacowanie PageRank stron internetowych przez roboty najpopularniejszych wyszukiwarek (Google, Yahoo!, MSN). Swoją uwagę zwrócę również na umiejętny dobór i gęstośd słów kluczowych oraz prawidłowe wbudowanie ich w tekstową zawartośd witryny oraz strukturę XHML. Ważnym zagadnieniem będą również linki wewnętrzne i zewnętrzne strony internetowej. Opisane zostaną również zagadnienia pozycjonowania witryn internetowych w najpopularniejszych wyszukiwarkach (Google, Yahoo!, MSN), to jest: reguły dostępności stron www, pojęcie wyszukiwalności, dobór oraz znaczenie metatagów oraz słów kluczowych, struktura oraz sposób tworzenia kodu XHTML wspierającego pozycjonowanie organiczne, strategie pozycjonowania po stronie serwera, a także praktyczne wskazówki pozycjonowania. W trzecim rozdziale zostanie wykonany audyt witryny internetowej http://www.pjwstk.edu.pl. Celem badania będzie zweryfikowanie kodu strony oraz jego struktury pod kontem technologii W3C oraz organicznego pozycjonowania SEO w najpopularniejszych wyszukiwarkach internetowych, zwłaszcza w Google. 2 INTRODUCTION The purpose of my dissertation is to present the newest W3C standards of composing XHTML websites. I am going to focus deeply on website SEO optimization (Search Engine Optimization), as well as on SEO techniques and methods to sufficiently position websites within the most available search engines (Google, Yahoo!, and MSN.) The first chapter is devoted to the way of composing websites that are based on the most up-to-date W3C standards (including DOM model, XHTML Transitional, AJAX, CSS, and DOCTYPE.) I will present web search engines and pay attention to difference between the most available ones. I will compare outdated HTML technology (and its disadvantages) with currently used XHTML technology. Regarding the future usability of current websites, I am going to emphasize advantages of the W3C model. In the second chapter I am going to work around the SEO optimization of websites which aim is to achieve high positions in search engines results list. I will focus on organic SEO, ethical and unethical SEO strategies and techniques, spamming SEO and estimating websites PageRank values by the most common search engines crawlers (Google, Yahoo!, and MSN.) I will deeply focus on the role of keywords and their density, as well as on incorporating them into the textual content of websites. I will also discuss the role of incoming and outgoing links in terms of SEO optimization. Another issue will be positioning a website within the most common search engines (Google, Yahoo!, MSN), which is: the rules and conditions of website availability, composing of meta tags and their importance, the structure and the way of composing the XHTML code that supports positioning, strategies of positioning at a server side, as well as some practical and general hints for positioning a website organically. The last chapter concerns some searching given to the current website of the PolishJapanese Institute of Information Technology in terms of organic positioning the website sufficiently within the most common search engines. I will verify the structure of the HTML website code in comparison with the W3C standards and SEO strategies. I am also going to propose some suggestions for improving semantic correctness of the website code (if required.) 3 TABLE OF CONTENTS WPROWADZENIE ....................................................................................................................... 2 INTRODUCTION ......................................................................................................................... 3 SEARCHING ENGINES AND WEB BROWSERS. XHTML AND W3C STANDARDS OF DESIGNING WEBSITES ................................................................................................................................... 6 1. INTRODUCTION .................................................................................................................... 6 2. HISTORY AND FOUNDATIONS OF THE HTML DOCUMENTS............................................................. 7 3. WEB SEARCH ENGINES ........................................................................................................... 8 3.1 Web crawlers ............................................................................................................ 9 3.2 Search engines’ algorithms ....................................................................................... 9 3.3 Loading data and creating rankings........................................................................ 10 3.4 Classification of searching engines ......................................................................... 12 4. WEB BROWSERS. ................................................................................................................ 13 5. LIMITATIONS AND DISADVANTAGES OF NON-STANDARDIZED HTML.............................................. 13 5.1 Backwards agreement ............................................................................................ 14 5.2 Ahead agreement ................................................................................................... 14 6. Advantages of web standards ...................................................................................... 15 7. XHTML, CSS AND W3C DOM MODEL AS ELEMENTS OF WEB STANDARDS .................................. 16 7.1 Definition of XHTML ............................................................................................... 16 7.2 Semantics of the XHTML code ................................................................................ 17 7.3 Advantages of XHTML............................................................................................. 17 7.4 W3C. Model DOM ................................................................................................... 18 7.5 CSS – Cascading Style Sheets .................................................................................. 19 8. COMPOSING XHTML WEBSITES ............................................................................................ 20 8.1 DOCTYPE and names declaration. Character encoding ......................................... 20 8.2 Formatting tags and attributes in XHTML 1.0 Transitional. Conversion to XHTML............................................................................................................................... 22 9. STANDARDS MODE AND QUIRKS MODE IN WEB BROWSERS ........................................................ 23 10. CONCLUSION ..................................................................................................................... 25 WEBSITES SEO OPTIMIZATION. ETHICAL AND UNETHICAL SEO STRATEGIES AND TECHNIQUES. PAGERANK VERSUS SEO SPAMMING. .............................................................. 27 1. INTRODUCTION .................................................................................................................. 27 2. DEFINING SEO................................................................................................................... 27 2.1 SEO purposes .......................................................................................................... 29 2.2 SEO optimization plan ............................................................................................ 29 2.3 What is organic SEO? .............................................................................................. 31 2.3.1 Organic elements of websites ............................................................................. 31 2.3.2 Benefits of organic optimization ......................................................................... 32 4 3. SEO STRATEGIES AND TECHNIQUES ........................................................................................ 33 3.1 Hosting, domains, navigation and other chosen elements of websites friendly to SEO ............................................................................................................................... 33 3.1.1 Hosting................................................................................................................. 33 3.1.2 Domain name ...................................................................................................... 33 3.1.3 Navigation ........................................................................................................... 34 3.1.4 Sitemap................................................................................................................ 34 3.1.5 TITLE tag .............................................................................................................. 34 3.1.6 HTML headings .................................................................................................... 34 3.1.7 Javascript and Flash ............................................................................................. 35 3.1.8 URL links in SEO strategies. URL canonicalization. Dynamically generated URL addresses ............................................................................................................... 36 3.2 Keywords and keywords prominence. Distinguishing keywords ........................... 37 3.2.1 Heuristic searching .............................................................................................. 38 3.2.2 Keywords used in website links........................................................................... 39 3.2.3 Appropriate keywords and keyword phrases ..................................................... 39 3.2.4 Keyword density versus overloading with keywords .......................................... 39 3.3 Website incoming and outgoing links. Linkages ..................................................... 40 3.3.1 Incoming links ...................................................................................................... 41 3.3.2 Outgoing links ...................................................................................................... 41 4. SPAMMING SEO. SPAMMING TECHNIQUES ............................................................................. 43 5. SEO OPTIMIZATION FOR THE MOST COMMON SEARCH ENGINES: GOOGLE, MSN, YAHOO! ............... 46 5.1 Google PageRank and SEO optimization for Google .............................................. 46 5.2 Websites optimization for MSN ............................................................................. 47 5.3 Websites optimization for Yahoo! .......................................................................... 48 5.4 Page rank fluctuations ............................................................................................ 48 6. CONCLUSION ..................................................................................................................... 49 EXAMINATION OF THE PJWSTK WEBSITE HTML CODE. SHIFTING TO XHTML 1.0 TRANSITIONAL STANDARDS. ................................................................................................... 51 1. INTRODUCTION .................................................................................................................. 51 2. EXAMINATION OF THE PJWSTK HTML WEBSITE CODE ................................................................ 52 3. CONCLUSION ..................................................................................................................... 69 REFERENCES ............................................................................................................................ 71 5 Searching engines and web browsers. XHTML and W3C standards of designing websites 1. Introduction Everyone seems to have used the Internet so far to browse for some specific issues, as the Internet has already become the most available source of information nowadays. Can we imagine living without the Internet now? Probably not. We are familiar with the most popular and available search engines (for example with the Google search engine), as everyone has already had it installed on their computer. Our common knowledge about searching engines very often means just to know how to run Google and how to make it search for some keyword query. After having searched for an issue, we are provided with paged lists of websites that go around the topic. There are so many websites in the Internet concerning the same or similar issues nowadays, that there has been a battle among websites owners (known as website positioning) to get the highest positions in the results list returned to a user in a web browser. The higher position a website gets (the best rank would be within the first page) the more visits it consequently gets. The problem of reaching out for higher positions is more and more important and significant. It has been estimated that “it's important to be in the top 3 pages of a search result because most people using search engines don't go past the 3rd page” (http://www.marketleap.com/verify/.) Nowadays there are many companies that only deal with positioning websites within the most common searching engines. There are also two major ways of positioning a website within these engines. First of all, a website positions itself by its XHTML structure (it is known as organic positioning and is explained in details later on) and its keywords spread around the site content. Technology, structure, as well as textual and graphical content of the website significantly affect search engine rankings and return a higher or lower position of the website in the results list. In case searched keywords are very common, or too many websites include them, it is not enough (but still important and relevant) to create semantically correct XHTML code. It has to be positioned artificially and this is the other way to push the website up in the results list. Artificial positioning is not 6 strictly dependent on website content, structure and technology it is based on. What we know is the fact that searching engines are more and more sensitive about artificial positioning, and once it has been recognized by a search engine, such a website falls down drastically within the results list (if not disappears at all from the list for a longer time.) I am not going to focus on artificial positioning in my dissertation. In order to get the “know-how” of how to make a website appear higher and higher in the search engines results list, first we have to understand how searching engines work and how they process searched information to create their rankings. This knowledge will be fundamental to understand how the XHTML website code should be composed and how it affects these engines. This is what the first chapter deals with, though. Initially, I am going to present search engines (and the phenomenon of crawlers), then I will go on to focus on current standards and technology of composing websites – XHTML and the W3C model. Further on, I am also going to point out advantages of shifting from an old HTML technology into currently used XHTML one. 2. History and foundations of the HTML documents The HTML abbreviation stands for the Hypertext Mark-up Language and is a mark-up language with the help of which we can create hypertext documents (http://www.w3.org/ MarkUp/html-spec/.) These documents are not dependent on platforms they are displayed in and have been in use by the World Wide Web (WWW) since 1990. The HTML mark-up derives from SGML documents and was originally initiated by a physics Tim Berners-Lee (Zeldman 2007), who devised ENQUIRE – a prototype of a hypertext information system. In 1980 this system was used to make some research documents available to others (http://pl.wikipedia.org/wiki/HTML.) The revolutionary idea hidden behind the language was that a user could have used some references to browse information that had been physically stored in remote places. The first specification about HTML called HTML tags was published by Tim Berners-Lee in 1991 and included 22 basic tags that were the basement for building HTML documents. Nowadays 13 of these tags have been still in use by website programmers. In 1993 the IETF (Internet Engineering Task Force) organization published the first specification of an HTML document (proposed by Berners-Lee and Dana Connolly) called 7 HTML - “Hypertext Markup Language Internet-Draft.” That meant a DTD (Document Type Definition) - a grammar description for HTML documents. Since inventors of web searching engines started comparing attributes of existing HMTL tags with the DTD document, the process of standardization of the HTML language has commenced. In 1993 the IETF organization founded an HTML Working Group and one year later HTML 2.0 got published officially (known as Requests for Comments since 1996.) That was regarded as the first complete and standardized specification of an HTML document, as well as that was the basement for future HTML implementations. An HTML 1.0 specification has never existed, as there was a need to differentiate previous specification attempts from the current HTML 2.0 one, though. Since 1996 the HTML specification has been developed and influenced by World Wide Web Consortium (W3C), and in 2000 the HTML 2.0 became an international standard for creating websites (ISO/IEC 15445:2000.) The last specification of HTML dates from 1999 and is known as HTML 4.01. After that, HTML has slowly shifted towards XHTML (Extensible Hypertext Markup Language) - the most up-to-date standards of designing websites now. 3. Web search engines A searching engine (abbreviated also to “engine” from now on) is a program that uses applications to collect information about web sites (Ledford 2009.) This information consists of keywords, phrases, URL inner links, the website code and other important pieces of information that can specify closer the contents of a website that gets browsed. After having gathered all the required information, the engine indexes the data and stores it in its databases. At the front-end of a web searching engine one can see a searching tool into which it is possible to input keywords or phrases to be searched for. A searching engine uses special algorithms to explore data stored in the database (each engine includes its own large database) and tries to return results that are the most adequate to the given keyword query. Thus, a user is provided with a list of paged sites and links retrieved from the database of the engine. 8 3.1 Web crawlers To define a web crawler we can say that it “is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots” (http://en.wikipedia. org/wiki/Web_crawler.) Web crawlers are responsible for the process of gathering and retrieving information from websites. They are like visitors that visit each website and get detailed information about the site. Ledford (2009) indicates that there are over 100 million websites nowadays and that each day around 1.5 million new websites appear each month. Having a phrase to be searched for, a searching engine goes through all references gathered by web spiders. As a matter of fact, such an engine consists of a few parts working together. More detailed specification about the search engine structure is not available. It is associated with its interface, which is just one part of it, as the rest is hidden behind the interface. The structure and methods to store the collected data inside the database are confidential and strictly determine each engine and its owner. 3.2 Search engines’ algorithms According to Ledford (2009) searching algorithms are the most important parts out of all parts that a search engine consists of. This is the basement of each engine on which the rest parts are built. These algorithms construct ways of presenting materials to users. Ledford also specifies that a searching algorithm is a procedure to solve problems that classifies the obstacle, estimates a number of possible answers and returns the solution to the problem. Initially, the algorithm accepts the keyword, browses its database with catalogued keywords and their URL addresses. After that it gathers websites that contain the searched phrase and these results are returned to a user. Retrieved results strictly depend on the algorithm that has been used and there are many kinds of algorithms. Each engine seems to cooperate with a different algorithm, which results in the fact that the same keyword may return different list of websites in different engines. The most common kinds of algorithms that Ledford (2009) proposes break down as follows: 9 1. Linear searching that is based on separate keywords. Data is searched linearly, as if it were organized in a list. Not only does it retrieve a separate element from an engine database, but also going through milliards of websites may take too much time, though. 2. Tree searching that tries finding sets of information from the narrowest set to the widest one. These sets of information remind of trees; separate information may ramify or spread out into other data, similarly to websites that include inner links to other pieces of information. It is also based on a hierarchical structure of storing data. This hierarchy means that the process of searching goes from one point to another depending on some data rankings stored in the database. This algorithm seems to be very useful in the Internet. 3. SQL (Structured Query Searching) searching allows retrieving information independently on subclass it belongs to, as the information is not organized hierarchically. 4. Searching based on solid information algorithm finds sets of information that is organized in a tree structure. Contrary to its name, not always does it return the best choice from database as the algorithm believes in general nature of answers. It is helpful to find data in some specific sets of information. 5. Opposing searching retrieves all possible solutions to the problem and seems to be useless as the number of returned solutions may be infinite. 6. Searching based on restricted satisfaction is probably the best algorithm to be used when looking for a specific keyword or phrase in the Internet. The solution has to fit some criteria and sets of database information may be analyzed in different nonlinear ways. These are only the most popular kinds of algorithms and they may be used simultaneously. The clue to maximize searching results is to understand how search engines work and what their requirements are. 3.3 Loading data and creating rankings It is clear that he process of searching data by an engine consists of the activity of the web ants, its database and the built-in algorithm (Ledford 2009.) independently on search 10 engines, their main goal is to find a website in the Internet (Danowski 2007.) The clue is to specify the rankings of the searched results, as these rankings allow displaying a website in the back-end browser. First of all, SEO optimization (will be elaborated on further) means to guess how a specific searching engine creates its data rankings. Rankings play significant role in SEO optimization, but in this chapter I would like just to emphasize different types of criteria that browsers use to collect information. Ledford (2009) enlists the following types of criteria: 1. Localization in terms of how keywords and phrases are spread out in the HTML document. Some browsers check if the searched keyword appears at the beginning of the HTML code or below it, which influences the ranking of the site. The higher the phrase is placed in the code, the higher ranking the site receives. The best option is to insert the keyword in the TITLE meta tag of the site, but this is going to be described further on. 2. Frequency in terms of how often the keyword is repeated in the HTML code. The more often the keyword appears in the HTML text, the better ranking it gets, but nowadays engines can sufficiently recognize the phenomenon of spamming keywords. This occurrence means that hidden keywords are artificially repeated too many times in the HTML code, which results in lowering the ranking of the site. Moreover, some search engines do ignore or do repel such sites that may never appear in the results list in the future. 3. Links. The ranking of a website strictly depends on the type and number of inner links within a website. Both links that redirected a user to the site and inner site links are significant. On the other hand, it does not work in the way that the more links a site includes, the higher ranking it receives. We just know the fact that incoming links, inner website links and links to leave the site influence the ranking, but the algorithm to estimate the ranking number may be different and is hidden behind each engine. 4. Number of clicks in terms of what the number of clicking a proper web site looks like in opposition to the number of clicking other sites within rankings. Ledford does emphasize that some searching engines cannot monitor the number of 11 visiting each site, and that is why some of them simply store the number of clicking the site within the displayed results, though. 3.4 Classification of searching engines Ledford (2009) groups search engines into basic (first-rate) ones, second-rate ones and brand (topical) ones. The basic ones (around which I am going to speculate the most) are Google, Yahoo! and MSN. This first group generates the most visits to websites, so they should be taken into consideration first when planning SEO optimization. Different results retrieved for a user come from different algorithms behind each engine. The Google search engine seems to be the king of all engines, partly because it provides a user with precise searching results. It has been the accuracy of the results that made the engine so popular. To make it come true, Google founders combined searching keywords with popularity of links. Joining these two criteria together has resulted in producing more precise results. Yahoo! is also regarded as a web engine, which is true, but it also means an Internet catalogue. It indexes different websites and organizes them within categories and subcategories. Originally, Yahoo! was just about to combine a list of the most popular websites gathered by two founders of the browser. The MSN search engine does not give so sophisticated possibilities of searching, as the two ones above do give. Ledford (2009) implies that the MSN is not able to analyze the number of website clicks, but on the other hand it takes into consideration the contents of a website. The way to appear higher in the results list of the MSN is to prepare adequate meta tags data and to spread out convenient keywords within the HTML text. The second-rate engines are not as popular as the ones above, and they are addressed to less numerous audience. They do not generate so much traffic to a proper website, but they are useful in regional and more restricted searching. These engines also differ in terms of how they compose rankings and analyze information. Some of them rely on keywords, others rely on mutual links, yet others rely on meta tags and some hidden criteria known only to the engine founders. The topical search engines constitute the most specialized ones and are very often devoted to one brand (e.g. medicine, sport) around which the searching results fluctuate. 12 4. Web browsers. We need a web browser to search for a specific keyword query and return it back to a user within paged results list. “A web browser or Internet browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier (URI) and may be a web page, image, video, or other piece of content.[1] Hyperlinks present in resources enable users to easily navigate their browsers to related resources” (http://en.wikipedia.org/wiki/Web_browser .) As it is shown in the Figure 1 to the right, different types of Internet browsers are used to display a web page (http://en.wikipedia.org/wiki/Web_browser.) The research conducted in January 2011 indicates that Internet Explorer (43.55%) is the leading web browser now. The second most common browser is Mozilla Firefox (29.0%) and the third one is Google Chrome (13.89 %.) 5. Limitations and disadvantages of non-standardized HTML Website owners want their sites to cost less, work better and be available to larger number of receivers, not only in nowadays web browsers but also in the future ones (Ledford 2009.) Web browsers are liable to age (in opposition to search engines that are inevitably developed and become better and better), which results in the fact that created websites in the past are very often displayed incorrectly nowadays. We build websites in order to re-build them in the future. Instead of adding additional features to an existing website, a programmer again seems to concentrate on adjusting them to current web requirements. Different users use different web browsers (Firefox, Internet Explorer, Opera, and Safari) to download website content. The difference is significant in case a website code is not standardized (is acceptable only by some web browsers) and very 13 often such an old website is simply illegible to visitors. If the site and its content are not fully available to a user, it means that the site owner looses another client. 5.1 Backwards agreement According to Ledford (2009) the backwards agreement means using non-standardized or even restricted and non-practical HTML tags or sets of code, so that each user (independently on what browser and its version they use) could “experience the same” when browsing a website. Actually, it is just a partial solution, as the site gathers more and more pieces of programming code to adjust the site to different web circumstances, which consequently makes the site code more and more illegible, as well. This effect is known as “code division.” This is the shortest way to re-write the website code again in the future, for adding new functionalities again has to take into consideration all available web browsers and their numerous versions. Following this way, it is never possible to fulfill all browsers versions, and there is a need to find a cut-off point - behind which the way a website behaves is ignored. A chosen older version of a web browser (e.g. IE6 or Netscape 1.1) can be treated as such a point. Companies try to find different solutions to the hindrance. Some of them decide to limit their budget income by either considering older and older websites versions (which requires to pay for additional hours spent over adjusting the website code) or by sticking to a chosen browser (thus turning down all other visitors who do not use the chosen browser.) For instance, because of some fake establishments, there can be more and more sites working correctly only in the Internet Explorer (sometimes even only for Windows.) Ledford speculates that it is around 15-25% potential users or clients that are lost by a company. There is also no point in sticking to the idea of designing a website only for one browser requirements. It can never be assured that the chosen web browser will be still a leading one among all other ones. 5.2 Ahead agreement The aim is to break the process of websites aging, so that they behave consequently in various web circumstances. The ahead agreement means that any existing website (that was correctly designed and built) should cooperate with each browser, platforms and other Internet devices (no need to re-design the site again or write additional code to 14 adjust the site.) He stresses the fact that such a site should still work, even if new browsers appear or if existing ones will be provided with new functionalities. It is only possible by using some web standards, which allow designing websites for all browsers and their versions with such easiness and comfort, as if the site were designed only for a chosen one. 6. Advantages of web standards “A Web standard is a general term for the formal standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods” (http://en.wikipedia.org/wiki/Web_standards.) Web standards (abbreviated also to standards) play important role to each medium nowadays (Ledford 2009) and have been accepted in the web international market. As website programming has already got standardized, there is no way but to learn and use it in practice. To a company a standardized website code means saving money and time, as well as making the site contents available to all visitors independently on tools they use to download the website. When using standards it is possible to make the web materials more portable and it is always advantageous, especially for website visitors. It always reduces the cost of so called labor. They are not just the sets of rules; they remain consistent with the previous ways of creating websites. Ledford says they are like the “prolongation and continuation of previous techniques.” Web standards have already been implemented in the newest web browsers and Internet devices that will certainly exist and work in the future altogether with the development of browsers and other standards. Building websites supported with standards lowers the costs of production and maintenance and makes the site available to larger audience. Ledford emphasizes that standards “take in everybody and we are able to serve these users that still use older browsers” (backwards agreement.) 15 7. XHTML, CSS and W3C DOM model as elements of web standards Ledford (2009) accepts the general division of web standards that consist of three major elements. The first element is structure (XHTML, XML), the second one is presentation (CSS1, CSS2) and the last one is behavior (ECMAScript, model DOM.) The XHTML language (explained in details further on) includes text data formatted in co-ordination with their structural and semantic meaning: the HTML code includes titles, subtitles, tabs, lists etc. The XHTML that includes only allowed tags is fully portable. The CSS presentation languages (Cascading Style Sheets) format the webpage and control the way a website is shown in the screen. They deal with typography, text organization and size, colors etc. Because presentation is separated from the HTML structure, by changing a CSS file, a programmer can modify the way a website is presented to a visitor without touching the HTML code. The standardized object model (W3C DOM that is focused on below) remains consistent with CSS, XHTML and ECMAScript 262 (JavaScript language version.) It allows creating advanced functions and special effects working consistently in all Internet browsers and platforms. 7.1 Definition of XHTML “XHTML (eXtensible HyperText Markup Language) is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language (HTML), the language in which web pages are written. While HTML (prior to HTML5) was defined as an application of Standard Generalized Markup Language (SGML), a very flexible markup language framework, XHTML is an application of XML, a more restrictive subset of SGML. Because XHTML documents need to be well-formed, they can be parsed using standard XML parsers” (http://en.wikipedia.org/wiki/XHTML.) In 1993 WC3 established XHTML 1.0 as recommendation and one year later XHTML 1.1 got developed. Actually XHTML is not regarded as a successor of HTML; it can be seen as HTML in the XML format (http://pl.wikipedia.org/wiki/XHTML.) This means that XHTML remains consistent with XML requirements. It is worth mentioning that Mozilla Firefox and Opera have already fully adjusted to the newest standards, while the Internet Explorer has not yet. 16 7.2 Semantics of the XHTML code The structure of the XHTML code is correct if it does not include any errors (all tags are closed and only allowed tags and attributes are implemented, e.g. the attribute “height” is not allowed for a table in XHTML.) Such correctness can be simply checked with the help of free on-line tools, for instance http://www.validator.W3C.org. The XHTML code is semantically correct if all tags are used in co-ordination with their meaning. For example, using the H1 tag to indicate an important title is semantically correct, but using this tag just to display some text with a larger font is semantically incorrect. This all means that a webpage can be correct structurally, but incorrect semantically. Semantic correctness of a webpage is very important for SEO optimization, as web crawlers rely on the semantics of the XHTML code to search for keywords and for indexation purposes. Each site is required to be correct both structurally and semantically nowadays. 7.3 Advantages of XHTML Zeldman (2007) enumerates 10 advantages of using XHTML code that break down as follows: 1. “XHTML is the current tags standard that substitutes HTML4” (trans. P.G.) 2. It is designed to cooperate with other script languages and applications that are based on XML, which is not available in case of HTML. 3. It is more coherent than HTML. 4. XHTML 1.0 is the basement for other XHTML versions in the future. It will always be easier to shift from XHTML to newer and newer versions of XHTML in the future rather than from older and older HTML code. 5. Older versions of browsers deal equally with HTML and XHTML code. 6. Newer versions of browsers prefer XHTML to HTML, as XHTML can be easily envisaged comparing to HTML. Zeldman emphasizes the fact that IE and Firefox display CSS formatting more precisely if XHTML DOCTYPE is declared. 7. XHTML functions well in wireless devices, which means that it is possible to get through to more users without additional software. 17 8. XHTML is the part of the W3C standard. 9. XHTML is the language of structure, which means that it is displayed always in the same way independently on web browsers (provided that CSS documents include text formatting.) 10. XHTML can be validated with the help of free online tools. It saves time to check XHTML code errors and cohesion on one’s own. It is easy to forget about some tag attributes, e.g. the “title” attributes in links or the “alt” attributes within image definitions, which can be simply found by the validation tools. 7.4 W3C. Model DOM “The World Wide Web Consortium (W3C) is an international community where Member organizations, a full-time staff, and the public work together to develop Web standards”, whose mission is to “lead the Web to its full potential” (http://www.w3.org/Consortium.) It was established in 1994 (Ledford 2009), it evaluates web specification and names some rules to make different web technologies work together. It is comprised of around 500 organizations and has introduced the following specifications: XHTML, CSS and the standardized document object model – DOM.) Firstly, the specifications were regarded as recommendations, but in 1998 a project of web standards commenced and the term “recommendations” got substituted for “standards.” Also ECMA (European Computer Manufacturers Association) is an organization that deals with web standardization. It is in charge of the ECMAScript, commonly known among programmers as JavaScript. What is DOM? The W3C website explains that “the Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page” (http://www.w3.org/DOM/#what.) In other words, the DOM model imitates traditional programs running in a web browser (Zeldman 2007) and allows websites to behave as if they were traditional computer applications. For example, a user can sort a table list by clicking on the list header (imitating a traditional Excel document) and the rest of the site remains untouched. Initially, model DOM was designed 18 to imitate traditional applications only at the client’s side, as performed actions were to be conducted successfully even without Internet connection. That was the beginnings of the AJAX technology that allows now reloading chosen website areas without refreshing the whole content. The DOM model proves that it is possible to built interactive standardized websites. Zeldman (2007) enlists the following web browsers that operate the W3C DOM model: Netscape6+, Mozilla 0.9+, IE5+, and Opera 7+. Pocket devices, mobile phones still do not co-operate with the DOM model. 7.5 CSS – Cascading Style Sheets The leading purpose of my dissertation is to focus on website organic SEO optimization, that is why I do not want to explain here how to build CSS declarations for XHTML documents. I just want to present some advantages of using CSS styles, evoking the fact that CSS formatting has already become a part of W3C standards. What does CSS stand for? “Cascading Style Sheets (CSS) is a simple mechanism for adding style (e.g., fonts, colors, spacing) to Web documents” (http://www.w3.org/Style/CSS/.) Using CSS files allows to separate website formatting and presentation from its structure and content. This separation brings about many advantages: 1. A website can be downloaded faster to users’ browsers. It means that servers are not overloaded; they are less encumbered, for there is no need to format each subpage separately (as CSS styles are usually prepared globally.) 2. It saves time for designing, programming, updating and maintaining a website, as changes in one CSS document can influence all the subpages of the website. Global changes to the website can be incorporated within several minutes. 3. People responsible for textual content do not have to worry about the layout and there is almost no probability that they will spoil the layout, as the layout formatting is separated from the website content. 4. Programmers and designers do not have to worry that changes made by the website owner will spoil the way it is displayed (Zeldman 2007.) 5. A website becomes more and more portable when sticking to W3C standards. This brings about some growth in website availability to users. 19 Attaching a CSS style to an XHTML document should be carried out according to a method called “the most possible scenario” (Zeldman 2007.) It means that, first of all, a programmer should prepare first CSS styles document for trustworthy browsers that can operate Standards Mode. If the website is displayed correctly in these browsers, a programmer can prepare a separate CSS file (using “@import” directive) to adjust the website in older browsers, too. It can be conducted as follows: Sample <link rel="stylesheet" type="text/css" href="css/styles.css" /> <!--[if IE]> <style type="text/css">@import url(css/ie.css);</style> <![endif]--> Following the method above allows avoiding the phenomenon known as “searching for the least common solution” that works correctly in all browsers (both older and newer ones.) 8. Composing XHTML websites There are different specifications of XHTML (e.g. XHTML 1.1 Strict), but the most common is XHTML 1.0 Transitional. It is more difficult to convert an old HTML code into XHTML 1.1 Strict rather than into XHTML 1.0 Transitional, though. The XHTML 2.0 specification is still under development. I am going to focus deeply on using the XHTML 1.0 Transitional specification as the basement for building the most up-to-date websites. 8.1 DOCTYPE and names declaration. Character encoding All XHTML documents start with the DOCTYPE declaration (presented below altogether with the namespace declaration) that informs web browsers how to interpret and verify the whole code. The declaration is inserted at the beginning of XHTML documents. It includes information about the XHTML version. Each DOCTYPE specifies different sets of rules that the whole website code follows. Zeldman (2007) mentions that tags code and CSS style will not be verified well, since there is no DOCTYPE declaration. In addition, DOCTYPE defines the way the website is provided to a user. The XHTML 1.0 proposes three types of documents declared by DOCTYPE: 20 1. Transitional – the most common and advised type when converting HTML into XHTML. This type is the most close to the old HTML structure and allows using outdated tags attributes (e.g. it still allows incorporating the “bgcolor” attribute for tables’ rows, which should be defined in the CSS files.) So, it can be treated as kind of shifting towards new web standards. 2. Strict – more restrictive than the transitional one. It allows defining structure and does not accept any visual formatting. 3. Frameset – if a document uses “<frameset>” elements, it should be based on frameset DOCTYPE. XHTML namespace declaration follows DOCTYPE declaration that extends the “<html>” element. This is the collection of elements types and attributes names that are strictly associated with document type (Zeldman 2007.) A correct DOCTYPE and namespace declaration can be composed as follows: Sample <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" http://www.w3.org/TR/xhtml1/DTD/ xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang=”en” lang=”en”> The last two attributes within the namespace declaration indicate the language of the document as well as the language version of the incorporated XML. In order to make a website be interpreted correctly by a web browser, as well as to prepare a websites for verification tests, it is required to set its character encoding. It can be ISO-8859-1 (also known as Latin-1), ISO-8859-2 (for Polish signs) or Unicode (utf-8.) The default character encoding for XML and XHTML documents is Unicode which gives a unique number for each character independently on language used within the document. Of course, programmers can use different types of encoding they prefer (e.g. American and Western Europe websites use Latin 1 encoding.) The ISO 8859 is a multi-language and standardized set of graphic chars encoded with 8 bits. A sample of the character declaration can be composed as follows: 21 Sample <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> 8.2 Formatting tags and attributes in XHTML 1.0 Transitional. Conversion to XHTML There are some rules a programmer should abide by when composing XHTML websites. First of all, all tags should be lowercase, as XHTML (that derives from XML) differentiates whether a char is capitalized or not. Zeldman underlines the fact that “all elements and names of attributes have to be lower case, otherwise the document will not be verified well” (trans. P.G.) For example, the “<TITLE></TITLE>” tag should be substituted for the “<title></title>” equivalent. Consequently, the attribute “onMouseOver” has to be turned into “onmouseover.” But this restriction concerns only elements names and attributes names, as the other content of the website can include capital letters. Secondly, all values of the tags attributes have to be inserted between quotation marks (e.g. width=”100”, not width=100), which was not required in the old HTML structure. All attributes have to have their values. The old HTML structure allowed empty attributes, which inside XHTML is regarded as an error from now on. These values are the same as the names. The chart below presents the difference: HTML <th nowrap> XHTML <th nowrap =”nowrap”> <hr noshade> <hr noshade=”noshade”> <input type=”radio” name=”gender” <input type=”radio” name=”gender” value=”f” value=”f” checked> checked=”checked”> Another rule is that all tags have to be closed, even empty ones. When programming an HTML document, it was possible to leave the “<td>” tag without closing it. The XHTML 1.0 specification requires all tags to be closed, the “<td>” has to be ended with the “</td>” tag. Empty tags, such as “<br>”, ” <img>”, “<input>” have to be closed with a slash at the end, like: “<br />”, “<img />”, and “<input />.” 22 Zeldman (2007) also mentions that double dashes should be used only at the beginning and the end of the comments. The chart below presents an example of both incorrect and correct usage of XHTML remarks: Incorrect XHTML remarks <!--incorrect -- remarks here and below --> Correct XHTML remarks <!—correct remark here and below--> <!-------------------========---------------------> <!--==========================--> 9. Standards Mode and Quirks Mode in web browsers When displaying a web site, a web browser has to be informed whether it should work in Standards Mode (website XHTML and CSS declaration is compatible with the W3C standards, so the whole XHTML is treated and displayed more restrictively) or in Quirks Mode (the website includes an old HTML code for different browsers with many browsers adjustments that exclude each other.) To make issue weirder Zeldman mentions that the Standards Mode for Gecko web browsers (Mozilla Firefox, Netscape) is quite different from the one of the Internet Explorer. In order to adjust the difference between webs browsers engines, in the course of time, Gecko and Netscape engineers proposed third working mode (called “almost Standards Mode”) - a mode working similarly to the Standards Mode in IE browsers. It is the DOCTYPE declaration that says to a web browser into what mode it should switch when displaying a web page. Moreover, the presence or lack of some information within this declaration results in turning on different modes for Gecko and IE browsers. Depending on the Standards or Weird mode, the same web page can be displayed in completely different ways for a user. Zeldman (2007) describes the mechanism as follows: 1. If the XHTML declaration includes full URL address, it turns on Standards Mode for Gecko and IE browsers, as well as for Safari, Opera 7+. Moreover, some declarations of HTML 4 DOCTYPE also switch on Standards Modes. 2. If the DOCTYPE declaration is old, or if it does not include a full URL address, or if there is no DOCTYPE declaration at all, it turns on Quirks Mode for Gecko and Mozilla browsers, which means that probably the XHTML or HTML code in not standardized and should be treated and displayed less restrictively. This non- 23 standardized mode switches on so-called backwards agreement and tries to display CSS styles in the same way they would be like in the old IE4/5. The examples below invoke a full and partial (relative) DOCTYPE declaration for Gecko browsers. The first one switches on the Standards Mode, whereas the other switches on the Quirks Mode: Sample <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "/DTD/xhtml1transitional.dtd"> As a matter of fact, both two declarations above switch on the Standards Mode for IE browsers, as the IE engine searches for any DOCTYPE declaration independently on the full or relative URL address. Zeldman (2007) enlists a set of XHTML DOCTYPE declarations that switch on either Standards Mode or “Almost Standards Mode”. They are as follows: 1. XHTML 1.0 Strict – runs full Standards Mode for all browsers that use DOCTYPE declaration to invoke the mode. It does not work in Opera browsers version less than 7.0 and in IE for Windows version less than 6.0. Sample <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”> 2. XHTML 1.0 Transitional – runs full Standards Mode for Gecko browsers and IE browsers (IE6+ for Windows and IE5+ for Macintosh.) It has no effect for Opera browsers (version less than 7.0) and for IE (Windows version less than 6.0.) Sample <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” 24 “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”> 3. XHTML 1.0 Frameset – runs full Standards Mode for Gecko browsers and for IE browsers (IE6+ for Windows and IE5+ for Macintosh). It runs “almost Standards Mode” for Netscape 7+. It does not affect Opera browsers with version less than 7.0 and for IE browsers with version less than 6.0. Sample <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Frameset//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd”> 4. XHTML 1.1 - runs full Standards Mode for all browsers that use DOCTYPE declaration to switch on the mode. It does not affect Opera browsers with version less than 7.0 and IE browsers for Windows with version less than 6.0 Sample <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1.dtd”> 10. Conclusion There are three major searching engines: Google, Yahoo!, and MSN.) Each search engine sends their crawlers to browse the Internet and collect information. Searched information gets stored in search engines databases from which it is retrieved later on for users. Each engine operates different algorithms to create some rankings and importance of the stored information, which allows filtering out data after some keyword queries are entered to be found. On the one hand, it is not possible to get the “know-how” of how engines create rankings and place sites within their paged results list (as this information is restricted.) On the other hand, some general understanding how they work helps a lot. Moreover, we have already had some experience in creating websites code, as it results from observing how changes in code affect the results rankings. This knowledge is the basement to understand how to build the XHTML website code that would be semantically correct. 25 XHTML is the current standard to create websites, as it is more coherent. There are different specifications of XHTML (e.g. XHTML 1.1 Strict, XHTML 2.0 that is still in development), but the most common is the XHTML 1.0 Transitional, which is the basement for other XHTML versions in the future. Programmers face shifting from an outdated HTML code into XHTML one, which requires some adaptation time. Zeldman (2007) presents a selected list of myths that may still exist in programmers’ consciousness when adjusting a website to different web browsers. He proves that each of them can be abolished when sticking to the W3C standards: 1. Availability forces programmers to create two versions of a website It is not true. Using W3C standards allows creating one version that is equally available in all browsers and mobile devices 2. Availability costs too much It is not true. Adding some elements (that improve website availability) takes usually little time and less cost that is returned by acquiring new clients 3. Availability forces programmers to compose primitive and low quality websites It is not true. By using CSS styles, XHTML, JavaScript and AJAX technology it is possible to control the layout and behavior of websites that become more and more attractive and available to all users Because web browsers get adjusted to current W3C standards more and more, and still there are many websites based on an old HTML technology, the XHTML 1.0 Transitional mode allows finding some agreement between HTML and XHTML nowadays. 26 Websites SEO Optimization. Ethical and unethical SEO strategies and techniques. PageRank versus SEO spamming. 1. Introduction It is not enough to build a website based on a standardized XHTML structure. What also counts for website indexation is a well-organized textual content of websites. A wellbuilt XHTML structure helps robots examine such a content, retrieve the most important information about the website, categorize it among other on-line websites and eventually estimate its page rank value (explained later on.) The higher a page rank value is the higher the website appears in search engines results lists. In this chapter I am going to focus on the most important techniques and strategies of how to achieve a high page rank value of websites. Processing organizing and working with the website contents is known as SEO optimization or SEO campaigns (which means all actions performed to optimize websites for search engines.) It is not possible to discuss all strategies, so I did my best to choose the most relevant ones in the context of the most common search engines: Google, Yahoo!, and MSN. There are ethical, unethical and “almost unethical” SEO optimization techniques. Unethical and cheating campaigns are regarded as spamming SEO. After defining SEO and focusing on advantages of organic SEO, I am going to discuss major elements that organic SEO consists of. Chosen SEO strategies and techniques will be elaborated on (both ethical and unethical ones.) The most important techniques will concern keywords (their density and their organization within XHTML structure and tags) as well as website links (incoming, outgoing and inner ones.) One who wants to achieve best results in ranking positions needs to understand SEO optimization for the most common search engines (Google, MSN, and Yahoo!). Each of these three search engines put emphasis on slightly different aspects of SEO strategies. Google has introduced its own term of the PageRank value. I will elaborate on these differences at the end of this chapter. 2. Defining SEO 27 “Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines via the "natural" or un-paid ("organic" or "algorithmic") search results. In general, the earlier (or higher on the page), and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users” (http://en.wikipedia.org/wiki/Search_engine_optimization.) According to Grappone (2010) SEO (known also as search engines marketing or Internet marketing) includes some actions that lead to generating more visits on a website. On the one hand, SEO optimization means improving the site XHTML code and structure, on the other hand, it also stands for generating additional traffic, communicating with search engines, investigating campaign effects, focusing and analyzing market competition. “SEO is not an advertisement, but advertisement may be one of the elements of the campaign” (trans. P.G.), as it has its targets and purposes. Ledford (2009) adds that it also means understanding how searching engines work, what differences among them are. “Search engine marketing, (SEM), is a form of Internet marketing that seeks to promote websites by increasing their visibility in search engine result pages (SERPs) through the use of, paid placement, contextual advertising, and paid inclusion. SEM is not SEO, as SEM constitutes Adwords” (http://en.wikipedia.org/wiki/Search_engine_marketing.) Do we have to bother about SEO optimization on our websites? Grappone (2010) generally replies “yes”, but there will not always be a positive answer to the question. Sometimes it is not recommended to expose some confidential company data to all the Internet users or the website already gets satisfactory position in a results list. Additionally, SEO optimization brings effects in the course of time, if we do not have enough time, we may not face significant improvements. Also, if our website is going to be re-built soon, there may be no point in optimizing its outdated version. As SEO optimization is a complex process that lasts, we may call it a SEO campaign. Such a campaign should not take place more often that every three months and not less than twice a year. Generally SEO is both knowledge and art (Grappone 2010.) 28 2.1 SEO purposes SEO is the knowledge of how to configure website elements in order to achieve best results in search engines ranks (Ledford 2009.) A website is like a person in a crowd, and to make it more distinguishable in this crowd, we need some optimization criteria to be fulfilled. Some of them are as follows: links textual content and context, website popularity, meta tags, keywords, website language, textual content of the whole website and website maturity. Depending on different search engines and their algorithms, all these elements become more or less important when creating website rankings. In order to conduct a successful SEO campaign, we need a clear purpose we aim at. This results from companies’ business needs and it may be either increasing products selling or generating more orders from a website. Ledford (2009) says it is not enough to state that we need to increase more visits to our websites, we have to know what we want to achieve further on when a user opens the website (e.g. we want to generate 50 orders a month.) On the average, SEO purposes are developed and re-estimated again every 6 months and they should be harmonized with business purposes. Whatever we change on our website, it is connected with our business purposes and is supported with SEO purposes (Ledford 2009.) 2.2 SEO optimization plan After we have pointed out our SEO purposes, we need to compose our SEO optimization schedule, though. First, we have to give some priority to all the subpages our website consists of. This will allow us to treat all the optimization issue separately in parts, which will not overwhelm us and which will maximize effects in shorter periods of time (Ledford 2009.) It is advised to give the highest priority to these pages that both naturally attract users the most (for instance the home page) and that get the most visits and generate the highest profit. Giving priority to websites automatically defines some strategies of marketing efforts to be brought in. Actually, there may be a number of subpages with the highest priority importance. Having embraced with the estimated priorities, we have to evaluate the website, which will allow us to verify what the progress of the SEO optimization is and where we are 29 currently heading for. When optimizing a website, subpages are said to be as important as the whole website is in general, or even more. In order to evaluate a website, we consider the following elements (Ledford 2009): 1. Meta tags. Relevant meta tags help classify the website correctly by search engines. The most important are the title, the keywords and the description meta tags. 2. Textual content of a website. We need to think over how often the website is updated and how often the textual content changes (is dynamic and refreshed.) Search engines still take it into consideration (it is one of the searching engine algorithms built-in feature.) If some site content remains unchanged for a longer period of time, engines may start ignoring the website. 3. Inner and external website links. They play important role in SEO optimization. Search engine crawlers (robots) browse the website to search for these links and collect information. Nowadays engines algorithms are constructed in such a way that they check whether external links relate to websites that are similar and connected with the website in terms of textual content (they may check if there are similar keywords.). Evaluation process means verifying whether each website link leads to proper external website, otherwise it should be abandoned. 4. Sitemap. It is important to compose an XML sitemap file and place it in the root directory of the website. Such an XML file includes ULR links altogether with priority information. It improves website indexation. It is possible to indicate to the Google search engine where it should search for such an XML sitemap. A part of a sample XML sitemap (adjusted to the Google requirements) may look like as follows: Sample <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> 30 <url> <loc>http://www.netivia.pl/</loc> <changefreq>daily</changefreq> <priority>1.00</priority> </url> </urlset> 2.3 What is organic SEO? Hallan tries to explain in his article (http://amrithallan.com/what-is-organic-seo/) that “it’s getting found on the search engines without paying the search engines for the placement, and keep getting found for a long, long time”, but this definition seems no to be satisfactory. In reality, websites owners combine both organic and paid SEO strategies to conduct some optimization. Some experts tend to regard organic SEO as only optimizing website content, others are convinced it is about the number of links a website consists of. Ledford (2009) points out that it is rather a combination of these and other elements, as the rankings position depends on the quality of the website. Organic SEO is not just a simple way to appear high in the results list. It consists in using all natural methods that are available to us to get higher rankings positions without paying for this. Ledford also attracts our attention to the fact that organic SEO takes from 3 to 6 month to see the results of the campaign. On the one hand it requires some patience and time. On the other hand, to improve effectiveness of such a campaign, it is worth considering PPC (pay-per-click programs) paid promotions. 2.3.1 Organic elements of websites Textual content of a website is regarded to be the most important element of the organic SEO. Search engines algorithms are very sensitive about spamming techniques, so we should find some agreement between incorporating relevant textual content and the number and position of different keywords within the text. One of the crawlers’ strategies is to explore how the textual content co-exists and is consistent with other elements of the site, such as meta tags and surrounding links. Since blogs are used on websites (as another organic element), crawlers check the frequency of changing their content. Generally, it 31 also means find some agreement between static and dynamic content of a website. Blogs and questionnaires give some dynamics to the website and may fulfill robots’ requirements. Links constitute other important elements of organic SEO. Ledford (2009) distinguishes incoming, outgoing and external links that should be spread out. Information about outgoing websites (where these links lead to) is as important as the context they are placed in. Nowadays, both sites (the one from links come and the other where links lead to) have to be consistent in terms of textual content, which means that their keywords and textual content have to be cohesive and relevant. This all is indeed explored by engines crawlers and affects the ranks position. Today crawlers’ algorithms can spy and follow what searching results are chosen by users. 2.3.2 Benefits of organic optimization Hallan is his article (http://amrithallan.com/what-is-organic-seo/) enlists some benefits of organic SEO optimization in opposition to paid campaigns (to some extent they still should be combined together). They are as follows: 1. If a website is organically optimized, people tend to click it more often. “Search engines are meant to show results according to relevancy of the found pages.” Sometimes people do not trust websites that appear in sponsored lists, as such a website may be regarded to include mediocre content. It is trustier to click on a website that appears naturally on the search results pages. “If they appear on their own, it means there is a greater chance of finding the relevant content there” 2. Long-lasting search results result from organically optimized websites, as the organic optimization improves the relevancy. “You can only be relevant by being relevant, and that means, constantly generating content that people want to find and consume” 3. “Organic SEO builds greater trust.” Read natural content of a website simply indicates that its owner cares for the website (and his/her business) and gets “deep-rooted knowledge of what they are involved in” 4. Organic SEO optimization is cheaper than sponsored campaigns. Even if we spend some money on paid campaigns, it still pays off to spend the same amount of 32 money on conducting organic optimization. This will result in being seen on results pages for incomparably longer time 3. SEO strategies and techniques SEO optimization is a set of strategies and techniques with the help of which we want to gain some increase in our website page ranking. This is usually done for chosen keywords or key phrases (Ledford 2009.) It is advised to consider SEO optimization as an important step before our site is created, as optimizing on-line websites is usually more time consuming. There is a list of ethical and unethical SEO strategies. We are going to focus on both variations, though. 3.1 Hosting, domains, navigation and other chosen elements of websites friendly to SEO SEO optimization is the most efficient when we conduct one substantial change in a proper period of time (Ledford 2009.) What attracts crawlers is the website design. Generally speaking, these are: meta tags (focused on later on), links (I have devoted separate section for linkages), navigation structure and textual content. As a matter of fact, there are many aspects we should consider in optimization process, but I am going to focus on chosen aspects that seem the most important to me. 3.1.1 Hosting It is important to what server we copy the files of our website. For example, if we wanted our website to be displayed in Poland and if we bought a server outside Poland, this would decrease the page rank value (the PageRank term is explained later on.) “Crawlers will recognize that the website does not suit our demographical position” (Ledford 2009.) 3.1.2 Domain name We should think about SEO when considering how to name our domain. This name should be as short as possible, and if possible, it should contain the most important keyword(s) that users enter into searching queries in searching engines. The best suffix for 33 domains is “.pl”, as this is a default suffix for search engines and affects the websites position within results list. 3.1.3 Navigation Ledford (2009) sums up that there are two ways of navigating to the website: inner (around the website) and external (other sites leading to the website.) SEO optimization strategies should combine both types of navigation. Navigation should consist of textual links that are filled in with relevant keywords and it helps crawlers navigate among subpages and index the website. Navigation consisting of graphical buttons is like an obstacle to crawlers. We should compose such a navigation that is friendly both to users and search engine crawlers. 3.1.4 Sitemap If we feel that users’ preferences clash with crawlers’ preferences, we should build a sitemap XML file. Such a file includes lists of links and indicates to crawlers the structure and density of a website. 3.1.5 TITLE tag This is the most important HTML tag in terms of SEO optimization. When indexing a website, crawlers seem to start with searching for the titles, as these titles are kind of sources to classify the whole website. It is not advised to fill the title with the name of our company; instead, we should consider relevant keywords. Moreover, some search engines index only the first 50 signs of titles, so we have to consider their length. “The W3C has proclaimed that the website title length should not extend 64 signs” (Ledford 2009.) It is not allowed to duplicate keywords within titles. Such a strategy may be classified as spamming SEO (discussed later on.) 3.1.6 HTML headings “<h1> to <h6> tags are used to define HTML headings. <h1> defines the most important heading. <h6> defines the least important heading” (http://www.w3schools.com/ TAGS/tag_hn.asp.) As we see, they configure different levels 34 of website headings. All major browsers operate the first 6 headings tags, whereas in practice HTML programmers incorporate up to the first four ones (Ledford 2009.) These tags seem to inform both users and crawlers what the main topic and other sections of websites are. These elements do mark some important information within the textual content, so they should also be filled in with relevant keywords. Crawlers examine the headings and check if it suits the following textual content. It is claimed that the H1 tag is a top level heading so it ought to embrace the most important keywords, while the other lower and lower number tags ought to consequently include less and less important keywords. HTML headings tags should be also loaded with keywords dynamically, not statically. It means that these keywords should vary a bit within the same headings. Filling in HTML headings with considered keywords is a SEO strategy to increase the website page rank value. 3.1.7 Javascript and Flash Javascript allows programmers to add some dynamics to websites, but it may block crawlers. To avoid that, it is recommended to place Javascript codes in external files (with “.js” suffix) and attach references to these files within the XHTML website code. Thus, it does not stop crawlers from indexing the website, because the Javascript code is not run by crawlers. It is almost impossible to index a flash site to estimate its page rank, and very often such an approach results in leaving the site by crawlers. It is advised to combine elements of flash (e.g. flash banners) with the website XHTML content. It is not recommended to use flash navigation, as it blocks crawlers from retrieving the most important keywords. Thurow (2008) points out the fact that for us as human beings a flash site seems to include many subpages, but from crawlers’ point of view it is regarded as only one page with only one main flash animation. In order to increase flash websites accessibility in search engines, one main flash animation should be divided into a few smaller ones, available under different ULR addresses. This tells robots that such a website contains more unique contents. 35 3.1.8 URL links in SEO strategies. URL canonicalization. Dynamically generated URL addresses The URL abbreviation stands for “Uniform Resource Locator”. On the one hand, it should be short so that users can remember it by heart and recall it when opening a web browser. On the other hand, a SEO strategy for URL links means inserting the most important keywords within the URL addresses. It is not recommended to overuse dynamic contents within the URL links (e.g. products IDs from a database.) “Canonicalization is the process of picking the best URL when there are several choices, and it usually refers to home pages” (http://www.mattcutts.com/blog/seo-advice-urlcanonicalization/.) It is worth emphasizing that crawlers regard the following two URL addresses as different ones: http://netivia.pl and http://www.netivia.pl. We ought to use the “301 rewriting rule” to redirect users and crawlers to only one base URL address. If there is only one base URL address, the power of the estimated website page rank value does not to have to be split for each URL variation separately. By putting the following code inside the website htaccess file we always force using the prefix “www” at the beginning of the URL address, even if we forget to enter it: Sample RewriteEngine On RewriteCond %{HTTP_HOST} ^netivia.pl(.*) [NC] RewriteRule ^(.*)$ http://www.netivia.pl/$1 [R=301,L] In most cases, URL addresses are generated dynamically (as website is governed by its CMS,) which can be recognized by crawlers. Dynamic addresses consist of special signs, like question marks, pluses, or other signs like “%”, “&”. Robots, on the one hand do not reject such links, but on the other hand seem not to like indexing them. Thurow (2008) enlists the following reasons for which it happens: 1. Search engines crawlers do their best to avoid indexing and storing the same websites several times in their databases. If a website consists of many subpages with different textual content, but their URL addresses look like almost the same with only one parameter that gets changed, these websites probably will not be 36 indexed by crawlers. It is better to create such addresses that contain unique major keywords that are connected with page contents. 2. Search engines try to make their search results more and more exact and adequate to the given query. For crawlers this requires much filtration before indexing takes place. 3. “Some dynamic URL addresses may make robots get over-looped and never leave the website” (Thurow 2008, trans. P.G..) If only one URL address parameter changes, crawlers are not able to recognize whether this parameter sorts the site content or it points out to different subpage. As a matter of fact, when Google crawlers canonicalize the URL links, they try to guess the best representation for all the links variations. 3.2 Keywords and keywords prominence. Distinguishing keywords “Keywords” is a term that is strictly connected with SEO (Ledford 2009.) Keywords are used to index and find a website in the Internet. Effective and popular keywords affect the website page rank. Understanding some rules of using keywords and knowing how to select them will help promote a website. Correct usage of keywords guarantee to find a website within the first 20 result pages (which is optimal according to Ledford), although users usually concentrate on the first two results pages. High position in the results list is more advantageous than setting paid adverts, as natural searching is conducted by the right target group. People type different search queries, so we should also consider these ones that include some typical errors. Thurow (2008) claims that all search engines give more significance to these keywords that appear at the beginning of a website content rather than to these that are placed at the end. The “positioning of a keyword within a website in relation to the beginning is known as prominence” (Thurow 2008, trans. P.G.) It is advised to spread important keywords at the beginning of a website by composing a textual static header. Of course, such headers ought to be different for each subpage, as each of them includes different textual content. 37 Distinguishing keywords within website textual contents plays important role in SEO strategies (Thurow 2008.) Users, who desire to find some appropriate information, type in detailed keywords or key phrases to be searched for. Such keywords are distinguished in results lists, which assure users that this is a website they look for. 3.2.1 Heuristic searching We should employ heuristic to achieve best rank position. “The term heuristic is used for algorithms which find solutions among all possible ones, but they do not guarantee that the best will be found, therefore they may be considered as approximate and not accurate algorithms” (http://students.ceid.upatras.gr/~papagel/project/kef5_5.htm.) In terms of SEO optimization, heuristic means combination of keywords and key phrases. Search engines algorithms also browse websites heuristically, which means that crawlers will follow the links and check the textual similarity of the sites that links lead to. If website links refer to pages that are not thematically connected, this may be regarded as “links farm” and will drastically decrease the page rank. Keywords are connected with heuristic, because they constitute kind of a model – they are the clue to solve a problem that is scanning search engines database to retrieve specific data. Heuristic gives variables to estimate ranking position for a searched query. Ledford (2009) proposes some rules connected with search engines positioning. Some of them are as follows: 1. Agreement between system and reality. Our website should include such a language that is acceptable and used by users (e.g. do not incorporate non-used technical terms) 2. Full control at the users’ side. Give users the possibility to control the website, as they are allowed to make mistakes. It means that they should be provided with “back” and “forward” links to navigate and browse the website. These are website inner links. 3. Sticking to standards. Every time a user sees an alert or information, they should not be surprised they see it. In this context, cohesion refers both to language and to actions performed on the website. 38 As a matter of fact, the mentioned rules above refer more to general website usability rather than to keywords as such, nevertheless they are also connected with keywords SEO strategies. 3.2.2 Keywords used in website links Textual content of links allows us to double use keywords around the website (Ledford 2009.) A crawler that analyzes a proper website distinguishes links and their textual content and uses them to categorize the website. Placing relevant and significant keywords within the website links is one of the SEO strategies. Of course, we cannot exaggerate and over-optimize our website. This may happen when too many keywords remain in the same grammatical form both within and out of links content. 3.2.3 Appropriate keywords and keyword phrases We can divide keywords into two major categories: the ones connected with our brand and general ones. It is advised to use important keywords within the name and description of our company (Ledford 2009.) In order to choose the most adequate keywords list for our website we should do some brainstorming and think about all the keywords (with general meaning) connected with our brand. Keywords with similar or the same meaning ought to be grouped together. We can narrow them down to select more specific keywords that will become our organic keywords at the end. Except for separate keywords, we should also consider keyword phrases, as they are also inputted by users. Such a phrase might consist of even three keywords. Phrases are generally more detailed than separate keywords, so they help increase the page rank. 3.2.4 Keyword density versus overloading with keywords Keyword density is another factor that is taken into consideration by crawlers when estimating website page rank. “Keyword density is the percentage of times a keyword or phrase appears on a web page compared to the total number of words on the page. In the context of search engine optimization keyword density can be used as a factor in determining whether a web page is relevant to a specified keyword or keyword phrase” (http://en.wikipedia.org/wiki/ Keyword_density.) It is estimated that the density of a 39 separate keyword should be between 1 and 3 percent, while general density of all keywords incorporated within the website text should be from 7 till 10 percent. It is commonly known that some search engines prefer higher density of keywords than others do. Nevertheless, textual content does count for all engines. The influence of keyword density over the estimated page rank also differs among search engines. Thurow (2008) says that one page should not consist of more than 800 words. This limitation increases the possibility that the whole website will be read by users. “Because it is possible to easily and artificially spread keywords around the textual content, crawlers less and less pay attention to the keywords density” (Thurow 2008, trans. P.G.) Overusing a keyword is known as “keyword stuffing” (unethical SEO strategy) and will result in penalizing a website. Today search engines algorithms are able to distinguish such an artificial keywords overloading. “Keyword stuffing had been used in the past to obtain maximum search engine ranking and visibility for particular phrases. This method is completely outdated and adds no value to rankings today. In particular, Google no longer gives good rankings to pages employing this technique” (http://en.wikipedia.org/wiki/Keyword_stuffing.) One of the methods to avoid overloading with keywords is to place a few unique keywords on each subpage. In this case they should come from the same meaning group (as they got grouped while brainstorming process.) 3.3 Website incoming and outgoing links. Linkages Focusing on linkage strategies plays important role nowadays and it is not enough to place some links in your XHTML code, as there are many types of linkages we should be aware of (Ledford 2009.) When considering SEO optimization plans, links are the most important just after website keywords, as they are the basis to create connections among websites by crawlers. The first goal of linkage strategies is to connect your website with other thematically similar websites, which increases traffic to the website. Links leading to your website are like “votes” that promote the website importance and relevance and they do influence the Google PageRank. There are some techniques to build incoming and outgoing links. 40 3.3.1 Incoming links Links that re-direct users to our website are more important than the outgoing ones. There are some ways to get more incoming ways that would lead to our websites. Some companies analyze regularly the sources of incoming links to their websites in order to improve their linkage strategies. First of all, we can ask for such links. It requires investigating the market to create a list of thematically similar websites and simply asking for adding our website to these websites codes. It is not the most effective method, though. Offering articles to other websites (and asking to place our links below the article) seems to be the most effective method of getting incoming links (Ledford 2009.) Its efficiency results from the fact that other websites owners constantly search for good textual contents. Blogs constitute another method of getting incoming links. A passage of text that includes links to our website may increase the estimated page rank value. Messages are the basis of each marketing program. It is possible to hire a company that will constantly place short pieces of information in the Internet. It is advisable to add some news or pieces of gossip about our company (of course altogether with links to our website.) Affiliating and PPC programs may also help a lot, but here we have to pay for adding our ads on such websites (Amazon.com is an example of it.) Clicking on our website link gives us some profit. Search engines algorithms accept affiliating and PPC programs and do not impose any punishment for that. Ledford (2009) does not advice to built own websites just in order to build links connections to our other website. Such an “illusion of popularity” may eventually be regarded as “spam of links”. Sometimes such deeds might be regarded as unethical. 3.3.2 Outgoing links It is often questionable to incorporate outgoing links because they allow visitors to leave our website (and they may not return.) The more outgoing links there are, the lower value of such votes is (Ledford 2009.) On the other hand, if a website does not include any 41 outgoing links, it is not regarded well by crawlers. Outgoing links help assign our website in a specific connection area and our website also ought to indicate to other websites. We have already known that crawlers explore links to estimate the website page rank. When creating outgoing links we should abide by some requirements imposed by search engines crawlers. Some of them are: 1. Target of links. We should be conscious about to what extend other websites are thematically similar to that of our 2. Links over-usage. It is frustrating when every second/third word in a sentence is linked. Ledford (2009) prompts not to use more than three links in an article or passage of text 3. Using keywords within the textual content of links. It pays a lot to use keywords in links rather than to put the “click here” phrase inside the link. Crawlers do search for keywords within links. It is even better to use a keyword phrase as a link provided that the link leads to thematically connected sites 4. Links to suspicious websites. It is not recommended to link to low quality sites, such as “farms of links” or spamming websites. If we address our website to such a website that has already got high ranking scores by search engines, our website will also get additional ranking points 5. Websites that include only links have been classified as spamming websites nowadays. It is strictly forbidden to use websites that do not include other textual content except for links 6. Links monitoring and repairing spoilt links. Ledford (2009) declares that it is better not to have a link at all rather than to have a spoilt one. One of linkage strategies is to regularly check if websites that our links lead to still exist in the Internet. We cannot allow links leading to nowhere because in the course of time it means to crawlers that a website is not governed well Generally speaking, we should use only such links that are useful to users. In terms of SEO optimization their usefulness means leading to websites where a user can still find useful information to them. What counts is links quality, not quantity. 42 4. Spamming SEO. Spamming techniques Spamming SEO is such an important issue in SEO optimization that it requires our separate attention. It has already been mentioned several times in this dissertation and now it is time we focused deeper on that topic. All kinds of spamming SEO are not ethical or at least “almost unethical” and can be recognized by search engines crawlers. In such cases the estimated page rank value of a website gets decreased dramatically and sometimes such a websites completely disappears from the results list. Search engines algorithms are constantly adjusted and modified to discover newer and newer kinds of spamming SEO. Ledford (2009) says that we should be very careful and sensitive about this issue, as what is ethical and acceptable today can be treated as an aspect of spamming on the other day. The following pieces of advice are considered: 1. Follow your consciousness and common sense. If you feel that what you are doing now is kind of spamming, it is probably true. If you feel some trickery, avoid it 2. Do not try to make your website be regarded as a different one when it is not in reality. Creating fake structures will sooner or later result in website exclusion from results lists. It is obvious that crawlers will unravel sets of artificial linking 3. Do not trust anyone who claims that some practice is acceptable in case you feel otherwise. Many SEO specialists will go on proving that some unethical SEO strategies are still acceptable provided that they are conducted well. It does not pay off, as “spam is always spam” and will be found by crawlers There are many techniques of SEO spamming (known also as spamdexing) and all of them should be avoided. Once they are recognized by crawlers, the website page rank gets decreased. All these unethical techniques and strategies try to incorporate more links into the website. Ledford (2009) groups some of them as follows: 1. Transparent links. On the face of it, they are not visible because they are of the same color as the surrounding background is 2. Hidden links. They are not seen as they become placed behind other graphical elements. Such links are not clickable by users but can be still accessible by crawlers 43 3. Misleading links. The way they are addressed is different from the way they are presented to users. Such links simply do not open websites that are entitled in these links 4. Links that are not recognizable by users. These links are written with a 1px font size, which is illegible to human beings 5. Keywords overloading and overloaded meta tags. Chosen keywords are repeated too many times either in the textual content of a website or within its meta tags. Sometimes a website is artificially overloaded with hidden links just to increase the keywords density (Danowski 2007.) It is commonly known now that repeating the same keywords around the same subpage does not generate higher page rank value. 6. Automatically generated websites. They are created by stealing textual contents of other websites and thus they are of no usability value 7. Links entitled with a dot. It means putting two identical links close to each other. The clever idea is to use a dot in the second link title. It may not be distinguishable by people, but will be undoubtedly encountered by crawlers. An example is as follows: <a href=”http://www.netivia.pl” title=”strony internetowe Warszawa”>Netivia</a><a href=”http://www.netivia.pl”>.</a> 8. Masking. It means preparing two separate versions of a website. One version is over-optimized for crawlers that are redirected to this website version 9. Hidden textual content. Such text is printed out with the same color as the background is, thus it is invisible to people, but still visible to crawlers. Danowski (2007) claims that this is the most popular unethical spamming method and derives of the times when the AltaVista search engine was used (to estimate page rank value it took into consideration the textual content not metatags.) 10. Websites including only links. It is perceived as “farm of links”. The only exception is the sitemap list that gathers all website links together 11. Redirecting websites. They are usually incorporated because of SEO strategies, but still useless for website visitors. Once such a redirecting website becomes opened, we are notified about being taken to other websites 44 12. Stealing websites. The only purpose to use other popular websites is to redirect their visitors to our website 13. Spamming with the help of the Internet encyclopedias. For instance, it can be achieved by editing Wikipedia articles and filling them with links to our websites. To avoid unethical spamming, the Wikipedia founders had to automatically add the “nofollow” attribute to each link 14. Filling HTML remarks with keywords. Although HTML comments are not seen by users in the browser, they still exist within the textual content of the website (Danowski 2007.) It is known that crawlers omit website HTML remarks when estimating the page rank value Except for dramatically decreased website page rank value and sanctions imposed by search engines, there are also other reasons for which it does not pay off to use unethical SEO strategies. No one likes spamming. Once we get redirected, face some spam or feel tricky ways to get us to a proper website, we will probably not return to it in the future. Danowski (2007) presents the following danger when using unethical spamming: First of all, such a website can be banned by search engines. It means that it can be excluded from results list and will not appear even on last pages. This means practically no traffic to such banned websites. Danowski proposes to type in a search engine the following phrase: “site:netivia.pl.” This will retrieve all domain sites that have been indexed so far. No results list for the query “site:netivia.pl” stands for that such a website has been banned and does not exist in the search engine index. If we want to check what version of the website is currently saved in the search engine database (the online version can be different, though), we should type in: “cache:netivia.pl.” Moreover, if we type in “info:netivia.pl”, we will be provided with all other information about indexing the websites. Once a website has been banned, it is almost impossible to get rid of the ban. Many owners do not deal with such a domain anymore and simply change the website name. Danowski (2007) also adds that filtration imposed on websites by search engines constitute less restrictive punishment for spamming. It means that a website will not be found by users when typing in a chosen key phrase to be searched for. Filtration is imposed automatically and also disappears automatically after some time. 45 5. SEO optimization for the most common search engines: Google, MSN, Yahoo! When considering search engines from the SEO optimization point of view, we can divide them into three major types. This division regards the way search engines index data and store it in their databases. Search engines that are based on crawlers constitute the first category and the Google Search Engine is undoubtedly categorized here. All the gathered information goes to the central repository where it is explored in the indexation process (Ledford 2009.) Thus, information retrieved for keyword queries are taken from the database index. Every time in a while, crawlers return to websites to re-index them again. The second group is search engines whose databases are loaded by people. They are regarded as websites catalogues and the Yahoo! search engine belongs to the group. Hybrid search engines constitute the last third group and are just a combination of the two above groups together. Not only are people allowed to register their websites within these search engines, but also they also spread their robots around the Internet to collect information about websites. Ledford (2009) points out the importance of understanding how this classification looks like, because this determines the way and time our website would be indexed by search engines (robots will probably find the website faster than people, as it is an automated process.) While Google concentrates the most on the connection between website textual content and its links, MSN observes the dynamics of the textual content and meta tags. Yahoo! pays the most attention to density of keywords, especially in the title tag. 5.1 Google PageRank and SEO optimization for Google The Google Internet search engine has been the leading search engine for a longer time and introduces new trends in the searching world. In 2007 it got 58.4% in market traffic out of all other search engines (Grappone 2010). It is Google that has made links popularity and website age so 46 important. It is true that today SEO world functions in the way formed just by the leading Google. In addition, it gives to website owners a free SEO analyzing tool known as Google Analytics. “PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set” (http://en.wikipedia.org/wiki/PageRank/.) In other words it investigates the number and the quality of both incoming and outgoing website links. It is a voting system that compares our website with other websites our page links to. The page rank of our website is estimated in a recursive loop and there are many factors that determine the ending page rank value. At the and each website gets a value from 0 to the highest 10. This scale is not linear, and the difference between 4 and 5 is different from the one between 3 and 4 (Grappone 2010.) A sample PageRank dispersion is presented in the Figure 2 (http://en.wikipedia.org/wiki/PageRank.) Usually the higher page rank number is the higher position a website gets in the Google search results. It is worth getting incoming links that will lead to our website, but on the other hand we should be aware of the fact that high page rank in Google means not only links. Moreover, Google tends to present the page rank value that was actually counted a few months ago (the current one is confidential.) To optimize a website for Google, we should use and get acquainted with all the webmasters tools prepared by Google (http://www.google.pl/webmasters/.) 5.2 Websites optimization for MSN Ledford (2009) indicates that the MSN.com site currently uses the Microsoft Live search engine, although it is still possible to use the MSN search engine (Microsoft Live is the brand name of this technology.) In order to get a high position in the MSN results list, we should abide by the fundamental rules of organic SEO optimization. The MSN does not allow sponsored increase in page ranks. One feature requires our attention: this search engine pays more attention to the freshness and dynamics of the website textual content 47 in comparison with other engines. It means we should consider some SEO strategies dealing with how to effectively process the website texts. Similarly to Google, the MSN has introduced its own algorithms to index the Internet and it also has its own rules that we should abide by when optimizing a website. All these principles can be found on the MSN.com site by searching “Site Owner Help” (Ledford 2009.) The MSN algorithms search for clue keywords in meta tags, titles of subpages and textual content of the top HTML code. So we should combine relevant information with keywords at the beginnings of each subpage. 5.3 Websites optimization for Yahoo! The Yahoo! search engine also differs from Google and MSN. It concentrates on keywords density and keywords occurrence in URL links, as well as on titles tags. We can achieve successful SEO optimization in Yahoo! results list by assorting good keywords with our website. The Yahoo! crawler is called SLURP and checks keywords density to estimate the page ranking (Ledford 2009.) According to Yahoo! the optimized dispersion of keywords equals: 1. 15-20% in the title tag. The title content is displayed in the Yahoo! results list 2. 3% in the textual content around the body tag 3. 3% in keywords and description meta tags Yahoo! analyzes also incoming links. 5.4 Page rank fluctuations We should also be aware of the fact that successful SEO optimization does not only mean high position in the results list. What also counts is simply the quality of the textual content as well as the usability of the whole website. The website position within the results list may fluctuate a bit and this happens independently on SEO optimization that is being carried out. Grappone (2010) enlists some of the reasons for that: 1. Activity of competitive companies. Sometimes the successful SEO results from the laziness of our competition 48 2. Functioning of the server that our website is placed on. If web crawlers re-visit the website that is switched off (because our server does not work at this time) it will unprecedentedly lower the rank position at least till they index the page again 3. In order to be able to store all the indexed information, search engines use different databases. Each of them has to return slightly different results coming from the same searching query. The current position of our website may be dependent on the current database chosen by the search engine 4. Search engine algorithms are constantly modified, improved and changed. Algorithms are patterns according to which search engines organize data. We may never be sure what actions should be conducted. Grappone (2010) cites that generally “good HTML titles, good homepage textual content and removing all the obstacles that force crawlers indexation” (trans. P.G.) is the clue to successful SEO optimization 6. Conclusion SEO optimization is a process that lasts and includes many SEO techniques and strategies. Such campaigns require our patience as we will see effects of our optimization efforts only after a few weeks or months. We never really know what proper actions should be performed to achieve high page rank value of our websites. Actually, the high page rank value is the main purpose that SEO optimization aims at. Each on-line website has already had its own page rank estimation. It is not possible to guess the details of crawlers’ algorithms, especially that search engines seem to verify, develop and adjust their criteria too often. Unfortunately, the current estimation of websites page rank value is hidden and we are always provided with the value that was estimated by search engines some time ago. Although search engines differ in estimating a page rank, what undoubtedly counts is building websites in an organic way and using only ethical SEO techniques. It never pays off to conduct and bother about unethical strategies, as this usually results in penalizing the website or even in exclusion from the results list. SEO optimization focuses mainly on considered organization of website keywords and sets of incoming, outgoing and inner links. Search engines crawlers do explore and check 49 connections among the Internet websites and try to categorize them. Inserting relevant keywords within proper tags, links and embracing them with proper textual content helps crawlers index and categorize the website and thus increases the page rank value. Conducting SEO optimization is kind of knowledge and requires some experience that can be gained only in the course of time. 50 Examination of the PJWSTK website HTML code. Shifting to XHTML 1.0 Transitional Standards. 1. Introduction This part of my diploma work includes some examination of the website of the PolishJapanese Institute of Information Technology (http://www.pjwstk.edu.pl.) The aim of the research is to verify whether the whole HTML code of the website is prepared in a way that sticks to current W3C standards and whether it includes all major SEO optimization methods and techniques. Additionally, the goal of the inquiry is to answer the following crucial questions: are separate subpages built in structurally and semantically in a way that is friendly to search engines crawlers? Do they take into consideration search engines’ needs? Do they feed crawlers with key information that they search for when indexing the website? Do they generate high page rank values for the website in the most common search engines results lists? I have decided to verify some chosen website pages, the homepage and three separate subpages: “opłaty”, “studia I-go stopnia - informatyka” and “zasady rekrutacji”. Because other subpages structurally and semantically resemble the ones that got chosen, all the given adjustments and proposals can be implemented around the entire website in other places. The whole examination has been presented in three-column tables below. The headers of the tables inform us what section the examination concerns and where the source of the code comes from. First column includes chosen parts of the current HTML code that got copied from a browser. The second column includes the same code that got reorganized and improved and fully fulfils the current W3C standards. The last third column enlists all the explanation of why the current HTML code is not correct semantically or structurally, why particular changes have been implemented and how all these improvements become perceived by search engines spiders. 51 2. Examination of the PJWSTK html website code HOMEPAGE HTML METATAGS HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-2"> <title>PJWSTK. Polsko-Japooska Wyższa Szkoła Technik Komputerowych.</title> <meta name="Language" content="pl"> <meta http-equiv="pragma" content="nocache"> <meta name="Classification" content="Education"> <meta name="revisit-after" content="7 day"> <meta name="Description" content= "Najlepsza niepubliczna wyższa szkoła informatyczna w Polsce"> <meta name="Keywords" content= "informatyka, japonistyka, kultura japonii, programowanie, grafika, sztuka, magisterskie, inżynierskie, zaoczne, uczelnia, Informatyka po ekonomii, Informatyka po socjologii, Informatyka po marketingu, Informatyka po zarządzaniu, Informatyka po psychologii, Informatyka po kulturoznawstwie, Informatyka po dziennikarstwie, Informatyka po kierunkach <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <base href="http://www.pjwstk.edu.pl/" /> <title> PJWSTK. Polsko-Japooska Wyższa Szkoła Technik Komputerowych.</title> <meta name="description" content=" Najlepsza niepubliczna wyższa szkoła informatyczna w Polsce" /> <meta name="keywords" content="informatyka, japonistyka, kultura japonii, programowanie, grafika, sztuka, magisterskie, inżynierskie, zaoczne, uczelnia, Informatyka po ekonomii, Informatyka po socjologii, Informatyka po marketingu, Informatyka po zarządzaniu, Informatyka po psychologii, Informatyka po kulturoznawstwie, Informatyka po dziennikarstwie, Informatyka po kierunkach humanistycznych" /> <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> <meta http-equiv="Content-Language" content="pl" /> <meta name="robots" content="index,follow" /> <meta name="revisit-after" content="7 days" /> <meta http-equiv="expires" content="0" /> Change Explanation - The HTML 4.01 website code is outdated and is never used in current W3C standards - When re-building the website it is advised to shift from HTML into XHTML structure - Main DOCTYPE declaration should indicate that the whole website HML code is based on current W3C standards, e.g. XHTML 1.0 Transitional - It is easier to use UTF-8 encoding rather than ISO-8859-2 in terms of textual content - Each <head> metatag should be closed with “/>” - Important metatag ROBOTS is missing; it indicates that the website should be indexed (or not) by crawlers - The <base> tag is missing - Some other important metatags are missing, like EXPIRES, DISTRIBUTION, LANGUAGE - Overloading with keywords - there are too many keywords enlisted within the KEYWORDS metatag - There is no link added for the favico.ico icon (an image that appears to the left within the URL browser address) - As there may be many CSS files separately prepared for each website page, it is advised to gather all these CSS files in one CSS folder The same suggestions concern other pages and subpages. 52 humanistycznych"> <meta http-equiv="Generator" content="TigerII MiniPad (C)2001"> <link rel="stylesheet" type="text/css" href="main.css"> <!--<script type="text/javascript" src="http://tomproj.yum.pl/clicksCounter/js/f ull.js"></script>--> <script type="text/javascript"> siteCode = "94011da7317156a6b02433e9c61d9e2a"; </script> </head> <meta name="distribution" content="Global" /> <meta name="Language" content="pl" /> <meta name="Author" content="[author name]" /> <meta name="copyright" content="Copyright (c) PJWSTK" /> <link rel="stylesheet" type="text/css" href="css/main.css"> <link REL="shortcut icon" HREF="i/favicon.ico" /> <script type="text/javascript" src="http://tomproj.yum.pl/clicksCounter/js/full.js">< /script> <script type="text/javascript">siteCode = "94011da7317156a6b02433e9c61d9e2a"; </script> </head> HOMEPAGE LOGO HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code <a href="?"><img border=0 ALT="PJWSTK" src="i/PJWSTK_logo.gif"></a> <a href=http://www.pjwstk.edu.pl title=”strona domowa PJWSTK”><img src="i/PJWSTK_logo.gif" alt="PJWSTK"></a> Change Explanation - Website logo should be implemented into a link that leads to homepage (starting page for visitors that get lost), which is correct - The <a> link misses its TITLE attribute, which includes important information for search engine browsers - Tags attributes names should not be capitalized - Image border should be switched off within CSS style sheet files; the example below gets rid of all borders of images included within links: A IMG {border:0} 53 Anyway, if it the border attribute remains, it should be surrounded with quotation marks, e.g: <img src=”i/pjwstk_logo.gif” border=”0” /> The same suggestions concern other pages and subpages. HOMEPAGE GENERAL HTML STRUCTURE HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code <div id="wrapper"> <div id="naglowek">…</div> <div id="pierwsza">…</div> <div id="druga">…</div> <div id="trzecia">…</div> <div id="czwarta">…</div> </div> this is correct, no suggestion needed Change Explanation - The whole textual content of the homepage is divided into top and middle sections. All the sections are positioned with the help of <DIV> elements (not table rows), which is correct - Formatting of each DIV box is defined in the CSS file, which is also correct HOMEPAGE TOP LINKS - Warszawa, Gdaosk, Bytom Current website HTML code Shifting to XHTML 1.0 Transitional Code <table width=750 cellpadding=0 cellspacing=0><tr><td><img ALT="PJWSTK" src="i/PJWSTK.gif"> <a href="http://www.pjwstk.edu.pl"> <img ALT="Warszawa" style="margin:0px;margin-bottom:10px;" <ul id=”topmenu”> <li> <a href=”http://www.pjwstk.edu.pl” title=”PJWSTK Warszawa”><img src=”j/PJWSTK_Warszawa_1.gif” alt=”Warszawa” /></a> HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011 Change Explanation - Avoid using table cells, use <UL>/<LI> list, instead - The <li> elements can be positioned, formatted and accessed within a CSS style sheet file, for example: #topmenu {list-style:none} #topmenu LI {float:left} - Do not use style formatting directly in the website code, as all the 54 src="i/PJWSTK_Warszawa_1.gif" onmouseover="src='i/PJWSTK_Warszawa_2.gif '" onmouseout= "src='i/PJWSTK_Warszawa_1.gif'" border="0"></a> <a target="_blank" href="http://gdansk.pjwstk.edu.pl"><img ALT="Gdaosk" style="margin:0px;" src="i/PJWSTK_Gdansk_1.gif" onmouseover="src='i/PJWSTK_Gdansk_2.gif'" onmouseout="src='i/PJWSTK_Gdansk_1.gif'" border="0"></a> <a target="_blank" href="http://bytom.pjwstk.edu.pl"> <img ALT="Bytom" style="margin:0px;" src="i/PJWSTK_Bytom_1.gif" onmouseover="src='i/PJWSTK_Bytom_2.gif'" onmouseout="src='i/PJWSTK_Bytom_1.gif'" border="0"> </a> <a href=”http://www.pjwstk.edu.pl” title=”PJWSTK Warszawa”>Warszawa</a> </li> <li> <a href=”http://www.gdansk.pjwstk.edu.pl” title=”PJWSTK Gdaosk”><img src=”i/PJWSTK_Gdansk_1.gif” alt=”Gdaosk” /></a> <a href=”http://www.gdansk.pjwstk.edu.pl” title=”PJWSTK GdaoskGdaosk</a> formatting ought to be defined in the CSS file - Each <a> link should include TITLE tag, which is missing in the current HTML code - The <IMG> tags include ALT attributes, which is correct, but the ALT should be lower case - The <IMG /> tags ought to be closed with “/>” - If there are images used next to links, these images should also be converted into links; proposed <li> elements include such links - Important keywords (like Warszawa, Gdaosk, Bytom) should be text links (not images), so that links textual contents (and keywords within them) can be accessed by engines crawlers </li> <li> <a href=”http://www.bytom.pjwstk.edu.pl” title=”PJWSTK Bytom”><img src=”i/PJWSTK_Bytom_1.gif” alt=”Bytom” /></a> <a href=”http://www.bytom.pjwstk.edu.pl” title=”PJWSTK Bytom”>Bytom</a> </li> </ul> HOMEPAGE „UCZELNIA” SECTION LINKS HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code <div class="r3"> <ul> <li><a href='?strona=1594'>Władze</a></li> <li><a href='?strona=1593'>Historia</a></li> <div class="r3"> <ul> <li><a href=”/wladze” title=”Władze PJWSTK”>Władze</a></li> Change Explanation - <UL>/<LI> list is used (instead of table cells), which is correct - Each <a> link misses its TITLE attribute, which in this section is crucial for search engines, as each link includes a very important keyword and leads to important subpage 55 <li><a href='?kat=204'>Biblioteka</a></li> <li><a href='?kat=205'>Wydawnictwo</a></li> <li><a href='?kat=243'>Jednostki</a></li> </ul> </div> <li><a href=”/historia” title=”Historia PJWSTK”>Historia</a></li> <li><a href=”/biblioteka” title=”Biblioteka PJWSTK”>Biblioteka</a></li> <li><a href=”/wydawnictwo” title=”Wydawnictwo PJWSTK”>Wydawnictwo</a></li> <li><a href=”jednostki” title=”jednostki PJWSTK”>Jednostki</a></li> </ul> </div> - The word “PJWSTK” has been added in each title attribute, so that such a link can be indexed well by crawlers; thus, the following key phrase like “wydawnictwo pjwstk” entered in search engines may lead directly to the target PJWSTK subpage - Current links structure (e.g. “?strona=1594”) indicates that the content is dynamically read from database (which is correct), but the opened subpage URL address does not include any keywords that are searched for by engine crawlers; such a link like: http://www.pjwstk.edu.pl/?stro na=1594 should be turned into http://www.pjwstk.edu.pl/wladze, which will push up the “wladze” subpage within search engine results list. The same concerns all other links used in the whole website The same concerns all the links within the following sections in the second column: “REKRUTACJA”, “STUDIA I-go stopnia“, “STUDIA II-go stopnia”, “STUDIA III-go stopnia”, “STUDIA PODYPLOMOWE.” HOMEPAGE „PORTALE” SECTION LINKS HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code <div class="r2"> <img alt="" src="i/linia_216px.gif"> <p><a target="_blank" href="http://samorzad.pjwstk.edu.pl/"><im g border=0 src="i/samorzad.gif" ALT="samorzad"></a> <div class="r2"> <a href="http://samorzad.pjwstk.edu.pl/" class=”graylink” title=”samorząd” target="_blank”>Samorząd</a> <a target="_blank" href="http://www.biurokarier.pjwstk.edu.p <a href="http://biurokarier.pjwstk.edu.pl/" class=”graylink” title=”biuro karier” target="_blank”>Samorząd</a> … Change Explanation - There is no point in repeating each time the “linia_216px.gif” IMG tag, as such an arrow can be used as the background image for each <A> link. This will lower the website html weight and bring the same final result. The CSS sample definition of such a link can look like that: .r2 a {padding-left:15px;display:block;background-image: (i/linia_216px.gif) no-repeat 0 0} - url It is not advised to use image links that include important keywords, 56 l/"><img src="i/biuro.gif" ALT="biuro karier"></a> <br /> </div> … as images content cannot be accessed by crawlers and indexed. It is better to change them into text links - The structure of the URL links is correct, as these are sub domains. - The <BR> element should be closed with “/>” The same concerns the section “LOGOWANIE”. <BR> </div> HOMEPAGE OTHER SECTION LINKS – e.g. „REKRUTACJA” Current website HTML code Shifting to XHTML 1.0 Transitional Code <div class="r1"> <img alt="" src="i/strzalka.gif"> <a href="#"> <img style="margin:0px;" src="i/rekrutacja_1.gif" onmouseover="src='i/rekrutacja_2.gif'" onmouseout="src='i/rekrutacja_1.gif'" border="0" alt=""> </a> </div> <h1><a href=”http://www.rekrutacja.pjwstk.edu.pl/” title=”rekrutacja”>Rekrutacja</a></h1> <div class="r2"> <img alt="" src="i/linia_216px.gif"> <p> </div> HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011 Change Explanation - OR The whole DIV content can be substituted for the H1 element with adequate CSS formatting. The left red arrow, as well as dots below the inscription, can be designed as one image that is used as the background for the H1 element, for instance: <h1>Rekrutacja</h1> H1 A {font-size:14px;display:block; font-weight: bold; backgroundimage: url (i/h1.gif) no-repeat 0 0} - It would be better to make the section name link lead to its separate subpage devoted to recruitment; such a subpage should include some textual content filled in with adequate keywords connected with recruitment; in this case, the link structure would be: <a href=”http://www.rekrutacja.pjwstk.edu.pl/” title=”rekrutacja”>Rekrutacja</a> - All section names are incorporated into the website as images, which is not correct, as they are very important keywords and should be in a textual form. 57 The same concerns all other sections names, especially “STUDIA I-go stopnia”, “Studia II-go stopnia”, “Studia III-stopnia”, as well as “STUDIA PODYPLOMOWE.” Less important section names should be enclosed with H2 element. HOMEPAGE NEWS SECTION – e.g. „Wydarzenia” HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code Change Explanation <div class="r2"> <P><font color="#000000"><b>Wydarzenia</b></font> </P> <img alt="" src="i/linia_szara_216px.gif"> <P><a href="?strona=2113">Spotkanie z Japonią recital fortepianowy</a> </P> <img alt="" src="i/linia_szara_216px.gif"> <div class=”news”> - The section name „AKTUALNOŚCI” (or “wydarzenia”) should be linked (composed in a way explained above) and lead to separate subpage including all news list. <h2><a href=”/news” title=”wydarzenia”>Wydarzenia </a></h2> <a class=”title”>Spotkanie z Japonią – recital fortepian owy</a> <span class=”text”>Ambasada Japonii zaprasza na recital fortepianowy Tempei Nakamura 中村天平 który odbędzie się 8 lipca 2011 r. o godz. 19:00, w Auli Głównej PJWSTK</span> </div> - As crawlers browse website textual content to find keywords, it is advised to give some leading introductory) text below each news title. Such a leading text should consist of relevant keywords important to crawlers - Also giving some dynamics to the website (in terms of changing its textual content on average every 3 months) does pay off, and is regarded as advantage by search engines - News titles are incorporated within links, which is correct, but these links do not include any keywords; thus, they do not help crawlers’ indexation. - SEO links (proposed in the shifting column) include important keywords, which help indexation 58 SUBPAGE “OPŁATY” HTML METATAGS HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code Change Explanation <title>PJWSTK. Polsko-Japooska Wyższa Szkoła Technik Komputerowych.</title> <meta name="Language" content="pl"> <title>Opłaty - PJWSTK. Polsko-Japooska Wyższa Szkoła Technik Komputerowych.</title> Except for all the hints that have been mentioned for the homepage head section above, it would be advised to incorporate the following suggestions: <meta name="Description" content="Najlepsza niepubliczna wyższa szkoła informatyczna w Polsce"> <meta name="Keywords" content="informatyka, japonistyka, kultura japonii, programowanie, grafika, sztuka, magisterskie, inżynierskie, zaoczne, uczelnia, Informatyka po ekonomii, Informatyka po socjologii, Informatyka po marketingu, Informatyka po zarządzaniu, Informatyka po psychologii, Informatyka po kulturoznawstwie, Informatyka po dziennikarstwie, Informatyka po kierunkach humanistycznych"> <meta name="Description" content="Opłaty PJWSTK. Najlepsza niepubliczna wyższa szkoła informatyczna w Polsce" /> <meta name="Keywords" content="opłaty, informatyka, japonistyka, kultura japonii, programowanie, grafika, sztuka, magisterskie, inżynierskie, zaoczne, uczelnia, Informatyka po ekonomii, Informatyka po socjologii, Informatyka po marketingu, Informatyka po zarządzaniu, Informatyka po psychologii, Informatyka po kulturoznawstwie, Informatyka po dziennikarstwie, Informatyka po kierunkach humanistycznych" /> SUBPAGE „OPŁATY” - GENERAL HTML STRUCTURE Current website HTML code Shifting to XHTML 1.0 Transitional Code <div id="wrapper"> <div id="naglowek">…</div> this is correct, no suggestion needed - As this is the “opłaty” subpage, it is strongly advised to add the word “opłaty” to this subpage TITLE textual content. It will help index it by crawlers for payments - The best URL link for this subpage would be like: http://www.pjwstk.edu.pl/oplaty-pjwstk, http://www.pjwstk.edu.pl/oplaty or even better like - As this subpage is devoted to payments, the keyword “oplaty” is the most crucial one and should be added either to the KEYWORDS metatag or to the DESCRIPTION metatag. The metatag KEYWORDS would be even more appropriate - There are too many keywords enlisted after commas within the KEYWORDS metatag; there should not be more than 10 keywords enlisted - Despite the fact that this subpage is devoted to payments, none of the enlisted keywords concerns payments at all HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011 Change Explanation - The whole textual content of the homepage is divided into top and middle sections. All the sections are positioned with the help of <DIV> 59 <div id="pierwsza">…</div> <div id="środkowa">…</div> <div id="czwarta">…</div> </div> elements (not table rows), which is correct - Formatting of each DIV box is defined in the CSS file, which is also correct The same boxes division is used in other subpages. SUBPAGE “OPŁATY” - NAVIGATION HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code <p><a href="http://www.pjwstk.edu.pl/?">Strona główna</a> >> <a href="http://www.pjwstk.edu.pl/?kat=189"> Rekrutacja</a><h1>Opłaty</h1><p></p><p> <div id=”navigation”> <a href="http://www.pjwstk.edu.pl" title=”strona główna”>Strona główna</a> >> <a href="http://www.pjwstk.edu.pl/rekrutacja" title=”rekrutacja”>Rekrutacja</a> <h1>Opłaty</h1> </div> <p>Kolegium Rektorskie PJWSTK wprowadziło obniżkę czesnego dla osób wpłacających czesne w jednej racie rocznej (w wysokości 7%) lub w dwóch ratach semestralnych (w wysokości 3%).</p> <p> </p> Change Explanation - Navigation links miss their TITLE attributes; in this case, it is the most appropriate to fill in the attributes with the keyword “opłaty”, as the whole subpage is devoted to payments. This would help index this subpage by crawlers - All the new lines within the current HTML code are generated with the “<p>” tag, which is not correct, as this generates additional useless HTML code that has to be downloaded each time from the PJWSTK server. It is always advised to format text contents within CSS style sheet files. CSS formatting makes the HTML code more legible and helps avoid overweighting servers. For example, the following CSS “#navigation” declaration will get rid of the “<p>”s from the header, while the “.class” one will remove all the “<p> </p>”s from the XHTML code: <p class=”text”>Kolegium Rektorskie PJWSTK wprowadziło obniżkę czesnego dla osób wpłacających czesne w jednej racie rocznej (w wysokości 7%) lub w dwóch ratach semestralnych (w wysokości 3%).</p> #navigation {margin:10px 0px;width:100%} .text {margin-bottom:20px} - The crucial keyword “opłaty” for this subpage has been enclosed with the H1 tag, which is correct; search engines crawlers search for the most important keywords within the H1 and H2 tags 60 - The navigation link leading to recruitment subpage should include this keyword “rekrutacja” within its textual content, which is missing; an example of such a link would be: http://www.pjwstk.edu.pl/rekrutacja/ SUBPAGE “OPŁATY” – LEFT SECTION “REKRUTACJA” Current website HTML code Shifting to XHTML 1.0 Transitional Code <div class="r3"><ul><li><a href="http://www.pjwstk.edu.pl/?strona=160 4">Opłaty</a></li> <li><a href="http://www.pjwstk.edu.pl/?strona=160 6">Kursy przygotowawcze i maturalne</a></li> <li><a href="http://www.pjwstk.edu.pl/?strona=209 6">Transfer z innych uczelni</a></li> <li><a href="http://www.pjwstk.edu.pl/?strona=160 5">Szczegółowe zasady rekrutacji na kierunki artystyczne</a></li> <li><a href="http://www.pjwstk.edu.pl/?strona=160 3">Rejestracja on-line</a></li> <li><a href="http://www.pjwstk.edu.pl/?kat=223">Z asady rekrutacji</a></li> <div class="r3"> <ul> <li><a href="http://www.pjwstk.edu.pl/oplaty" title=”opłaty”>Opłaty</a></li> <li><a href="http://www.pjwstk.edu.pl/kursyprzygotowawcze-i-maturalne" title=”kursy przygotowawcze i maturalne”>Kursy przygotowawcze i maturalne</a></li> <li><a href=”http://www.pjwstk.edu.pl/transfer-zinnych-uczelni” title=”transfer z innych uczelni”>Transfer z innych uczelni</a></li> <li><a href=”http://www.pjwstk.edu.pl/szczegolowezasady-rekrutacji-na-kierunki-artystyczne” title=”Szczegółowe zasady rekrutacji na kierunki artystyczne”>Szczegółowe zasady rekrutacji na kierunki artystyczne</a></li> <li><a href=”http://www.pjwstk.edu.pl/rejestracja” title=”rejestracja”>Rejestracja on-line</a></li> <li><a href=”http://www.pjwstk.edu.pl/zasady- HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011 Change Explanation - Section links are enlisted with <UL>/<LI>, which is correct - All links miss keywords, which would help crawlers index the subpages for these words. SEO friendly links have been proposed in the second column, though - All links miss TITLE attributes; these titles ought to be filled in with appropriate keywords, which has been also proposed to the left - After having clicked on one of the links, the opened subpage should include textual content that is connected with the keywords incorporated within links and TITLE attributes; since search engines crawlers verify subpage textual content against used keywords, that would increase the subpage page rank significantly - When composing link names that are SEO friendly, it is worth considering first what searching phrase would be typed in by users to search for information that this link will lead to; having known the most appropriate keywords or keyword phrase, it is worth inserting these words within the SEO link 61 </ul></div> rekrutacji” title=”zasady rekrutacji”>Zasady rekrutacji</a></li> </ul> </div> SUBPAGE “OPŁATY” – LEFT COLUMN IMAGE LINKS Current website HTML code Shifting to XHTML 1.0 Transitional Code <div style="margin-top:0px;" class="r1"> <img alt="" src="oplaty_pliki/strzalka.gif" style=""> <a href="http://www.pjwstk.edu.pl/?kat=187"> <img style="margin: 0px;" src="oplaty_pliki/uczelnia_1.gif" onmouseover="src='i/uczelnia_2.gif'" onmouseout="src='i/uczelnia_1.gif'" alt="" border="0"></a></div> <div class=”r1”> <a href=”http://www.pjwstk.edu.pl/oplaty” class=”main” title=”opłaty PJWSTK”></a> <div class="r2"><img alt="" src="oplaty_pliki/linia_216px.gif"><p></p></ div> … <p> <a href="http://samorzad.pjwstk.edu.pl/"><img alt="samorzad" … <a href=”http://www.samorzad.pjwstk.edu.pl/” title=” Samorząd” class=”link”>Samorząd</a> <a href=”http://www.samorzad.pjwstk.edu.pl/” title=”Biuro Karier i Praktyk” class=”link”>Biuro Karier i Praktyk</a> … </div> HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011 Change Explanation - SEO friendly links that include appropriate keywords have been proposed, which will help subpage indexation for search engines crawlers - It is possible to simply abbreviate the current HTML code by formatting the text within CSS style sheet files; changing links background images can also be done via CSS styles. Such an operation gets rid of all the useless HTML code (seen in the middle column) which now has to be downloaded from the server; for the proposed XHTML code in the middle column, such CSS formatting would look more or less like the following: .r1 A.main {display:block;width:100%;height:30px; background-image: ur(i/mainlink.gif) no-repeat left bottom;text-decoration:none} .r1 A.link {color:gray;text-decoration:none; color:gray;fontsize:14px;font-weight:bold; width:100px;paddingleft:30px;background: url(i/link.gif) no-repeat left bottom} .r1 A.link:hover {color:red;text-decoration:none; font-size:14px;font62 src="oplaty_pliki/samorzad.gif" border="0"></a><br> weight:red;width:100px; padding-left:30px;background: url(i/link.gif) no-repeat left bottom} <a href="http://www.biurokarier.pjwstk.edu.pl/ "><img alt="biurokarier" src="oplaty_pliki/biuro.gif"></a><br> </p> HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011 SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” HTML METATAGS AND CONTENT OVERWIEW Current website HTML code Shifting to XHTML 1.0 Transitional Code Change Explanation <title>PJWSTK. Polsko-Japooska Wyższa Szkoła Technik Komputerowych.</title> <title>Informatyka - PJWSTK. Polsko-Japooska Wyższa Szkoła Technik Komputerowych.</title> - <meta name="Description" content="Najlepsza niepubliczna wyższa szkoła informatyczna w Polsce"> <meta name="Description" content="Informatyka. Studia I-go stopnia na PJWSTK. Najlepsza niepubliczna wyższa szkoła informatyczna w Polsce"> - <meta name="Keywords" content="informatyka, japonistyka, kultura japonii, programowanie, grafika, sztuka, magisterskie, inżynierskie, zaoczne, uczelnia, Informatyka po ekonomii, Informatyka po socjologii, Informatyka po marketingu, Informatyka po zarządzaniu, Informatyka po psychologii, Informatyka po kulturoznawstwie, Informatyka po <meta name="Keywords" content="informatyka studia I-ego stopnia, informatyka studia pierwszego stopnia, informatyka pierwszego stopnia, japonistyka, kultura japonii, programowanie, grafika, sztuka, inżynierskie, zaoczne, uczelnia, informatyka po ekonomii, Informatyka po socjologii, Informatyka po marketingu, Informatyka po zarządzaniu, Informatyka po psychologii, Informatyka po kulturoznawstwie, Informatyka po dziennikarstwie, Informatyka po - - The keywords "Informatyka" and ”I-ego stopnia” are the most crucial for this section, but they are missing in the subpage metatags (TITLE, KEYWORDS or DESCRIPTION) It is advised to brainstorm and enlist appropriate keywords or key phrases for each PJWSTK subpage and include them in section metatags. Crawlers try to associate metatags information with website textual content and store it altogether in search engines databases. When indexing it, they examine whether all this information suits each other to give this association some page rank value Currently used keywords are not so much adequate for the “informatyka – I-ego stopnia” specialization; there are keywords that should not be concerned here, e.g. “magisterskie” – this keyword is more appropriate for the “informatyka – II-go stopnia” specialization and ought to be indexed there The whole textual content of this subpage mostly consists of links, 63 dziennikarstwie, Informatyka po kierunkach humanistycznych"> <meta http-equiv="Generator" content="TigerII MiniPad (C)2001"> kierunkach humanistycznych"> <meta http-equiv="Generator" content="TigerII MiniPad (C)2001"> which is not so much correct, e.g. the middle column provides only a set of links. Except for them, there is no textual content suitable for this section (meaning textual description without links that would mix keywords in various forms appropriate for this specialization) SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” - NAVIGATION Current website HTML code Shifting to XHTML 1.0 Transitional Code <a href="http://www.pjwstk.edu.pl/?" >Strona główna</a> >> <a href="http://www.pjwstk.edu.pl/?kat=206">S tudia I-go stopnia</a> >> <a href="http://www.pjwstk.edu.pl/?kat=209">I nformatyka</a><h1>Informatyka</h1> <a href=”http://www.pjwstk.edu.pl/” title=”strona główna”>Strona główna</a> >> <a href=”http://www.pjwstk.edu.pl/studia-i-stopnia” title=”Studia I-go stopnia”>Studia I-go stopnia </a> >> <h1>Informatyka</h1> HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011 Change Explanation - SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” – MIDDLE BOX „INFORMATYKA” Current website HTML code Shifting to XHTML 1.0 Transitional Code <li><a href="http://www.pjwstk.edu.pl/?strona=167 3">Specjalizacje</a></li> … <li><a href="http://www.pjwstk.edu.pl/?strona=167 <li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/specializacje” title=”informatyka I-go stopnia - specializacje” >Specjalizacje</a></li> <li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/studia-dzienne/program-nauczania” Links miss their TITLE attributes that ought to include keywords appropriate for this section The section name “informatyka” has been inserted into the H1 tag, which is correct Links are not SEO friendly because they are not composed of any keywords appropriate for this section; the information “kat=206” gives no direction for crawlers how to index this HTML part HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011 Change Explanation - As computer science concerns the first level specialization here, it would be better to give this important information within SEO links for an ant; that is why the middle column links have been reorganized structurally and logically. The statement “?strona=167” says nothing valuable to search engines in terms of positioning and SEO strategies. These are appropriate places in XHTML code to communicate with 64 5">Program nauczania - studia dzienne</a></li> <li><a href="http://www.pjwstk.edu.pl/?strona=167 7">Studia Otwarte (internetowe)</a></li> title=”Informatyka I-go stopnia studia dzienne program nauczania”>Program nauczania - studia dzienne</a></li> - <li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/studia-internetowe” title=”Informatyka I stopnia – studia internetowe”>Studia Otwarte (internetowe)</a></li> SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” – LEFT BOX „INFORMATYKA” Current website HTML code Shifting to XHTML 1.0 Transitional Code The following code has been unnecessarily doubled in the middle column: The following changes for XHTML code have already been proposed in the table above: HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011 Change Explanation - <li><a href="http://www.pjwstk.edu.pl/?strona=167 3">Specjalizacje</a></li> … <li><a href="http://www.pjwstk.edu.pl/?strona=167 5">Program nauczania - studia dzienne</a></li> <li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/specializacje” title=”informatyka I-go stopnia - specializacje” >Specjalizacje</a></li> <li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/studia-dzienne/program-nauczania” title=”Informatyka I-go stopnia studia dzienne program nauczania”>Program nauczania - studia dzienne</a></li> ants and they should not have been emptied The whole concept of how to organize information in pages and subpages should be considered before SEO links are composed, as the words “informatyka” and “I-go stopnia” can be changed in positions. This depends on the logics and structure that the whole PJWSTK website sticks to This code should have been placed only once in the subpage, because such repetition can be regarded as “farm of links” by crawlers Except for doubling the set of links, it is better to leave the links in the left column, and remove them from the middle column. After that, the middle column could contain textual information (mixed with appropriate keywords) for the left column opened links. Thus, the whole subpage gathers both links and textual content, which would be more appropriate <li><a href="http://www.pjwstk.edu.pl/?strona=167 7">Studia Otwarte (internetowe)</a></li> 65 SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” – LEFT COLUMN IMAGES LINKS Current website HTML code Shifting to XHTML 1.0 Transitional Code <div style="margin-top:9px;" class="r1"> <img alt="" src="studia_i_stopnia_informatyka_pliki/strza lka.gif" style=""> <a href="http://www.pjwstk.edu.pl/?kat=206"> <img style="margin: 0px;" src="studia_i_stopnia_informatyka_pliki/stud ia_I_st_1.gif" onmouseover="src='i/studia_I_st_2.gif'" onmouseout="src='i/studia_I_st_1.gif'" alt="" border="0"></a> </div> <div class=”r1”> HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011 Change Explanation - <ul> <li><a href=”http://www.pjwstk.edu.pl/studia-i-gostopnia/informatyka” title=”Studia I-go stopnia Informatyka”>Studia I-go stopnia Informatyka</a></li> </ul> </div> - - SUBPAGE “ZASADY REKRUTACJI” HTML METATAGS AND CONTENT OVERVIEW By placing all the text formatting within CSS files, it is possible to significantly abbreviate the HTML code Images content cannot be accessed by crawlers, so there is no point in making images links; these should be substituted for textual links When using images, their ALT attribute has to be added and filled in with adequate keywords, as crawlers verify links TITLEs and images ALTs to gather indexation information Different links structure has been proposed, so that they become SEO friendly Links have been inserted into <ul>/<li> list HTML source: http://www.pjwstk.edu.pl/?kat=223, retrieved July 2011 Current website HTML code Shifting to XHTML 1.0 Transitional Code Change Explanation <title>PJWSTK. Polsko-Japooska Wyższa Szkoła Technik Komputerowych.</title> <title>Rekrutacja PJWSTK. Polsko-Japooska Wyższa Szkoła Technik Komputerowych</title> - <meta name="Description" content="Najlepsza niepubliczna wyższa szkoła informatyczna w Polsce"> <meta name="Description" content="Zasady rekrutacji PJWSTK. Informatyka na studiach I i II stopnia. Rekrutacja na kierunki Sztuka Nowych mediów, kultura - This subpage is devoted to recruitment, but no such information is mentioned within metatags, that is why different textual content for metatags has been proposed There are too many keywords enlisted in the KEYWORD metatag in the current html code. These words do not strictly identify the subpage with recruitment 66 Japonii oraz studia podyplomowe" /> <meta name="Keywords" content="informatyka, japonistyka, kultura japonii, programowanie, grafika, sztuka, magisterskie, inżynierskie, zaoczne, uczelnia, Informatyka po ekonomii, Informatyka po socjologii, Informatyka po marketingu, Informatyka po zarządzaniu, Informatyka po psychologii, Informatyka po kulturoznawstwie, Informatyka po dziennikarstwie, Informatyka po kierunkach humanistycznych"> - <meta name="Keywords" content=" rekrutacja informatyka, rekrutacja japonistyka, rekrutacja kultura japonii, programowanie, grafika, sztuka, rekrutacja magisterskie, rekrutacja inżynierskie, rekrutacja PJWSTK zaoczne, rekrutacja studia podyplomowe " /> SUBPAGE “ZASADY REKRUTACJI” - NAVIGATION Current website HTML code Shifting to XHTML 1.0 Transitional Code <a href="http://www.pjwstk.edu.pl/?">Strona główna</a> >> <a href="http://www.pjwstk.edu.pl/?kat=189"> Rekrutacja</a> >> <a href="http://www.pjwstk.edu.pl/?kat=223">Z asady rekrutacji</a><h1>Zasady rekrutacji</h1> <a href=”http://www.pjwstk.edu.pl/” title=”strona główna”>Strona główna</a> >> <a href=”http://www.pjwstk.edu.pl/rekrutacja” title=”Rekrutacja”>Rekrutacja</a> >> <h1>Zasady rekrutacji</h1> Similarly to the “Studia I-go stopnia informatyka”, this subpage includes only sets of links and does not include any passages of text; it would be advised to combine links with text here HTML source: http://www.pjwstk.edu.pl/?kat=223, retrieved July 2011 Change Explanation - Links miss their TITLE attributes that ought to include keywords appropriate for sections they lead to The section name “Zasady rekrutacji” has been inserted into the H1 tag, which is correct Current links are not SEO friendly because they do not inherit any keywords appropriate for this section; the information “kat=189” gives no information for search engines indexation 67 SUBPAGE “ZASADY REKRUTACJI” – MIDDLE COLUMN „ZASADY REKRUTACJI” Current website HTML code Shifting to XHTML 1.0 Transitional Code <div class="r3"><ul><li><a href="http://www.pjwstk.edu.pl/?strona=160 4">Opłaty</a></li> <li><a href="http://www.pjwstk.edu.pl/?strona=160 6">Kursy przygotowawcze i maturalne</a></li> <li><a href="http://www.pjwstk.edu.pl/?strona=209 6">Transfer z innych uczelni</a></li> <li><a href="http://www.pjwstk.edu.pl/?strona=160 3">Rejestracja on-line</a></li> </ul></div> <div class="r3"><ul><li><a href="http://www.pjwstk.edu.pl/zasadyrekrutacji/oplaty" title=”opłaty”>Opłaty</a></li> <li><a href="http://www.pjwstk.edu.pl/ zasadyrekrutacji/kursy-przygotowawcze-i-maturalne" title=”kursy przygotowawcze i maturalne”>Kursy przygotowawcze i maturalne</a></li> <li><a href="http://www.pjwstk.edu.pl/ zasadyrekrutacji/transfer-z-innych-uczelni" title=”transfer z innych uczelni”>Transfer z innych uczelni</a></li> … <li><a href="http://www.pjwstk.edu.pl/zasadyrekrutacji/rejestracja-online">Rejestracja online</a></li> </ul></div> HTML source: http://www.pjwstk.edu.pl/?kat=223, retrieved July 2011 Change Explanation - All TITLE attributes have been added to links - Links have become SEO friendly and contain keywords to increase page rank value for this subpage and to indicate to search engines what the subpage refers to - The current links structure “?strona=160” does not include any clue data for crawlers, so they have to guess by the surrounding textual content what section subpage concerns; - the obstacle is that except for sets of links, there is no other textual content in this subpage, which is not correct 68 3. Conclusion While conducting the examination of the PJWSTK website, I have investigated and checked some chosen parts of its HTML code against current SEO optimization standards and techniques. Having considered all the requiring and crucial information that is searched for by crawlers, I have checked all the key places in the HTML code where such the information was expected to be found. Because current standards of designing websites base structurally on the XHTML, all the suggested modifications and suggestions in my analysis stick to these W3C directions. The aim of the examination was to verify whether the current PJWSTK html code has been prepared in such a way that generates its high page rank value, thus pushing the website up in the most common search engines results list. After having examined the code of the home page and other three chosen subpages, I have unfortunately come to the conclusion that it had been built in a way that does not support search engines indexation and does not stick to the W3C international standards. In my analysis many changes had to be proposed for the current version of the website, and that is why, when working over a new version of the website, the following major SEO optimization strategies and suggestions should be unavoidably implemented: 1. The outdated HTML code should be substituted for the XHTML one. The explanation for this shift has already been elaborated on in the previous chapters of this diploma work. 2. When browsing the website, the content of the TITLE, KEYWORDS and DESCRIPTION metatags does not change at all, or they merely change, which does not support indexation and does not generate high page rank values. Each subpage contains different textual content and different sets of links, but it is not stated within the metatags. There are many subpages where textual content varies but their keywords list still remains unchanged. In many cases the chosen keywords do not suit the subpage textual content. From crawlers’ point of view it means low page rank value. 3. The structure of the website links and images has to be repaired, as their key attributes are missing or remain with no value. Almost in all cases, the links miss their TITLE attributes. They should be filled in with appropriate keywords, adequate 69 to the subpage textual content, so that the whole content is cohesive. The same goes about images ALT attributes, they should have been filled in with appropriate subpage keywords. It is commonly known that one of the search engines crawlers’ algorithms is to check and analyze the contents of TITLEs and ALTs. 4. The structure of links is not SEO friendly and does not support indexation at all. In all cases (except for sub-domains that naturally consist of keywords in their prefixes) the link includes some database information like “?strona=123”, where the number 123 means just the website inner database row ID. But this means nothing to crawlers, because it does not include any keywords. If the subpage is devoted to recruitment or payments, the whole URL link should include such clue keyword in it, but it does not. 5. There are many links, especially in the left column of the subpages, that are images links, but they should be textual ones. All the keywords implemented within images are legible for us as human beings, but image contents cannot be accessed by robots. Thus, any keywords cannot be distinguished by crawlers. The textual content of links is another place to be investigated by crawlers, and being aware of that fact, they should be prepared and implemented reasonably. 6. Almost all subpages consist of only sets of links; they do not contain any other textual content. This can be regarded as a farm of links and the whole subpage can be classified as kind of spamming. For SEO positioning and for high page rank values, it always pays off to implement links altogether with raw textual content and mix them with each other. 7. Almost all the formatting of text is done with the help of other HTML elements, which at the end gives much unnecessary HTML code. The whole website code weighs too much and is overloaded. It has to be downloaded each time from the PJWST server, so it wastes the server transfer and power. All text formatting should be done with the help of CSS style sheet files, which is one of the W3C standards and SEO optimization methods. CSS declarations purify HTML codes from needless tags and allow robots to concentrate on pure textual contents and incorporated keywords. 70 REFERENCES Danowski, B., Makaruk, M. (2007). Pozycjonowanie i optymalizacja stron WWW. Jak to się robi. Gliwice: Helion Grappone, J., Couzin, G. (2010). Godzina dziennie z SEO: Wejdź na szczyty wyszukiwarek. Gliwice: Helion Ledford, J. (2009). SEO Biblia. Gliwice: Helion Lieb, R. (2010). Pozycjonowanie w wyszukiwarkach internetowych: Poznaj najlepsze praktyki pozycjonowania i bądź zawsze pierwszy. Gliwice: Helion Thurow, S. (2008). Pozycjonowanie w wyszukiwarkach internetowych. Autorytety informatyki. Gliwice: Helion Zeldman, J. (2007). Projektowanie serwisów www: Standardy sieciowe. Gliwice: Helion ON-LINE REFERENCES http://en.wikipedia.org/wiki/Web_crawler, retrieved March 15, 2011 http://www.w3.org/MarkUp/html-spec/, retrieved April 6, 2011 http://en.wikipedia.org/wiki/Web_standards, retrieved April 6, 2011 http://pl.wikipedia.org/wiki/HTML, retrieved April 7, 2011 http://en.wikipedia.org/wiki/Web_browser, retrieved April 10, 2011 http://www.w3.org/Consortium, retrieved April 10, 2011 http://en.wikipedia.org/wiki/XHTML, retrieved April 11, 2011 http://www.w3.org/Style/CSS/, retrieved April 22, 2011 http://www.w3.org/DOM/#what, retrieved April 22, 2011 71 http://www.marketleap.com/verify, retrieved May 8, 2011 http://en.wikipedia.org/wiki/Search_engine_optimization, retrieved May 10, 2011 http://en.wikipedia.org/wiki/Search_engine_marketing/, retrieved May 10, 2011 http://amrithallan.com/what-is-organic-seo/, retrieved May 10, 2011 http://en.wikipedia.org/wiki/PageRank, retrieved May 11, 2011 http://students.ceid.upatras.gr/~papagel/project/kef5_5.htm, retrieved May 13, 2011 http://en.wikipedia.org/wiki/Keyword_density, retrieved May 13, 2011 http://en.wikipedia.org/wiki/Keyword_stuffing, retrieved May 13, 2011 http://www.google.pl/webmasters, retrieved May 13, 2011 http://www.w3schools.com/TAGS/tag_hn.asp, retrieved May 18, 2011 http://www.mattcutts.com/blog/seo-advice-url-canonicalization/, retrieved May 19, 2011 72