The W3C standard of composing websites. SEO optimization

Transcription

The W3C standard of composing websites. SEO optimization
Informatyka
Bazy danych
Przemysław Godlewski
Nr albumu s7785
The W3C standard of composing websites. SEO
optimization and positioning websites within the most
common search engines. Examination of the PJWSTK
website for positioning in the search engines.
Praca magisterska napisana
pod kierunkiem
Prof. Lecha Banachowskiego
Warszawa, lipiec, 2011
1
WPROWADZENIE
Celem niniejszej pracy magisterskiej jest przedstawienie najnowszych standardów
projektowania witryn internetowych (W3C). Omówione zostaną dogłębnie zagadnienie
optymalizacji SEO, jak również techniki i metody skutecznego pozycjonowania witryn www
w najbardziej dostępnych wyszukiwarkach internetowych (Google, Yahoo!, MSN).
W pierwszym rozdziale pracy zestawione zostaną sposoby budowania witryn
internetowych w oparciu o obowiązujące standardy (W3C, model DOM, XHTML
Transitional, AJAX, CSS, typy dokumentów DOCTYPE). Swoją uwagę skoncentruję na
różnicach pomiędzy najbardziej dostępnymi wyszukiwarkami, przestarzałą technologią
HTML (i niekorzyściami wynikającymi z jej zastosowania) oraz obecnie używaną
technologią XHTML. Mając na uwadze przyszłośd powstających obecnie witryn,
przedstawię korzyści płynące z projektowania stron www w oparciu o model W3C.
Drugi rozdział pracy poświęcony zostanie zagadnieniu optymalizacji SEO witryn
internetowych, którego głównym celem jest wysoka pozycja w rankingach wyszukiwania.
Omówione zostaną SEO organiczne, etyczne oraz nieetyczne techniki i strategie SEO,
spamowanie
SEO,
szacowanie
PageRank
stron
internetowych
przez
roboty
najpopularniejszych wyszukiwarek (Google, Yahoo!, MSN). Swoją uwagę zwrócę również
na umiejętny dobór i gęstośd słów kluczowych oraz prawidłowe wbudowanie ich w
tekstową zawartośd witryny oraz strukturę XHML. Ważnym zagadnieniem będą również
linki wewnętrzne i zewnętrzne strony internetowej. Opisane zostaną również zagadnienia
pozycjonowania witryn internetowych w najpopularniejszych wyszukiwarkach (Google,
Yahoo!, MSN), to jest: reguły dostępności stron www, pojęcie wyszukiwalności, dobór oraz
znaczenie metatagów oraz słów kluczowych, struktura oraz sposób tworzenia kodu XHTML
wspierającego pozycjonowanie organiczne, strategie pozycjonowania po stronie serwera, a
także praktyczne wskazówki pozycjonowania.
W
trzecim
rozdziale
zostanie
wykonany
audyt
witryny
internetowej
http://www.pjwstk.edu.pl. Celem badania będzie zweryfikowanie kodu strony oraz jego
struktury pod kontem technologii W3C oraz organicznego pozycjonowania SEO w
najpopularniejszych wyszukiwarkach internetowych, zwłaszcza w Google.
2
INTRODUCTION
The purpose of my dissertation is to present the newest W3C standards of composing
XHTML websites. I am going to focus deeply on website SEO optimization (Search Engine
Optimization), as well as on SEO techniques and methods to sufficiently position websites
within the most available search engines (Google, Yahoo!, and MSN.)
The first chapter is devoted to the way of composing websites that are based on the
most up-to-date W3C standards (including DOM model, XHTML Transitional, AJAX, CSS,
and DOCTYPE.) I will present web search engines and pay attention to difference between
the most available ones. I will compare outdated HTML technology (and its disadvantages)
with currently used XHTML technology. Regarding the future usability of current websites,
I am going to emphasize advantages of the W3C model.
In the second chapter I am going to work around the SEO optimization of websites
which aim is to achieve high positions in search engines results list. I will focus on organic
SEO, ethical and unethical SEO strategies and techniques, spamming SEO and estimating
websites PageRank values by the most common search engines crawlers (Google, Yahoo!,
and MSN.) I will deeply focus on the role of keywords and their density, as well as on
incorporating them into the textual content of websites. I will also discuss the role of
incoming and outgoing links in terms of SEO optimization. Another issue will be positioning
a website within the most common search engines (Google, Yahoo!, MSN), which is: the
rules and conditions of website availability, composing of meta tags and their importance,
the structure and the way of composing the XHTML code that supports positioning,
strategies of positioning at a server side, as well as some practical and general hints for
positioning a website organically.
The last chapter concerns some searching given to the current website of the PolishJapanese Institute of Information Technology in terms of organic positioning the website
sufficiently within the most common search engines. I will verify the structure of the HTML
website code in comparison with the W3C standards and SEO strategies. I am also going to
propose some suggestions for improving semantic correctness of the website code (if
required.)
3
TABLE OF CONTENTS
WPROWADZENIE ....................................................................................................................... 2
INTRODUCTION ......................................................................................................................... 3
SEARCHING ENGINES AND WEB BROWSERS. XHTML AND W3C STANDARDS OF DESIGNING
WEBSITES ................................................................................................................................... 6
1. INTRODUCTION .................................................................................................................... 6
2. HISTORY AND FOUNDATIONS OF THE HTML DOCUMENTS............................................................. 7
3. WEB SEARCH ENGINES ........................................................................................................... 8
3.1 Web crawlers ............................................................................................................ 9
3.2 Search engines’ algorithms ....................................................................................... 9
3.3 Loading data and creating rankings........................................................................ 10
3.4 Classification of searching engines ......................................................................... 12
4. WEB BROWSERS. ................................................................................................................ 13
5. LIMITATIONS AND DISADVANTAGES OF NON-STANDARDIZED HTML.............................................. 13
5.1 Backwards agreement ............................................................................................ 14
5.2 Ahead agreement ................................................................................................... 14
6. Advantages of web standards ...................................................................................... 15
7. XHTML, CSS AND W3C DOM MODEL AS ELEMENTS OF WEB STANDARDS .................................. 16
7.1 Definition of XHTML ............................................................................................... 16
7.2 Semantics of the XHTML code ................................................................................ 17
7.3 Advantages of XHTML............................................................................................. 17
7.4 W3C. Model DOM ................................................................................................... 18
7.5 CSS – Cascading Style Sheets .................................................................................. 19
8. COMPOSING XHTML WEBSITES ............................................................................................ 20
8.1 DOCTYPE and names declaration. Character encoding ......................................... 20
8.2 Formatting tags and attributes in XHTML 1.0 Transitional. Conversion to
XHTML............................................................................................................................... 22
9. STANDARDS MODE AND QUIRKS MODE IN WEB BROWSERS ........................................................ 23
10. CONCLUSION ..................................................................................................................... 25
WEBSITES SEO OPTIMIZATION. ETHICAL AND UNETHICAL SEO STRATEGIES AND
TECHNIQUES. PAGERANK VERSUS SEO SPAMMING. .............................................................. 27
1. INTRODUCTION .................................................................................................................. 27
2. DEFINING SEO................................................................................................................... 27
2.1 SEO purposes .......................................................................................................... 29
2.2 SEO optimization plan ............................................................................................ 29
2.3 What is organic SEO? .............................................................................................. 31
2.3.1 Organic elements of websites ............................................................................. 31
2.3.2 Benefits of organic optimization ......................................................................... 32
4
3. SEO STRATEGIES AND TECHNIQUES ........................................................................................ 33
3.1 Hosting, domains, navigation and other chosen elements of websites friendly
to SEO ............................................................................................................................... 33
3.1.1 Hosting................................................................................................................. 33
3.1.2 Domain name ...................................................................................................... 33
3.1.3 Navigation ........................................................................................................... 34
3.1.4 Sitemap................................................................................................................ 34
3.1.5 TITLE tag .............................................................................................................. 34
3.1.6 HTML headings .................................................................................................... 34
3.1.7 Javascript and Flash ............................................................................................. 35
3.1.8 URL links in SEO strategies. URL canonicalization. Dynamically generated
URL addresses ............................................................................................................... 36
3.2 Keywords and keywords prominence. Distinguishing keywords ........................... 37
3.2.1 Heuristic searching .............................................................................................. 38
3.2.2 Keywords used in website links........................................................................... 39
3.2.3 Appropriate keywords and keyword phrases ..................................................... 39
3.2.4 Keyword density versus overloading with keywords .......................................... 39
3.3 Website incoming and outgoing links. Linkages ..................................................... 40
3.3.1 Incoming links ...................................................................................................... 41
3.3.2 Outgoing links ...................................................................................................... 41
4. SPAMMING SEO. SPAMMING TECHNIQUES ............................................................................. 43
5. SEO OPTIMIZATION FOR THE MOST COMMON SEARCH ENGINES: GOOGLE, MSN, YAHOO! ............... 46
5.1 Google PageRank and SEO optimization for Google .............................................. 46
5.2 Websites optimization for MSN ............................................................................. 47
5.3 Websites optimization for Yahoo! .......................................................................... 48
5.4 Page rank fluctuations ............................................................................................ 48
6. CONCLUSION ..................................................................................................................... 49
EXAMINATION OF THE PJWSTK WEBSITE HTML CODE. SHIFTING TO XHTML 1.0
TRANSITIONAL STANDARDS. ................................................................................................... 51
1. INTRODUCTION .................................................................................................................. 51
2. EXAMINATION OF THE PJWSTK HTML WEBSITE CODE ................................................................ 52
3. CONCLUSION ..................................................................................................................... 69
REFERENCES ............................................................................................................................ 71
5
Searching engines and web browsers. XHTML
and W3C standards of designing websites
1. Introduction
Everyone seems to have used the Internet so far to browse for some specific issues, as
the Internet has already become the most available source of information nowadays. Can
we imagine living without the Internet now? Probably not. We are familiar with the most
popular and available search engines (for example with the Google search engine), as
everyone has already had it installed on their computer. Our common knowledge about
searching engines very often means just to know how to run Google and how to make it
search for some keyword query. After having searched for an issue, we are provided with
paged lists of websites that go around the topic.
There are so many websites in the Internet concerning the same or similar issues
nowadays, that there has been a battle among websites owners (known as website
positioning) to get the highest positions in the results list returned to a user in a web
browser. The higher position a website gets (the best rank would be within the first page)
the more visits it consequently gets. The problem of reaching out for higher positions is
more and more important and significant. It has been estimated that “it's important to be
in the top 3 pages of a search result because most people using search engines don't go
past the 3rd page” (http://www.marketleap.com/verify/.)
Nowadays there are many companies that only deal with positioning websites within
the most common searching engines. There are also two major ways of positioning a
website within these engines. First of all, a website positions itself by its XHTML structure
(it is known as organic positioning and is explained in details later on) and its keywords
spread around the site content. Technology, structure, as well as textual and graphical
content of the website significantly affect search engine rankings and return a higher or
lower position of the website in the results list. In case searched keywords are very
common, or too many websites include them, it is not enough (but still important and
relevant) to create semantically correct XHTML code. It has to be positioned artificially and
this is the other way to push the website up in the results list. Artificial positioning is not
6
strictly dependent on website content, structure and technology it is based on. What we
know is the fact that searching engines are more and more sensitive about artificial
positioning, and once it has been recognized by a search engine, such a website falls down
drastically within the results list (if not disappears at all from the list for a longer time.) I
am not going to focus on artificial positioning in my dissertation.
In order to get the “know-how” of how to make a website appear higher and higher in
the search engines results list, first we have to understand how searching engines work
and how they process searched information to create their rankings. This knowledge will
be fundamental to understand how the XHTML website code should be composed and
how it affects these engines. This is what the first chapter deals with, though. Initially, I am
going to present search engines (and the phenomenon of crawlers), then I will go on to
focus on current standards and technology of composing websites – XHTML and the W3C
model. Further on, I am also going to point out advantages of shifting from an old HTML
technology into currently used XHTML one.
2. History and foundations of the HTML documents
The HTML abbreviation stands for the Hypertext Mark-up Language and is a mark-up
language with the help of which we can create hypertext documents (http://www.w3.org/
MarkUp/html-spec/.) These documents are not dependent on platforms they are
displayed in and have been in use by the World Wide Web (WWW) since 1990.
The HTML mark-up derives from SGML documents and was originally initiated by a
physics Tim Berners-Lee (Zeldman 2007), who devised ENQUIRE – a prototype of a
hypertext information system. In 1980 this system was used to make some research
documents available to others (http://pl.wikipedia.org/wiki/HTML.) The revolutionary idea
hidden behind the language was that a user could have used some references to browse
information that had been physically stored in remote places.
The first specification about HTML called HTML tags was published by Tim Berners-Lee
in 1991 and included 22 basic tags that were the basement for building HTML documents.
Nowadays 13 of these tags have been still in use by website programmers.
In 1993 the IETF (Internet Engineering Task Force) organization published the first
specification of an HTML document (proposed by Berners-Lee and Dana Connolly) called
7
HTML - “Hypertext Markup Language Internet-Draft.” That meant a DTD (Document Type
Definition) - a grammar description for HTML documents. Since inventors of web searching
engines started comparing attributes of existing HMTL tags with the DTD document, the
process of standardization of the HTML language has commenced.
In 1993 the IETF organization founded an HTML Working Group and one year later
HTML 2.0 got published officially (known as Requests for Comments since 1996.) That was
regarded as the first complete and standardized specification of an HTML document, as
well as that was the basement for future HTML implementations. An HTML 1.0
specification has never existed, as there was a need to differentiate previous specification
attempts from the current HTML 2.0 one, though. Since 1996 the HTML specification has
been developed and influenced by World Wide Web Consortium (W3C), and in 2000 the
HTML 2.0 became an international standard for creating websites (ISO/IEC 15445:2000.)
The last specification of HTML dates from 1999 and is known as HTML 4.01. After that,
HTML has slowly shifted towards XHTML (Extensible Hypertext Markup Language) - the
most up-to-date standards of designing websites now.
3. Web search engines
A searching engine (abbreviated also to “engine” from now on) is a program that uses
applications to collect information about web sites (Ledford 2009.) This information
consists of keywords, phrases, URL inner links, the website code and other important
pieces of information that can specify closer the contents of a website that gets browsed.
After having gathered all the required information, the engine indexes the data and stores
it in its databases.
At the front-end of a web searching engine one can see a searching tool into which it
is possible to input keywords or phrases to be searched for. A searching engine uses
special algorithms to explore data stored in the database (each engine includes its own
large database) and tries to return results that are the most adequate to the given
keyword query. Thus, a user is provided with a list of paged sites and links retrieved from
the database of the engine.
8
3.1
Web crawlers
To define a web crawler we can say that it “is a computer program that browses the
World Wide Web in a methodical, automated manner or in an orderly fashion. Other
terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots”
(http://en.wikipedia. org/wiki/Web_crawler.) Web crawlers are responsible for the
process of gathering and retrieving information from websites. They are like visitors that
visit each website and get detailed information about the site. Ledford (2009) indicates
that there are over 100 million websites nowadays and that each day around 1.5 million
new websites appear each month. Having a phrase to be searched for, a searching engine
goes through all references gathered by web spiders.
As a matter of fact, such an engine consists of a few parts working together. More
detailed specification about the search engine structure is not available. It is associated
with its interface, which is just one part of it, as the rest is hidden behind the interface. The
structure and methods to store the collected data inside the database are confidential and
strictly determine each engine and its owner.
3.2
Search engines’ algorithms
According to Ledford (2009) searching algorithms are the most important parts out of
all parts that a search engine consists of. This is the basement of each engine on which the
rest parts are built. These algorithms construct ways of presenting materials to users.
Ledford also specifies that a searching algorithm is a procedure to solve problems that
classifies the obstacle, estimates a number of possible answers and returns the solution to
the problem. Initially, the algorithm accepts the keyword, browses its database with
catalogued keywords and their URL addresses. After that it gathers websites that contain
the searched phrase and these results are returned to a user.
Retrieved results strictly depend on the algorithm that has been used and there are
many kinds of algorithms. Each engine seems to cooperate with a different algorithm,
which results in the fact that the same keyword may return different list of websites in
different engines. The most common kinds of algorithms that Ledford (2009) proposes
break down as follows:
9
1. Linear searching that is based on separate keywords. Data is searched linearly, as if
it were organized in a list. Not only does it retrieve a separate element from an
engine database, but also going through milliards of websites may take too much
time, though.
2. Tree searching that tries finding sets of information from the narrowest set to the
widest one. These sets of information remind of trees; separate information may
ramify or spread out into other data, similarly to websites that include inner links
to other pieces of information. It is also based on a hierarchical structure of storing
data. This hierarchy means that the process of searching goes from one point to
another depending on some data rankings stored in the database. This algorithm
seems to be very useful in the Internet.
3. SQL (Structured Query Searching) searching allows retrieving information
independently on subclass it belongs to, as the information is not organized
hierarchically.
4. Searching based on solid information algorithm finds sets of information that is
organized in a tree structure. Contrary to its name, not always does it return the
best choice from database as the algorithm believes in general nature of answers.
It is helpful to find data in some specific sets of information.
5. Opposing searching retrieves all possible solutions to the problem and seems to be
useless as the number of returned solutions may be infinite.
6. Searching based on restricted satisfaction is probably the best algorithm to be used
when looking for a specific keyword or phrase in the Internet. The solution has to
fit some criteria and sets of database information may be analyzed in different nonlinear ways.
These are only the most popular kinds of algorithms and they may be used simultaneously.
The clue to maximize searching results is to understand how search engines work and
what their requirements are.
3.3
Loading data and creating rankings
It is clear that he process of searching data by an engine consists of the activity of the
web ants, its database and the built-in algorithm (Ledford 2009.) independently on search
10
engines, their main goal is to find a website in the Internet (Danowski 2007.) The clue is to
specify the rankings of the searched results, as these rankings allow displaying a website in
the back-end browser. First of all, SEO optimization (will be elaborated on further) means
to guess how a specific searching engine creates its data rankings.
Rankings play significant role in SEO optimization, but in this chapter I would like just
to emphasize different types of criteria that browsers use to collect information. Ledford
(2009) enlists the following types of criteria:
1. Localization in terms of how keywords and phrases are spread out in the HTML
document. Some browsers check if the searched keyword appears at the beginning
of the HTML code or below it, which influences the ranking of the site. The higher
the phrase is placed in the code, the higher ranking the site receives. The best
option is to insert the keyword in the TITLE meta tag of the site, but this is going to
be described further on.
2. Frequency in terms of how often the keyword is repeated in the HTML code. The
more often the keyword appears in the HTML text, the better ranking it gets, but
nowadays engines can sufficiently recognize the phenomenon of spamming
keywords. This occurrence means that hidden keywords are artificially repeated
too many times in the HTML code, which results in lowering the ranking of the site.
Moreover, some search engines do ignore or do repel such sites that may never
appear in the results list in the future.
3. Links. The ranking of a website strictly depends on the type and number of inner
links within a website. Both links that redirected a user to the site and inner site
links are significant. On the other hand, it does not work in the way that the more
links a site includes, the higher ranking it receives. We just know the fact that
incoming links, inner website links and links to leave the site influence the ranking,
but the algorithm to estimate the ranking number may be different and is hidden
behind each engine.
4. Number of clicks in terms of what the number of clicking a proper web site looks
like in opposition to the number of clicking other sites within rankings. Ledford
does emphasize that some searching engines cannot monitor the number of
11
visiting each site, and that is why some of them simply store the number of clicking
the site within the displayed results, though.
3.4
Classification of searching engines
Ledford (2009) groups search engines into basic (first-rate) ones, second-rate ones
and brand (topical) ones. The basic ones (around which I am going to speculate the most)
are Google, Yahoo! and MSN. This first group generates the most visits to websites, so they
should be taken into consideration first when planning SEO optimization. Different results
retrieved for a user come from different algorithms behind each engine.
The Google search engine seems to be the king of all engines, partly because it
provides a user with precise searching results. It has been the accuracy of the results that
made the engine so popular. To make it come true, Google founders combined searching
keywords with popularity of links. Joining these two criteria together has resulted in
producing more precise results.
Yahoo! is also regarded as a web engine, which is true, but it also means an Internet
catalogue. It indexes different websites and organizes them within categories and
subcategories. Originally, Yahoo! was just about to combine a list of the most popular
websites gathered by two founders of the browser.
The MSN search engine does not give so sophisticated possibilities of searching, as the
two ones above do give. Ledford (2009) implies that the MSN is not able to analyze the
number of website clicks, but on the other hand it takes into consideration the contents of
a website. The way to appear higher in the results list of the MSN is to prepare adequate
meta tags data and to spread out convenient keywords within the HTML text.
The second-rate engines are not as popular as the ones above, and they are
addressed to less numerous audience. They do not generate so much traffic to a proper
website, but they are useful in regional and more restricted searching. These engines also
differ in terms of how they compose rankings and analyze information. Some of them rely
on keywords, others rely on mutual links, yet others rely on meta tags and some hidden
criteria known only to the engine founders.
The topical search engines constitute the most specialized ones and are very often
devoted to one brand (e.g. medicine, sport) around which the searching results fluctuate.
12
4. Web browsers.
We need a web browser to search for a specific
keyword query and return it back to a user within paged
results list. “A web browser or Internet browser is a
software application for retrieving, presenting, and
traversing information resources on the World Wide Web.
An information resource is identified by a Uniform
Resource Identifier (URI) and may be a web page, image,
video, or other piece of content.[1] Hyperlinks present in
resources enable users to easily navigate their browsers to
related
resources”
(http://en.wikipedia.org/wiki/Web_browser .)
As it is shown in the Figure 1 to the right, different
types of Internet browsers are used to display a web page
(http://en.wikipedia.org/wiki/Web_browser.) The research conducted in January 2011
indicates that Internet Explorer (43.55%) is the leading web browser now. The second
most common browser is Mozilla Firefox (29.0%) and the third one is Google Chrome
(13.89 %.)
5. Limitations and disadvantages of non-standardized HTML
Website owners want their sites to cost less, work better and be available to larger
number of receivers, not only in nowadays web browsers but also in the future ones
(Ledford 2009.) Web browsers are liable to age (in opposition to search engines that are
inevitably developed and become better and better), which results in the fact that created
websites in the past are very often displayed incorrectly nowadays. We build websites in
order to re-build them in the future. Instead of adding additional features to an existing
website, a programmer again seems to concentrate on adjusting them to current web
requirements. Different users use different web browsers (Firefox, Internet Explorer,
Opera, and Safari) to download website content. The difference is significant in case a
website code is not standardized (is acceptable only by some web browsers) and very
13
often such an old website is simply illegible to visitors. If the site and its content are not
fully available to a user, it means that the site owner looses another client.
5.1
Backwards agreement
According to Ledford (2009) the backwards agreement means using non-standardized
or even restricted and non-practical HTML tags or sets of code, so that each user
(independently on what browser and its version they use) could “experience the same”
when browsing a website. Actually, it is just a partial solution, as the site gathers more and
more pieces of programming code to adjust the site to different web circumstances, which
consequently makes the site code more and more illegible, as well. This effect is known as
“code division.” This is the shortest way to re-write the website code again in the future,
for adding new functionalities again has to take into consideration all available web
browsers and their numerous versions. Following this way, it is never possible to fulfill all
browsers versions, and there is a need to find a cut-off point - behind which the way a
website behaves is ignored. A chosen older version of a web browser (e.g. IE6 or Netscape
1.1) can be treated as such a point.
Companies try to find different solutions to the hindrance. Some of them decide to
limit their budget income by either considering older and older websites versions (which
requires to pay for additional hours spent over adjusting the website code) or by sticking
to a chosen browser (thus turning down all other visitors who do not use the chosen
browser.) For instance, because of some fake establishments, there can be more and more
sites working correctly only in the Internet Explorer (sometimes even only for Windows.)
Ledford speculates that it is around 15-25% potential users or clients that are lost by a
company. There is also no point in sticking to the idea of designing a website only for one
browser requirements. It can never be assured that the chosen web browser will be still a
leading one among all other ones.
5.2
Ahead agreement
The aim is to break the process of websites aging, so that they behave consequently in
various web circumstances. The ahead agreement means that any existing website (that
was correctly designed and built) should cooperate with each browser, platforms and
other Internet devices (no need to re-design the site again or write additional code to
14
adjust the site.) He stresses the fact that such a site should still work, even if new browsers
appear or if existing ones will be provided with new functionalities. It is only possible by
using some web standards, which allow designing websites for all browsers and their
versions with such easiness and comfort, as if the site were designed only for a chosen
one.
6. Advantages of web standards
“A Web standard is a general term for the formal standards and other technical
specifications that define and describe aspects of the World Wide Web. In recent years,
the term has been more frequently associated with the trend of endorsing a set of
standardized best practices for building web sites, and a philosophy of web design and
development
that
includes
those
methods”
(http://en.wikipedia.org/wiki/Web_standards.)
Web standards (abbreviated also to standards) play important role to each medium
nowadays (Ledford 2009) and have been accepted in the web international market. As
website programming has already got standardized, there is no way but to learn and use it
in practice. To a company a standardized website code means saving money and time, as
well as making the site contents available to all visitors independently on tools they use to
download the website. When using standards it is possible to make the web materials
more portable and it is always advantageous, especially for website visitors. It always
reduces the cost of so called labor. They are not just the sets of rules; they remain
consistent with the previous ways of creating websites. Ledford says they are like the
“prolongation and continuation of previous techniques.” Web standards have already
been implemented in the newest web browsers and Internet devices that will certainly
exist and work in the future altogether with the development of browsers and other
standards. Building websites supported with standards lowers the costs of production and
maintenance and makes the site available to larger audience. Ledford emphasizes that
standards “take in everybody and we are able to serve these users that still use older
browsers” (backwards agreement.)
15
7. XHTML, CSS and W3C DOM model as elements of web standards
Ledford (2009) accepts the general division of web standards that consist of three
major elements. The first element is structure (XHTML, XML), the second one is
presentation (CSS1, CSS2) and the last one is behavior (ECMAScript, model DOM.)
The XHTML language (explained in details further on) includes text data formatted in
co-ordination with their structural and semantic meaning: the HTML code includes titles,
subtitles, tabs, lists etc. The XHTML that includes only allowed tags is fully portable.
The CSS presentation languages (Cascading Style Sheets) format the webpage and
control the way a website is shown in the screen. They deal with typography, text
organization and size, colors etc. Because presentation is separated from the HTML
structure, by changing a CSS file, a programmer can modify the way a website is presented
to a visitor without touching the HTML code.
The standardized object model (W3C DOM that is focused on below) remains
consistent with CSS, XHTML and ECMAScript 262 (JavaScript language version.) It allows
creating advanced functions and special effects working consistently in all Internet
browsers and platforms.
7.1
Definition of XHTML
“XHTML (eXtensible HyperText Markup Language) is a family of XML markup languages
that mirror or extend versions of the widely-used Hypertext Markup Language (HTML), the
language in which web pages are written. While HTML (prior to HTML5) was defined as an
application of Standard Generalized Markup Language (SGML), a very flexible markup
language framework, XHTML is an application of XML, a more restrictive subset of SGML.
Because XHTML documents need to be well-formed, they can be parsed using standard
XML parsers” (http://en.wikipedia.org/wiki/XHTML.) In 1993 WC3 established XHTML 1.0
as recommendation and one year later XHTML 1.1 got developed.
Actually XHTML is not regarded as a successor of HTML; it can be seen as HTML in the
XML format (http://pl.wikipedia.org/wiki/XHTML.) This means that XHTML remains
consistent with XML requirements. It is worth mentioning that Mozilla Firefox and Opera
have already fully adjusted to the newest standards, while the Internet Explorer has not
yet.
16
7.2
Semantics of the XHTML code
The structure of the XHTML code is correct if it does not include any errors (all tags are
closed and only allowed tags and attributes are implemented, e.g. the attribute “height” is
not allowed for a table in XHTML.) Such correctness can be simply checked with the help of
free on-line tools, for instance http://www.validator.W3C.org.
The XHTML code is semantically correct if all tags are used in co-ordination with their
meaning. For example, using the H1 tag to indicate an important title is semantically
correct, but using this tag just to display some text with a larger font is semantically
incorrect.
This all means that a webpage can be correct structurally, but incorrect semantically.
Semantic correctness of a webpage is very important for SEO optimization, as web
crawlers rely on the semantics of the XHTML code to search for keywords and for
indexation purposes. Each site is required to be correct both structurally and semantically
nowadays.
7.3
Advantages of XHTML
Zeldman (2007) enumerates 10 advantages of using XHTML code that break down as
follows:
1. “XHTML is the current tags standard that substitutes HTML4” (trans. P.G.)
2. It is designed to cooperate with other script languages and applications that are
based on XML, which is not available in case of HTML.
3. It is more coherent than HTML.
4. XHTML 1.0 is the basement for other XHTML versions in the future. It will always be
easier to shift from XHTML to newer and newer versions of XHTML in the future
rather than from older and older HTML code.
5. Older versions of browsers deal equally with HTML and XHTML code.
6. Newer versions of browsers prefer XHTML to HTML, as XHTML can be easily
envisaged comparing to HTML. Zeldman emphasizes the fact that IE and Firefox
display CSS formatting more precisely if XHTML DOCTYPE is declared.
7. XHTML functions well in wireless devices, which means that it is possible to get
through to more users without additional software.
17
8. XHTML is the part of the W3C standard.
9. XHTML is the language of structure, which means that it is displayed always in the
same way independently on web browsers (provided that CSS documents include
text formatting.)
10. XHTML can be validated with the help of free online tools. It saves time to check
XHTML code errors and cohesion on one’s own. It is easy to forget about some tag
attributes, e.g. the “title” attributes in links or the “alt” attributes within image
definitions, which can be simply found by the validation tools.
7.4
W3C. Model DOM
“The World Wide Web Consortium (W3C) is an international community where
Member organizations, a full-time staff, and the public work together to develop Web
standards”,
whose
mission
is
to
“lead
the
Web
to
its
full
potential”
(http://www.w3.org/Consortium.) It was established in 1994 (Ledford 2009), it evaluates
web specification and names some rules to make different web technologies work
together. It is comprised of around 500 organizations and has introduced the following
specifications: XHTML, CSS and the standardized document object model – DOM.) Firstly,
the specifications were regarded as recommendations, but in 1998 a project of web
standards commenced and the term “recommendations” got substituted for “standards.”
Also ECMA (European Computer Manufacturers Association) is an organization that deals
with web standardization. It is in charge of the ECMAScript, commonly known among
programmers as JavaScript.
What is DOM? The W3C website explains that “the Document Object Model is a
platform- and language-neutral interface that will allow programs and scripts to
dynamically access and update the content, structure and style of documents. The
document can be further processed and the results of that processing can be incorporated
back into the presented page” (http://www.w3.org/DOM/#what.) In other words, the
DOM model imitates traditional programs running in a web browser (Zeldman 2007) and
allows websites to behave as if they were traditional computer applications. For example,
a user can sort a table list by clicking on the list header (imitating a traditional Excel
document) and the rest of the site remains untouched. Initially, model DOM was designed
18
to imitate traditional applications only at the client’s side, as performed actions were to be
conducted successfully even without Internet connection. That was the beginnings of the
AJAX technology that allows now reloading chosen website areas without refreshing the
whole content. The DOM model proves that it is possible to built interactive standardized
websites.
Zeldman (2007) enlists the following web browsers that operate the W3C DOM model:
Netscape6+, Mozilla 0.9+, IE5+, and Opera 7+. Pocket devices, mobile phones still do not
co-operate with the DOM model.
7.5
CSS – Cascading Style Sheets
The leading purpose of my dissertation is to focus on website organic SEO
optimization, that is why I do not want to explain here how to build CSS declarations for
XHTML documents. I just want to present some advantages of using CSS styles, evoking the
fact that CSS formatting has already become a part of W3C standards. What does CSS
stand for? “Cascading Style Sheets (CSS) is a simple mechanism for adding style (e.g., fonts,
colors, spacing) to Web documents” (http://www.w3.org/Style/CSS/.)
Using CSS files allows to separate website formatting and presentation from its
structure and content. This separation brings about many advantages:
1. A website can be downloaded faster to users’ browsers. It means that servers are
not overloaded; they are less encumbered, for there is no need to format each
subpage separately (as CSS styles are usually prepared globally.)
2. It saves time for designing, programming, updating and maintaining a website, as
changes in one CSS document can influence all the subpages of the website. Global
changes to the website can be incorporated within several minutes.
3. People responsible for textual content do not have to worry about the layout and
there is almost no probability that they will spoil the layout, as the layout
formatting is separated from the website content.
4. Programmers and designers do not have to worry that changes made by the
website owner will spoil the way it is displayed (Zeldman 2007.)
5. A website becomes more and more portable when sticking to W3C standards. This
brings about some growth in website availability to users.
19
Attaching a CSS style to an XHTML document should be carried out according to a
method called “the most possible scenario” (Zeldman 2007.) It means that, first of all, a
programmer should prepare first CSS styles document for trustworthy browsers that can
operate Standards Mode. If the website is displayed correctly in these browsers, a
programmer can prepare a separate CSS file (using “@import” directive) to adjust the
website in older browsers, too. It can be conducted as follows:
Sample
<link rel="stylesheet" type="text/css" href="css/styles.css" />
<!--[if IE]> <style type="text/css">@import url(css/ie.css);</style> <![endif]-->
Following the method above allows avoiding the phenomenon known as “searching for the
least common solution” that works correctly in all browsers (both older and newer ones.)
8. Composing XHTML websites
There are different specifications of XHTML (e.g. XHTML 1.1 Strict), but the most
common is XHTML 1.0 Transitional. It is more difficult to convert an old HTML code into
XHTML 1.1 Strict rather than into XHTML 1.0 Transitional, though. The XHTML 2.0
specification is still under development. I am going to focus deeply on using the XHTML 1.0
Transitional specification as the basement for building the most up-to-date websites.
8.1
DOCTYPE and names declaration. Character encoding
All XHTML documents start with the DOCTYPE declaration (presented below altogether
with the namespace declaration) that informs web browsers how to interpret and verify
the whole code. The declaration is inserted at the beginning of XHTML documents. It
includes information about the XHTML version. Each DOCTYPE specifies different sets of
rules that the whole website code follows. Zeldman (2007) mentions that tags code and
CSS style will not be verified well, since there is no DOCTYPE declaration. In addition,
DOCTYPE defines the way the website is provided to a user.
The XHTML 1.0 proposes three types of documents declared by DOCTYPE:
20
1. Transitional – the most common and advised type when converting HTML into XHTML.
This type is the most close to the old HTML structure and allows using outdated tags
attributes (e.g. it still allows incorporating the “bgcolor” attribute for tables’ rows,
which should be defined in the CSS files.) So, it can be treated as kind of shifting
towards new web standards.
2. Strict – more restrictive than the transitional one. It allows defining structure and does
not accept any visual formatting.
3. Frameset – if a document uses “<frameset>” elements, it should be based on frameset
DOCTYPE.
XHTML namespace declaration follows DOCTYPE declaration that extends the “<html>”
element. This is the collection of elements types and attributes names that are strictly
associated with document type (Zeldman 2007.)
A correct DOCTYPE and namespace declaration can be composed as follows:
Sample
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
http://www.w3.org/TR/xhtml1/DTD/ xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang=”en” lang=”en”>
The last two attributes within the namespace declaration indicate the language of the
document as well as the language version of the incorporated XML.
In order to make a website be interpreted correctly by a web browser, as well as to
prepare a websites for verification tests, it is required to set its character encoding. It can
be ISO-8859-1 (also known as Latin-1), ISO-8859-2 (for Polish signs) or Unicode (utf-8.) The
default character encoding for XML and XHTML documents is Unicode which gives a
unique number for each character independently on language used within the document.
Of course, programmers can use different types of encoding they prefer (e.g. American
and Western Europe websites use Latin 1 encoding.) The ISO 8859 is a multi-language and
standardized set of graphic chars encoded with 8 bits.
A sample of the character declaration can be composed as follows:
21
Sample
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
8.2
Formatting tags and attributes in XHTML 1.0 Transitional. Conversion to XHTML
There are some rules a programmer should abide by when composing XHTML
websites. First of all, all tags should be lowercase, as XHTML (that derives from XML)
differentiates whether a char is capitalized or not. Zeldman underlines the fact that “all
elements and names of attributes have to be lower case, otherwise the document will not
be verified well” (trans. P.G.) For example, the “<TITLE></TITLE>” tag should be
substituted
for
the
“<title></title>”
equivalent.
Consequently,
the
attribute
“onMouseOver” has to be turned into “onmouseover.” But this restriction concerns only
elements names and attributes names, as the other content of the website can include
capital letters.
Secondly, all values of the tags attributes have to be inserted between quotation marks
(e.g. width=”100”, not width=100), which was not required in the old HTML structure.
All attributes have to have their values. The old HTML structure allowed empty
attributes, which inside XHTML is regarded as an error from now on. These values are the
same as the names. The chart below presents the difference:
HTML
<th nowrap>
XHTML
<th nowrap =”nowrap”>
<hr noshade>
<hr noshade=”noshade”>
<input type=”radio” name=”gender”
<input type=”radio” name=”gender” value=”f”
value=”f” checked>
checked=”checked”>
Another rule is that all tags have to be closed, even empty ones. When programming
an HTML document, it was possible to leave the “<td>” tag without closing it. The XHTML
1.0 specification requires all tags to be closed, the “<td>” has to be ended with the “</td>”
tag. Empty tags, such as “<br>”, ” <img>”, “<input>” have to be closed with a slash at the
end, like: “<br />”, “<img />”, and “<input />.”
22
Zeldman (2007) also mentions that double dashes should be used only at the beginning
and the end of the comments. The chart below presents an example of both incorrect and
correct usage of XHTML remarks:
Incorrect XHTML remarks
<!--incorrect -- remarks here and below -->
Correct XHTML remarks
<!—correct remark here and below-->
<!-------------------========--------------------->
<!--==========================-->
9. Standards Mode and Quirks Mode in web browsers
When displaying a web site, a web browser has to be informed whether it should work
in Standards Mode (website XHTML and CSS declaration is compatible with the W3C
standards, so the whole XHTML is treated and displayed more restrictively) or in Quirks
Mode (the website includes an old HTML code for different browsers with many browsers
adjustments that exclude each other.) To make issue weirder Zeldman mentions that the
Standards Mode for Gecko web browsers (Mozilla Firefox, Netscape) is quite different
from the one of the Internet Explorer. In order to adjust the difference between webs
browsers engines, in the course of time, Gecko and Netscape engineers proposed third
working mode (called “almost Standards Mode”) - a mode working similarly to the
Standards Mode in IE browsers.
It is the DOCTYPE declaration that says to a web browser into what mode it should
switch when displaying a web page. Moreover, the presence or lack of some information
within this declaration results in turning on different modes for Gecko and IE browsers.
Depending on the Standards or Weird mode, the same web page can be displayed in
completely different ways for a user. Zeldman (2007) describes the mechanism as follows:
1. If the XHTML declaration includes full URL address, it turns on Standards Mode for
Gecko and IE browsers, as well as for Safari, Opera 7+. Moreover, some
declarations of HTML 4 DOCTYPE also switch on Standards Modes.
2. If the DOCTYPE declaration is old, or if it does not include a full URL address, or if
there is no DOCTYPE declaration at all, it turns on Quirks Mode for Gecko and
Mozilla browsers, which means that probably the XHTML or HTML code in not
standardized and should be treated and displayed less restrictively. This non-
23
standardized mode switches on so-called backwards agreement and tries to display
CSS styles in the same way they would be like in the old IE4/5.
The examples below invoke a full and partial (relative) DOCTYPE declaration for Gecko
browsers. The first one switches on the Standards Mode, whereas the other switches on
the Quirks Mode:
Sample
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "/DTD/xhtml1transitional.dtd">
As a matter of fact, both two declarations above switch on the Standards Mode for IE
browsers, as the IE engine searches for any DOCTYPE declaration independently on the full
or relative URL address.
Zeldman (2007) enlists a set of XHTML DOCTYPE declarations that switch on either
Standards Mode or “Almost Standards Mode”. They are as follows:
1. XHTML 1.0 Strict – runs full Standards Mode for all browsers that use DOCTYPE
declaration to invoke the mode. It does not work in Opera browsers version less
than 7.0 and in IE for Windows version less than 6.0.
Sample
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
2. XHTML 1.0 Transitional – runs full Standards Mode for Gecko browsers and IE
browsers (IE6+ for Windows and IE5+ for Macintosh.) It has no effect for Opera
browsers (version less than 7.0) and for IE (Windows version less than 6.0.)
Sample
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”
24
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
3. XHTML 1.0 Frameset – runs full Standards Mode for Gecko browsers and for IE
browsers (IE6+ for Windows and IE5+ for Macintosh). It runs “almost Standards
Mode” for Netscape 7+. It does not affect Opera browsers with version less than
7.0 and for IE browsers with version less than 6.0.
Sample
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Frameset//EN”
“http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd”>
4. XHTML 1.1 - runs full Standards Mode for all browsers that use DOCTYPE
declaration to switch on the mode. It does not affect Opera browsers with version
less than 7.0 and IE browsers for Windows with version less than 6.0
Sample
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN”
“http://www.w3.org/TR/xhtml1/DTD/xhtml1.dtd”>
10. Conclusion
There are three major searching engines: Google, Yahoo!, and MSN.) Each search
engine sends their crawlers to browse the Internet and collect information. Searched
information gets stored in search engines databases from which it is retrieved later on for
users. Each engine operates different algorithms to create some rankings and importance
of the stored information, which allows filtering out data after some keyword queries are
entered to be found. On the one hand, it is not possible to get the “know-how” of how
engines create rankings and place sites within their paged results list (as this information is
restricted.) On the other hand, some general understanding how they work helps a lot.
Moreover, we have already had some experience in creating websites code, as it results
from observing how changes in code affect the results rankings. This knowledge is the
basement to understand how to build the XHTML website code that would be semantically
correct.
25
XHTML is the current standard to create websites, as it is more coherent. There are
different specifications of XHTML (e.g. XHTML 1.1 Strict, XHTML 2.0 that is still in
development), but the most common is the XHTML 1.0 Transitional, which is the basement
for other XHTML versions in the future. Programmers face shifting from an outdated HTML
code into XHTML one, which requires some adaptation time. Zeldman (2007) presents a
selected list of myths that may still exist in programmers’ consciousness when adjusting a
website to different web browsers. He proves that each of them can be abolished when
sticking to the W3C standards:
1. Availability forces programmers to create two versions of a website
It is not true. Using W3C standards allows creating one version that is equally
available in all browsers and mobile devices
2. Availability costs too much
It is not true. Adding some elements (that improve website availability) takes
usually little time and less cost that is returned by acquiring new clients
3. Availability forces programmers to compose primitive and low quality websites
It is not true. By using CSS styles, XHTML, JavaScript and AJAX technology it is
possible to control the layout and behavior of websites that become more and
more attractive and available to all users
Because web browsers get adjusted to current W3C standards more and more, and still
there are many websites based on an old HTML technology, the XHTML 1.0 Transitional
mode allows finding some agreement between HTML and XHTML nowadays.
26
Websites SEO Optimization. Ethical and
unethical SEO strategies and techniques.
PageRank versus SEO spamming.
1. Introduction
It is not enough to build a website based on a standardized XHTML structure. What
also counts for website indexation is a well-organized textual content of websites. A wellbuilt XHTML structure helps robots examine such a content, retrieve the most important
information about the website, categorize it among other on-line websites and eventually
estimate its page rank value (explained later on.) The higher a page rank value is the higher
the website appears in search engines results lists.
In this chapter I am going to focus on the most important techniques and strategies of
how to achieve a high page rank value of websites. Processing organizing and working with
the website contents is known as SEO optimization or SEO campaigns (which means all
actions performed to optimize websites for search engines.) It is not possible to discuss all
strategies, so I did my best to choose the most relevant ones in the context of the most
common search engines: Google, Yahoo!, and MSN. There are ethical, unethical and
“almost unethical” SEO optimization techniques. Unethical and cheating campaigns are
regarded as spamming SEO.
After defining SEO and focusing on advantages of organic SEO, I am going to discuss
major elements that organic SEO consists of. Chosen SEO strategies and techniques will be
elaborated on (both ethical and unethical ones.) The most important techniques will
concern keywords (their density and their organization within XHTML structure and tags)
as well as website links (incoming, outgoing and inner ones.)
One who wants to achieve best results in ranking positions needs to understand SEO
optimization for the most common search engines (Google, MSN, and Yahoo!). Each of
these three search engines put emphasis on slightly different aspects of SEO strategies.
Google has introduced its own term of the PageRank value. I will elaborate on these
differences at the end of this chapter.
2. Defining SEO
27
“Search engine optimization (SEO) is the process of improving the visibility of a website
or a web page in search engines via the "natural" or un-paid ("organic" or "algorithmic")
search results. In general, the earlier (or higher on the page), and more frequently a site
appears in the search results list, the more visitors it will receive from the search engine's
users” (http://en.wikipedia.org/wiki/Search_engine_optimization.) According to Grappone
(2010) SEO (known also as search engines marketing or Internet marketing) includes some
actions that lead to generating more visits on a website. On the one hand, SEO
optimization means improving the site XHTML code and structure, on the other hand, it
also stands for generating additional traffic, communicating with search engines,
investigating campaign effects, focusing and analyzing market competition. “SEO is not an
advertisement, but advertisement may be one of the elements of the campaign” (trans.
P.G.), as it has its targets and purposes. Ledford (2009) adds that it also means
understanding how searching engines work, what differences among them are.
“Search engine marketing, (SEM), is a form of Internet marketing that seeks to
promote websites by increasing their visibility in search engine result pages (SERPs)
through the use of, paid placement, contextual advertising, and paid inclusion. SEM is not
SEO,
as
SEM
constitutes
Adwords”
(http://en.wikipedia.org/wiki/Search_engine_marketing.)
Do we have to bother about SEO optimization on our websites? Grappone (2010)
generally replies “yes”, but there will not always be a positive answer to the question.
Sometimes it is not recommended to expose some confidential company data to all the
Internet users or the website already gets satisfactory position in a results list.
Additionally, SEO optimization brings effects in the course of time, if we do not have
enough time, we may not face significant improvements. Also, if our website is going to be
re-built soon, there may be no point in optimizing its outdated version. As SEO
optimization is a complex process that lasts, we may call it a SEO campaign. Such a
campaign should not take place more often that every three months and not less than
twice a year. Generally SEO is both knowledge and art (Grappone 2010.)
28
2.1 SEO purposes
SEO is the knowledge of how to configure website elements in order to achieve best
results in search engines ranks (Ledford 2009.) A website is like a person in a crowd, and to
make it more distinguishable in this crowd, we need some optimization criteria to be
fulfilled. Some of them are as follows: links textual content and context, website
popularity, meta tags, keywords, website language, textual content of the whole website
and website maturity. Depending on different search engines and their algorithms, all
these elements become more or less important when creating website rankings.
In order to conduct a successful SEO campaign, we need a clear purpose we aim at.
This results from companies’ business needs and it may be either increasing products
selling or generating more orders from a website. Ledford (2009) says it is not enough to
state that we need to increase more visits to our websites, we have to know what we want
to achieve further on when a user opens the website (e.g. we want to generate 50 orders a
month.)
On the average, SEO purposes are developed and re-estimated again every 6 months
and they should be harmonized with business purposes. Whatever we change on our
website, it is connected with our business purposes and is supported with SEO purposes
(Ledford 2009.)
2.2 SEO optimization plan
After we have pointed out our SEO purposes, we need to compose our SEO
optimization schedule, though. First, we have to give some priority to all the subpages our
website consists of. This will allow us to treat all the optimization issue separately in parts,
which will not overwhelm us and which will maximize effects in shorter periods of time
(Ledford 2009.) It is advised to give the highest priority to these pages that both naturally
attract users the most (for instance the home page) and that get the most visits and
generate the highest profit. Giving priority to websites automatically defines some
strategies of marketing efforts to be brought in. Actually, there may be a number of
subpages with the highest priority importance.
Having embraced with the estimated priorities, we have to evaluate the website, which
will allow us to verify what the progress of the SEO optimization is and where we are
29
currently heading for. When optimizing a website, subpages are said to be as important as
the whole website is in general, or even more. In order to evaluate a website, we consider
the following elements (Ledford 2009):
1. Meta tags. Relevant meta tags help classify the website correctly by search engines.
The most important are the title, the keywords and the description meta tags.
2. Textual content of a website. We need to think over how often the website is
updated and how often the textual content changes (is dynamic and refreshed.)
Search engines still take it into consideration (it is one of the searching engine
algorithms built-in feature.) If some site content remains unchanged for a longer
period of time, engines may start ignoring the website.
3. Inner and external website links. They play important role in SEO optimization.
Search engine crawlers (robots) browse the website to search for these links and
collect information. Nowadays engines algorithms are constructed in such a way
that they check whether external links relate to websites that are similar and
connected with the website in terms of textual content (they may check if there are
similar keywords.). Evaluation process means verifying whether each website link
leads to proper external website, otherwise it should be abandoned.
4. Sitemap. It is important to compose an XML sitemap file and place it in the root
directory of the website. Such an XML file includes ULR links altogether with
priority information. It improves website indexation. It is possible to indicate to the
Google search engine where it should search for such an XML sitemap. A part of a
sample XML sitemap (adjusted to the Google requirements) may look like as
follows:
Sample
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
30
<url>
<loc>http://www.netivia.pl/</loc>
<changefreq>daily</changefreq>
<priority>1.00</priority>
</url>
</urlset>
2.3 What is organic SEO?
Hallan tries to explain in his article (http://amrithallan.com/what-is-organic-seo/) that
“it’s getting found on the search engines without paying the search engines for the
placement, and keep getting found for a long, long time”, but this definition seems no to
be satisfactory. In reality, websites owners combine both organic and paid SEO strategies
to conduct some optimization.
Some experts tend to regard organic SEO as only
optimizing website content, others are convinced it is about the number of links a website
consists of. Ledford (2009) points out that it is rather a combination of these and other
elements, as the rankings position depends on the quality of the website. Organic SEO is
not just a simple way to appear high in the results list. It consists in using all natural
methods that are available to us to get higher rankings positions without paying for this.
Ledford also attracts our attention to the fact that organic SEO takes from 3 to 6 month to
see the results of the campaign. On the one hand it requires some patience and time. On
the other hand, to improve effectiveness of such a campaign, it is worth considering PPC
(pay-per-click programs) paid promotions.
2.3.1 Organic elements of websites
Textual content of a website is regarded to be the most important element of the
organic SEO. Search engines algorithms are very sensitive about spamming techniques, so
we should find some agreement between incorporating relevant textual content and the
number and position of different keywords within the text. One of the crawlers’ strategies
is to explore how the textual content co-exists and is consistent with other elements of the
site, such as meta tags and surrounding links. Since blogs are used on websites (as another
organic element), crawlers check the frequency of changing their content. Generally, it
31
also means find some agreement between static and dynamic content of a website. Blogs
and questionnaires give some dynamics to the website and may fulfill robots’
requirements.
Links constitute other important elements of organic SEO. Ledford (2009) distinguishes
incoming, outgoing and external links that should be spread out. Information about
outgoing websites (where these links lead to) is as important as the context they are
placed in. Nowadays, both sites (the one from links come and the other where links lead
to) have to be consistent in terms of textual content, which means that their keywords and
textual content have to be cohesive and relevant. This all is indeed explored by engines
crawlers and affects the ranks position. Today crawlers’ algorithms can spy and follow
what searching results are chosen by users.
2.3.2 Benefits of organic optimization
Hallan is his article (http://amrithallan.com/what-is-organic-seo/) enlists some benefits
of organic SEO optimization in opposition to paid campaigns (to some extent they still
should be combined together). They are as follows:
1. If a website is organically optimized, people tend to click it more often. “Search
engines are meant to show results according to relevancy of the found pages.”
Sometimes people do not trust websites that appear in sponsored lists, as such a
website may be regarded to include mediocre content. It is trustier to click on a
website that appears naturally on the search results pages. “If they appear on their
own, it means there is a greater chance of finding the relevant content there”
2. Long-lasting search results result from organically optimized websites, as the
organic optimization improves the relevancy. “You can only be relevant by being
relevant, and that means, constantly generating content that people want to find
and consume”
3. “Organic SEO builds greater trust.” Read natural content of a website simply
indicates that its owner cares for the website (and his/her business) and gets
“deep-rooted knowledge of what they are involved in”
4. Organic SEO optimization is cheaper than sponsored campaigns. Even if we spend
some money on paid campaigns, it still pays off to spend the same amount of
32
money on conducting organic optimization. This will result in being seen on results
pages for incomparably longer time
3. SEO strategies and techniques
SEO optimization is a set of strategies and techniques with the help of which we want
to gain some increase in our website page ranking. This is usually done for chosen
keywords or key phrases (Ledford 2009.) It is advised to consider SEO optimization as an
important step before our site is created, as optimizing on-line websites is usually more
time consuming. There is a list of ethical and unethical SEO strategies. We are going to
focus on both variations, though.
3.1 Hosting, domains, navigation and other chosen elements of websites friendly to SEO
SEO optimization is the most efficient when we conduct one substantial change in a
proper period of time (Ledford 2009.) What attracts crawlers is the website design.
Generally speaking, these are: meta tags (focused on later on), links (I have devoted
separate section for linkages), navigation structure and textual content. As a matter of
fact, there are many aspects we should consider in optimization process, but I am going to
focus on chosen aspects that seem the most important to me.
3.1.1 Hosting
It is important to what server we copy the files of our website. For example, if we
wanted our website to be displayed in Poland and if we bought a server outside Poland,
this would decrease the page rank value (the PageRank term is explained later on.)
“Crawlers will recognize that the website does not suit our demographical position”
(Ledford 2009.)
3.1.2 Domain name
We should think about SEO when considering how to name our domain. This name
should be as short as possible, and if possible, it should contain the most important
keyword(s) that users enter into searching queries in searching engines. The best suffix for
33
domains is “.pl”, as this is a default suffix for search engines and affects the websites
position within results list.
3.1.3 Navigation
Ledford (2009) sums up that there are two ways of navigating to the website: inner
(around the website) and external (other sites leading to the website.) SEO optimization
strategies should combine both types of navigation. Navigation should consist of textual
links that are filled in with relevant keywords and it helps crawlers navigate among
subpages and index the website. Navigation consisting of graphical buttons is like an
obstacle to crawlers. We should compose such a navigation that is friendly both to users
and search engine crawlers.
3.1.4 Sitemap
If we feel that users’ preferences clash with crawlers’ preferences, we should build a
sitemap XML file. Such a file includes lists of links and indicates to crawlers the structure
and density of a website.
3.1.5 TITLE tag
This is the most important HTML tag in terms of SEO optimization. When indexing a
website, crawlers seem to start with searching for the titles, as these titles are kind of
sources to classify the whole website. It is not advised to fill the title with the name of our
company; instead, we should consider relevant keywords. Moreover, some search engines
index only the first 50 signs of titles, so we have to consider their length. “The W3C has
proclaimed that the website title length should not extend 64 signs” (Ledford 2009.) It is
not allowed to duplicate keywords within titles. Such a strategy may be classified as
spamming SEO (discussed later on.)
3.1.6 HTML headings
“<h1> to <h6> tags are used to define HTML headings. <h1> defines the most
important
heading.
<h6>
defines
the
least
important
heading”
(http://www.w3schools.com/ TAGS/tag_hn.asp.) As we see, they configure different levels
34
of website headings. All major browsers operate the first 6 headings tags, whereas in
practice HTML programmers incorporate up to the first four ones (Ledford 2009.) These
tags seem to inform both users and crawlers what the main topic and other sections of
websites are. These elements do mark some important information within the textual
content, so they should also be filled in with relevant keywords. Crawlers examine the
headings and check if it suits the following textual content. It is claimed that the H1 tag is a
top level heading so it ought to embrace the most important keywords, while the other
lower and lower number tags ought to consequently include less and less important
keywords. HTML headings tags should be also loaded with keywords dynamically, not
statically. It means that these keywords should vary a bit within the same headings. Filling
in HTML headings with considered keywords is a SEO strategy to increase the website page
rank value.
3.1.7 Javascript and Flash
Javascript allows programmers to add some dynamics to websites, but it may block
crawlers. To avoid that, it is recommended to place Javascript codes in external files (with
“.js” suffix) and attach references to these files within the XHTML website code. Thus, it
does not stop crawlers from indexing the website, because the Javascript code is not run
by crawlers.
It is almost impossible to index a flash site to estimate its page rank, and very often
such an approach results in leaving the site by crawlers. It is advised to combine elements
of flash (e.g. flash banners) with the website XHTML content. It is not recommended to use
flash navigation, as it blocks crawlers from retrieving the most important keywords.
Thurow (2008) points out the fact that for us as human beings a flash site seems to include
many subpages, but from crawlers’ point of view it is regarded as only one page with only
one main flash animation. In order to increase flash websites accessibility in search
engines, one main flash animation should be divided into a few smaller ones, available
under different ULR addresses. This tells robots that such a website contains more unique
contents.
35
3.1.8 URL links in SEO strategies. URL canonicalization. Dynamically generated URL
addresses
The URL abbreviation stands for “Uniform Resource Locator”. On the one hand, it
should be short so that users can remember it by heart and recall it when opening a web
browser. On the other hand, a SEO strategy for URL links means inserting the most
important keywords within the URL addresses. It is not recommended to overuse dynamic
contents within the URL links (e.g. products IDs from a database.)
“Canonicalization is the process of picking the best URL when there are several choices,
and it usually refers to home pages” (http://www.mattcutts.com/blog/seo-advice-urlcanonicalization/.) It is worth emphasizing that crawlers regard the following two URL
addresses as different ones: http://netivia.pl and http://www.netivia.pl. We ought to use
the “301 rewriting rule” to redirect users and crawlers to only one base URL address. If
there is only one base URL address, the power of the estimated website page rank value
does not to have to be split for each URL variation separately. By putting the following
code inside the website htaccess file we always force using the prefix “www” at the
beginning of the URL address, even if we forget to enter it:
Sample
RewriteEngine On
RewriteCond %{HTTP_HOST} ^netivia.pl(.*) [NC]
RewriteRule ^(.*)$ http://www.netivia.pl/$1 [R=301,L]
In most cases, URL addresses are generated dynamically (as website is governed by its
CMS,) which can be recognized by crawlers. Dynamic addresses consist of special signs, like
question marks, pluses, or other signs like “%”, “&”. Robots, on the one hand do not reject
such links, but on the other hand seem not to like indexing them. Thurow (2008) enlists
the following reasons for which it happens:
1.
Search engines crawlers do their best to avoid indexing and storing the same
websites several times in their databases. If a website consists of many subpages
with different textual content, but their URL addresses look like almost the same
with only one parameter that gets changed, these websites probably will not be
36
indexed by crawlers. It is better to create such addresses that contain unique major
keywords that are connected with page contents.
2. Search engines try to make their search results more and more exact and adequate
to the given query. For crawlers this requires much filtration before indexing takes
place.
3. “Some dynamic URL addresses may make robots get over-looped and never leave
the website” (Thurow 2008, trans. P.G..) If only one URL address parameter
changes, crawlers are not able to recognize whether this parameter sorts the site
content or it points out to different subpage.
As a matter of fact, when Google crawlers canonicalize the URL links, they try to guess
the best representation for all the links variations.
3.2 Keywords and keywords prominence. Distinguishing keywords
“Keywords” is a term that is strictly connected with SEO (Ledford 2009.) Keywords are
used to index and find a website in the Internet. Effective and popular keywords affect the
website page rank. Understanding some rules of using keywords and knowing how to
select them will help promote a website.
Correct usage of keywords guarantee to find a website within the first 20 result pages
(which is optimal according to Ledford), although users usually concentrate on the first
two results pages. High position in the results list is more advantageous than setting paid
adverts, as natural searching is conducted by the right target group. People type different
search queries, so we should also consider these ones that include some typical errors.
Thurow (2008) claims that all search engines give more significance to these keywords
that appear at the beginning of a website content rather than to these that are placed at
the end. The “positioning of a keyword within a website in relation to the beginning is
known as prominence” (Thurow 2008, trans. P.G.) It is advised to spread important
keywords at the beginning of a website by composing a textual static header. Of course,
such headers ought to be different for each subpage, as each of them includes different
textual content.
37
Distinguishing keywords within website textual contents plays important role in SEO
strategies (Thurow 2008.) Users, who desire to find some appropriate information, type in
detailed keywords or key phrases to be searched for. Such keywords are distinguished in
results lists, which assure users that this is a website they look for.
3.2.1 Heuristic searching
We should employ heuristic to achieve best rank position. “The term heuristic is used
for algorithms which find solutions among all possible ones, but they do not guarantee
that the best will be found, therefore they may be considered as approximate and not
accurate algorithms” (http://students.ceid.upatras.gr/~papagel/project/kef5_5.htm.) In
terms of SEO optimization, heuristic means combination of keywords and key phrases.
Search engines algorithms also browse websites heuristically, which means that crawlers
will follow the links and check the textual similarity of the sites that links lead to. If website
links refer to pages that are not thematically connected, this may be regarded as “links
farm” and will drastically decrease the page rank. Keywords are connected with heuristic,
because they constitute kind of a model – they are the clue to solve a problem that is
scanning search engines database to retrieve specific data.
Heuristic gives variables to estimate ranking position for a searched query. Ledford
(2009) proposes some rules connected with search engines positioning. Some of them are
as follows:
1. Agreement between system and reality. Our website should include such a
language that is acceptable and used by users (e.g. do not incorporate non-used
technical terms)
2. Full control at the users’ side. Give users the possibility to control the website, as
they are allowed to make mistakes. It means that they should be provided with
“back” and “forward” links to navigate and browse the website. These are website
inner links.
3. Sticking to standards. Every time a user sees an alert or information, they should
not be surprised they see it. In this context, cohesion refers both to language and
to actions performed on the website.
38
As a matter of fact, the mentioned rules above refer more to general website usability
rather than to keywords as such, nevertheless they are also connected with keywords SEO
strategies.
3.2.2 Keywords used in website links
Textual content of links allows us to double use keywords around the website (Ledford
2009.) A crawler that analyzes a proper website distinguishes links and their textual
content and uses them to categorize the website. Placing relevant and significant
keywords within the website links is one of the SEO strategies. Of course, we cannot
exaggerate and over-optimize our website. This may happen when too many keywords
remain in the same grammatical form both within and out of links content.
3.2.3 Appropriate keywords and keyword phrases
We can divide keywords into two major categories: the ones connected with our brand
and general ones. It is advised to use important keywords within the name and description
of our company (Ledford 2009.) In order to choose the most adequate keywords list for
our website we should do some brainstorming and think about all the keywords (with
general meaning) connected with our brand. Keywords with similar or the same meaning
ought to be grouped together. We can narrow them down to select more specific
keywords that will become our organic keywords at the end.
Except for separate keywords, we should also consider keyword phrases, as they are
also inputted by users. Such a phrase might consist of even three keywords. Phrases are
generally more detailed than separate keywords, so they help increase the page rank.
3.2.4 Keyword density versus overloading with keywords
Keyword density is another factor that is taken into consideration by crawlers when
estimating website page rank. “Keyword density is the percentage of times a keyword or
phrase appears on a web page compared to the total number of words on the page. In the
context of search engine optimization keyword density can be used as a factor in
determining whether a web page is relevant to a specified keyword or keyword phrase”
(http://en.wikipedia.org/wiki/ Keyword_density.) It is estimated that the density of a
39
separate keyword should be between 1 and 3 percent, while general density of all
keywords incorporated within the website text should be from 7 till 10 percent. It is
commonly known that some search engines prefer higher density of keywords than others
do. Nevertheless, textual content does count for all engines. The influence of keyword
density over the estimated page rank also differs among search engines.
Thurow (2008) says that one page should not consist of more than 800 words. This
limitation increases the possibility that the whole website will be read by users. “Because
it is possible to easily and artificially spread keywords around the textual content, crawlers
less and less pay attention to the keywords density” (Thurow 2008, trans. P.G.)
Overusing a keyword is known as “keyword stuffing” (unethical SEO strategy) and will
result in penalizing a website. Today search engines algorithms are able to distinguish such
an artificial keywords overloading. “Keyword stuffing had been used in the past to obtain
maximum search engine ranking and visibility for particular phrases. This method is
completely outdated and adds no value to rankings today. In particular, Google no longer
gives
good
rankings
to
pages
employing
this
technique”
(http://en.wikipedia.org/wiki/Keyword_stuffing.) One of the methods to avoid overloading
with keywords is to place a few unique keywords on each subpage. In this case they should
come from the same meaning group (as they got grouped while brainstorming process.)
3.3 Website incoming and outgoing links. Linkages
Focusing on linkage strategies plays important role nowadays and it is not enough to
place some links in your XHTML code, as there are many types of linkages we should be
aware of (Ledford 2009.) When considering SEO optimization plans, links are the most
important just after website keywords, as they are the basis to create connections among
websites by crawlers. The first goal of linkage strategies is to connect your website with
other thematically similar websites, which increases traffic to the website. Links leading to
your website are like “votes” that promote the website importance and relevance and
they do influence the Google PageRank. There are some techniques to build incoming and
outgoing links.
40
3.3.1 Incoming links
Links that re-direct users to our website are more important than the outgoing ones.
There are some ways to get more incoming ways that would lead to our websites. Some
companies analyze regularly the sources of incoming links to their websites in order to
improve their linkage strategies.
First of all, we can ask for such links. It requires investigating the market to create a list
of thematically similar websites and simply asking for adding our website to these websites
codes. It is not the most effective method, though.
Offering articles to other websites (and asking to place our links below the article)
seems to be the most effective method of getting incoming links (Ledford 2009.) Its
efficiency results from the fact that other websites owners constantly search for good
textual contents.
Blogs constitute another method of getting incoming links. A passage of text that
includes links to our website may increase the estimated page rank value.
Messages are the basis of each marketing program. It is possible to hire a company
that will constantly place short pieces of information in the Internet. It is advisable to add
some news or pieces of gossip about our company (of course altogether with links to our
website.)
Affiliating and PPC programs may also help a lot, but here we have to pay for adding
our ads on such websites (Amazon.com is an example of it.) Clicking on our website link
gives us some profit. Search engines algorithms accept affiliating and PPC programs and do
not impose any punishment for that.
Ledford (2009) does not advice to built own websites just in order to build links
connections to our other website. Such an “illusion of popularity” may eventually be
regarded as “spam of links”. Sometimes such deeds might be regarded as unethical.
3.3.2 Outgoing links
It is often questionable to incorporate outgoing links because they allow visitors to
leave our website (and they may not return.) The more outgoing links there are, the lower
value of such votes is (Ledford 2009.) On the other hand, if a website does not include any
41
outgoing links, it is not regarded well by crawlers. Outgoing links help assign our website in
a specific connection area and our website also ought to indicate to other websites.
We have already known that crawlers explore links to estimate the website page rank.
When creating outgoing links we should abide by some requirements imposed by search
engines crawlers. Some of them are:
1. Target of links. We should be conscious about to what extend other websites are
thematically similar to that of our
2. Links over-usage. It is frustrating when every second/third word in a sentence is
linked. Ledford (2009) prompts not to use more than three links in an article or
passage of text
3. Using keywords within the textual content of links. It pays a lot to use keywords in
links rather than to put the “click here” phrase inside the link. Crawlers do search
for keywords within links. It is even better to use a keyword phrase as a link
provided that the link leads to thematically connected sites
4. Links to suspicious websites. It is not recommended to link to low quality sites, such
as “farms of links” or spamming websites. If we address our website to such a
website that has already got high ranking scores by search engines, our website will
also get additional ranking points
5. Websites that include only links have been classified as spamming websites
nowadays. It is strictly forbidden to use websites that do not include other textual
content except for links
6. Links monitoring and repairing spoilt links. Ledford (2009) declares that it is better
not to have a link at all rather than to have a spoilt one. One of linkage strategies is
to regularly check if websites that our links lead to still exist in the Internet. We
cannot allow links leading to nowhere because in the course of time it means to
crawlers that a website is not governed well
Generally speaking, we should use only such links that are useful to users. In terms of
SEO optimization their usefulness means leading to websites where a user can still find
useful information to them. What counts is links quality, not quantity.
42
4. Spamming SEO. Spamming techniques
Spamming SEO is such an important issue in SEO optimization that it requires our
separate attention. It has already been mentioned several times in this dissertation and
now it is time we focused deeper on that topic.
All kinds of spamming SEO are not ethical or at least “almost unethical” and can be
recognized by search engines crawlers. In such cases the estimated page rank value of a
website gets decreased dramatically and sometimes such a websites completely
disappears from the results list. Search engines algorithms are constantly adjusted and
modified to discover newer and newer kinds of spamming SEO. Ledford (2009) says that
we should be very careful and sensitive about this issue, as what is ethical and acceptable
today can be treated as an aspect of spamming on the other day. The following pieces of
advice are considered:
1. Follow your consciousness and common sense. If you feel that what you are doing
now is kind of spamming, it is probably true. If you feel some trickery, avoid it
2. Do not try to make your website be regarded as a different one when it is not in
reality. Creating fake structures will sooner or later result in website exclusion from
results lists. It is obvious that crawlers will unravel sets of artificial linking
3. Do not trust anyone who claims that some practice is acceptable in case you feel
otherwise. Many SEO specialists will go on proving that some unethical SEO
strategies are still acceptable provided that they are conducted well. It does not
pay off, as “spam is always spam” and will be found by crawlers
There are many techniques of SEO spamming (known also as spamdexing) and all of
them should be avoided. Once they are recognized by crawlers, the website page rank gets
decreased. All these unethical techniques and strategies try to incorporate more links into
the website. Ledford (2009) groups some of them as follows:
1. Transparent links. On the face of it, they are not visible because they are of the
same color as the surrounding background is
2. Hidden links. They are not seen as they become placed behind other graphical
elements. Such links are not clickable by users but can be still accessible by
crawlers
43
3. Misleading links. The way they are addressed is different from the way they are
presented to users. Such links simply do not open websites that are entitled in
these links
4. Links that are not recognizable by users. These links are written with a 1px font
size, which is illegible to human beings
5. Keywords overloading and overloaded meta tags. Chosen keywords are repeated
too many times either in the textual content of a website or within its meta tags.
Sometimes a website is artificially overloaded with hidden links just to increase the
keywords density (Danowski 2007.) It is commonly known now that repeating the
same keywords around the same subpage does not generate higher page rank
value.
6. Automatically generated websites. They are created by stealing textual contents of
other websites and thus they are of no usability value
7. Links entitled with a dot. It means putting two identical links close to each other.
The clever idea is to use a dot in the second link title. It may not be distinguishable
by people, but will be undoubtedly encountered by crawlers. An example is as
follows:
<a
href=”http://www.netivia.pl”
title=”strony
internetowe
Warszawa”>Netivia</a><a href=”http://www.netivia.pl”>.</a>
8. Masking. It means preparing two separate versions of a website. One version is
over-optimized for crawlers that are redirected to this website version
9. Hidden textual content. Such text is printed out with the same color as the
background is, thus it is invisible to people, but still visible to crawlers. Danowski
(2007) claims that this is the most popular unethical spamming method and derives
of the times when the AltaVista search engine was used (to estimate page rank
value it took into consideration the textual content not metatags.)
10. Websites including only links. It is perceived as “farm of links”. The only exception
is the sitemap list that gathers all website links together
11. Redirecting websites. They are usually incorporated because of SEO strategies, but
still useless for website visitors. Once such a redirecting website becomes opened,
we are notified about being taken to other websites
44
12. Stealing websites. The only purpose to use other popular websites is to redirect
their visitors to our website
13. Spamming with the help of the Internet encyclopedias. For instance, it can be
achieved by editing Wikipedia articles and filling them with links to our websites. To
avoid unethical spamming, the Wikipedia founders had to automatically add the
“nofollow” attribute to each link
14. Filling HTML remarks with keywords. Although HTML comments are not seen by
users in the browser, they still exist within the textual content of the website
(Danowski 2007.) It is known that crawlers omit website HTML remarks when
estimating the page rank value
Except for dramatically decreased website page rank value and sanctions imposed by
search engines, there are also other reasons for which it does not pay off to use unethical
SEO strategies. No one likes spamming. Once we get redirected, face some spam or feel
tricky ways to get us to a proper website, we will probably not return to it in the future.
Danowski (2007) presents the following danger when using unethical spamming:
First of all, such a website can be banned by search engines. It means that it can be
excluded from results list and will not appear even on last pages. This means practically no
traffic to such banned websites. Danowski proposes to type in a search engine the
following phrase: “site:netivia.pl.” This will retrieve all domain sites that have been
indexed so far. No results list for the query “site:netivia.pl” stands for that such a website
has been banned and does not exist in the search engine index. If we want to check what
version of the website is currently saved in the search engine database (the online version
can be different, though), we should type in: “cache:netivia.pl.” Moreover, if we type in
“info:netivia.pl”, we will be provided with all other information about indexing the
websites.
Once a website has been banned, it is almost impossible to get rid of the ban. Many
owners do not deal with such a domain anymore and simply change the website name.
Danowski (2007) also adds that filtration imposed on websites by search engines
constitute less restrictive punishment for spamming. It means that a website will not be
found by users when typing in a chosen key phrase to be searched for. Filtration is
imposed automatically and also disappears automatically after some time.
45
5. SEO optimization for the most common search engines: Google, MSN, Yahoo!
When considering search engines from the SEO optimization point of view, we can
divide them into three major types. This division regards the way search engines index
data and store it in their databases.
Search engines that are based on crawlers constitute the first category and the Google
Search Engine is undoubtedly categorized here. All the gathered information goes to the
central repository where it is explored in the indexation process (Ledford 2009.) Thus,
information retrieved for keyword queries are taken from the database index. Every time
in a while, crawlers return to websites to re-index them again.
The second group is search engines whose databases are loaded by people. They are
regarded as websites catalogues and the Yahoo! search engine belongs to the group.
Hybrid search engines constitute the last third group and are just a combination of the
two above groups together. Not only are people allowed to register their websites within
these search engines, but also they also spread their robots around the Internet to collect
information about websites.
Ledford (2009) points out the importance of understanding how this classification
looks like, because this determines the way and time our website would be indexed by
search engines (robots will probably find the website faster than people, as it is an
automated process.)
While Google concentrates the most on the connection between website textual
content and its links, MSN observes the dynamics of the textual content and meta tags.
Yahoo! pays the most attention to density of keywords, especially in the title tag.
5.1 Google PageRank and SEO optimization for
Google
The Google Internet search engine has been
the leading search engine for a longer time and
introduces new trends in the searching world. In
2007 it got 58.4% in market traffic out of all other
search engines (Grappone 2010). It is Google that
has made links popularity and website age so
46
important. It is true that today SEO world functions in the way formed just by the leading
Google. In addition, it gives to website owners a free SEO analyzing tool known as Google
Analytics.
“PageRank is a link analysis algorithm, named after Larry Page and used by the Google
Internet search engine, that assigns a numerical weighting to each element of a
hyperlinked set of documents, such as the World Wide Web, with the purpose of
"measuring"
its
relative
importance
within
the
set”
(http://en.wikipedia.org/wiki/PageRank/.) In other words it investigates the number and
the quality of both incoming and outgoing website links. It is a voting system that
compares our website with other websites our page links to. The page rank of our website
is estimated in a recursive loop and there are many factors that determine the ending
page rank value. At the and each website gets a value from 0 to the highest 10. This scale is
not linear, and the difference between 4 and 5 is different from the one between 3 and 4
(Grappone 2010.)
A
sample
PageRank
dispersion
is
presented
in
the
Figure
2
(http://en.wikipedia.org/wiki/PageRank.) Usually the higher page rank number is the
higher position a website gets in the Google search results. It is worth getting incoming
links that will lead to our website, but on the other hand we should be aware of the fact
that high page rank in Google means not only links. Moreover, Google tends to present the
page rank value that was actually counted a few months ago (the current one is
confidential.)
To optimize a website for Google, we should use and get acquainted with all the
webmasters tools prepared by Google (http://www.google.pl/webmasters/.)
5.2 Websites optimization for MSN
Ledford (2009) indicates that the MSN.com site currently uses the Microsoft Live
search engine, although it is still possible to use the MSN search engine (Microsoft Live is
the brand name of this technology.) In order to get a high position in the MSN results list,
we should abide by the fundamental rules of organic SEO optimization. The MSN does not
allow sponsored increase in page ranks. One feature requires our attention: this search
engine pays more attention to the freshness and dynamics of the website textual content
47
in comparison with other engines. It means we should consider some SEO strategies
dealing with how to effectively process the website texts.
Similarly to Google, the MSN has introduced its own algorithms to index the Internet
and it also has its own rules that we should abide by when optimizing a website. All these
principles can be found on the MSN.com site by searching “Site Owner Help” (Ledford
2009.)
The MSN algorithms search for clue keywords in meta tags, titles of subpages and textual
content of the top HTML code. So we should combine relevant information with keywords
at the beginnings of each subpage.
5.3 Websites optimization for Yahoo!
The Yahoo! search engine also differs from Google and MSN. It concentrates on
keywords density and keywords occurrence in URL links, as well as on titles tags. We can
achieve successful SEO optimization in Yahoo! results list by assorting good keywords with
our website.
The Yahoo! crawler is called SLURP and checks keywords density to estimate the page
ranking (Ledford 2009.) According to Yahoo! the optimized dispersion of keywords equals:
1. 15-20% in the title tag. The title content is displayed in the Yahoo! results list
2. 3% in the textual content around the body tag
3. 3% in keywords and description meta tags
Yahoo! analyzes also incoming links.
5.4 Page rank fluctuations
We should also be aware of the fact that successful SEO optimization does not only
mean high position in the results list. What also counts is simply the quality of the textual
content as well as the usability of the whole website.
The website position within the results list may fluctuate a bit and this happens
independently on SEO optimization that is being carried out. Grappone (2010) enlists some
of the reasons for that:
1. Activity of competitive companies. Sometimes the successful SEO results from the
laziness of our competition
48
2. Functioning of the server that our website is placed on. If web crawlers re-visit the
website that is switched off (because our server does not work at this time) it will
unprecedentedly lower the rank position at least till they index the page again
3. In order to be able to store all the indexed information, search engines use
different databases. Each of them has to return slightly different results coming
from the same searching query. The current position of our website may be
dependent on the current database chosen by the search engine
4. Search engine algorithms are constantly modified, improved and changed.
Algorithms are patterns according to which search engines organize data. We may
never be sure what actions should be conducted. Grappone (2010) cites that
generally “good HTML titles, good homepage textual content and removing all the
obstacles that force crawlers indexation” (trans. P.G.) is the clue to successful SEO
optimization
6. Conclusion
SEO optimization is a process that lasts and includes many SEO techniques and
strategies. Such campaigns require our patience as we will see effects of our optimization
efforts only after a few weeks or months. We never really know what proper actions
should be performed to achieve high page rank value of our websites. Actually, the high
page rank value is the main purpose that SEO optimization aims at. Each on-line website
has already had its own page rank estimation. It is not possible to guess the details of
crawlers’ algorithms, especially that search engines seem to verify, develop and adjust
their criteria too often. Unfortunately, the current estimation of websites page rank value
is hidden and we are always provided with the value that was estimated by search engines
some time ago.
Although search engines differ in estimating a page rank, what undoubtedly counts is
building websites in an organic way and using only ethical SEO techniques. It never pays off
to conduct and bother about unethical strategies, as this usually results in penalizing the
website or even in exclusion from the results list.
SEO optimization focuses mainly on considered organization of website keywords and
sets of incoming, outgoing and inner links. Search engines crawlers do explore and check
49
connections among the Internet websites and try to categorize them. Inserting relevant
keywords within proper tags, links and embracing them with proper textual content helps
crawlers index and categorize the website and thus increases the page rank value.
Conducting SEO optimization is kind of knowledge and requires some experience that
can be gained only in the course of time.
50
Examination of the PJWSTK website HTML
code. Shifting to XHTML 1.0 Transitional
Standards.
1. Introduction
This part of my diploma work includes some examination of the website of the PolishJapanese Institute of Information Technology (http://www.pjwstk.edu.pl.) The aim of the
research is to verify whether the whole HTML code of the website is prepared in a way
that sticks to current W3C standards and whether it includes all major SEO optimization
methods and techniques. Additionally, the goal of the inquiry is to answer the following
crucial questions: are separate subpages built in structurally and semantically in a way that
is friendly to search engines crawlers? Do they take into consideration search engines’
needs? Do they feed crawlers with key information that they search for when indexing the
website? Do they generate high page rank values for the website in the most common
search engines results lists?
I have decided to verify some chosen website pages, the homepage and three separate
subpages: “opłaty”, “studia I-go stopnia - informatyka” and “zasady rekrutacji”. Because
other subpages structurally and semantically resemble the ones that got chosen, all the
given adjustments and proposals can be implemented around the entire website in other
places.
The whole examination has been presented in three-column tables below. The headers
of the tables inform us what section the examination concerns and where the source of
the code comes from. First column includes chosen parts of the current HTML code that
got copied from a browser. The second column includes the same code that got
reorganized and improved and fully fulfils the current W3C standards. The last third
column enlists all the explanation of why the current HTML code is not correct
semantically or structurally, why particular changes have been implemented and how all
these improvements become perceived by search engines spiders.
51
2.
Examination of the PJWSTK html website code
HOMEPAGE HTML METATAGS
HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML
4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html><head>
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-2">
<title>PJWSTK. Polsko-Japooska Wyższa Szkoła
Technik Komputerowych.</title>
<meta name="Language" content="pl">
<meta http-equiv="pragma" content="nocache">
<meta name="Classification"
content="Education">
<meta name="revisit-after" content="7 day">
<meta name="Description" content=
"Najlepsza niepubliczna wyższa szkoła
informatyczna w Polsce">
<meta name="Keywords" content=
"informatyka, japonistyka, kultura japonii,
programowanie, grafika, sztuka, magisterskie,
inżynierskie, zaoczne, uczelnia, Informatyka
po ekonomii, Informatyka po socjologii,
Informatyka po marketingu, Informatyka po
zarządzaniu, Informatyka po psychologii,
Informatyka po kulturoznawstwie, Informatyka
po dziennikarstwie, Informatyka po kierunkach
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<base href="http://www.pjwstk.edu.pl/" />
<title> PJWSTK. Polsko-Japooska Wyższa Szkoła
Technik Komputerowych.</title>
<meta name="description" content=" Najlepsza
niepubliczna wyższa szkoła informatyczna w Polsce" />
<meta name="keywords" content="informatyka,
japonistyka, kultura japonii, programowanie, grafika,
sztuka, magisterskie, inżynierskie, zaoczne, uczelnia,
Informatyka po ekonomii, Informatyka po socjologii,
Informatyka po marketingu, Informatyka po
zarządzaniu, Informatyka po psychologii, Informatyka
po kulturoznawstwie, Informatyka po dziennikarstwie,
Informatyka po kierunkach humanistycznych" />
<meta http-equiv="Content-type"
content="text/html; charset=utf-8" />
<meta http-equiv="Content-Language" content="pl"
/>
<meta name="robots" content="index,follow" />
<meta name="revisit-after" content="7 days" />
<meta http-equiv="expires" content="0" />
Change Explanation
-
The HTML 4.01 website code is outdated and is never used in current
W3C standards
-
When re-building the website it is advised to shift from HTML into
XHTML structure
-
Main DOCTYPE declaration should indicate that the whole website
HML code is based on current W3C standards, e.g. XHTML 1.0
Transitional
-
It is easier to use UTF-8 encoding rather than ISO-8859-2 in terms of
textual content
-
Each <head> metatag should be closed with “/>”
-
Important metatag ROBOTS is missing; it indicates that the website
should be indexed (or not) by crawlers
-
The <base> tag is missing
-
Some other important metatags are missing, like EXPIRES,
DISTRIBUTION, LANGUAGE
-
Overloading with keywords - there are too many keywords enlisted
within the KEYWORDS metatag
-
There is no link added for the favico.ico icon (an image that appears to
the left within the URL browser address)
-
As there may be many CSS files separately prepared for each website
page, it is advised to gather all these CSS files in one CSS folder
The same suggestions concern other pages and subpages.
52
humanistycznych">
<meta http-equiv="Generator"
content="TigerII MiniPad (C)2001">
<link rel="stylesheet" type="text/css"
href="main.css">
<!--<script type="text/javascript"
src="http://tomproj.yum.pl/clicksCounter/js/f
ull.js"></script>-->
<script type="text/javascript">
siteCode =
"94011da7317156a6b02433e9c61d9e2a";
</script>
</head>
<meta name="distribution" content="Global" />
<meta name="Language" content="pl" />
<meta name="Author" content="[author name]" />
<meta name="copyright" content="Copyright (c)
PJWSTK" />
<link rel="stylesheet" type="text/css"
href="css/main.css">
<link REL="shortcut icon" HREF="i/favicon.ico" />
<script type="text/javascript"
src="http://tomproj.yum.pl/clicksCounter/js/full.js"><
/script>
<script type="text/javascript">siteCode =
"94011da7317156a6b02433e9c61d9e2a";
</script>
</head>
HOMEPAGE LOGO
HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<a href="?"><img border=0 ALT="PJWSTK"
src="i/PJWSTK_logo.gif"></a>
<a href=http://www.pjwstk.edu.pl title=”strona
domowa PJWSTK”><img src="i/PJWSTK_logo.gif"
alt="PJWSTK"></a>
Change Explanation
-
Website logo should be implemented into a link that leads to
homepage (starting page for visitors that get lost), which is correct
-
The <a> link misses its TITLE attribute, which includes important
information for search engine browsers
-
Tags attributes names should not be capitalized
-
Image border should be switched off within CSS style sheet files; the
example below gets rid of all borders of images included within links:
A IMG {border:0}
53
Anyway, if it the border attribute remains, it should be surrounded
with quotation marks, e.g: <img src=”i/pjwstk_logo.gif” border=”0”
/>
The same suggestions concern other pages and subpages.
HOMEPAGE GENERAL HTML STRUCTURE
HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div id="wrapper">
<div id="naglowek">…</div>
<div id="pierwsza">…</div>
<div id="druga">…</div>
<div id="trzecia">…</div>
<div id="czwarta">…</div>
</div>
this is correct, no suggestion needed
Change Explanation
-
The whole textual content of the homepage is divided into top and
middle sections. All the sections are positioned with the help of <DIV>
elements (not table rows), which is correct
-
Formatting of each DIV box is defined in the CSS file, which is also
correct
HOMEPAGE TOP LINKS - Warszawa, Gdaosk, Bytom
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<table width=750 cellpadding=0
cellspacing=0><tr><td><img ALT="PJWSTK"
src="i/PJWSTK.gif">
<a href="http://www.pjwstk.edu.pl">
<img ALT="Warszawa"
style="margin:0px;margin-bottom:10px;"
<ul id=”topmenu”>
<li>
<a href=”http://www.pjwstk.edu.pl”
title=”PJWSTK Warszawa”><img
src=”j/PJWSTK_Warszawa_1.gif”
alt=”Warszawa” /></a>
HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011
Change Explanation
-
Avoid using table cells, use <UL>/<LI> list, instead
-
The <li> elements can be positioned, formatted and accessed within a
CSS style sheet file, for example:
#topmenu {list-style:none}
#topmenu LI {float:left}
-
Do not use style formatting directly in the website code, as all the
54
src="i/PJWSTK_Warszawa_1.gif"
onmouseover="src='i/PJWSTK_Warszawa_2.gif
'" onmouseout=
"src='i/PJWSTK_Warszawa_1.gif'"
border="0"></a>
<a target="_blank"
href="http://gdansk.pjwstk.edu.pl"><img
ALT="Gdaosk" style="margin:0px;"
src="i/PJWSTK_Gdansk_1.gif"
onmouseover="src='i/PJWSTK_Gdansk_2.gif'"
onmouseout="src='i/PJWSTK_Gdansk_1.gif'"
border="0"></a>
<a target="_blank"
href="http://bytom.pjwstk.edu.pl">
<img ALT="Bytom"
style="margin:0px;"
src="i/PJWSTK_Bytom_1.gif"
onmouseover="src='i/PJWSTK_Bytom_2.gif'"
onmouseout="src='i/PJWSTK_Bytom_1.gif'"
border="0"> </a>
<a href=”http://www.pjwstk.edu.pl”
title=”PJWSTK Warszawa”>Warszawa</a>
</li>
<li>
<a href=”http://www.gdansk.pjwstk.edu.pl”
title=”PJWSTK Gdaosk”><img
src=”i/PJWSTK_Gdansk_1.gif” alt=”Gdaosk”
/></a>
<a href=”http://www.gdansk.pjwstk.edu.pl”
title=”PJWSTK GdaoskGdaosk</a>
formatting ought to be defined in the CSS file
-
Each <a> link should include TITLE tag, which is missing in the current
HTML code
-
The <IMG> tags include ALT attributes, which is correct, but the ALT
should be lower case
-
The <IMG /> tags ought to be closed with “/>”
-
If there are images used next to links, these images should also be
converted into links; proposed <li> elements include such links
-
Important keywords (like Warszawa, Gdaosk, Bytom) should be text
links (not images), so that links textual contents (and keywords within
them) can be accessed by engines crawlers
</li>
<li>
<a href=”http://www.bytom.pjwstk.edu.pl”
title=”PJWSTK Bytom”><img
src=”i/PJWSTK_Bytom_1.gif” alt=”Bytom”
/></a>
<a href=”http://www.bytom.pjwstk.edu.pl”
title=”PJWSTK Bytom”>Bytom</a>
</li>
</ul>
HOMEPAGE „UCZELNIA” SECTION LINKS
HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div class="r3">
<ul>
<li><a href='?strona=1594'>Władze</a></li>
<li><a href='?strona=1593'>Historia</a></li>
<div class="r3">
<ul>
<li><a href=”/wladze” title=”Władze
PJWSTK”>Władze</a></li>
Change Explanation
-
<UL>/<LI> list is used (instead of table cells), which is correct
-
Each <a> link misses its TITLE attribute, which in this section is crucial
for search engines, as each link includes a very important keyword and
leads to important subpage
55
<li><a href='?kat=204'>Biblioteka</a></li>
<li><a href='?kat=205'>Wydawnictwo</a></li>
<li><a href='?kat=243'>Jednostki</a></li>
</ul>
</div>
<li><a href=”/historia” title=”Historia
PJWSTK”>Historia</a></li>
<li><a href=”/biblioteka” title=”Biblioteka
PJWSTK”>Biblioteka</a></li>
<li><a href=”/wydawnictwo” title=”Wydawnictwo
PJWSTK”>Wydawnictwo</a></li>
<li><a href=”jednostki” title=”jednostki
PJWSTK”>Jednostki</a></li>
</ul>
</div>
-
The word “PJWSTK” has been added in each title attribute, so that
such a link can be indexed well by crawlers; thus, the following key
phrase like “wydawnictwo pjwstk” entered in search engines may lead
directly to the target PJWSTK subpage
-
Current links structure (e.g. “?strona=1594”) indicates that the
content is dynamically read from database (which is correct), but the
opened subpage URL address does not include any keywords that are
searched for by engine crawlers; such a link like:
http://www.pjwstk.edu.pl/?stro na=1594 should be turned into
http://www.pjwstk.edu.pl/wladze, which will push up the “wladze”
subpage within search engine results list. The same concerns all other
links used in the whole website
The same concerns all the links within the following sections in the
second column: “REKRUTACJA”, “STUDIA I-go stopnia“, “STUDIA II-go
stopnia”, “STUDIA III-go stopnia”, “STUDIA PODYPLOMOWE.”
HOMEPAGE „PORTALE” SECTION LINKS
HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div class="r2">
<img alt="" src="i/linia_216px.gif">
<p><a target="_blank"
href="http://samorzad.pjwstk.edu.pl/"><im
g border=0 src="i/samorzad.gif"
ALT="samorzad"></a>
<div class="r2">
<a href="http://samorzad.pjwstk.edu.pl/"
class=”graylink” title=”samorząd”
target="_blank”>Samorząd</a>
<a target="_blank"
href="http://www.biurokarier.pjwstk.edu.p
<a href="http://biurokarier.pjwstk.edu.pl/"
class=”graylink” title=”biuro karier”
target="_blank”>Samorząd</a>
…
Change Explanation
-
There is no point in repeating each time the “linia_216px.gif” IMG tag,
as such an arrow can be used as the background image for each <A>
link. This will lower the website html weight and bring the same final
result. The CSS sample definition of such a link can look like that:
.r2
a
{padding-left:15px;display:block;background-image:
(i/linia_216px.gif) no-repeat 0 0}
-
url
It is not advised to use image links that include important keywords,
56
l/"><img src="i/biuro.gif" ALT="biuro
karier"></a>
<br />
</div>
…
as images content cannot be accessed by crawlers and indexed. It is
better to change them into text links
-
The structure of the URL links is correct, as these are sub domains.
-
The <BR> element should be closed with “/>”
The same concerns the section “LOGOWANIE”.
<BR>
</div>
HOMEPAGE OTHER SECTION LINKS – e.g. „REKRUTACJA”
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div class="r1">
<img alt="" src="i/strzalka.gif">
<a href="#">
<img style="margin:0px;"
src="i/rekrutacja_1.gif"
onmouseover="src='i/rekrutacja_2.gif'"
onmouseout="src='i/rekrutacja_1.gif'"
border="0" alt="">
</a>
</div>
<h1><a href=”http://www.rekrutacja.pjwstk.edu.pl/”
title=”rekrutacja”>Rekrutacja</a></h1>
<div class="r2">
<img alt="" src="i/linia_216px.gif">
<p>
</div>
HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011
Change Explanation
-
OR
The whole DIV content can be substituted for the H1 element with
adequate CSS formatting. The left red arrow, as well as dots below the
inscription, can be designed as one image that is used as the
background for the H1 element, for instance:
<h1>Rekrutacja</h1>
H1 A {font-size:14px;display:block; font-weight: bold; backgroundimage: url (i/h1.gif) no-repeat 0 0}
-
It would be better to make the section name link lead to its separate
subpage devoted to recruitment; such a subpage should include some
textual content filled in with adequate keywords connected with
recruitment; in this case, the link structure would be:
<a
href=”http://www.rekrutacja.pjwstk.edu.pl/”
title=”rekrutacja”>Rekrutacja</a>
-
All section names are incorporated into the website as images, which
is not correct, as they are very important keywords and should be in a
textual form.
57
The same concerns all other sections names, especially “STUDIA I-go
stopnia”, “Studia II-go stopnia”, “Studia III-stopnia”, as well as “STUDIA
PODYPLOMOWE.” Less important section names should be enclosed with
H2 element.
HOMEPAGE NEWS SECTION – e.g. „Wydarzenia”
HTML source: http://www.pjwstk.edu.pl/, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
Change Explanation
<div class="r2">
<P><font
color="#000000"><b>Wydarzenia</b></font>
</P>
<img alt=""
src="i/linia_szara_216px.gif">
<P><a
href="?strona=2113">Spotkanie z Japonią recital fortepianowy</a> </P> <img alt=""
src="i/linia_szara_216px.gif">
<div class=”news”>
- The section name „AKTUALNOŚCI” (or “wydarzenia”) should be linked
(composed in a way explained above) and lead to separate subpage
including all news list.
<h2><a href=”/news”
title=”wydarzenia”>Wydarzenia </a></h2>
<a class=”title”>Spotkanie z Japonią – recital
fortepian owy</a>
<span class=”text”>Ambasada Japonii zaprasza na
recital fortepianowy Tempei Nakamura 中村天平
który odbędzie się 8 lipca 2011 r. o godz. 19:00, w
Auli Głównej PJWSTK</span>
</div>
- As crawlers browse website textual content to find keywords, it is advised
to give some leading introductory) text below each news title. Such a
leading text should consist of relevant keywords important to crawlers
- Also giving some dynamics to the website (in terms of changing its textual
content on average every 3 months) does pay off, and is regarded as
advantage by search engines
- News titles are incorporated within links, which is correct, but these links
do not include any keywords; thus, they do not help crawlers’ indexation.
- SEO links (proposed in the shifting column) include important keywords,
which help indexation
58
SUBPAGE “OPŁATY” HTML METATAGS
HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
Change Explanation
<title>PJWSTK. Polsko-Japooska Wyższa Szkoła
Technik Komputerowych.</title>
<meta name="Language" content="pl">
<title>Opłaty - PJWSTK. Polsko-Japooska Wyższa
Szkoła Technik Komputerowych.</title>
Except for all the hints that have been mentioned for the homepage head
section above, it would be advised to incorporate the following
suggestions:
<meta name="Description"
content="Najlepsza niepubliczna wyższa szkoła
informatyczna w Polsce">
<meta name="Keywords"
content="informatyka, japonistyka, kultura
japonii, programowanie, grafika, sztuka,
magisterskie, inżynierskie, zaoczne, uczelnia,
Informatyka po ekonomii, Informatyka po
socjologii, Informatyka po marketingu,
Informatyka po zarządzaniu, Informatyka po
psychologii, Informatyka po kulturoznawstwie,
Informatyka po dziennikarstwie, Informatyka
po kierunkach humanistycznych">
<meta name="Description" content="Opłaty PJWSTK.
Najlepsza niepubliczna wyższa szkoła informatyczna w
Polsce" />
<meta name="Keywords" content="opłaty,
informatyka, japonistyka, kultura japonii,
programowanie, grafika, sztuka, magisterskie,
inżynierskie, zaoczne, uczelnia, Informatyka po
ekonomii, Informatyka po socjologii, Informatyka po
marketingu, Informatyka po zarządzaniu, Informatyka
po psychologii, Informatyka po kulturoznawstwie,
Informatyka po dziennikarstwie, Informatyka po
kierunkach humanistycznych" />
SUBPAGE „OPŁATY” - GENERAL HTML STRUCTURE
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div id="wrapper">
<div id="naglowek">…</div>
this is correct, no suggestion needed
-
As this is the “opłaty” subpage, it is strongly advised to add the word
“opłaty” to this subpage TITLE textual content. It will help index it by
crawlers for payments
-
The best URL link for this subpage would be like:
http://www.pjwstk.edu.pl/oplaty-pjwstk,
http://www.pjwstk.edu.pl/oplaty
or
even
better
like
-
As this subpage is devoted to payments, the keyword “oplaty” is the
most crucial one and should be added either to the KEYWORDS
metatag or to the DESCRIPTION metatag. The metatag KEYWORDS
would be even more appropriate
-
There are too many keywords enlisted after commas within the
KEYWORDS metatag; there should not be more than 10 keywords
enlisted
-
Despite the fact that this subpage is devoted to payments, none of
the enlisted keywords concerns payments at all
HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011
Change Explanation
-
The whole textual content of the homepage is divided into top and
middle sections. All the sections are positioned with the help of <DIV>
59
<div id="pierwsza">…</div>
<div id="środkowa">…</div>
<div id="czwarta">…</div>
</div>
elements (not table rows), which is correct
-
Formatting of each DIV box is defined in the CSS file, which is also
correct
The same boxes division is used in other subpages.
SUBPAGE “OPŁATY” - NAVIGATION
HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<p><a
href="http://www.pjwstk.edu.pl/?">Strona
główna</a> &gt;&gt; <a
href="http://www.pjwstk.edu.pl/?kat=189">
Rekrutacja</a><h1>Opłaty</h1><p></p><p>
<div id=”navigation”>
<a href="http://www.pjwstk.edu.pl" title=”strona
główna”>Strona główna</a> &gt;&gt; <a
href="http://www.pjwstk.edu.pl/rekrutacja"
title=”rekrutacja”>Rekrutacja</a>
<h1>Opłaty</h1>
</div>
<p>Kolegium Rektorskie PJWSTK wprowadziło
obniżkę czesnego dla osób
wpłacających czesne w jednej racie rocznej (w
wysokości 7%) lub w dwóch
ratach semestralnych (w wysokości 3%).</p>
<p>&nbsp;</p>
Change Explanation
-
Navigation links miss their TITLE attributes; in this case, it is the most
appropriate to fill in the attributes with the keyword “opłaty”, as the
whole subpage is devoted to payments. This would help index this
subpage by crawlers
-
All the new lines within the current HTML code are generated with the
“<p>” tag, which is not correct, as this generates additional useless
HTML code that has to be downloaded each time from the PJWSTK
server. It is always advised to format text contents within CSS style
sheet files. CSS formatting makes the HTML code more legible and
helps avoid overweighting servers. For example, the following CSS
“#navigation” declaration will get rid of the “<p>”s from the header,
while the “.class” one will remove all the “<p>&nbsp;</p>”s from the
XHTML code:
<p class=”text”>Kolegium Rektorskie PJWSTK
wprowadziło obniżkę czesnego dla osób
wpłacających czesne w jednej racie rocznej (w
wysokości 7%) lub w dwóch
ratach semestralnych (w wysokości 3%).</p>
#navigation {margin:10px 0px;width:100%}
.text {margin-bottom:20px}
-
The crucial keyword “opłaty” for this subpage has been enclosed with
the H1 tag, which is correct; search engines crawlers search for the
most important keywords within the H1 and H2 tags
60
-
The navigation link leading to recruitment subpage should include this
keyword “rekrutacja” within its textual content, which is missing; an
example of such a link would be:
http://www.pjwstk.edu.pl/rekrutacja/
SUBPAGE “OPŁATY” – LEFT SECTION “REKRUTACJA”
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div class="r3"><ul><li><a
href="http://www.pjwstk.edu.pl/?strona=160
4">Opłaty</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?strona=160
6">Kursy przygotowawcze i
maturalne</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?strona=209
6">Transfer z innych uczelni</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?strona=160
5">Szczegółowe zasady rekrutacji na kierunki
artystyczne</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?strona=160
3">Rejestracja on-line</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?kat=223">Z
asady rekrutacji</a></li>
<div class="r3">
<ul>
<li><a href="http://www.pjwstk.edu.pl/oplaty"
title=”opłaty”>Opłaty</a></li>
<li><a href="http://www.pjwstk.edu.pl/kursyprzygotowawcze-i-maturalne" title=”kursy
przygotowawcze i maturalne”>Kursy
przygotowawcze i maturalne</a></li>
<li><a href=”http://www.pjwstk.edu.pl/transfer-zinnych-uczelni” title=”transfer z innych
uczelni”>Transfer z innych uczelni</a></li>
<li><a
href=”http://www.pjwstk.edu.pl/szczegolowezasady-rekrutacji-na-kierunki-artystyczne”
title=”Szczegółowe zasady rekrutacji na kierunki
artystyczne”>Szczegółowe zasady rekrutacji na
kierunki artystyczne</a></li>
<li><a href=”http://www.pjwstk.edu.pl/rejestracja”
title=”rejestracja”>Rejestracja on-line</a></li>
<li><a href=”http://www.pjwstk.edu.pl/zasady-
HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011
Change Explanation
-
Section links are enlisted with <UL>/<LI>, which is correct
-
All links miss keywords, which would help crawlers index the subpages
for these words. SEO friendly links have been proposed in the second
column, though
-
All links miss TITLE attributes; these titles ought to be filled in with
appropriate keywords, which has been also proposed to the left
-
After having clicked on one of the links, the opened subpage should
include textual content that is connected with the keywords
incorporated within links and TITLE attributes; since search engines
crawlers verify subpage textual content against used keywords, that
would increase the subpage page rank significantly
-
When composing link names that are SEO friendly, it is worth
considering first what searching phrase would be typed in by users to
search for information that this link will lead to; having known the
most appropriate keywords or keyword phrase, it is worth inserting
these words within the SEO link
61
</ul></div>
rekrutacji” title=”zasady rekrutacji”>Zasady
rekrutacji</a></li>
</ul>
</div>
SUBPAGE “OPŁATY” – LEFT COLUMN IMAGE LINKS
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div style="margin-top:0px;" class="r1">
<img alt="" src="oplaty_pliki/strzalka.gif"
style="">
<a
href="http://www.pjwstk.edu.pl/?kat=187">
<img style="margin: 0px;"
src="oplaty_pliki/uczelnia_1.gif"
onmouseover="src='i/uczelnia_2.gif'"
onmouseout="src='i/uczelnia_1.gif'" alt=""
border="0"></a></div>
<div class=”r1”>
<a href=”http://www.pjwstk.edu.pl/oplaty”
class=”main” title=”opłaty PJWSTK”></a>
<div class="r2"><img alt=""
src="oplaty_pliki/linia_216px.gif"><p></p></
div>
…
<p>
<a
href="http://samorzad.pjwstk.edu.pl/"><img
alt="samorzad"
…
<a href=”http://www.samorzad.pjwstk.edu.pl/” title=”
Samorząd” class=”link”>Samorząd</a>
<a href=”http://www.samorzad.pjwstk.edu.pl/”
title=”Biuro Karier i Praktyk” class=”link”>Biuro Karier i
Praktyk</a>
…
</div>
HTML source: http://www.pjwstk.edu.pl/?strona=1604, retrieved July 2011
Change Explanation
-
SEO friendly links that include appropriate keywords have been
proposed, which will help subpage indexation for search engines
crawlers
-
It is possible to simply abbreviate the current HTML code by
formatting the text within CSS style sheet files; changing links
background images can also be done via CSS styles. Such an operation
gets rid of all the useless HTML code (seen in the middle column)
which now has to be downloaded from the server; for the proposed
XHTML code in the middle column, such CSS formatting would look
more or less like the following:
.r1 A.main {display:block;width:100%;height:30px; background-image:
ur(i/mainlink.gif) no-repeat left bottom;text-decoration:none}
.r1 A.link {color:gray;text-decoration:none; color:gray;fontsize:14px;font-weight:bold; width:100px;paddingleft:30px;background: url(i/link.gif) no-repeat left bottom}
.r1 A.link:hover {color:red;text-decoration:none; font-size:14px;font62
src="oplaty_pliki/samorzad.gif"
border="0"></a><br>
weight:red;width:100px; padding-left:30px;background: url(i/link.gif)
no-repeat left bottom}
<a
href="http://www.biurokarier.pjwstk.edu.pl/
"><img alt="biurokarier"
src="oplaty_pliki/biuro.gif"></a><br>
</p>
HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011
SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” HTML METATAGS AND CONTENT
OVERWIEW
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
Change Explanation
<title>PJWSTK. Polsko-Japooska Wyższa
Szkoła Technik Komputerowych.</title>
<title>Informatyka - PJWSTK. Polsko-Japooska Wyższa
Szkoła Technik Komputerowych.</title>
-
<meta name="Description"
content="Najlepsza niepubliczna wyższa
szkoła informatyczna w Polsce">
<meta name="Description" content="Informatyka.
Studia I-go stopnia na PJWSTK. Najlepsza niepubliczna
wyższa szkoła informatyczna w Polsce">
-
<meta name="Keywords"
content="informatyka, japonistyka, kultura
japonii, programowanie, grafika, sztuka,
magisterskie, inżynierskie, zaoczne, uczelnia,
Informatyka po ekonomii, Informatyka po
socjologii, Informatyka po marketingu,
Informatyka po zarządzaniu, Informatyka po
psychologii, Informatyka po
kulturoznawstwie, Informatyka po
<meta name="Keywords" content="informatyka studia
I-ego stopnia, informatyka studia pierwszego stopnia,
informatyka pierwszego stopnia, japonistyka, kultura
japonii, programowanie, grafika, sztuka, inżynierskie,
zaoczne, uczelnia, informatyka po ekonomii,
Informatyka po socjologii, Informatyka po marketingu,
Informatyka po zarządzaniu, Informatyka po
psychologii, Informatyka po kulturoznawstwie,
Informatyka po dziennikarstwie, Informatyka po
-
-
The keywords "Informatyka" and ”I-ego stopnia” are the most crucial
for this section, but they are missing in the subpage metatags (TITLE,
KEYWORDS or DESCRIPTION)
It is advised to brainstorm and enlist appropriate keywords or key
phrases for each PJWSTK subpage and include them in section
metatags. Crawlers try to associate metatags information with
website textual content and store it altogether in search engines
databases. When indexing it, they examine whether all this
information suits each other to give this association some page rank
value
Currently used keywords are not so much adequate for the
“informatyka – I-ego stopnia” specialization; there are keywords that
should not be concerned here, e.g. “magisterskie” – this keyword is
more appropriate for the “informatyka – II-go stopnia” specialization
and ought to be indexed there
The whole textual content of this subpage mostly consists of links,
63
dziennikarstwie, Informatyka po kierunkach
humanistycznych">
<meta http-equiv="Generator"
content="TigerII MiniPad (C)2001">
kierunkach humanistycznych">
<meta http-equiv="Generator" content="TigerII
MiniPad (C)2001">
which is not so much correct, e.g. the middle column provides only a
set of links. Except for them, there is no textual content suitable for
this section (meaning textual description without links that would mix
keywords in various forms appropriate for this specialization)
SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” - NAVIGATION
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<a href="http://www.pjwstk.edu.pl/?"
>Strona główna</a> &gt;&gt; <a
href="http://www.pjwstk.edu.pl/?kat=206">S
tudia I-go stopnia</a> &gt;&gt; <a
href="http://www.pjwstk.edu.pl/?kat=209">I
nformatyka</a><h1>Informatyka</h1>
<a href=”http://www.pjwstk.edu.pl/” title=”strona
główna”>Strona główna</a> &gt;&gt; <a
href=”http://www.pjwstk.edu.pl/studia-i-stopnia”
title=”Studia I-go stopnia”>Studia I-go stopnia </a>
&gt;&gt; <h1>Informatyka</h1>
HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011
Change Explanation
-
SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” – MIDDLE BOX „INFORMATYKA”
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<li><a
href="http://www.pjwstk.edu.pl/?strona=167
3">Specjalizacje</a></li>
…
<li><a
href="http://www.pjwstk.edu.pl/?strona=167
<li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/specializacje” title=”informatyka I-go
stopnia - specializacje” >Specjalizacje</a></li>
<li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/studia-dzienne/program-nauczania”
Links miss their TITLE attributes that ought to include keywords
appropriate for this section
The section name “informatyka” has been inserted into the H1 tag,
which is correct
Links are not SEO friendly because they are not composed of any
keywords appropriate for this section; the information “kat=206”
gives no direction for crawlers how to index this HTML part
HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011
Change Explanation
-
As computer science concerns the first level specialization here, it
would be better to give this important information within SEO links for
an ant; that is why the middle column links have been reorganized
structurally and logically. The statement “?strona=167” says nothing
valuable to search engines in terms of positioning and SEO strategies.
These are appropriate places in XHTML code to communicate with
64
5">Program nauczania - studia
dzienne</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?strona=167
7">Studia Otwarte (internetowe)</a></li>
title=”Informatyka I-go stopnia studia dzienne program
nauczania”>Program nauczania - studia
dzienne</a></li>
-
<li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/studia-internetowe” title=”Informatyka I
stopnia – studia internetowe”>Studia Otwarte
(internetowe)</a></li>
SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” – LEFT BOX „INFORMATYKA”
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
The following code has been unnecessarily
doubled in the middle column:
The following changes for XHTML code have already
been proposed in the table above:
HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011
Change Explanation
-
<li><a
href="http://www.pjwstk.edu.pl/?strona=167
3">Specjalizacje</a></li>
…
<li><a
href="http://www.pjwstk.edu.pl/?strona=167
5">Program nauczania - studia
dzienne</a></li>
<li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/specializacje” title=”informatyka I-go
stopnia - specializacje” >Specjalizacje</a></li>
<li><a href=”http://www.pjwstk.edu.pl/informatyka/igo-stopnia/studia-dzienne/program-nauczania”
title=”Informatyka I-go stopnia studia dzienne program
nauczania”>Program nauczania - studia
dzienne</a></li>
ants and they should not have been emptied
The whole concept of how to organize information in pages and
subpages should be considered before SEO links are composed, as the
words “informatyka” and “I-go stopnia” can be changed in positions.
This depends on the logics and structure that the whole PJWSTK
website sticks to
This code should have been placed only once in the subpage, because
such repetition can be regarded as “farm of links” by crawlers
Except for doubling the set of links, it is better to leave the links in the
left column, and remove them from the middle column. After that,
the middle column could contain textual information (mixed with
appropriate keywords) for the left column opened links. Thus, the
whole subpage gathers both links and textual content, which would be
more appropriate
<li><a
href="http://www.pjwstk.edu.pl/?strona=167
7">Studia Otwarte (internetowe)</a></li>
65
SUBPAGE “STUDIA I STOPNIA - INFORMATYKA” – LEFT COLUMN IMAGES LINKS
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div style="margin-top:9px;" class="r1">
<img alt=""
src="studia_i_stopnia_informatyka_pliki/strza
lka.gif" style="">
<a
href="http://www.pjwstk.edu.pl/?kat=206">
<img style="margin: 0px;"
src="studia_i_stopnia_informatyka_pliki/stud
ia_I_st_1.gif"
onmouseover="src='i/studia_I_st_2.gif'"
onmouseout="src='i/studia_I_st_1.gif'" alt=""
border="0"></a>
</div>
<div class=”r1”>
HTML source: http://www.pjwstk.edu.pl/?kat=209, retrieved July 2011
Change Explanation
-
<ul>
<li><a href=”http://www.pjwstk.edu.pl/studia-i-gostopnia/informatyka” title=”Studia I-go stopnia
Informatyka”>Studia I-go stopnia
Informatyka</a></li>
</ul>
</div>
-
-
SUBPAGE “ZASADY REKRUTACJI” HTML METATAGS AND CONTENT OVERVIEW
By placing all the text formatting within CSS files, it is possible to
significantly abbreviate the HTML code
Images content cannot be accessed by crawlers, so there is no point in
making images links; these should be substituted for textual links
When using images, their ALT attribute has to be added and filled in
with adequate keywords, as crawlers verify links TITLEs and images
ALTs to gather indexation information
Different links structure has been proposed, so that they become SEO
friendly
Links have been inserted into <ul>/<li> list
HTML source: http://www.pjwstk.edu.pl/?kat=223, retrieved July 2011
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
Change Explanation
<title>PJWSTK. Polsko-Japooska Wyższa
Szkoła Technik Komputerowych.</title>
<title>Rekrutacja PJWSTK. Polsko-Japooska Wyższa
Szkoła Technik Komputerowych</title>
-
<meta name="Description"
content="Najlepsza niepubliczna wyższa
szkoła informatyczna w Polsce">
<meta name="Description" content="Zasady rekrutacji
PJWSTK. Informatyka na studiach I i II stopnia.
Rekrutacja na kierunki Sztuka Nowych mediów, kultura
-
This subpage is devoted to recruitment, but no such information is
mentioned within metatags, that is why different textual content for
metatags has been proposed
There are too many keywords enlisted in the KEYWORD metatag in
the current html code. These words do not strictly identify the
subpage with recruitment
66
Japonii oraz studia podyplomowe" />
<meta name="Keywords"
content="informatyka, japonistyka, kultura
japonii, programowanie, grafika, sztuka,
magisterskie, inżynierskie, zaoczne, uczelnia,
Informatyka po ekonomii, Informatyka po
socjologii, Informatyka po marketingu,
Informatyka po zarządzaniu, Informatyka po
psychologii, Informatyka po
kulturoznawstwie, Informatyka po
dziennikarstwie, Informatyka po kierunkach
humanistycznych">
-
<meta name="Keywords" content=" rekrutacja
informatyka, rekrutacja japonistyka, rekrutacja kultura
japonii, programowanie, grafika, sztuka, rekrutacja
magisterskie, rekrutacja inżynierskie, rekrutacja PJWSTK
zaoczne, rekrutacja studia podyplomowe " />
SUBPAGE “ZASADY REKRUTACJI” - NAVIGATION
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<a
href="http://www.pjwstk.edu.pl/?">Strona
główna</a> &gt;&gt; <a
href="http://www.pjwstk.edu.pl/?kat=189">
Rekrutacja</a> &gt;&gt; <a
href="http://www.pjwstk.edu.pl/?kat=223">Z
asady rekrutacji</a><h1>Zasady
rekrutacji</h1>
<a href=”http://www.pjwstk.edu.pl/” title=”strona
główna”>Strona główna</a> &gt;&gt; <a
href=”http://www.pjwstk.edu.pl/rekrutacja”
title=”Rekrutacja”>Rekrutacja</a> &gt;&gt; <h1>Zasady
rekrutacji</h1>
Similarly to the “Studia I-go stopnia informatyka”, this subpage
includes only sets of links and does not include any passages of text; it
would be advised to combine links with text here
HTML source: http://www.pjwstk.edu.pl/?kat=223, retrieved July 2011
Change Explanation
-
Links miss their TITLE attributes that ought to include keywords
appropriate for sections they lead to
The section name “Zasady rekrutacji” has been inserted into the H1
tag, which is correct
Current links are not SEO friendly because they do not inherit any
keywords appropriate for this section; the information “kat=189”
gives no information for search engines indexation
67
SUBPAGE “ZASADY REKRUTACJI” – MIDDLE COLUMN „ZASADY REKRUTACJI”
Current website HTML code
Shifting to XHTML 1.0 Transitional Code
<div class="r3"><ul><li><a
href="http://www.pjwstk.edu.pl/?strona=160
4">Opłaty</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?strona=160
6">Kursy przygotowawcze i
maturalne</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?strona=209
6">Transfer z innych uczelni</a></li>
<li><a
href="http://www.pjwstk.edu.pl/?strona=160
3">Rejestracja on-line</a></li>
</ul></div>
<div class="r3"><ul><li><a
href="http://www.pjwstk.edu.pl/zasadyrekrutacji/oplaty" title=”opłaty”>Opłaty</a></li>
<li><a href="http://www.pjwstk.edu.pl/ zasadyrekrutacji/kursy-przygotowawcze-i-maturalne"
title=”kursy przygotowawcze i maturalne”>Kursy
przygotowawcze i maturalne</a></li>
<li><a href="http://www.pjwstk.edu.pl/ zasadyrekrutacji/transfer-z-innych-uczelni" title=”transfer z
innych uczelni”>Transfer z innych uczelni</a></li>
…
<li><a href="http://www.pjwstk.edu.pl/zasadyrekrutacji/rejestracja-online">Rejestracja online</a></li> </ul></div>
HTML source: http://www.pjwstk.edu.pl/?kat=223, retrieved July 2011
Change Explanation
-
All TITLE attributes have been added to links
-
Links have become SEO friendly and contain keywords to increase
page rank value for this subpage and to indicate to search engines
what the subpage refers to
-
The current links structure “?strona=160” does not include any clue
data for crawlers, so they have to guess by the surrounding textual
content what section subpage concerns;
-
the obstacle is that except for sets of links, there is no other textual
content in this subpage, which is not correct
68
3. Conclusion
While conducting the examination of the PJWSTK website, I have investigated and
checked some chosen parts of its HTML code against current SEO optimization standards
and techniques. Having considered all the requiring and crucial information that is
searched for by crawlers, I have checked all the key places in the HTML code where such
the information was expected to be found. Because current standards of designing
websites base structurally on the XHTML, all the suggested modifications and suggestions
in my analysis stick to these W3C directions. The aim of the examination was to verify
whether the current PJWSTK html code has been prepared in such a way that generates its
high page rank value, thus pushing the website up in the most common search engines
results list.
After having examined the code of the home page and other three chosen subpages, I
have unfortunately come to the conclusion that it had been built in a way that does not
support search engines indexation and does not stick to the W3C international standards.
In my analysis many changes had to be proposed for the current version of the website,
and that is why, when working over a new version of the website, the following major SEO
optimization strategies and suggestions should be unavoidably implemented:
1. The outdated HTML code should be substituted for the XHTML one. The
explanation for this shift has already been elaborated on in the previous chapters
of this diploma work.
2. When browsing the website, the content of the TITLE, KEYWORDS and
DESCRIPTION metatags does not change at all, or they merely change, which does
not support indexation and does not generate high page rank values. Each subpage
contains different textual content and different sets of links, but it is not stated
within the metatags. There are many subpages where textual content varies but
their keywords list still remains unchanged. In many cases the chosen keywords do
not suit the subpage textual content. From crawlers’ point of view it means low
page rank value.
3. The structure of the website links and images has to be repaired, as their key
attributes are missing or remain with no value. Almost in all cases, the links miss
their TITLE attributes. They should be filled in with appropriate keywords, adequate
69
to the subpage textual content, so that the whole content is cohesive. The same
goes about images ALT attributes, they should have been filled in with appropriate
subpage keywords. It is commonly known that one of the search engines crawlers’
algorithms is to check and analyze the contents of TITLEs and ALTs.
4. The structure of links is not SEO friendly and does not support indexation at all. In
all cases (except for sub-domains that naturally consist of keywords in their
prefixes) the link includes some database information like “?strona=123”, where
the number 123 means just the website inner database row ID. But this means
nothing to crawlers, because it does not include any keywords. If the subpage is
devoted to recruitment or payments, the whole URL link should include such clue
keyword in it, but it does not.
5. There are many links, especially in the left column of the subpages, that are images
links, but they should be textual ones. All the keywords implemented within images
are legible for us as human beings, but image contents cannot be accessed by
robots. Thus, any keywords cannot be distinguished by crawlers. The textual
content of links is another place to be investigated by crawlers, and being aware of
that fact, they should be prepared and implemented reasonably.
6. Almost all subpages consist of only sets of links; they do not contain any other
textual content. This can be regarded as a farm of links and the whole subpage can
be classified as kind of spamming. For SEO positioning and for high page rank
values, it always pays off to implement links altogether with raw textual content
and mix them with each other.
7. Almost all the formatting of text is done with the help of other HTML elements,
which at the end gives much unnecessary HTML code. The whole website code
weighs too much and is overloaded. It has to be downloaded each time from the
PJWST server, so it wastes the server transfer and power. All text formatting should
be done with the help of CSS style sheet files, which is one of the W3C standards
and SEO optimization methods. CSS declarations purify HTML codes from needless
tags and allow robots to concentrate on pure textual contents and incorporated
keywords.
70
REFERENCES
Danowski, B., Makaruk, M. (2007). Pozycjonowanie i optymalizacja stron WWW. Jak to się
robi.
Gliwice: Helion
Grappone, J., Couzin, G. (2010). Godzina dziennie z SEO: Wejdź na szczyty wyszukiwarek.
Gliwice: Helion
Ledford, J. (2009). SEO Biblia.
Gliwice: Helion
Lieb, R. (2010). Pozycjonowanie w wyszukiwarkach internetowych: Poznaj najlepsze
praktyki
pozycjonowania i bądź zawsze pierwszy.
Gliwice: Helion
Thurow, S. (2008). Pozycjonowanie w wyszukiwarkach internetowych. Autorytety
informatyki.
Gliwice: Helion
Zeldman, J. (2007). Projektowanie serwisów www: Standardy sieciowe.
Gliwice: Helion
ON-LINE REFERENCES
http://en.wikipedia.org/wiki/Web_crawler, retrieved March 15, 2011
http://www.w3.org/MarkUp/html-spec/, retrieved April 6, 2011
http://en.wikipedia.org/wiki/Web_standards, retrieved April 6, 2011
http://pl.wikipedia.org/wiki/HTML, retrieved April 7, 2011
http://en.wikipedia.org/wiki/Web_browser, retrieved April 10, 2011
http://www.w3.org/Consortium, retrieved April 10, 2011
http://en.wikipedia.org/wiki/XHTML, retrieved April 11, 2011
http://www.w3.org/Style/CSS/, retrieved April 22, 2011
http://www.w3.org/DOM/#what, retrieved April 22, 2011
71
http://www.marketleap.com/verify, retrieved May 8, 2011
http://en.wikipedia.org/wiki/Search_engine_optimization, retrieved May 10, 2011
http://en.wikipedia.org/wiki/Search_engine_marketing/, retrieved May 10, 2011
http://amrithallan.com/what-is-organic-seo/, retrieved May 10, 2011
http://en.wikipedia.org/wiki/PageRank, retrieved May 11, 2011
http://students.ceid.upatras.gr/~papagel/project/kef5_5.htm, retrieved May 13, 2011
http://en.wikipedia.org/wiki/Keyword_density, retrieved May 13, 2011
http://en.wikipedia.org/wiki/Keyword_stuffing, retrieved May 13, 2011
http://www.google.pl/webmasters, retrieved May 13, 2011
http://www.w3schools.com/TAGS/tag_hn.asp, retrieved May 18, 2011
http://www.mattcutts.com/blog/seo-advice-url-canonicalization/, retrieved May 19, 2011
72