Webometrics and SEO

Transcription

Webometrics and SEO
Information Retrieval: Other
Issues & SEO
-1-
IR Issues
• So is that all there is to (text) IR … word
match, weighting, network topology and black
magic ? Is that search ? No way !
–
–
–
–
–
–
Word senses
Relevance feedback and query expansion
Fusion of retrieval
Distributed search
Sub-document retrieval
Question answering
-2-
Sheep Attacks Rocket
?
-3-
Sheep Attacks Rocket
?
-4-
Nut screws washer and bolts
-5-
Word Sense Indexing
• Formats of dictionaries vary but each has a short text
description of each sense of a word.
My concise OED has the following for BAR:
•
(n)
long piece of metal, chocolate, soap;
strip of silver below clasp of medal as an additional distinction;
rod/pole to fasten or confine on a window;
immaterial restriction (entry to a pub);
place for prisoner to stand at court;
rail dividing off space;
pub counter;
place for refreshments;
(v)
to fasten with a bar;
(n)
large Mediterranean fish, edible;
(n)
Unit of pressure ~ 105N/m2
(prep)
except, as in racing odds
•"bar" is also a legal exam !
• Index docs by word senses
– Parse sentence and identify the word sense
– Wouldn’t that be better ? Yes, but can’t be done.
-6-
Relevance Feedback
…feeding back to a system, some relevance judgments that the
system can use to reformulate the search in an attempt to
improve the quality of the search…
.. Initiated by the user… i.e. a manual process.
5, exit
1, process
query as normal
2, select
relevant docs
from Results…
3, generate
Document
vectors and
process
4, a new
query is
made
New
Query
Query
-7-
Query Expansion
…the process of expanding a query to incorporate more terms,
either manually or automatically…
As “relevant” documents are “found”, these can be used as a
source for the system to improve the quality of the unseen
document ranking.
Once these terms have been identified, they can be:
1.
2.
presented to the user for manual addition to the query, and
re-processing
automatically added to a query, to do the same thing, in a
process called pseud-relevance feedback. Here we run a
query, select terms from top-N docs, expand query, re-run,
and THEN present to user.
Why is it not used in web search engines ?
-8-
Thesaurus-based QE
• Use a thesaurus to suggest words to
add to the query
• Domain independent or dependent
• CRASH -> failure collapse breakdown collide fail 'pack up‘
'stop working’
'break down‘
accident
-9-
smash
bump
Thesaurus-based QE
• Where to get such a thesaurus ?
1. Use a controlled vocabulary that is
maintained by human editors - expensive so
only for well-resourced domains, like MeSH in
medicine.
2. A manual thesaurus - also expensive, and
static, but domain-independent.
3. Automatically derived thesaurus using word
co-occurrence statistics over a collection of
documents
4. Use query log mining from other users to
make suggestions, requires huge query
volume, appropriate to web search
- 10 -
Other Issues for IR
• More Like This Search … you will have seen it in large
search engines as a form of Relevance Feedback
- 11 -
Other Issues for IR
• Retrieval as a combination of several
retrieval strategies… data fusion…
Query
TF-IDF
BM25
BM25 + LA
LSI
Ranked List
voting
- 12 -
Other Issues for IR
• distributed text retrieval is big due to large
collections being inherently distributed and the
increasing growth of internet ... people want to
be able to search +1 text database with one
search ... RIAN.IE is an example
1, Query
2, Search
Metasearch
3, Merge
- 13 -
4, Result
Other Issues for IR
• … what if documents are large? ... hence the emergence of
passage retrieval…
2
4
1
B
A
3
C
• problem of applying standard IR techniques to
heterogeneous lengths documents ...
– text-tiling or
– chopping documents up into “pages” of approx same length
- 14 -
Question-Answering Systems
• Q&A systems - the holy grail of information access, but
how do they work ? Google Anssers and Yahoo!
Answers are manual
“Who invented Penicillin ?”
• Simple (shallow) way is to segment (text) documents
into “answers” by recognising names, locations, dates,
times, monetary amounts, names of people, company
names, etc … and index each by the surrounding
context (words), then search the contexts for the right
class of answer, rank, and present
• This works - kinda - sometimes - but poorly.
AskJeeves.com/ask.com were the early adopters www.wolframalpha.com is the most famous current one
• More sophisticated semantic based QA has been a goal
of artificial intelligence .. for decades. We’re still
waiting.
• Meanwhile its trickling into search engines … try it on
Google, Bing and Yahoo!
- 15 -
Ionaut Q&A
- 16 -
Ionaut Q&A
- 17 -
Ionaut Q&A
- 18 -
Ionaut Q&A
- 19 -
Ionaut Q&A
- 20 -
Search Engine Optimisation
• What’s the most-asked question of people
who say they know about search … can you
optimise my web page !
• SEO is a business, a meta-business:
– process of improving the volume or quality of
traffic to a web site from search engines via search
results
– preparing a website to enhance its chances of
being ranked in the top results of a search engine
– process whereby web pages are designed, built
and modified with search engine results in mind
• Because of the huge importance of SEs,it
follows that manipulating their output is a
business opportunity.
- 21 -
Search Engine Optimisation
• If you know how a process works, you can
reverse it.
• But we don’t know how SEs work, we know
their ingredients, but not the recipe, and the
recipe changes, constantly.
• So it’s a cold war/game in a technology
landscape - SEs vs. SEO - with no referees,
and no scoreboard.
• http://searchenginewatch.com/ great
resource.
• Much of the following comes from Shumin Cao
of www.xpress-shop.com
- 22 -
What SEO Is?
• Search Engine Optimisation
• Improve website/page ranking
on search engines
• Increase site visibility
• Attract more site visitors
• Improve website traffic
• Increase site conversion rate
• An important Online Marketing Tool to
increase business revenue
- 23 -
What SEO is NOT?
• Build pages only for Search Engines
• An online marketing tool only for
attracting site visits - that’s SPAM
• Attracting site visits is only the first part
of customer engagement - there has to
be more follow-on
- 24 -
SEO Example
- 25 -
- 26 -
- 27 -
- 28 -
- 29 -
Organic SEO vs. Paid SEO (SEM)
Organic SEO
Paid SEO (PPC)
Cost
FREE
By clicks
Results
On all search engines
By search engine
Response Time
Weeks/ Months
Instant result
Suitable for
Medium to long term
strategy
Short term promotion
- 30 -
Ireland Search Marketing Statistics
• Why SEO ?
• 3,042,600 Internet users
in Ireland as of
Jun/10 (i.e.65.8% of the population )
• 81.3% of Irish Internet users are using
search engines every day
• 78% of people click only the results on the
first page
- 31 -
- More than 60% of people searching on Google will click on the first 3 listed
links
- About 30% of the traffic is distributed for the remaining positions on page 1
- And about 10% for page 2
- 32 -
SEO in 3 Steps
Step 1: Keywords Research & Analysis
Step 2: Onsite & Offsite Optimisation
Step 3: Ongoing Review &
Maintenance
- 33 -
Step 1: Keywords
• Avoid broad keywords
• Use well targeted keywords (location,
age, gender, specified type of services
or products etc.)
• Do Keywords Effectiveness Index (KEI)
Research
- 34 -
Step 1: Keywords
Google Keywords Tool
- 35 -
Step 1: Keywords
Google Keywords Tool
- 36 -
Step 1: Keywords
Google Keywords Tool
- 37 -
Step 1: Keywords
- 38 -
Step 1: Keywords
Optimise keywords by choosing words for
your HTML which are:
•
Relevant to your website
•
Have high search volumes
•
Have fewer competitors as possible
- 39 -
Step 1: Keywords
KEI: Keyword Effectiveness Index, which
measures how effective a keyword is for
your web site.
KEI = (Number of Monthly Keyword
Searches) / (Competing Pages)
- 40 -
Step 1: Keywords
KEI Analysis Report Example
- 41 -
Step 1: Keywords
KEI Analysis Report Example
- 42 -
Step 2: Optimisation - Onsite
What do you do with these good keywords for
your site ?
1. HTML Title Tag
2. Meta Tags
3. Headings (H1, H2 Tags)
4. Alt Tags
5. Page name/URL Optimisation
6. Sitemap (HTML & XML)
7. 301/404 Redirect
8. Good Quality Content
… and more probably, but these 8 will do …
- 43 -
Step 2: Optimisation - Onsite
1. HTML Title Tag Optimisation
<TITLE>
Online Magazines, Magazine Subscription Ireland, Magazines Ireland, Ireland's Online
Magazine
Shop
</TITLE>
<meta name="description" content="Ireland's Online Magazine Shop" >
<meta name="keywords" content="Online Magazines, Magazine Subscription Ireland, Magazines
Ireland, Ireland's Online Magazine Shop">
- 44 -
Step 2: Optimisation - Onsite
1. HTML Title Tag Optimisation
• Include keywords
• Work the keywords into phrases
• Put keywords First!
• Unique on each page
• Up to 70 characters
- 45 -
Step 2: Optimisation - Onsite
2. Meta Tag
<title>Online Magazines, Magazine Subscription Ireland, Magazines Ireland, Ireland's Online
Magazine Shop </title>
<META name=“DESCRIPTION" content="Ireland's Online Magazine Shop" >
<META name=“KEYWORDS" content="Online Magazines, Magazine Subscription
Ireland, Magazines Ireland, Ireland's Online Magazine Shop">
- 46 -
Step 2: Optimisation - Onsite
2. Meta Tag
• Use plural or singular version of keywords why ? We think it might matter in the SE
recipe but of course we don’t know. SEW tells
us it might.
• Meta description not exceed 200 characters
• Meta keywords between 10-12 words
- 47 -
Step 2: Optimisation - Onsite
3. Headings (H1, H2 Tags)
<h1 class="componentheading">The Course</h1>
<h2 class="contentheading"> Positive Psychology and Meditation </h2>
- 48 -
Step 2: Optimisation - Onsite
4. Alt Tag
<img src="/images/images/side.jpg" border="0" alt="Positive Psychology"
title="Positive Psychology" width="220" align="middle" />
- 49 -
Step 2: Optimisation - Onsite
5. Page name/URL Optimisation
http://www.xyz.com/product.php?id=123
http://www.xyz.com/product-name-keywords.php
e.g.
http://www.soonerthanlater.com/calendars-supplier.php
http://www.soonerthanlater.com/raffle-ticket.php
- 50 -
Step 2: Optimisation - Onsite
5. Page name/URL Optimisation
• Use URL rewrite
• Include targeted keywords as possible
• Separate words with hyphens
• Bit.ly hasn’t helped in using URL names but
the SEs may be resolving URL shortening,
especially if they purchase URL shortening
sites.
- 51 -
Step 2: Optimisation - Onsite
6. Sitemap (HTML & XML)
- 52 -
Step 2: Optimisation - Onsite
6. Sitemap (HTML & XML)
• Include keywords in HTML sitemap
• Have HTML sitemap linkable from the main
site
• Generate XML site map at
http://www.xml-sitemaps.com/
• Update frequently
• Why ? Its dressed up as a human navigation
aid, but its there to help the spiders.
- 53 -
Step 2: Optimisation - Onsite
7. 404/301 Redirect
A 404 error means “not found”
e.g. page name spelling mistake
A 301 error means “moved permanently”
e.g. domain name changes, content movement
Ugh! How offputting is that to a (human) visitor … and to a
spider it downgrades the weighting on your whole website,
points to sloppy maintenance, unclenliness, etc.
- 54 -
Step 2: Optimisation - Onsite
8. Good Quality Content
• Write as much useful content for visitors (not SE’s) as
possible.
• Sprinkle any text naturally with your target keywords
• Text instead of image
• Must be *original* content – i.e., not ‘lifted’ from
another site, why ? A “turnitin”-like process is run by
the SEs, they pre-calculate “find more like this”, and if
the similarity is too great, that’s bad.
• Update frequently
• Its ironic - “as an aside, make sure your content is
good” - so SEO is all about window dressing and
presentation.
- 55 -
Step 2: Optimisation - Offsite
• Submit URLs to Search Engines
e.g. http://www.google.com/addurl/?continue=/addurl
http://www.bing.com/webmaster/SubmitSitePage.aspx
• Backlinks on Free Directory Listings
e.g. DMOZ http://www.dmoz.org
• Backlinks on Paid Directories
e.g. Yahoo’s Directory, Business.com
•
This is a throwback to the first days of
the web when there were no spiders,
but the facility is still there.
- 56 -
Step 2: Optimisation - Offsite
• Articles backlinks
e.g. www.ezinearticles.com
• Press Release backlinks
e.g. www.prweb.com
• Social Network Profile
e.g. facebook, twitter, linked in
• Backlinks from Sites with high Google
Page Rank - play their game !
- 57 -
Step 2: Optimisation - Offsite
• 2nd Tier backlinks more Important 1st Tier
- 58 -
SEO Mistakes
•
•
•
•
•
•
Targeting the wrong keywords
Keywords stuffing
Not fully optimise your Homepage
Hidden text or hidden links
Pages loaded with irrelevant words
Duplicated content on multiple pages
- 59 -
Step 3: Ongoing Maintenance
• Measuring
– Use Google Analytics or statcounter.com for a different view
• Monitoring
– Check Rankings Periodically
• Reporting
– Record Rankings
• Tweaking
– Make Necessary Modifications or Further Optimisation as Ranking
Changes
- 60 -
10 SEO tips Businessinsider.com
Every business with a Web site should make Search
Engine Optimization -- trying to get your site as high
up as possible on Google and Bing search-results
pages -- a part of their growth strategy. At its most
basic, "SEO" means finding ways to increase your
site's appearance in web visitors' search results. This
generally means more traffic to your site. While
intense SEO can involve complex site restructuring
with a firm (or consultant) that specializes in this
area, there are a few simple steps you can take
yourself to increase your search engine ranking. All it
requires is a little effort, and some re-thinking of how
you approach content on your site.
- 61 -
1. Monitor where you stand. You won't know if your
SEO efforts are working unless you monitor your
search standings. MarketingVox suggests that you
keep an eye on your page rank with tools like Alexa
($199)and the Google toolbar.
2. 2. Keywords. You should be conscious of placing
appropriate keywords throughout every aspect of
your site: your titles, content, URLs, and image
names. Think about your keywords as search terms
-- how would someone looking for information on
this topic search for it?
3. 3. Link back to yourself There is probably no more
basic strategy for SEO than the integration of
internal links into your site -- it is an easy way to
boost traffic to individual pages, SEO Consult says.
- 62 -
4. Create a sitemap. Adding a site map -- a page listing
and linking to all the other major pages on your site
-- makes it easier for spiders to search your site.
"The fewer clicks necessary to get to a page on
your website, the better,”
5. Search-friendly URLs. Make your URLs more
search-engine-friendly by naming them with clear
keywords.
6. Avoid Flash. Flash might look pretty, but it does
nothing for your SEO. According to the Search
Engine Journal, "Frames, Flash and AJAX all share
a common problem - you can’t link to a single
page... Don’t use Frames at all and use Flash and
AJAX sparingly for best SEO results.”
- 63 -
7. Add image descriptions. Spiders can only search
text, not text in your images -- which is why you need
to make the words associated with your images as
descriptive as possible. Start with your image
names: adding an "ALT" tag allows you to include a
keyword-rich description for every image on your site.
8. Fresh content. Your content needs to be fresh -updating regularly and often is crucial for increasing
traffic. "The best sites for users, and consequently
for search engines, are full of oft-updated, useful
information about a given service, product, topic or
discipline," MarketingVox explains.
- 64 -
9. Use social media. A CEO blog is just one element of
social media distribution, an important SEO strategy
according to SEO Consult. You should be distributing
links to fresh content on your site across appropriate
social networking platforms. Whether displayed on
your company's account, or recommended, retweeted, and re-distributed by someone else, this
strategy exponentially muliplies the number of places
where visitors will view your links.
10. Link to others. An easy way to direct more traffic to
your site is by developing relationships with other
sites. PC World suggests that you personally ask the
webmasters of well-respected sites if they'll include a
link to your site on theirs. Be sure to return the favor - then everyone wins!
- 65 -