Webometrics and SEO
Transcription
Webometrics and SEO
Information Retrieval: Other Issues & SEO -1- IR Issues • So is that all there is to (text) IR … word match, weighting, network topology and black magic ? Is that search ? No way ! – – – – – – Word senses Relevance feedback and query expansion Fusion of retrieval Distributed search Sub-document retrieval Question answering -2- Sheep Attacks Rocket ? -3- Sheep Attacks Rocket ? -4- Nut screws washer and bolts -5- Word Sense Indexing • Formats of dictionaries vary but each has a short text description of each sense of a word. My concise OED has the following for BAR: • (n) long piece of metal, chocolate, soap; strip of silver below clasp of medal as an additional distinction; rod/pole to fasten or confine on a window; immaterial restriction (entry to a pub); place for prisoner to stand at court; rail dividing off space; pub counter; place for refreshments; (v) to fasten with a bar; (n) large Mediterranean fish, edible; (n) Unit of pressure ~ 105N/m2 (prep) except, as in racing odds •"bar" is also a legal exam ! • Index docs by word senses – Parse sentence and identify the word sense – Wouldn’t that be better ? Yes, but can’t be done. -6- Relevance Feedback …feeding back to a system, some relevance judgments that the system can use to reformulate the search in an attempt to improve the quality of the search… .. Initiated by the user… i.e. a manual process. 5, exit 1, process query as normal 2, select relevant docs from Results… 3, generate Document vectors and process 4, a new query is made New Query Query -7- Query Expansion …the process of expanding a query to incorporate more terms, either manually or automatically… As “relevant” documents are “found”, these can be used as a source for the system to improve the quality of the unseen document ranking. Once these terms have been identified, they can be: 1. 2. presented to the user for manual addition to the query, and re-processing automatically added to a query, to do the same thing, in a process called pseud-relevance feedback. Here we run a query, select terms from top-N docs, expand query, re-run, and THEN present to user. Why is it not used in web search engines ? -8- Thesaurus-based QE • Use a thesaurus to suggest words to add to the query • Domain independent or dependent • CRASH -> failure collapse breakdown collide fail 'pack up‘ 'stop working’ 'break down‘ accident -9- smash bump Thesaurus-based QE • Where to get such a thesaurus ? 1. Use a controlled vocabulary that is maintained by human editors - expensive so only for well-resourced domains, like MeSH in medicine. 2. A manual thesaurus - also expensive, and static, but domain-independent. 3. Automatically derived thesaurus using word co-occurrence statistics over a collection of documents 4. Use query log mining from other users to make suggestions, requires huge query volume, appropriate to web search - 10 - Other Issues for IR • More Like This Search … you will have seen it in large search engines as a form of Relevance Feedback - 11 - Other Issues for IR • Retrieval as a combination of several retrieval strategies… data fusion… Query TF-IDF BM25 BM25 + LA LSI Ranked List voting - 12 - Other Issues for IR • distributed text retrieval is big due to large collections being inherently distributed and the increasing growth of internet ... people want to be able to search +1 text database with one search ... RIAN.IE is an example 1, Query 2, Search Metasearch 3, Merge - 13 - 4, Result Other Issues for IR • … what if documents are large? ... hence the emergence of passage retrieval… 2 4 1 B A 3 C • problem of applying standard IR techniques to heterogeneous lengths documents ... – text-tiling or – chopping documents up into “pages” of approx same length - 14 - Question-Answering Systems • Q&A systems - the holy grail of information access, but how do they work ? Google Anssers and Yahoo! Answers are manual “Who invented Penicillin ?” • Simple (shallow) way is to segment (text) documents into “answers” by recognising names, locations, dates, times, monetary amounts, names of people, company names, etc … and index each by the surrounding context (words), then search the contexts for the right class of answer, rank, and present • This works - kinda - sometimes - but poorly. AskJeeves.com/ask.com were the early adopters www.wolframalpha.com is the most famous current one • More sophisticated semantic based QA has been a goal of artificial intelligence .. for decades. We’re still waiting. • Meanwhile its trickling into search engines … try it on Google, Bing and Yahoo! - 15 - Ionaut Q&A - 16 - Ionaut Q&A - 17 - Ionaut Q&A - 18 - Ionaut Q&A - 19 - Ionaut Q&A - 20 - Search Engine Optimisation • What’s the most-asked question of people who say they know about search … can you optimise my web page ! • SEO is a business, a meta-business: – process of improving the volume or quality of traffic to a web site from search engines via search results – preparing a website to enhance its chances of being ranked in the top results of a search engine – process whereby web pages are designed, built and modified with search engine results in mind • Because of the huge importance of SEs,it follows that manipulating their output is a business opportunity. - 21 - Search Engine Optimisation • If you know how a process works, you can reverse it. • But we don’t know how SEs work, we know their ingredients, but not the recipe, and the recipe changes, constantly. • So it’s a cold war/game in a technology landscape - SEs vs. SEO - with no referees, and no scoreboard. • http://searchenginewatch.com/ great resource. • Much of the following comes from Shumin Cao of www.xpress-shop.com - 22 - What SEO Is? • Search Engine Optimisation • Improve website/page ranking on search engines • Increase site visibility • Attract more site visitors • Improve website traffic • Increase site conversion rate • An important Online Marketing Tool to increase business revenue - 23 - What SEO is NOT? • Build pages only for Search Engines • An online marketing tool only for attracting site visits - that’s SPAM • Attracting site visits is only the first part of customer engagement - there has to be more follow-on - 24 - SEO Example - 25 - - 26 - - 27 - - 28 - - 29 - Organic SEO vs. Paid SEO (SEM) Organic SEO Paid SEO (PPC) Cost FREE By clicks Results On all search engines By search engine Response Time Weeks/ Months Instant result Suitable for Medium to long term strategy Short term promotion - 30 - Ireland Search Marketing Statistics • Why SEO ? • 3,042,600 Internet users in Ireland as of Jun/10 (i.e.65.8% of the population ) • 81.3% of Irish Internet users are using search engines every day • 78% of people click only the results on the first page - 31 - - More than 60% of people searching on Google will click on the first 3 listed links - About 30% of the traffic is distributed for the remaining positions on page 1 - And about 10% for page 2 - 32 - SEO in 3 Steps Step 1: Keywords Research & Analysis Step 2: Onsite & Offsite Optimisation Step 3: Ongoing Review & Maintenance - 33 - Step 1: Keywords • Avoid broad keywords • Use well targeted keywords (location, age, gender, specified type of services or products etc.) • Do Keywords Effectiveness Index (KEI) Research - 34 - Step 1: Keywords Google Keywords Tool - 35 - Step 1: Keywords Google Keywords Tool - 36 - Step 1: Keywords Google Keywords Tool - 37 - Step 1: Keywords - 38 - Step 1: Keywords Optimise keywords by choosing words for your HTML which are: • Relevant to your website • Have high search volumes • Have fewer competitors as possible - 39 - Step 1: Keywords KEI: Keyword Effectiveness Index, which measures how effective a keyword is for your web site. KEI = (Number of Monthly Keyword Searches) / (Competing Pages) - 40 - Step 1: Keywords KEI Analysis Report Example - 41 - Step 1: Keywords KEI Analysis Report Example - 42 - Step 2: Optimisation - Onsite What do you do with these good keywords for your site ? 1. HTML Title Tag 2. Meta Tags 3. Headings (H1, H2 Tags) 4. Alt Tags 5. Page name/URL Optimisation 6. Sitemap (HTML & XML) 7. 301/404 Redirect 8. Good Quality Content … and more probably, but these 8 will do … - 43 - Step 2: Optimisation - Onsite 1. HTML Title Tag Optimisation <TITLE> Online Magazines, Magazine Subscription Ireland, Magazines Ireland, Ireland's Online Magazine Shop </TITLE> <meta name="description" content="Ireland's Online Magazine Shop" > <meta name="keywords" content="Online Magazines, Magazine Subscription Ireland, Magazines Ireland, Ireland's Online Magazine Shop"> - 44 - Step 2: Optimisation - Onsite 1. HTML Title Tag Optimisation • Include keywords • Work the keywords into phrases • Put keywords First! • Unique on each page • Up to 70 characters - 45 - Step 2: Optimisation - Onsite 2. Meta Tag <title>Online Magazines, Magazine Subscription Ireland, Magazines Ireland, Ireland's Online Magazine Shop </title> <META name=“DESCRIPTION" content="Ireland's Online Magazine Shop" > <META name=“KEYWORDS" content="Online Magazines, Magazine Subscription Ireland, Magazines Ireland, Ireland's Online Magazine Shop"> - 46 - Step 2: Optimisation - Onsite 2. Meta Tag • Use plural or singular version of keywords why ? We think it might matter in the SE recipe but of course we don’t know. SEW tells us it might. • Meta description not exceed 200 characters • Meta keywords between 10-12 words - 47 - Step 2: Optimisation - Onsite 3. Headings (H1, H2 Tags) <h1 class="componentheading">The Course</h1> <h2 class="contentheading"> Positive Psychology and Meditation </h2> - 48 - Step 2: Optimisation - Onsite 4. Alt Tag <img src="/images/images/side.jpg" border="0" alt="Positive Psychology" title="Positive Psychology" width="220" align="middle" /> - 49 - Step 2: Optimisation - Onsite 5. Page name/URL Optimisation http://www.xyz.com/product.php?id=123 http://www.xyz.com/product-name-keywords.php e.g. http://www.soonerthanlater.com/calendars-supplier.php http://www.soonerthanlater.com/raffle-ticket.php - 50 - Step 2: Optimisation - Onsite 5. Page name/URL Optimisation • Use URL rewrite • Include targeted keywords as possible • Separate words with hyphens • Bit.ly hasn’t helped in using URL names but the SEs may be resolving URL shortening, especially if they purchase URL shortening sites. - 51 - Step 2: Optimisation - Onsite 6. Sitemap (HTML & XML) - 52 - Step 2: Optimisation - Onsite 6. Sitemap (HTML & XML) • Include keywords in HTML sitemap • Have HTML sitemap linkable from the main site • Generate XML site map at http://www.xml-sitemaps.com/ • Update frequently • Why ? Its dressed up as a human navigation aid, but its there to help the spiders. - 53 - Step 2: Optimisation - Onsite 7. 404/301 Redirect A 404 error means “not found” e.g. page name spelling mistake A 301 error means “moved permanently” e.g. domain name changes, content movement Ugh! How offputting is that to a (human) visitor … and to a spider it downgrades the weighting on your whole website, points to sloppy maintenance, unclenliness, etc. - 54 - Step 2: Optimisation - Onsite 8. Good Quality Content • Write as much useful content for visitors (not SE’s) as possible. • Sprinkle any text naturally with your target keywords • Text instead of image • Must be *original* content – i.e., not ‘lifted’ from another site, why ? A “turnitin”-like process is run by the SEs, they pre-calculate “find more like this”, and if the similarity is too great, that’s bad. • Update frequently • Its ironic - “as an aside, make sure your content is good” - so SEO is all about window dressing and presentation. - 55 - Step 2: Optimisation - Offsite • Submit URLs to Search Engines e.g. http://www.google.com/addurl/?continue=/addurl http://www.bing.com/webmaster/SubmitSitePage.aspx • Backlinks on Free Directory Listings e.g. DMOZ http://www.dmoz.org • Backlinks on Paid Directories e.g. Yahoo’s Directory, Business.com • This is a throwback to the first days of the web when there were no spiders, but the facility is still there. - 56 - Step 2: Optimisation - Offsite • Articles backlinks e.g. www.ezinearticles.com • Press Release backlinks e.g. www.prweb.com • Social Network Profile e.g. facebook, twitter, linked in • Backlinks from Sites with high Google Page Rank - play their game ! - 57 - Step 2: Optimisation - Offsite • 2nd Tier backlinks more Important 1st Tier - 58 - SEO Mistakes • • • • • • Targeting the wrong keywords Keywords stuffing Not fully optimise your Homepage Hidden text or hidden links Pages loaded with irrelevant words Duplicated content on multiple pages - 59 - Step 3: Ongoing Maintenance • Measuring – Use Google Analytics or statcounter.com for a different view • Monitoring – Check Rankings Periodically • Reporting – Record Rankings • Tweaking – Make Necessary Modifications or Further Optimisation as Ranking Changes - 60 - 10 SEO tips Businessinsider.com Every business with a Web site should make Search Engine Optimization -- trying to get your site as high up as possible on Google and Bing search-results pages -- a part of their growth strategy. At its most basic, "SEO" means finding ways to increase your site's appearance in web visitors' search results. This generally means more traffic to your site. While intense SEO can involve complex site restructuring with a firm (or consultant) that specializes in this area, there are a few simple steps you can take yourself to increase your search engine ranking. All it requires is a little effort, and some re-thinking of how you approach content on your site. - 61 - 1. Monitor where you stand. You won't know if your SEO efforts are working unless you monitor your search standings. MarketingVox suggests that you keep an eye on your page rank with tools like Alexa ($199)and the Google toolbar. 2. 2. Keywords. You should be conscious of placing appropriate keywords throughout every aspect of your site: your titles, content, URLs, and image names. Think about your keywords as search terms -- how would someone looking for information on this topic search for it? 3. 3. Link back to yourself There is probably no more basic strategy for SEO than the integration of internal links into your site -- it is an easy way to boost traffic to individual pages, SEO Consult says. - 62 - 4. Create a sitemap. Adding a site map -- a page listing and linking to all the other major pages on your site -- makes it easier for spiders to search your site. "The fewer clicks necessary to get to a page on your website, the better,” 5. Search-friendly URLs. Make your URLs more search-engine-friendly by naming them with clear keywords. 6. Avoid Flash. Flash might look pretty, but it does nothing for your SEO. According to the Search Engine Journal, "Frames, Flash and AJAX all share a common problem - you can’t link to a single page... Don’t use Frames at all and use Flash and AJAX sparingly for best SEO results.” - 63 - 7. Add image descriptions. Spiders can only search text, not text in your images -- which is why you need to make the words associated with your images as descriptive as possible. Start with your image names: adding an "ALT" tag allows you to include a keyword-rich description for every image on your site. 8. Fresh content. Your content needs to be fresh -updating regularly and often is crucial for increasing traffic. "The best sites for users, and consequently for search engines, are full of oft-updated, useful information about a given service, product, topic or discipline," MarketingVox explains. - 64 - 9. Use social media. A CEO blog is just one element of social media distribution, an important SEO strategy according to SEO Consult. You should be distributing links to fresh content on your site across appropriate social networking platforms. Whether displayed on your company's account, or recommended, retweeted, and re-distributed by someone else, this strategy exponentially muliplies the number of places where visitors will view your links. 10. Link to others. An easy way to direct more traffic to your site is by developing relationships with other sites. PC World suggests that you personally ask the webmasters of well-respected sites if they'll include a link to your site on theirs. Be sure to return the favor - then everyone wins! - 65 -