Wednesday, March 25, 2015

Microsoft 70-486: Plan for search engine optimization and accessibility

Exam Objectives


Use analytical tools to parse HTML, view and evaluate conceptual structure by using plugs-in for browsers, write semantic markup (HTML5 and ARIA) for accessibility (for example, screen readers)

Quick Overview of Training Materials



Before we can examine analysis tools for SEO, I think it's important to ask the question: What is Search Engine Optimization (SEO)?  SEO is the practice of writing our webpage in such a way that makes it easy for search engine crawlers to digest and analyze the page, and ultimately improve the page rank; i.e. improve the natural search position of our page in search results.

SEO Concepts


When web crawlers analyze a webpage, they look at several elements for data regarding the content of your site (source):
  • <title>
  • <meta name="description">
  • <meta name="keywords">
  • headings <h1> thru <h6>
  • links <a>
  • to a lesser extent, content text
Writing well formed mark up enables crawlers to determine the structure of your page and analyze the content for relevant search terms.  The title tag gives the crawler information about the overall content of the page, as well as displaying information about the page to potential visitors using the search engine, and is the single most inportant tag on the page.  Headings (<h[1-6]> tags) describe the structure of the document and summarize the contents.  The <meta> tags give search engines additional information about the site ( "description" is displayed in the search results and "keywords" list relevant keywords).

Sites listed in the robots.txt file are ignored, as are pages with the noindex attribute and links with the nofollow attribute.  Search results are weighted based on where search terms appear (in the title vs in a paragraph) and the frequency with which terms appear.  "Authority" also determines where your site will appear and is determined by the number of incoming links and the frequency with which your site is selected by search engine users.  Using title, meta, and heading tags effectively can improve search visibility for a site.

Guidelines for SEO


  • Pages should:
    • use well formed HTML or XHTML (e.g. no missing closing tags)
    • contain <html> and <body> tags
    • contain a <noscript> tag for each <script> tag.  Search engines will not index <script> tags, so putting search relevant content in a corresponding <noscript> tag ensures it is considered.
    • avoid <meta http-equiv="refresh"> for redirect.  Use an http redirect, or at least add a "content" attribute with a timeout, like so: <meta http-equiv="refresh" content="5,href">
    • body tag should contain less than 100KB of content. Content longer than this may be truncated by indexer.
    • use <meta name="robots"> tag correctly.  Guide.
    • <iframe> should have relevant text in the inner content area.
  • Title tag should:
    • not contain placeholder information (i.e. it should not be generated)
    • be unique from other pages
    • not be empty, but less than 65 characters.
    • be the only one on the page
    • located in the <head> tag
    • be different from the <meta name="description"> tag
  • <meta name="description"> should:
    • exist, once and only once, on every page in the <head> tag
    • be unique
    • not be empty but be no longer than 150 characters (longer and the page may be flagged for deceptive practices)
  • <meta name="keywords"> should:
    • exist, once and only once, on every page in the <head> tag
    • not be empty; less than 874 characters (longer and you may be flagged for "keyword stuffing")
    • should not contain more than one instance of the same keyword
  • headings <h1> thru <h6>:
    • each page should have one and only one <h1>
    • <h1> tag should be unique for each page
  • <img> tags should:
    • contain a descriptive "alt" attribute.  Search engines use the alt attribute to interpret the content of images
    • the alt attribute should not start with copyright or ©
    • alt should contain fewer than 150 words (to avoid being flagged as deceptive).
  • hyperlinks <a> should:
    • contain a valid top level domain
    • not contain generic text such as "link" or "click here"
    • not contain carriage return in the href
    • not contain a nofollow attribute if it is relevant to the site
    • not include a session id (links pointed to the same page with different session ids will be analyzed as seperate links)
    • not end in ampersand for a page without a query string
    • contain no more than three query string parameters (any more and the crawler may ignore the link)
    • page should only have one canonical link i.e. <link rel="canonical" href="url">

SEO Analysis Tools

The Microsoft tool for check search engine optimization is the SEO Toolkit, which is freely downloadable from Microsoft.  This installation guide was useful for getting up and running.  I ran it against the ancient Space Jam promo website and was not surprised by the awful results.

Warner Bros. Space Jam website had 1946 violations
But if that looks bad, I couldn't believe what happened when I ran the analysis on the ESPN website... The first run went on for maybe 12 minutes before I killed it (after it had downloaded 500mb of website data), from nearly 7000 urls... yikes!  I ran it again with the "store copies of analyzed Web pages locally" option turned off hoping that would help.  It didn't, but the results (after hitting the 20,000 url maximum, 35 minutes later...) were astounding... well over 350,000 violations, and that was only a limited sample of the site:

Just before it called it quits...


Results of the ESPN analysis... wow... not that ESPN needs to care about SEO anyway...

One way to get SEO information at your fingertips is with a browser plugin.  Plenty are available, I played with META SEO Inspector for Google Chrome.  This plugin displays information from <meta> tags, and will highlight "nofollow" links.  It also links to a number of online SEO tools for page rank, link analysis, keyword density, etc.  Here were the results for ESPN:

The META SEO Inspector plug-in for Chrome
META SEO Inspector links to many other SEO resources



Writing Accessible Markup


The MSDN article and the Primer from W3C are both excellent resources for getting the basics of ARIA.  I'm not going to completely rehash all the content from those places, so I'll just try to summarize it quickly.

 ARIA (Accessible Rich Internet Applications) aims to fill a gap in accessibility that currently exists in the world of online content.  Whereas modern operating systems have accessibility APIs to enable sight, hearing, and other impaired individuals to use native applications, no such interface has existed to enable these users to easily navigate and use web pages that makes extensive use of dynamic HTML and AJAX.  This is where ARIA comes in, by exposing additional attributes withing the markup that enables easier navigation, notification of updates, and information about element state.

ARIA uses three types of roles to identify navigational landmarks, document structure, and widgets.  These roles are assigned by adding the role="" attribute.  ARIA properties and states are attributes that represent additional information about a given element; these attributes follow the pattern aria-xxxx="" where xxxx is the state or property.

One common theme in the ARIA literature is that well formed, semantic HTML is more accessible, but many of the principles for making sites more accessible overlap with concepts from SEO. 

1 comment:

  1. To the spammers:
    My comments section does not display links, and I delete spam immediately. So posting your "Ermagerd your blog so good best content ever! <link> <link> <link>" comments is pointless.

    So do us both a favor and knock it off already, geez...

    ReplyDelete