Document Structure in HTML and its Relation to SEO

Jordan Sandford - June 9, 2008

HTML is one of the core technologies of our search engine optimization and search engine marketing worlds. Search engines eat and live HTML documents. Computer programs, including search engines also create HTML. For the vast majority of searches ran on search engines, the search engines analyze the data pulled directly from and only from your HTML documents. Let’s look at some HTML documents through the eyes of a search engine to get a general understanding of how HTML documents work and how search engines use them. Before we do, I’ll provide a simple explanation of what HTML is.

Generally, all web pages are HTML documents. Some of the main characteristics of HTML documents are that they are plain-text files written in a language called HTML (HTML stands for HyperText Markup Language). The HTML commands in the text file are interpreted by your browser to appropriately render color, text formatting (such as underlining words and font size), embedded images and interactivity, but HTML code itself cannot contain anything except letters, numbers and punctuation, etc. It is made to be readable and understandable by both humans and computers. It has a standardized structure so that any human or computer program should be able to read two different HTML documents and extract semantics (or meaning) from them the same way. Standardization also allows two computer programs to read the same document and extract the same semantics from them as well as to visually render them the same, given the same set of rendering rules. HTML documents will usually just contain the content and the structure–rendering rules are stored in another document.

If you think about a traditional letter, there are specific parts of the letter, like the addressee’s name and address, date, salutation, and opening, body, and closing paragraphs. If you were to tell a search engine to index or read your letter exactly as you see it, the engine would not be able to differentiate say, the date with the salutation. For example, if you were to ask the search engine what the person’s name is to whom the letter is written, it might unknowingly include the salutation line with the first body paragraph and not be able to give you an answer. You need to be able to tell the search engine where the salutation line starts and ends and where the individual body paragraphs start and end.

Imagine you were writing your letter with a word processor and there was a key titled “Salutation Line.” You would press the key before typing “Dear Mr. Jacobs,” and once more after the salutation. Essentially, you tagged a part of the text using hidden marks to denote where the salutation line starts and stops. You would do a similar thing for headlines, underlined text, body paragraphs, addresses, dates and maybe even a picture place holder nested inside a body paragraph, all with appropriate tags. Now, the document has structure. If a human, a browser, or a search engine were to read the underlying tags, they would be able to respond correctly when asked, “when was the letter written” or “show me the picture associated with the second body paragraph” and so on. The search engines’ ability to create meaning from and associate parts of the content assists with their ability to tell if a given keyword or key phrase is relevant to the web page.

The structure of an HTML document starts with information about the version and style of the mark-up language. This helps the non-human reader know how to interpret which the tag types it encounters, which may include ignoring a tag completely. After the HTML version information, the general structure then continues with the first HTML tag which contains the two main HTML sections, head and body tags, also delimited using their own tags. The head section contains information about the body of the document as well as further instructions for non-human readers. Our example letter above does not have a head tag, but if it did, we could include a synopsis of the letter using a meta tag. A human reader, if given the choice, would probably read the synopsis first in order to decide if they want to read the entire letter. Incorporating this same idea, the search engines use the meta tags and place a respectable level of importance on them.

If unclosed tags or incorrectly-nested tags occur anywhere in the document, non-human readers could also become confused about the meaning and create invalid associations. A process known as “validating HTML” helps to assure these errors don’t exist in your document and should be a part of all SEO efforts. You can validate your HTML using the W3C Markup Validation Service. You should also make sure to properly use HTML tags in your documents as well as optimize content inside these tags. Doing so provides structure to your document and helps the search engines do their job: read the HTML documents similar to how a human would, so that it can answer questions about them solicited from a human.

© 2023 MoreVisibility. All rights reserved.