http://pastie.org/1246677
Here are my comments on their conclusions which you may find helpful:
Today most people are overwhelmed with information. Not only is there an enormous amount of information to read on a variety of devices, a lot of this information links to other content which takes time to fetch and users end up wasting time if the content is not relevant to them.
Wasting time is something we don't want to be doing ever and guess what ? Our bosses don't want us to do that either. They want us to share and learn but they definitely don't want us to waste time.
With that goal in mind the Socialcast team decided to embark on a mission of providing more context for links which are shared in our application. A small description, picture, title and topics it covers. This practice is not new. Other consumer portals already parse html to try and extract this information.
This is a difficult job because before HTML 5, markup was not inherently semantic.
Most of the tags in HTML were only for layout.
Being the agile team that we are we didn't want to spend a lot of time trying to handle html with tags that are not properly closed and figuring which image is the most appropriate one to render, the representative image, based on size. Too complex.
So we asked what is the best technique for capturing the relevant data in a web page. Our research brought us to the following specifications
All of these are specifications detailing how to add semantic data to your web pages. The specifications cover the syntax and concepts and in some cases detailed vocabulary.
link rel="alternate" type="text/xml+oembed"
Initially we just wanted to extract a single object and decided to use the Open Graph Protocol. It provides a short set of rules which on one hand is great because the code to parse it is very short but on the other hand its not flexible enough as described in my earlier post where there are issues working in existing closed source Business Systems.
This is why we turned to Microdata. We have posted on our wiki a lot of details on how the parsing works. How to extend it and we have built the ability for other vocabularies to be used like Activity Streams or even the oEmbed vocabulary can be used.
So I decided to try more consumer facing products like wikis, blogs, etc hoping to encounter less resistance.
Today I was testing adding some ogp markup to this blog and found that unlike other systems I have been working with lately it allowed my to easily modify the HEAD.