Monday, October 25, 2010

Microdata guided by usability testing results

About a year ago Google decided to do actual research on how usable the Microdata spec was. This blog post has very interesting details .

Ian Hickson was kind enough to point me to this post after I asked about consolidating the itemscope and itemtype attributes into item.
Here are my comments on their conclusions which you may find helpful:

Tuesday, October 19, 2010

It's official: Socialcast REACH is live

Socialcast ( is out of private beta for REACH
for which we parse Open Graph Protocol as well as HTML 5 Microdata with the Open Graph vocabulary.
All of the components are extensible and we will be adding more vocabularies in subsequent releases.
We have been working in private beta with some customers and found that microdata helped us with closed source business systems because it can be placed anywhere on the HTML page.

Side by side comparison:

Pretty easy !

Saturday, October 16, 2010

Context in the Enterprise

Today most people are overwhelmed with information. Not only is there an enormous amount of information to read on a variety of devices, a lot of this information links to other content which takes time to fetch and users end up wasting time if the content is not relevant to them.
Wasting time is something we don't want to be doing ever and guess what ? Our bosses don't want us to do that either. They want us to share and learn but they definitely don't want us to waste time.

With that goal in mind the Socialcast team decided to embark on a mission of providing more context for links which are shared in our application. A small description, picture, title and topics it covers. This practice is not new. Other consumer portals already parse html to try and extract this information.
This is a difficult job because before HTML 5, markup was not inherently semantic.

Most of the tags in HTML were only for layout.
Being the agile team that we are we didn't want to spend a lot of time trying to handle html with tags that are not properly closed and figuring which image is the most appropriate one to render, the representative image, based on size. Too complex.

So we asked what is the best technique for capturing the relevant data in a web page. Our research brought us to the following specifications

What do they have in common ?

All of these are specifications detailing how to add semantic data to your web pages.  The specifications cover the syntax and concepts and in some cases detailed vocabulary.

How many objects can you read out of an html page ?

  • RDFa : As many as you want. You can create new vocabularies and not only describe objects but also entire sentences with subject predicate and object. It also allows cross referencing objects
  • Microformats: All the ones which map to a specific microformat. 
  • Open Graph Protocol : One main object with some predefined relations. The object can have a variety of types specified by Facebook.
  • oEmbed one specified via
    link rel="alternate" type="text/xml+oembed"
  • Microdata: As many as you want and does not enforce global uniqueness or the use of namespaces for types. Objects can be defined adhoc.

What is needed to use ?

  1. RDFa: in XHtml doctype
  2. Microformats: Nothing
  3. Open Graph Protocol: Nothing but it should be same as RDFa
  4. oEmbed: Another document. So performing a separate http request.
  5. Microdata: HTML 5 doctype but doesn't break anything in practice

Initially we just wanted to extract a single object and decided to use the Open Graph Protocol. It provides a short set of rules which on one hand is great because the code to parse it is very short but on the other hand its not flexible enough as described in my earlier post where there are issues working in existing closed source Business Systems.

This is why we turned to Microdata. We have posted on our wiki a lot of details on how the parsing works. How to extend it and we have built the ability for other vocabularies to be used like Activity Streams or even the oEmbed vocabulary can be used.

So what do the users get in return ?

  • Distributed discussions through out their eco system
  • Good sources of material curated by people they trust, their colleagues.
  • Rapid deployment
  • Easy to add to business systems

Sunday, October 10, 2010

HTML5: Implementors experience with OGP and Microdata

You cannot argue the fact that the Open Graph Protocol(ogp) looks very clean and easy to understand.
Its a subset of RDFa which focuses on a single schema and a single location where this schema can be applied:
This is why we started working with it at our company. Simplicity and of course being a standard proposal signed under the OWFa agreement: Open for anyone to use.
So we have been testing interoperability on enterprise software systems. How hard is it to add these ogp meta tags ? For some systems like SugarCRM we have the source code and for some other like Sharepoint we do not.
In addition, our target audience are system administrators, not just developers so we are interested in finding the simplest solution that feels more natural in their environment.
From my experimentation I have definitely found some challenges dealing with this more closed enterprise software. No access to edit the html on the head. No access to dynamic data from the head.

So I decided to try more consumer facing products like wikis, blogs, etc hoping to encounter less resistance.
Today I was testing adding some ogp markup to this blog and found that unlike other systems I have been working with lately it allowed my to easily modify the HEAD.

I added the basic, title, type, etc. However, when it came to og:image I was a bit stuck on what to do even on such a flexible external system.
The images I want featured when I share blog entries with the world will come from each blog entry and will be carefully chosen to capture the essence of the piece.

Naturally the first solution I thought of was Microformats as these can be added in context. The problem with microformats is that they don't have a lot of common properties between the different types of objects.
So one proposal would be to add the concept of a representative image to microformats but in the essence of time I searched some more and remembered something about microdata in HTML 5.
When I first saw Microdata presented I wondered why there was another specification talking about semantics. There is already RDFa... but in taking a closer look I am quickly seeing how useful microdata will be based on its simplicity.

The beauty of microdata is that it does not concern itself with imposing a schema. It does one job and it does it well.

Schema is optional. No namespaces. Hurray ! Much more suited for doing inline representation of objects.
Looking at the Open Graph Protocol I realize its more of a schema that can easily be translated into a microdata vocabulary.

Therefore we have now shifted gears to add support in our product for parsing microdata with the open graph protocol as a vocabulary. No more need to hack in semantics. The solution is in the HTML 5 specification. In addition will be rolling our support for more vocabularies based on the needs of our users.


social coder

My photo
San Francisco, California, United States
Open Web Standards Advocate