Monica: 2010

Tuesday, November 2, 2010

Activity Streams 101 Session at IIW

For those new to the standard, here is a brief explanation:

Activity Streams is a way of modeling social actions which improves the performance of human beings interpreting shared information and making decisions.

The Activity Streams data structure focuses on:

Providing an up to date digest of important information narrated via the reader's social graph.
Optimizing for human consumption by utilizing multiple mediums in a clean fashion
Producing a cycle of engagement. As reactions to activities occur, those also become part of the stream

There is also a standard specification which is being developed by a variety of companies and individuals. Here are a few details

activitystrea.ms is a standard used to convey what people are doing around the web
Defines a set of concepts and vocabulary
Can be used with Atom, RSS or JSON

Here are a couple of examples

Activity Streams on Atom

Activity Streams on JSON

Latest News

Notes from Kevin Marks
Paul Tarjan from FB considers using for Facebook to auto refresh news about the objects in the graph using og:feed
OWFa Agreement is getting signed for v1

Monday, October 25, 2010

Microdata guided by usability testing results

About a year ago Google decided to do actual research on how usable the Microdata spec was. This blog post has very interesting details http://blog.whatwg.org/usability-testing-html5.

Ian Hickson was kind enough to point me to this post after I asked about consolidating the itemscope and itemtype attributes into item.

See:

http://pastie.org/1246677
Here are my comments on their conclusions which you may find helpful:

It's official: Socialcast REACH is live

Socialcast (http://www.socialcast.com/) is out of private beta for REACH
http://techcrunch.com/2010/10/19/socialcast-reach-extends-activity-streams-to-outside-business-applications/
for which we parse Open Graph Protocol as well as HTML 5 Microdata with the Open Graph vocabulary.
All of the components are extensible and we will be adding more vocabularies in subsequent releases.

We have been working in private beta with some customers and found that microdata helped us with closed source business systems because it can be placed anywhere on the HTML page.

Side by side comparison:

Pretty easy !

Saturday, October 16, 2010

Context in the Enterprise

Today most people are overwhelmed with information. Not only is there an enormous amount of information to read on a variety of devices, a lot of this information links to other content which takes time to fetch and users end up wasting time if the content is not relevant to them.
Wasting time is something we don't want to be doing ever and guess what ? Our bosses don't want us to do that either. They want us to share and learn but they definitely don't want us to waste time.

With that goal in mind the Socialcast team decided to embark on a mission of providing more context for links which are shared in our application. A small description, picture, title and topics it covers. This practice is not new. Other consumer portals already parse html to try and extract this information.
This is a difficult job because before HTML 5, markup was not inherently semantic.

Most of the tags in HTML were only for layout.
Being the agile team that we are we didn't want to spend a lot of time trying to handle html with tags that are not properly closed and figuring which image is the most appropriate one to render, the representative image, based on size. Too complex.

So we asked what is the best technique for capturing the relevant data in a web page. Our research brought us to the following specifications

RDFa : http://www.w3.org/TR/rdfa-syntax
Open Graph Protocol: http://opengraphprotocol.org/
Microdata: See previous post http://montrics.blogspot.com/2010/10/html5-implementors-experience-with-ogp.html
Microformats: http://microformats.org/wiki/Main_Page
oEmbed: http://www.oembed.com/
And we should not leave out HTML5

What do they have in common ?

All of these are specifications detailing how to add semantic data to your web pages. The specifications cover the syntax and concepts and in some cases detailed vocabulary.

How many objects can you read out of an html page ?

RDFa : As many as you want. You can create new vocabularies and not only describe objects but also entire sentences with subject predicate and object. It also allows cross referencing objects
Microformats: All the ones which map to a specific microformat.
Open Graph Protocol : One main object with some predefined relations. The object can have a variety of types specified by Facebook.

oEmbed one specified via

link rel="alternate" type="text/xml+oembed"

Microdata: As many as you want and does not enforce global uniqueness or the use of namespaces for types. Objects can be defined adhoc.
```
 
```

What is needed to use ?

RDFa: http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd in XHtml doctype
Microformats: Nothing
Open Graph Protocol: Nothing but it should be same as RDFa
oEmbed: Another document. So performing a separate http request.
Microdata: HTML 5 doctype but doesn't break anything in practice

Initially we just wanted to extract a single object and decided to use the Open Graph Protocol. It provides a short set of rules which on one hand is great because the code to parse it is very short but on the other hand its not flexible enough as described in my earlier post where there are issues working in existing closed source Business Systems.

This is why we turned to Microdata. We have posted on our wiki a lot of details on how the parsing works. How to extend it and we have built the ability for other vocabularies to be used like Activity Streams or even the oEmbed vocabulary can be used.

So what do the users get in return ?

Distributed discussions through out their eco system
Good sources of material curated by people they trust, their colleagues.
Rapid deployment
Easy to add to business systems

Sunday, October 10, 2010

HTML5: Implementors experience with OGP and Microdata

You cannot argue the fact that the Open Graph Protocol(ogp) looks very clean and easy to understand.
Its a subset of RDFa which focuses on a single schema and a single location where this schema can be applied:
This is why we started working with it at our company. Simplicity and of course being a standard proposal signed under the OWFa agreement: Open for anyone to use.
So we have been testing interoperability on enterprise software systems. How hard is it to add these ogp meta tags ? For some systems like SugarCRM we have the source code and for some other like Sharepoint we do not.
In addition, our target audience are system administrators, not just developers so we are interested in finding the simplest solution that feels more natural in their environment.
From my experimentation I have definitely found some challenges dealing with this more closed enterprise software. No access to edit the html on the head. No access to dynamic data from the head.

So I decided to try more consumer facing products like wikis, blogs, etc hoping to encounter less resistance.
Today I was testing adding some ogp markup to this blog and found that unlike other systems I have been working with lately it allowed my to easily modify the HEAD.

I added the basic, title, type, etc. However, when it came to og:image I was a bit stuck on what to do even on such a flexible external system.
The images I want featured when I share blog entries with the world will come from each blog entry and will be carefully chosen to capture the essence of the piece.

Naturally the first solution I thought of was Microformats as these can be added in context. The problem with microformats is that they don't have a lot of common properties between the different types of objects.
So one proposal would be to add the concept of a representative image to microformats but in the essence of time I searched some more and remembered something about microdata in HTML 5.
When I first saw Microdata presented I wondered why there was another specification talking about semantics. There is already RDFa... but in taking a closer look I am quickly seeing how useful microdata will be based on its simplicity.

The beauty of microdata is that it does not concern itself with imposing a schema. It does one job and it does it well.

Schema is optional. No namespaces. Hurray ! Much more suited for doing inline representation of objects.
Looking at the Open Graph Protocol I realize its more of a schema that can easily be translated into a microdata vocabulary.

Therefore we have now shifted gears to add support in our product for parsing microdata with the open graph protocol as a vocabulary. No more need to hack in semantics. The solution is in the HTML 5 specification. In addition will be rolling our support for more vocabularies based on the needs of our users.

Research:

Monday, May 17, 2010

Extending PubSubHubbub

Yesterday we met at IIW to discuss Facebook's Graph Realtime Api's use cases and why the team decided not to use the current PubSubHubbub specification(0.3). Wei Zhu from Facebook presented some additional arguments to those presented in my earlier post for why PubSubHubbub was not used:

Lack of topic URLs. Some notifications can only be pushed and there is no way to GET a list of them at a later time. MySpace had the same issue with the firehose. There was no url for it.
One other issue that needs a little bit more work in PubSubHubbub was batching. The current recommendation relies on HTTP Keep-Alives or Atom Feeds.

Here are the ideas we presented that can help solve the issue:

Give every resource a (topic) URL
Use OAuth 2.0 for subscription authorization
Move hub discovery to the HTTP Response Headers

Attendees seem to agree that this is beneficial so we are moving forward with presenting this to the PubSubHubbub mailing list.

Here are the initial changes to the specification:
http://github.com/ciberch/pshb_oauth

I didn't want to put them in the same repo until we got feedback from the mailing list

Facebook's Realtime Updates -- Use Cases

At f8 Facebook launched a first version of real time updates for the Graph API. These updates allow consumers to subscribe to users of their application and get notified via an HTTP Post when the data has changed so they can go and fetch it by invoking the Graph API endpoint.

Some in the community were wondering why the PubSubHubbub protocol was not used and be concerned about issues like the Thundering Herd.

It comes down to these three reasons:

The need for simple data modeling of any resource:
- PubSubHubbub currently only supports Atom and RSS and we wanted to use JSON to match the rest of the Graph API so developers don't have to write additional wrappers.
- We need to syndicate changes to any type of resource, not just feeds or lists. The changes may include updating properties of a given resource or deleting the resource altogether. PubSubHubbub only supports appending to the list.
- The ability to do light pings where only the notification is sent. Not all the use cases require fetching the data right away but can still benefit from the notifications model as opposed to continous polling.
The need for user authorization
- We needed to let users remain in control over what data is shared and with whom based on their privacy settings.
- Facebook is committed to authenticity and quality and there are rules to encourage this.
  So one of the main reasons why we could not use traditional PubSubHubbub is that it does not address authenticating the publishers or the subscribers to determine the quality of data being pushed in or the trust that the user has in the consumer.
- As you may have seen, the Graph API is extremely powerful in its simplicity, flexibility and efficiency.
  To query the Graph API you just need to figure out the url of the resource you are interested in fetching and use OAuth 2.0
  Ex: https://graph.facebook.com/ciberch/feed?token=XXXX we wanted to use a similar elegant approach for subscribing to notifications

The need for a more efficient content propagation architecture.
- For those of you who are not familiar with the term, PubSubHubbub is a open protocol which allows exchange of news feeds in real time by POSTing changes to subscribers as they occur (this methodology is called web hooks). One of the main goals of PubSubHubbub is to allow the syndication of public feeds to any party. It works best when the same content is requested by a multitude of subscribers. For example CNN's feed would benefit from publishing to a hub that can help them service all their consumers. The publisher and the hub do not know or worry about how the information will be resyndicated. PubSubHubbub is built with small publishers and large hubs in mind which allow publishers to fan out. It is definitely a far superior to consumers polling publishers directly.

In contrast to the CNN news example above, Facebook has a more personal relationship with their users.

Here is an example to illustrate the challenge we would face trying to use PubSubHubbub. We have a user *Tim* wanting to share content with to a subset of people and applications and 2 applications: Sports Club and Restaurant Rating Site. The same content can't be sent to both applications because it would violate Tim's privacy.

So what we needed was to provide a simple way for external developers to keep user's data in sync taking into consideration the authorization given by users and thus we selected to use OAuth 2.0 for subscription creation and for data retrieval.

This use case, as well as existing Facebook interaction requirements, materialized in the following three needs:

Need to have a decoupled notification system which allows to syndicate changes to arbitrary data.
Need to only syndicate data to authenticated consumers
The need for a more efficient data propagation architecture.

The fact that Facebook only sends notifications of the content that changed means the consumer can always fetch the up to date version from the server. No need to check timestamps in case the updates were received in the incorrect order.
The need for Facebook to act as a consumer aware delivery hub distributing the content of a relatively small subset of its publishers to another relatively small subset of interested consumers within a variety of multiple contexts

Facebook delivers personlized content to each user on each app and has a lot of users and apps. This means that it would not benefit from fanning out the same content to multiple consumers. Every user and every application gets different updates. In FB's world you see content through your social graph so everyone sees something different.

The PSHB example below does not work for Facebook because it's neither a small publisher needing to fan out nor a traditional consumer agnostic hub. Facebook is only interested in syndicating the updates from Facebook users to trusted parties.

We think that there are other platforms which may be facing similar challenges syndicating changes to arbitrarily modeled data to authenticated consumers.
Before the release of OAuth WRAP and OAuth 2.0 we had some discussions on the PubSubHubbub mailing list about using OAuth 1.0a topic url signing. Here is the proposal. This is a good and simple enhancement and now with the release of OAuth 2.0 we can use a very similar approach.

Sunday, May 16, 2010

Web Linking in JSON

This morning, I have been reading about Web Linking. This is in short a specification standardizing a common practice of making links "fat" with semantic goodness by adding attributes. There is a set of defined attributes and depending on those attributes, more attributes can be added. It's based on xml and comes in very handy for extending Atom and HTML.

Here is an example of how I can reference a related blog post:

<link rel="related"
type='text/html'
href="http://gapingvoid.com/2007/10/24/more-thoughts-on-social-objects/">

The rel attribute is very important as it defines the relation to the current element. There is in fact a registry which takes care of keeping track of all the types of relations. Examples are: me, related, alternate, via, etc. The specification also talks about how to serialize those links on the header but we are going to focus on serializing the links in JSON.

So what happens when we consider rendering feeds in JSON and have to deal with web linking ? While Xml has attributes and child elements, JSON objects only have properties. Therefore attributes and child elements in xml both map to properties in JSON. Structures with properties in JSON are objects. I bet you I am not the first one thinking how should we model an object that has a url to a human readable page in JSON? You may be tempted to simply copy xml verbatim and have an array of links. We actually did this in our Activity Streams JSON specification. It did not look very readable and clashed with our modeling of social objects. It was not clear when to model something as a fat link in a links array or as a native object.

{
"title" : "Web Linking in JSON",
"author" : {
"id": "tag:facebook:2010:0293203920",
"displayName" : "Monica Keller",
"permalinkUrl" = "http://www.facebook.com/ciberch"
},
"permalinkUrl" : "http://montrics.com/blog"
"links" : [
{
  "rel" : "alternate",
  "href": "http://montrics.com/blog",
  "type" : "text/html"
},
{
  "rel" : "author",
  "href": "http://graph.facebook.com/ciberch",
  "type" : "application/json"
}
]
}

Its a mismatch. This "links" is just a bag which can have a large variety of items, so why not just use the links' parent element ? Links are just objects.

Furthermore social objects are fat links with type 'text/html' because they are publicly accessible and human interactive. This is an example of why should not model social objects one way and links independently. They are the same thing. We should represent links in JSON directly as properties where the property name maps to the relationship and type.

For example to list all link rel="related" type="text/html"

{
"title" : "Web Linking in JSON",
"link" : "http://montrics.blogspot.com/2010/05/web-linking-in-json.html",
"related_html_page" : [
{"link" : "http://tools.ietf.org/html/draft-hammer-discovery-05"},
{"link" : "http://openidconnect.com/", "title" : "OpenID Connect"}
]
...
}

Keeping it simple.

Since joining Facebook I have been enlightened by the team's simplicity-first approach to pretty much everything: user interface, APIs and specifications. This vision has had substantial influence shaping external efforts as well: OAuth 2.0 and now we are seeing beginning efforts for OpenID Connect.

The OpenID Connect idea is good and simple: once you know who the user is you will have access to get details about the user in JSON. It's not surprising that JSON is the format of choice for transmitting data. It's compact, takes no effort to deserialize and reads logically.

Another the key ingredient for OpenID Connect is discovery. How do you go from a user's email address or OpenID url to knowing what endpoints to query to get the information ? Eran Hammer-Lahav has been working for several years in Discovery. His work is amazing and inspiring and is becoming the foundation of many of the specifications we use today. In short it's a set of protocols describing how machines can discover and use apis. The one gotcha was that this used to be specified all using xml based on the Web Linking specification so Eran has started drafting a proposal for doing discovery and resource description in JSON called JRD

This is what caught my attention.

Here is what this proposal would look like for JRD using simple web linking in JSON

{
"openid" : {"link" : "https://www.server.com/openid"}
"license" : {"link": "http://example.com/license"},
"lrdd" : {"template":"http://meta.example.com?uri={uri}"}
...
}

I hope this recommendation for modeling links in JSON as objects using the relation is simple and useful and thanks to John Panzer and Martin Atkins for helping shed light on this issue.

Friday, February 12, 2010

This is the story of a girl....

Who came all the way from Ohio to work at MySpace. Starry eyed at the thoughts of millions of concurrent users leveraging software she built, she packed her bags and moved across the country in a jiffy. It was not disappointing.

MySpace was filled with color and character. There were many, many faces all over the amazing complex in Beverly Hills. There were crowded corridors filled with pictures of MySpace Secret Shows and meeting rooms with people excited doing collaborative design. This girl and her friends built the Activity Stream at MySpace and soon realized that it was essential for the stream to flow outside the walls for it to stay alive and thus she embarked on a quest to find a sensible way to exchange this valuable information about users at MySpace. In this quest she found new friends and realized that she truly identified with the values that they were fighting for: letting the user be in control, opening up the walled garden and allowing anyone big or small to have the same opportunities by using open standards.

As you may have guessed, that girl, woman actually is me :) And today is my last day at MySpace. I am filled with nostalgia but excited about the future and pursuing my dreams.

You may be surprised since I have been doing a series of conferences. As Group Architect I was able to not only work with the Activity Stream team but also on the Developer Platform with the backing of the COO, I was able to have my ideas heard and executed. It was these projects which provided massive openness of the user’s MySpace data via Open Standards like OpenID, oAuth, ActivityStrea.ms and PubSubHubbub that filled me with joy because of all the possibilities we provided for other people to be creative.

But I have chosen to leave. While I was able to have some temporary creative freedom this is not the norm or part of what other engineers enjoy and I do not feel there is one cohesive push to deliver the best we can deliver anymore.

To my friends and colleagues at MySpace, some parting advice:
It is imperative that MySpace puts in place strong technical leadership who can attract good technical talent and make well-informed decisions. It is important that they stay connected to rest of the world and work on interoperable standards and solid products which benefit the end user. Many of my fellow engineers have fantastic ideas and a plan for phased delivery.

I wish them the best of luck and I am sure we will cross paths and work together.

If everything goes as planned, I will also be working with more of you in the community and helping showcase and build upon one of the most incredible social products I have ever seen.

Yes that is right ! I am happy to announce that I have decided to join Facebook as an Open Source and Web Standards Program Manager.

I will be working closely with David Recordon, Luke Shepard and another fantastic group of people.

This is going to be a great year. Get ready !