Monday, May 17, 2010

Extending PubSubHubbub

Yesterday we met at IIW to discuss Facebook's Graph Realtime Api's use cases and why the team decided not to use the current PubSubHubbub specification(0.3). Wei Zhu from Facebook presented some additional arguments to those presented in my earlier post for why PubSubHubbub was not used:
  • Lack of topic URLs. Some notifications can only be pushed and there is no way to GET a list of them at a later time. MySpace had the same issue with the firehose. There was no url for it.
  • One other issue that needs a little bit more work in PubSubHubbub was batching. The current recommendation relies on HTTP Keep-Alives or Atom Feeds.
Here are the ideas we presented that can help solve the issue:
  1. Give every resource a (topic) URL
  2. Use OAuth 2.0 for subscription authorization
  3. Move hub discovery to the HTTP Response Headers

Attendees seem to agree that this is beneficial so we are moving forward with presenting this to the PubSubHubbub mailing list.

Here are the initial changes to the specification:
http://github.com/ciberch/pshb_oauth

I didn't want to put them in the same repo until we got feedback from the mailing list


Facebook's Realtime Updates -- Use Cases

At f8 Facebook launched a first version of real time updates for the Graph API. These updates allow consumers to subscribe to users of their application and get notified via an HTTP Post when the data has changed so they can go and fetch it by invoking the Graph API endpoint.

Some in the community were wondering why the PubSubHubbub protocol was not used and be concerned about issues like the Thundering Herd.

It comes down to these three reasons:
  1. The need for simple data modeling of any resource:
    • PubSubHubbub currently only supports Atom and RSS and we wanted to use JSON to match the rest of the Graph API so developers don't have to write additional wrappers.
    • We need to syndicate changes to any type of resource, not just feeds or lists. The changes may include updating properties of a given resource or deleting the resource altogether. PubSubHubbub only supports appending to the list.
    • The ability to do light pings where only the notification is sent. Not all the use cases require fetching the data right away but can still benefit from the notifications model as opposed to continous polling.

  2. The need for user authorization
    • We needed to let users remain in control over what data is shared and with whom based on their privacy settings.
    • Facebook is committed to authenticity and quality and there are rules to encourage this.
      So one of the main reasons why we could not use traditional PubSubHubbub is that it does not address authenticating the publishers or the subscribers to determine the quality of data being pushed in or the trust that the user has in the consumer.
    • As you may have seen, the Graph API is extremely powerful in its simplicity, flexibility and efficiency.
      To query the Graph API you just need to figure out the url of the resource you are interested in fetching and use OAuth 2.0
      Ex: https://graph.facebook.com/ciberch/feed?token=XXXX
      we wanted to use a similar elegant approach for subscribing to notifications

  3. The need for a more efficient content propagation architecture.
    • For those of you who are not familiar with the term, PubSubHubbub is a open protocol which allows exchange of news feeds in real time by POSTing changes to subscribers as they occur (this methodology is called web hooks). One of the main goals of PubSubHubbub is to allow the syndication of public feeds to any party. It works best when the same content is requested by a multitude of subscribers. For example CNN's feed would benefit from publishing to a hub that can help them service all their consumers. The publisher and the hub do not know or worry about how the information will be resyndicated. PubSubHubbub is built with small publishers and large hubs in mind which allow publishers to fan out. It is definitely a far superior to consumers polling publishers directly.

In contrast to the CNN news example above, Facebook has a more personal relationship with their users.

Here is an example to illustrate the challenge we would face trying to use PubSubHubbub. We have a user *Tim* wanting to share content with to a subset of people and applications and 2 applications: Sports Club and Restaurant Rating Site. The same content can't be sent to both applications because it would violate Tim's privacy.



So what we needed was to provide a simple way for external developers to keep user's data in sync taking into consideration the authorization given by users and thus we selected to use OAuth 2.0 for subscription creation and for data retrieval.


This use case, as well as existing Facebook interaction requirements, materialized in the following three needs:
  1. Need to have a decoupled notification system which allows to syndicate changes to arbitrary data.
  2. Need to only syndicate data to authenticated consumers
  3. The need for a more efficient data propagation architecture.
    • The fact that Facebook only sends notifications of the content that changed means the consumer can always fetch the up to date version from the server. No need to check timestamps in case the updates were received in the incorrect order.
    • The need for Facebook to act as a consumer aware delivery hub distributing the content of a relatively small subset of its publishers to another relatively small subset of interested consumers within a variety of multiple contexts
    • Facebook delivers personlized content to each user on each app and has a lot of users and apps. This means that it would not benefit from fanning out the same content to multiple consumers. Every user and every application gets different updates. In FB's world you see content through your social graph so everyone sees something different.

The PSHB example below does not work for Facebook because it's neither a small publisher needing to fan out nor a traditional consumer agnostic hub. Facebook is only interested in syndicating the updates from Facebook users to trusted parties.



We think that there are other platforms which may be facing similar challenges syndicating changes to arbitrarily modeled data to authenticated consumers.
Before the release of OAuth WRAP and OAuth 2.0 we had some discussions on the PubSubHubbub mailing list about using OAuth 1.0a topic url signing. Here is the proposal. This is a good and simple enhancement and now with the release of OAuth 2.0 we can use a very similar approach.

Sunday, May 16, 2010

Web Linking in JSON

This morning, I have been reading about Web Linking. This is in short a specification standardizing a common practice of making links "fat" with semantic goodness by adding attributes. There is a set of defined attributes and depending on those attributes, more attributes can be added. It's based on xml and comes in very handy for extending Atom and HTML.

Here is an example of how I can reference a related blog post:

<link rel="related"
type='text/html'
href="http://gapingvoid.com/2007/10/24/more-thoughts-on-social-objects/">

The rel attribute is very important as it defines the relation to the current element. There is in fact a registry which takes care of keeping track of all the types of relations. Examples are: me, related, alternate, via, etc. The specification also talks about how to serialize those links on the header but we are going to focus on serializing the links in JSON.

So what happens when we consider rendering feeds in JSON and have to deal with web linking ? While Xml has attributes and child elements, JSON objects only have properties. Therefore attributes and child elements in xml both map to properties in JSON. Structures with properties in JSON are objects. I bet you I am not the first one thinking how should we model an object that has a url to a human readable page in JSON? You may be tempted to simply copy xml verbatim and have an array of links. We actually did this in our Activity Streams JSON specification. It did not look very readable and clashed with our modeling of social objects. It was not clear when to model something as a fat link in a links array or as a native object.

{
"title" : "Web Linking in JSON",
"author" : {
"id": "tag:facebook:2010:0293203920",
"displayName" : "Monica Keller",
"permalinkUrl" = "http://www.facebook.com/ciberch"
},
"permalinkUrl" : "http://montrics.com/blog"
"links" : [
{
"rel" : "alternate",
"href": "http://montrics.com/blog",
"type" : "text/html"
},
{
"rel" : "author",
"href": "http://graph.facebook.com/ciberch",
"type" : "application/json"
}
]
}

Its a mismatch. This "links" is just a bag which can have a large variety of items, so why not just use the links' parent element ? Links are just objects.

Furthermore social objects are fat links with type 'text/html' because they are publicly accessible and human interactive. This is an example of why should not model social objects one way and links independently. They are the same thing. We should represent links in JSON directly as properties where the property name maps to the relationship and type.

For example to list all link rel="related" type="text/html"

{
"title" : "Web Linking in JSON",
"link" : "http://montrics.blogspot.com/2010/05/web-linking-in-json.html",
"related_html_page" : [
{"link" : "http://tools.ietf.org/html/draft-hammer-discovery-05"},
{"link" : "http://openidconnect.com/", "title" : "OpenID Connect"}
]
...
}

Keeping it simple.

Since joining Facebook I have been enlightened by the team's simplicity-first approach to pretty much everything: user interface, APIs and specifications. This vision has had substantial influence shaping external efforts as well: OAuth 2.0 and now we are seeing beginning efforts for OpenID Connect.

The OpenID Connect idea is good and simple: once you know who the user is you will have access to get details about the user in JSON. It's not surprising that JSON is the format of choice for transmitting data. It's compact, takes no effort to deserialize and reads logically.

Another the key ingredient for OpenID Connect is discovery. How do you go from a user's email address or OpenID url to knowing what endpoints to query to get the information ? Eran Hammer-Lahav has been working for several years in Discovery. His work is amazing and inspiring and is becoming the foundation of many of the specifications we use today. In short it's a set of protocols describing how machines can discover and use apis. The one gotcha was that this used to be specified all using xml based on the Web Linking specification so Eran has started drafting a proposal for doing discovery and resource description in JSON called JRD

This is what caught my attention.

Here is what this proposal would look like for JRD using simple web linking in JSON

{
"openid" : {"link" : "https://www.server.com/openid"}
"license" : {"link": "http://example.com/license"},
"lrdd" : {"template":"http://meta.example.com?uri={uri}"}
...
}

I hope this recommendation for modeling links in JSON as objects using the relation is simple and useful and thanks to John Panzer and Martin Atkins for helping shed light on this issue.

social coder

My photo
San Francisco, California, United States
Open Web Standards Advocate