Tuesday, October 4, 2011

User Avatars on Cloud Foundry

Most web applications today have the need to upload and serve user generated images such as profile pictures, photos or video thumbnails. If you are using Ruby on Rails there are a couple great frameworks you can use: Namely PaperClip and CarrierWave.

After reviewing both libraries I went with CarrierWave for this project as it seemed the least obtrusive and most flexible. CarrierWave can be used with the file system, Amazon S3 and a database including Mongo GridFS. Since my project is hosted on Cloud Foundry and that gives me free access to install Mongo and bind it to my Application so I decided to try that option.
The first task for which I wanted to add images was when users registered on my app using their Facebook account. Here are the steps you can take to support uploading and serving images. For more details on the Facebook integration review the user.rb model in the source code.

Steps on your terminal

This assumes you already have a Ruby on Rails 3.0 application on Cloud Foundry with a Users model
# Log in to cloud foundry if you are not logged in
vmc login youremail@website.com

vmc create-service mongodb

# See what the newly created mongo service is called
vmc services

# Bind the service to your existing Application
vmc bind-service mongodb-???? appname

Steps on your Code base

1- Add gems to your Gemfile

gem 'carrierwave'
gem 'carrierwave-mongoid', :require => "carrierwave/mongoid"

2- Install CarrierWave for your Model

rails generate uploader Avatar

3 - Edit the generated file

to contain:
class AvatarUploader < CarrierWave::Uploader::Base

          # Choose what kind of storage to use for this uploader:
          storage :grid_fs

          # Override the directory where uploaded files will be stored.
          # This is a sensible default for uploaders that are meant to be mounted:
          def store_dir

          # Provide a default URL as a default if there hasn't been a file uploaded:
          def default_url
              "/images/fallback/" + [version_name, "default.png"].compact.join('_')

4- Update your ActiveRecord model to store the avatar

Make sure you are loading CarrierWave after loading your ORM, otherwise you'll need to require the relevant extension manually, e.g.:
require 'carrierwave/orm/activerecord'
Add a string column to the model you want to mount the uploader on:
add_column :users, :avatar, :string
Open your model file and mount the uploader:
class User
  mount_uploader :avatar, AvatarUploader

  # Make sure that the avatar is accessible
  attr_accessible :avatar, :remote_avatar_url, :email, :password, :password_confirmation, :remember_me, :first_name, :last_name, :display_name, :username ...


5 - Create an initializer for Mongoid to use your Mongo DB instance on Cloud Foundry

  • Name it 01_mongoid.rb so it runs before everything else
Mongoid.configure do |config|
  conn_info = nil

    services = JSON.parse(ENV['VCAP_SERVICES'])
    services.each do |service_version, bindings|
      bindings.each do |binding|
        if binding['label'] =~ /mongo/i
          conn_info = binding['credentials']
    raise "could not find connection info for mongo" unless conn_info
    conn_info = {'hostname' => 'localhost', 'port' => 27017}

  cnx = Mongo::Connection.new(conn_info['hostname'], conn_info['port'], :pool_size => 5, :timeout => 5)
  db = cnx['db']
  if conn_info['username'] and conn_info['password']
    db.authenticate(conn_info['username'], conn_info['password'])

  config.master = db

6 - Update your CarrierWave Initializer to use the Cloud Foundry Mongo DB

require 'serve_gridfs_image'

CarrierWave.configure do |config|
  config.storage = :grid_fs
  config.grid_fs_connection = Mongoid.database

  # Storage access url
  config.grid_fs_access_url = "/grid"

7- Handle requests for the images in lib/serve_gridfs_image.rb

class ServeGridfsImage
  def initialize(app)
      @app = app

  def call(env)
    if env["PATH_INFO"] =~ /^\/grid\/(.+)$/
      process_request(env, $1)

  def process_request(env, key)
      Mongo::GridFileSystem.new(Mongoid.database).open(key, 'r') do |file|
        [200, { 'Content-Type' => file.content_type }, [file.read]]
      [404, { 'Content-Type' => 'text/plain' }, ['File not found.']]

Step 8 - Deploy !

bundle install
bundle package
vmc update app_name


This will give you the ability to upload and serve images. Do note that this will not provide image resizing. If you are using devise for example you can import the avatar(profile picture) of the user when they sign up.
class << self
    def new_with_session(params, session)
      super.tap do |user|
        if session['devise.omniauth_info']
          if data = session['devise.omniauth_info']['user_info']
            user.display_name = data['name'] if data.has_key? 'name'
            user.email = data['email']
            user.username = data['nickname'] if data.has_key? 'nickname'
            user.first_name = data['first_name'] if data.has_key? 'first_name'
            user.last_name = data['last_name'] if data.has_key? 'last_name'
            user.remote_avatar_url = data['image'] if data.has_key? 'image'


Tuesday, November 2, 2010

Activity Streams 101 Session at IIW

For those new to the standard, here is a brief explanation:
Activity Streams is a way of modeling social actions which improves the performance of human beings interpreting shared information and making decisions.

The Activity Streams data structure focuses on:

  1. Providing an up to date digest of important information narrated via the reader's social graph.
  2. Optimizing for human consumption by utilizing multiple mediums in a clean fashion
  3. Producing a cycle of engagement. As reactions to activities occur, those also become part of the stream 
There is also a standard specification which is being developed by a variety of companies and individuals. Here are a few details

  1. activitystrea.ms is a standard used to convey what people are doing around the web
  2. Defines a set of concepts and vocabulary
  3. Can be used with Atom, RSS or JSON
Here are a couple of examples

Activity Streams on Atom

Activity Streams on JSON

Latest News

  • Notes from Kevin Marks
  • Paul Tarjan from FB considers using for Facebook to auto refresh news about the objects in the graph using og:feed
  • OWFa Agreement is getting signed for v1

Monday, October 25, 2010

Microdata guided by usability testing results

About a year ago Google decided to do actual research on how usable the Microdata spec was. This blog post has very interesting details .

Ian Hickson was kind enough to point me to this post after I asked about consolidating the itemscope and itemtype attributes into item.
Here are my comments on their conclusions which you may find helpful:

Tuesday, October 19, 2010

It's official: Socialcast REACH is live

Socialcast (http://www.socialcast.com/) is out of private beta for REACH
for which we parse Open Graph Protocol as well as HTML 5 Microdata with the Open Graph vocabulary.
All of the components are extensible and we will be adding more vocabularies in subsequent releases.
We have been working in private beta with some customers and found that microdata helped us with closed source business systems because it can be placed anywhere on the HTML page.

Side by side comparison:

Pretty easy !

Saturday, October 16, 2010

Context in the Enterprise

Today most people are overwhelmed with information. Not only is there an enormous amount of information to read on a variety of devices, a lot of this information links to other content which takes time to fetch and users end up wasting time if the content is not relevant to them.
Wasting time is something we don't want to be doing ever and guess what ? Our bosses don't want us to do that either. They want us to share and learn but they definitely don't want us to waste time.

With that goal in mind the Socialcast team decided to embark on a mission of providing more context for links which are shared in our application. A small description, picture, title and topics it covers. This practice is not new. Other consumer portals already parse html to try and extract this information.
This is a difficult job because before HTML 5, markup was not inherently semantic.

Most of the tags in HTML were only for layout.
Being the agile team that we are we didn't want to spend a lot of time trying to handle html with tags that are not properly closed and figuring which image is the most appropriate one to render, the representative image, based on size. Too complex.

So we asked what is the best technique for capturing the relevant data in a web page. Our research brought us to the following specifications

What do they have in common ?

All of these are specifications detailing how to add semantic data to your web pages.  The specifications cover the syntax and concepts and in some cases detailed vocabulary.

How many objects can you read out of an html page ?

  • RDFa : As many as you want. You can create new vocabularies and not only describe objects but also entire sentences with subject predicate and object. It also allows cross referencing objects
  • Microformats: All the ones which map to a specific microformat. 
  • Open Graph Protocol : One main object with some predefined relations. The object can have a variety of types specified by Facebook.
  • oEmbed one specified via
    link rel="alternate" type="text/xml+oembed"
  • Microdata: As many as you want and does not enforce global uniqueness or the use of namespaces for types. Objects can be defined adhoc.

What is needed to use ?

  1. RDFa: http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd in XHtml doctype
  2. Microformats: Nothing
  3. Open Graph Protocol: Nothing but it should be same as RDFa
  4. oEmbed: Another document. So performing a separate http request.
  5. Microdata: HTML 5 doctype but doesn't break anything in practice

Initially we just wanted to extract a single object and decided to use the Open Graph Protocol. It provides a short set of rules which on one hand is great because the code to parse it is very short but on the other hand its not flexible enough as described in my earlier post where there are issues working in existing closed source Business Systems.

This is why we turned to Microdata. We have posted on our wiki a lot of details on how the parsing works. How to extend it and we have built the ability for other vocabularies to be used like Activity Streams or even the oEmbed vocabulary can be used.

So what do the users get in return ?

  • Distributed discussions through out their eco system
  • Good sources of material curated by people they trust, their colleagues.
  • Rapid deployment
  • Easy to add to business systems

Sunday, October 10, 2010

HTML5: Implementors experience with OGP and Microdata

You cannot argue the fact that the Open Graph Protocol(ogp) looks very clean and easy to understand.
Its a subset of RDFa which focuses on a single schema and a single location where this schema can be applied:
This is why we started working with it at our company. Simplicity and of course being a standard proposal signed under the OWFa agreement: Open for anyone to use.
So we have been testing interoperability on enterprise software systems. How hard is it to add these ogp meta tags ? For some systems like SugarCRM we have the source code and for some other like Sharepoint we do not.
In addition, our target audience are system administrators, not just developers so we are interested in finding the simplest solution that feels more natural in their environment.
From my experimentation I have definitely found some challenges dealing with this more closed enterprise software. No access to edit the html on the head. No access to dynamic data from the head.

So I decided to try more consumer facing products like wikis, blogs, etc hoping to encounter less resistance.
Today I was testing adding some ogp markup to this blog and found that unlike other systems I have been working with lately it allowed my to easily modify the HEAD.

I added the basic, title, type, etc. However, when it came to og:image I was a bit stuck on what to do even on such a flexible external system.
The images I want featured when I share blog entries with the world will come from each blog entry and will be carefully chosen to capture the essence of the piece.

Naturally the first solution I thought of was Microformats as these can be added in context. The problem with microformats is that they don't have a lot of common properties between the different types of objects.
So one proposal would be to add the concept of a representative image to microformats but in the essence of time I searched some more and remembered something about microdata in HTML 5.
When I first saw Microdata presented I wondered why there was another specification talking about semantics. There is already RDFa... but in taking a closer look I am quickly seeing how useful microdata will be based on its simplicity.

The beauty of microdata is that it does not concern itself with imposing a schema. It does one job and it does it well.

Schema is optional. No namespaces. Hurray ! Much more suited for doing inline representation of objects.
Looking at the Open Graph Protocol I realize its more of a schema that can easily be translated into a microdata vocabulary.

Therefore we have now shifted gears to add support in our product for parsing microdata with the open graph protocol as a vocabulary. No more need to hack in semantics. The solution is in the HTML 5 specification. In addition will be rolling our support for more vocabularies based on the needs of our users.


Monday, May 17, 2010

Extending PubSubHubbub

Yesterday we met at IIW to discuss Facebook's Graph Realtime Api's use cases and why the team decided not to use the current PubSubHubbub specification(0.3). Wei Zhu from Facebook presented some additional arguments to those presented in my earlier post for why PubSubHubbub was not used:
  • Lack of topic URLs. Some notifications can only be pushed and there is no way to GET a list of them at a later time. MySpace had the same issue with the firehose. There was no url for it.
  • One other issue that needs a little bit more work in PubSubHubbub was batching. The current recommendation relies on HTTP Keep-Alives or Atom Feeds.
Here are the ideas we presented that can help solve the issue:
  1. Give every resource a (topic) URL
  2. Use OAuth 2.0 for subscription authorization
  3. Move hub discovery to the HTTP Response Headers

Attendees seem to agree that this is beneficial so we are moving forward with presenting this to the PubSubHubbub mailing list.

Here are the initial changes to the specification:

I didn't want to put them in the same repo until we got feedback from the mailing list

social coder

My photo
San Francisco, California, United States
Open Web Standards Advocate