Reviews42 platform Architecture
by Subhranath Chunder
Few months back, was given the responsibility to architect and deliver a community based reviews platform. The first of it's kind in India.
After initial discussions, the priorities finally came to:
- Scalability
- Easy to integrate with external platforms
- Ready to port on different application platforms
Being a Django fanatic, it was my obvious tool of choice to start with, and to built a platform around it. The platform itself is now based on open source software/tools, and it's technology stack comprises of:
All these things were put together to create a SOA based software platform which is not only easy to scale and integrate with other external platforms and services, but also easy to expand into other platforms/devices.
The current platform architecture can be logically represented as something like the following:
Eventually, the Reviews42 platform made it's debut on 30th March 2012, with the launch of it's Django powered Web App, and possibly the rest of the apps will follow in the future.
Share your views about the architectural design, the nice and also the nasty ones.

Comments
How is "API Layer" written? And how a web app is connecting/using it? Does this mean that a web app doesn't have any ORM models? Can you elaborate on this?
Thanks
@slafs: As mentioned on the diagram, the API layer itself is also written in Django. It's primary purpose is to provide platform functionalities as services. This is where the SOA model comes into play.
The web app as well as the other apps, are making use of the all so readily available http protocol. That's the communication interface.
Web app does not have any ORM models. It talks to the API layer and exchange messages through services.
As a tech project the site reviews42 is good work.
However, not to discourage you but, "The first of it's kind in India." - really?
Have you seen http://www.mouthshut.com/ ?
@last comment: That's more of a generic forum to me, rather than a reviews platform, technically.
1) What was the research/analysis conducted that validated the use of MongoDB for cache ? What other KV systems were taken into consideration ? Why was MongoDB settled upon ?
2) How do you deal with schema updates over the three different DB's ?
3) Why was there need for 2 level caching over Mongo and mem ? Or do they handle different types of caching.
Overall, the de-coupling achieved by the SOA architecture makes this setup excellent !
@UdayKal:
1 - Rephrasing your question, I'm taking it as "Why MongoDB?".
To start with. Looking at the amount of data we were going to serve and handle over time, it is bound to explode. Were we planning to scale our app to handle such amount of data? Yes. That's why we need NoSQL. Now, which one to choose? Significant reads, frequent insert, considerable updates, and obviously the ever changing data. My analysis finally leads us to choose MongoDB.
2 - The schema updates are most relevant to the primary db MySQL only. That is where most of the write operations are going to, in normalized form. (The final part would probably clear out why the schema updates are not too much significant to the other two DBs)
3 - This is probably one of the most interesting part of the architecture. As you see, the MySQL is only recording normalized data, but practically a single user request would involve a lot of denormalized data. The MongoDB here works pretty much like a cache, which already has all the involved data in denormalized form as a single document. So, you can also think of this as the "Prepared DB" which already has the denormalized data. Or even as "View-based DB". As an user when you'll see a page on the site, the entire data will probably come from a single MongoDB query, if it crosses the memcache layer at the first place that is. So, yes the memcache and MongoDB cache (as I call it) has different purpose of implementation. MongoDB provides the cached/prepared/denormalized data, and memcache provides the in-memory optional cache.
Just to add a little more info.
MongoDB and Solr DB are updated as and when required, with the help of RabbitMQ.
- Solr is for the full text search data. So the frequent schema changes hardly ever affects this.
- And changing MySQL database schema, affects MongoDB schema only if there is a change in which user sees the data coming to the frontend. In that case too, adding fields to MongoDB is pretty simple and pays less performance and storage penalty than something like CouchDB.
- Being on MongoDB, and serving the data upfront, the facility of using dynamic queries also provides a good opportunity to remove the MySQL DB enitirely, if required at all anytime anyway. My SOA architecture provides me that much flexibility pretty easily.
Does your API layer return multiple formats (html, json) based on the request type, or do you just return a single type (like json) in every case?
Obviously, if the API returns json (or xml) in every case then there would be quite a bit of unnecessary overhead for browser requests -- once for the API layer to create a json response, and once for the Django view to deserialize the json response before returning html.
8MfWoP pdlciihymdfj, [url=http://wnpvkwvolcbc.com/]wnpvkwvolcbc[/url], [link=http://tcwpwynicnli.com/]tcwpwynicnli[/link], http://llvyendbzzps.com/