Friday, August 26, 2016

Efficiency by grouping requests and events

In general programming world, every request gets a response, which is how it is supposed to be. But what if similar requests are received at the same time. The solution is

  1. The program has to process every request individually and provide the response.
  2. We have the cache the response for the first request and return the same response for the similar request which is repeated.

The problem with the first one is the duplicate effort which could have been avoided. The problem with 2nd solution is that we assume that the caching layer is available and both the requests are exactly same (no difference in context).

What if the program or the internal engine which runs the program like (JVM or V8 engine) is able to handle such similar requests, group them together, process the request and respond to all of them at once.
This logic can be event based in event driven languages like Nodejs or even at method/function level.

Nodejs:
Nodejs is a event driven programming, where every request goes into the event queue. If the engine is able to sub group the events and process similar events in one go, it would mean using less resources to process same amount of events.

Need some thought on how this can be achieved.

Tuesday, June 14, 2016

MongoDB Design Patterns


{MongoDB}, its a NoSQL - Document database. Its ideal for most use cases. Its not ideal for a few, but you can still overcome some of the limitations in MongoDB using the following design patterns.

This article provides a solution for some of the limitations mentioned in my other article MongoDB : The Good, The Bad and the Ugly.

1. Query Command Segregation Pattern

Segregate responsibility to different nodes in the replica set.
Primary node may have priority 1 and may keep only indexes required for insert and update. The queries can be executed on secondaries.
This pattern will increase write throughput on the “priority 1” servers since fewer indexes need to
be updated and inserted on writing to a collection and secondaries benefit from having to update
fewer indexes and having a working set of memory that is optimized for their workload


2. Application level transactions Pattern

MongoDB does not support transactions and locking of documents internally. However, with
application logic a queue may be maintained.

db.queue.insert( { _id : 123,
message : { },
locked : false,
tlocked : ISODate(),
try : 0 });
var timerange = date.Now() - TIMECONSTANT;
var doc = db.queue.findAndModify( { $or : [ { locked : false }, { locked : true, tlocked : {
$lt : timerange } } ], { $set : { locked : true, tlocked : date.Now(), $inc : { try : 1 } } }
//do some processing
db.queue.update( { _id : 123, try : doc.try }, { } );

3. Bucketing Pattern

When the document has an array which grows over the period of time, use bucketing pattern.
Example: Orders. The order lines can grow or may be larger than the desired size of the document.
The pattern is handled programmatically and is triggered using a tolerance count.

     var TOLERANCE = 100;
    for( recipient in msg.to) {
db.inbox.update( {
owner: msg.to[recipient], count: { $lt : TOLERANCE }, time : { $lt : Date.now() } },
{ $setOnInsert : { owner: msg.to[recipient], time : Date.now() },
{ $push: { "messages": msg }, $inc : { count : 1 } },
{ upsert: true } );
}

4. Relationship Pattern 

Sometimes its not feasible to embed entire document. Example when we are modeling people. Use this pattern to build relationships.

1. Determine if data “belongs to” a document - is there a relation?
2. Embed when possible, especially if the data is useful and exclusive (“belongs in”).
3. Always reference using _id values at minimum.
4. Denormalize the useful parts of the relationship. Good candidates do not change value often or ever and are useful.
5. Be mindful of updates to denormalized data and repair relationships
{
_id : 1,
name : ‘Sam Smith’,
bio : ‘Sam Smith is a nice guy’,
best_friend : { id : 2, name : ‘Mary Reynolds’ },
hobbies : [ { id : 100, n :’Computers’ }, { id : 101, n : ‘Music’ } ]
}
{
_id : 2,
name : ‘Mary Reynolds’
bio : ‘Mary has composed documents in MongoDB’,
best_friend : { id : 1, name : ‘Sam Smith’ },
hobbies : [ { id : 101, n : ‘Music’ } ]
}

5. Materialized Path Pattern


If you have a tree pattern of data model where the same object type is a child of an object, you can use the materialized path pattern for more efficient search/queries. Sample is given below.
{ _id: "Books", path: null }
{ _id: "Programming", path: ",Books," }
{ _id: "Databases", path: ",Books,Programming," }
{ _id: "Languages", path: ",Books,Programming," }
{ _id: "MongoDB", path: ",Books,Programming,Databases," }
{ _id: "dbm", path: ",Books,Programming,Databases," } 
Query to retrieve the whole tree, sorting by the field path:
db.collection.find().sort( { path: 1 } )
Use regular expressions on the path field to find the descendants of Programming:
db.collection.find( { path: /,Programming,/ } )
Retrieve the descendants of Books where the Books is the top parent:
           db.collection.find( { path: /^,Books,/ } ) 


MongoDB : The Good, The Bad and The Ugly





For those who are new to {MongoDB}, its a NoSQL - Document database.  Documents comprise sets of key-value pairs and are the basic unit of data in MongoDB.
It is definitely one of the most popular nosql databases as of now. Its widely accepted and fits a wide variety of usecases (though not all).

In this article of the good, the bad and the ugly, I would like to give a brief based on my experience with MongoDB over past few years.

The Good

Since MongoDB is as popular as it is today, there should be more good that the bad and the ugly. If not, the developers will not accept it. Below are a few good things about MongoDB.

Flexible Data Model

In today's dynamic use cases and every changing application, having a flexible data model is a boon. Flexible data model means that there is no predefined schema and the document can hold any set of values based on any key.

Expressive Query Syntax

The query language of MongoDB is very expressive and is easy to understand. Many would say that its not like SQL. But why should be stick to SQL like query language when we can move forward and be more expressive and simpler.

Easy to Learn

MongoDB is easy to learn and quick to start with. The basic installation, setup and execution would not take more than a few hours. The more robust setup might be complex, but I will talk about it later.
You should be able to use the MongoDB database with ease in your project. 

Performance

Query performance is one of the strong points about MongoDB. It stored most of the workable data in RAM. All data is persisted in the hard disk, but during query, it does not fetch the data from the hard disk. It rather gets it from local RAM and hence is able to serve much faster. Here, it is important to have the right indexes and large enough RAM to get benefited from the MongoDB's performance.

Scalable and Reliable

MongoDB is highly scalable using shards. Horizontal scalability is a big plus in mos of the nosql database. MongoDB is no exception.
It is also highly reliable due to its replica sets and the data is replicated in more nodes asynchronously.

Async Drivers

Non blocking IO using Async drivers are essential in all of modern applications which are built for speed. MongoDB has async driver support for most of the popular languages.

Documentation

Having a good documentation can make the developers life lot easier, specially when the developer is new to the technology. 

Text Search

If you are building a website which need to search within all of your data, text search is essential. Example, a eCommerce website with text search enabled database can be lot more lucrative to the users.

Server-side Script

If you need some operations to be performed on the server side and not in your application, you can do that in MongoDB. Put your list of mongo statements in .js file and execute mongo yourFile.js

Documents = Objects

The good thing about having a document database is that, your object can directly be stored as a single documents in MongoDB. There is no need of ORM here.

The Bad

We looked at the various good things about MongoDB. Below are the few bad things. I am sure the critics are more interested in this part. {MongoDB} can be evil if we use it in a wrong use case.

Transactions

Now a days, there are very few applications which actually require transactions. But some applications still need it. MongoDB unfortunately does not support transactions. So if you need to update more than one document or collection per user request, dont use MongoDB. It may lead to corrupted data as there is no ACID guarantee. Rollbacks have to be handled by your application.

No Triggers

In RDBMS, we have the luxury of triggers, which have saved us in many cases. This luxury is missing in MongoDB.

Storage

MongoDB needs more storage than other popular databases. The introduction of WiredTiger in MongoDB 3.0 has solved the storage issue, but using WiredTiger may not be ideal for most of the applications.

Disk cleanup

MongoDb does not automatically clean up the disk space. So if the documents are rewritten or deleted, the disk space is not released. This happens during restart or has to be done manually.

The Ugly

Sometimes the ugly can be worst than the bad. Its important to know the ugly part before using the technology. It does not stop you from using the product, but it can make your life very tough.

Hierarchy of self

If you have a data model where a object can have a recursive children (i.e., same object type is a child of a object and it keeps going for n levels), MongoDB document can become very ugly. Indexing, searching and sorting these recursive embedded documents can be very hard.

Joins

Joining two documents is also not simple in MongoDB. Though MongoDB 3.2 supports left outer joins (lookup), it is not yet mature. If your applications requires to pull data from multiple collections in a single query, it might not be possible. Hence you have to make multiple queries, which might make your code look a bit messy.

Indexing

Though speed is advertised as a big plus point of MongoDB, it is achievable only if you have right indexes. If you end up having wrong indexes or composite indexes in incorrect order, MongoDB can be one of the slowest databases.
If you have a lot of filter by and sort by fields, you may end up having a lot of indexes on a collection, which of course is not good.

Duplicate Data

You may end up having a lot of duplicate data as MongoDB does not support well defined relationships. Updating this duplicate data can be hard and also due to lack of ACID, we have end up having corrupted data.

Conclusion

Overall, MongoDB is a good database. But provided it suites your use case. If it does not, it can get very ugly. Try using it in the wrong place and you will get burnt. 
Analyze it well and do consult an expert. You will definitely enjoy using it when its right.
As for the bad and the ugly part, you can work around few of them using the design patterns which I have explained in the article MongoDB Design Patterns


MongoDB Best Practices

Few MongoDB best practices which can help you implement the right way are listed below:
Hardware
  • Ensure your working set fits in RAM
  • Use compression
  • Run single MongoDB per server.
  • Use SSDs for write-heavy applications
Data Model
  • Store all data for a record in a single document.
  • Avoid large documents
  • Avoid unnecessarily long field names.
  • Eliminate unnecessary indexes.
  • Remove indexes that are prefixes of other indexes.
Application
  • Updates only modified fields.
  • Avoid negation in queries
  • Run explain() for every complex query.
  • Use covered queries when possible.
  • Use bulk inserts when needed.
Setup and Configuration
  • Have at least one secondary and one arbiter.
  • Set write concern to 2 when the data critical.
  • Have daily dump of data for backup.