Friday, August 26, 2016

Efficiency by grouping requests and events

In general programming world, every request gets a response, which is how it is supposed to be. But what if similar requests are received at the same time. The solution is

  1. The program has to process every request individually and provide the response.
  2. We have the cache the response for the first request and return the same response for the similar request which is repeated.

The problem with the first one is the duplicate effort which could have been avoided. The problem with 2nd solution is that we assume that the caching layer is available and both the requests are exactly same (no difference in context).

What if the program or the internal engine which runs the program like (JVM or V8 engine) is able to handle such similar requests, group them together, process the request and respond to all of them at once.
This logic can be event based in event driven languages like Nodejs or even at method/function level.

Nodejs:
Nodejs is a event driven programming, where every request goes into the event queue. If the engine is able to sub group the events and process similar events in one go, it would mean using less resources to process same amount of events.

Need some thought on how this can be achieved.

Tuesday, June 14, 2016

MongoDB Design Patterns


{MongoDB}, its a NoSQL - Document database. Its ideal for most use cases. Its not ideal for a few, but you can still overcome some of the limitations in MongoDB using the following design patterns.

This article provides a solution for some of the limitations mentioned in my other article MongoDB : The Good, The Bad and the Ugly.

1. Query Command Segregation Pattern

Segregate responsibility to different nodes in the replica set.
Primary node may have priority 1 and may keep only indexes required for insert and update. The queries can be executed on secondaries.
This pattern will increase write throughput on the “priority 1” servers since fewer indexes need to
be updated and inserted on writing to a collection and secondaries benefit from having to update
fewer indexes and having a working set of memory that is optimized for their workload


2. Application level transactions Pattern

MongoDB does not support transactions and locking of documents internally. However, with
application logic a queue may be maintained.

db.queue.insert( { _id : 123,
message : { },
locked : false,
tlocked : ISODate(),
try : 0 });
var timerange = date.Now() - TIMECONSTANT;
var doc = db.queue.findAndModify( { $or : [ { locked : false }, { locked : true, tlocked : {
$lt : timerange } } ], { $set : { locked : true, tlocked : date.Now(), $inc : { try : 1 } } }
//do some processing
db.queue.update( { _id : 123, try : doc.try }, { } );

3. Bucketing Pattern

When the document has an array which grows over the period of time, use bucketing pattern.
Example: Orders. The order lines can grow or may be larger than the desired size of the document.
The pattern is handled programmatically and is triggered using a tolerance count.

     var TOLERANCE = 100;
    for( recipient in msg.to) {
db.inbox.update( {
owner: msg.to[recipient], count: { $lt : TOLERANCE }, time : { $lt : Date.now() } },
{ $setOnInsert : { owner: msg.to[recipient], time : Date.now() },
{ $push: { "messages": msg }, $inc : { count : 1 } },
{ upsert: true } );
}

4. Relationship Pattern 

Sometimes its not feasible to embed entire document. Example when we are modeling people. Use this pattern to build relationships.

1. Determine if data “belongs to” a document - is there a relation?
2. Embed when possible, especially if the data is useful and exclusive (“belongs in”).
3. Always reference using _id values at minimum.
4. Denormalize the useful parts of the relationship. Good candidates do not change value often or ever and are useful.
5. Be mindful of updates to denormalized data and repair relationships
{
_id : 1,
name : ‘Sam Smith’,
bio : ‘Sam Smith is a nice guy’,
best_friend : { id : 2, name : ‘Mary Reynolds’ },
hobbies : [ { id : 100, n :’Computers’ }, { id : 101, n : ‘Music’ } ]
}
{
_id : 2,
name : ‘Mary Reynolds’
bio : ‘Mary has composed documents in MongoDB’,
best_friend : { id : 1, name : ‘Sam Smith’ },
hobbies : [ { id : 101, n : ‘Music’ } ]
}

5. Materialized Path Pattern


If you have a tree pattern of data model where the same object type is a child of an object, you can use the materialized path pattern for more efficient search/queries. Sample is given below.
{ _id: "Books", path: null }
{ _id: "Programming", path: ",Books," }
{ _id: "Databases", path: ",Books,Programming," }
{ _id: "Languages", path: ",Books,Programming," }
{ _id: "MongoDB", path: ",Books,Programming,Databases," }
{ _id: "dbm", path: ",Books,Programming,Databases," } 
Query to retrieve the whole tree, sorting by the field path:
db.collection.find().sort( { path: 1 } )
Use regular expressions on the path field to find the descendants of Programming:
db.collection.find( { path: /,Programming,/ } )
Retrieve the descendants of Books where the Books is the top parent:
           db.collection.find( { path: /^,Books,/ } ) 


MongoDB : The Good, The Bad and The Ugly





For those who are new to {MongoDB}, its a NoSQL - Document database.  Documents comprise sets of key-value pairs and are the basic unit of data in MongoDB.
It is definitely one of the most popular nosql databases as of now. Its widely accepted and fits a wide variety of usecases (though not all).

In this article of the good, the bad and the ugly, I would like to give a brief based on my experience with MongoDB over past few years.

The Good

Since MongoDB is as popular as it is today, there should be more good that the bad and the ugly. If not, the developers will not accept it. Below are a few good things about MongoDB.

Flexible Data Model

In today's dynamic use cases and every changing application, having a flexible data model is a boon. Flexible data model means that there is no predefined schema and the document can hold any set of values based on any key.

Expressive Query Syntax

The query language of MongoDB is very expressive and is easy to understand. Many would say that its not like SQL. But why should be stick to SQL like query language when we can move forward and be more expressive and simpler.

Easy to Learn

MongoDB is easy to learn and quick to start with. The basic installation, setup and execution would not take more than a few hours. The more robust setup might be complex, but I will talk about it later.
You should be able to use the MongoDB database with ease in your project. 

Performance

Query performance is one of the strong points about MongoDB. It stored most of the workable data in RAM. All data is persisted in the hard disk, but during query, it does not fetch the data from the hard disk. It rather gets it from local RAM and hence is able to serve much faster. Here, it is important to have the right indexes and large enough RAM to get benefited from the MongoDB's performance.

Scalable and Reliable

MongoDB is highly scalable using shards. Horizontal scalability is a big plus in mos of the nosql database. MongoDB is no exception.
It is also highly reliable due to its replica sets and the data is replicated in more nodes asynchronously.

Async Drivers

Non blocking IO using Async drivers are essential in all of modern applications which are built for speed. MongoDB has async driver support for most of the popular languages.

Documentation

Having a good documentation can make the developers life lot easier, specially when the developer is new to the technology. 

Text Search

If you are building a website which need to search within all of your data, text search is essential. Example, a eCommerce website with text search enabled database can be lot more lucrative to the users.

Server-side Script

If you need some operations to be performed on the server side and not in your application, you can do that in MongoDB. Put your list of mongo statements in .js file and execute mongo yourFile.js

Documents = Objects

The good thing about having a document database is that, your object can directly be stored as a single documents in MongoDB. There is no need of ORM here.

The Bad

We looked at the various good things about MongoDB. Below are the few bad things. I am sure the critics are more interested in this part. {MongoDB} can be evil if we use it in a wrong use case.

Transactions

Now a days, there are very few applications which actually require transactions. But some applications still need it. MongoDB unfortunately does not support transactions. So if you need to update more than one document or collection per user request, dont use MongoDB. It may lead to corrupted data as there is no ACID guarantee. Rollbacks have to be handled by your application.

No Triggers

In RDBMS, we have the luxury of triggers, which have saved us in many cases. This luxury is missing in MongoDB.

Storage

MongoDB needs more storage than other popular databases. The introduction of WiredTiger in MongoDB 3.0 has solved the storage issue, but using WiredTiger may not be ideal for most of the applications.

Disk cleanup

MongoDb does not automatically clean up the disk space. So if the documents are rewritten or deleted, the disk space is not released. This happens during restart or has to be done manually.

The Ugly

Sometimes the ugly can be worst than the bad. Its important to know the ugly part before using the technology. It does not stop you from using the product, but it can make your life very tough.

Hierarchy of self

If you have a data model where a object can have a recursive children (i.e., same object type is a child of a object and it keeps going for n levels), MongoDB document can become very ugly. Indexing, searching and sorting these recursive embedded documents can be very hard.

Joins

Joining two documents is also not simple in MongoDB. Though MongoDB 3.2 supports left outer joins (lookup), it is not yet mature. If your applications requires to pull data from multiple collections in a single query, it might not be possible. Hence you have to make multiple queries, which might make your code look a bit messy.

Indexing

Though speed is advertised as a big plus point of MongoDB, it is achievable only if you have right indexes. If you end up having wrong indexes or composite indexes in incorrect order, MongoDB can be one of the slowest databases.
If you have a lot of filter by and sort by fields, you may end up having a lot of indexes on a collection, which of course is not good.

Duplicate Data

You may end up having a lot of duplicate data as MongoDB does not support well defined relationships. Updating this duplicate data can be hard and also due to lack of ACID, we have end up having corrupted data.

Conclusion

Overall, MongoDB is a good database. But provided it suites your use case. If it does not, it can get very ugly. Try using it in the wrong place and you will get burnt. 
Analyze it well and do consult an expert. You will definitely enjoy using it when its right.
As for the bad and the ugly part, you can work around few of them using the design patterns which I have explained in the article MongoDB Design Patterns


MongoDB Best Practices

Few MongoDB best practices which can help you implement the right way are listed below:
Hardware
  • Ensure your working set fits in RAM
  • Use compression
  • Run single MongoDB per server.
  • Use SSDs for write-heavy applications
Data Model
  • Store all data for a record in a single document.
  • Avoid large documents
  • Avoid unnecessarily long field names.
  • Eliminate unnecessary indexes.
  • Remove indexes that are prefixes of other indexes.
Application
  • Updates only modified fields.
  • Avoid negation in queries
  • Run explain() for every complex query.
  • Use covered queries when possible.
  • Use bulk inserts when needed.
Setup and Configuration
  • Have at least one secondary and one arbiter.
  • Set write concern to 2 when the data critical.
  • Have daily dump of data for backup.






Wednesday, August 20, 2014

Linux: Setup SSH Key for password-less login

To login from one linux box to another without having to type the password, follow the below steps. This is useful when you have to install or setup some software on the 2nd linux box using a automated tool. Just follow the lines in blue. The other lines are generated by the commands.

[devu@vmhostname01 ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (default file name /home/devu/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/devu/.ssh/id_rsa. Your public key has been saved in /home/devu/.ssh/id_rsa.pub. The key fingerprint is: 2f:bb:6c:e5:9b:90:44:f5:37:34:f8:7c:78:6f:39:96 devu@vmhostname01 The key's randomart image is: +--[ RSA 2048]----+ | . .o | | . ... . | | . .oo. | | . .+.o| | S o+| | . o. Eo| | +o. ...| | ..+.. | | .+.o. | +-----------------+ [devu@vmhostname01 ~]$ ls -ltr .ssh/ total 20 -rw-r-----. 1 devu devu 405 Dec 18 2013 authorized_keys -rw-r--r--. 1 devu devu 4189 Aug 19 04:15 known_hosts -rw-------. 1 devu devu 1675 Aug 20 21:19 id_rsa -rw-r--r--. 1 devu devu 401 Aug 20 21:19 id_rsa.pub [devu@vmhostname01 ~]$ cd .ssh/ [devu@vmhostname01 .ssh]$ ssh-copy-id -i id_rsa.pub devu@vmhostname02 ***WARNING***devu@vmhostname02's password: Now try logging into the machine, with "ssh 'devu@vmhostname02'", and check in : .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [devu@vmhostname01 .ssh]$ ssh devu@vmhostname02 [devu@vmhostname02 ~]$

Thursday, October 17, 2013

RESTful API Standards

Below are the common RESTful API standards that we should follow when designing our API's

Keep your base URL simple and intuitive
 /dogs     /dogs/1234

Keep verbs out of your base URLs
Never use /getDogs or /createDogs

Use HTTP verbs to operate on the collections and elements
POST, GET, PUT, DELETE is CRUD (Create-Read-Update-Delete)

Use Plural nouns
/dogs   /deals    /quotes

Concrete names are better than abstract
Instead of /items be more specific like /blogs  /videos
Number of resources - preferably between 12 to 24

Simplify the Relationship and Association between Resources
GET /owners/5678/dogs
POST /owners/5678/dogs

Keep complexity behind the ‘?’
GET /dogs?color=red&state=running&location=park

Have a good error design
                Aligning errors with HTTP status codes
                200 - OK
                400 - Bad Request from client
                500 - Internal Server Error
                304 - Not Modified
                404 – Not Found
                401 - Unauthorized
                403 - Forbidden
                Provide a more granular error message
                {"status" : "401", "message":"Authentication Required","code": 20003}

Versioning is mandatory
Always use v and never have minor version like v1.0. Ideal is v1, v2
Have version all the way to the left (highest scope): /v1/dogs

Maintain at least one version back
Follow proper cycle for deprecating and retiring the API

Response content type, OAuth etc must go into the header
information which doesn't change the logic for each response goes into the header

Extra optional fields must be requested in a comma-delimited list
/dogs?fields=name,color,location 

Pagination is a must for resource API's
Use limit and offset. 
/dogs?limit=25&offset=50
default pagination is limit=10 with offset=0

For Non-Resource API's: Use verbs not nouns
/convert?from=EUR&to=CNY&amount=100
/translate?from=EN&to=FR&text=Hello

Request format - support multiple if possible
Default should be JSON, but support XML if possible
Use the pure RESTful way with the header, Accept: application/json

Use standard Java/Javascript naming convention for attributes
example: createdAt, firstName

Search within a resource: use ?q=
/owners/5678/dogs?q=fluffy+fur

Have all API's in single store and under one domain
api.mycompany.com

Option to Supress error codes
Always sent HTTP 200, even in case of errors.
&suppress_response_codes=true

Authentication : Use OAuth 2.0

In addition to the atomic API, provide composite API's as well if required - if there is a need
This will avoid applications making multiple calls per screen.


Complement the API with code libraries and a software development kit (SDK)

Note: These are not the only standards and there may be variations. These are the standards which are followed my many and works for them.

Friday, October 11, 2013

Build and Deploy the PLAY application

Once you have developed a application in play, you might want to create a binary out of it which you can deploy on another server.
The below steps have to be followed to create a binary of your Play application and to deploy on another server (Linux).

1. In your play application directory, run the below command.
[My application folder] $ play dist
This command will create a zip file with the following name [applicationName]-[version].zip. Example: testapp-1.0.zip.
2. Copy this zip file to the server on which you want to deploy the application.
Example: scp testapp-1.0.zip user@other-server-host:/home/user/.
3. Unzip the file to the folder where you want the application running.
Example: unzip testapp-1.0.zip -d /opt
4. You will find a start file in the unziped folder. But most of the time it will not have executable permissions.
Change the permission by executing the below command.
chmod 777 start
5. You can then execute the command like below.
[Unzipped dist folder] $ ./start

You can also make your application run on different port and provide other options along with the ./start.
Refer the below URL for the options.
http://www.playframework.com/documentation/2.1.x/ProductionConfiguration

Hoping that this information helps some newbies of Play.

Monday, July 22, 2013

Sonar - Code Quality Management

This Sonar tool is saving me a lot of time these days. Gone are the days where I need to worry about the developer following the coding standards of JAVA, HTML and JS.
I only need to worry about the business logic review these days.

This open source tool is surely a must have for every team. Check it out.
http://www.sonarsource.com/

The developers are also benefited as they are able to improve in their coding standards by using this tool.

Few plugins which are impressive are
http://docs.codehaus.org/display/SONAR/Quality+Index+Plugin
http://docs.codehaus.org/display/SONAR/Toxicity+Chart+Plugin
http://docs.codehaus.org/display/SONAR/Motion+Chart+Plugin

There are lot more though which can be easily added.