Pythoner's Blog

Monday, July 11, 2016

Internship Experience at Cityflo

I am really excited to share my internship experience at Cityflo. It has been really two awesome months working here. Here at Cityflo, the team is building a great product that is solving the commute problem of thousands of people in India. During this period of 2 months, I worked on new technologies and got to learn a lot.

It was 21st December 2015 when I joined Cityflo as Backend Developer Intern. I devoted the first week on ramping up the codebase. Also, I took Cityflo ride in the first week itself and the experience was really awesome as the ride was punctual and very much comfortable.

From the second week, I was assigned the task of setting up Test Driven development for Cityflo. TDD setup was really necessary at that time since two or three times the system fucked up when the code changes was deployed on the production server. For instance, once a small syntax error due to one of the team member caused internal server error and the ride bookings on the mobile app stopped for about 15-20 minutes until we reverted it back to previous stable commit. These small errors can cause a large number of problems for a company growing at a fast pace and needed to be taken care of. Here comes the role of Test Driven Development(TDD). For the first time in my life, I wrote test suite for such a big codebase of about 5000 lines of code.

While writing the tests, I realized that TDD is something really important that will help us deploy the code fastly without worrying about code failure at the production server. Writing the test cases is not that easy. It needs understanding of each and every corner case written in the codebase so that if anything changes in the code, then tests should fail ideally. This constitutes a good test suite.

After writing complete set of test cases for the whole business logic, the code coverage raised from 0 to 85%. I also integrated travis CI and Coveralls as well. After completing the TDD, no fuck up has happened at the production server. It really helped Cityflo in keeping the code clear and simple and increased productivity. Here at Cityflo, we always strive for the code coverage with a better set of test cases that checks everything. Along with this feature, I fixed hundreds of daily tech issues and the maintenance issues.

In the whole span of 2 months, I ended up in making 280 commits which comprised of 14,223 lines of additions and 8,433 lines of deletions.

I would like to thank Cityflo Team for giving me such an amazing experience. :-)

Friday, September 11, 2015

Google Summer of Code 2015 Experience

Hi everyone, I was selected for Google Summer of Code 2015 under CloudCV Organization. In this blog post, I am going to describe the work that I did whole summer. I hope that it might be useful :P

I was selected as the Google Summer of Code student in the CloudCV Organization for the Project 'Integrating Dropbox, Google Drive and S3 and building REST APIs for CloudCV'.

It has been an awesome experience this summer. I learn a lot of new technologies and worked on them in a very short span of time.

TL;DR

In short, the project aimed in developing the following functionalities:

Integrate the third party authentication through Dropbox and Google
Create the new database schema for CloudCV
Create the REST APIs for the new database model
Modify and Integrate the NVIDIA's DIGITS Framework (Deep Learning Framework) and add the concept of users in DIGITS
Create the CloudStorage Upload APIs for uploading several Gigabytes of Model Jobs to Dropbox, Google Drive and Amazon S3 Cloud
Create the CloudStorage Download APIs for downloading the training and validation datasets from Dropbox, Google and Amazon S3 Cloud

Now, explaining the project in detail:

Integrate the third party authentication through Dropbox and Google:

For integrating the third party authentication, I used the Django-Allauth package for django. I already did a lot of research over the list of authentication packages available for Django. So, finally I chose Django-Allauth. It handled all the cases quite well except some due to which I needed to change that so that it satisfied the needs. Integrating this was not that tough, but creating a beautiful UI was a big problem for me. I used the Materializecss (based out of Material Design) for creating the layouts.
Create a new Database Schema

Creating a database schema and normalizing it properly has always been a challenge for me. I brainstormed a lot for the database schema and finally after a lot of discussion with mentors over this topic, we came to a conclusion and the database schema looked like as shown here http://www.deshraj.in/cloudcv_db.
Create REST APIs for new Database Model

That was my first time when I built REST APIs. Before going into depth, I would like to say that REST is just AWESOME ;) I loved the concept. For creating the APIs, I preferred the DRF(Django REST Framework). DRF has a lot of out of box functionalities that helps a developer very much in building the RESTful Architecture.
Inegrating Workspaces in DIGITS

This task involved working on the DIGITS Framework and modifying it for adding the concept of workspaces and users. About DIGITS: Deep Learning GPU Training System is a webapp for training deep neural networks. The official source code repository of DIGITS is https://github.com/NVIDIA/DIGITS. The basic idea behind creating workspaces in DIGITS is to facilitate the collaboration of several researchers, data-scientists to work together. This will create a platform for them to work collaboratively. After integrating workspaces, I needed to connect the both Django(CloudCV Server) with Flask(DIGITS) to support a single authentication system. So, the session was managed in CloudCV Server and DIGITS server got the Readonly access to sessions so as to check the logged in user. For managing sessions, I used the Redis_Session Fork. The implementation of sharing single session between django and is a crucial part of the project.
Building Cloud Storage API

This Cloud Storage API is one of the prominent features of CloudCV forked DIGITS. Using this API, the models can be easily uploaded and downloaded from cloud storages like Dropbox, Google Drive and Amazon S3 Cloud. Speaking about the technology, Boto S3 is used to fetch the data from S3 buckets.
Planning after GSOC 2015

I have been contributing to open source projects from around 11 months and I love to do that. So, I am continuing my work on CloudCV. Also, I am contributing to the Main Repository of NVIDIA DIGITS so as to add other functionalities and help the Researchers and Data Scientists around the world who are using it.

It has been the most challenging summer for me that improved my coding skills a lot. I would recommend that one should start contributing to the Open Source Projects as early as possible because that teaches you how the big Organizations like Mozilla, Google, Microsoft etc works on their projects. Also, it gets added to your CV that is a big plus.

If you have any comments, then do comment below the post. Sorry for my bad composition :) :)

My Github: http://github.com/deshraj

Lastly, I would say that

Saturday, June 6, 2015

Install Caffe on Ubuntu 14.04 the right way

What is Caffe? Why it is used ?

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.

Before beginning the installation, I would like to tell you that its a great pain in ass installing the Caffe.

Installation:

First, you need to install the general dependencies using the following command:

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler

After installing these, you need to install three more dependencies:

CUDA is required for GPU mode. Installing CUDA is optional but it is recommended for better performance.
To install CUDA, you can visit the link https://developer.nvidia.com/cuda-downloads and then you can download CUDA for Ubuntu 14.04. Start the download and go get some coffee since it is quite large and will take time to download. After successful download, install it.
BLAS : Install ATLAS by sudo apt-get install libatlas-base-dev or install OpenBLAS or MKL for better CPU performance.
BOOST : BOOST C++ library can be downloaded and installed through Sourceforge.
OpenCV 2.4 or above: For installing OpenCV, follow this link.
Python: If you use the default python, then you will need to sudo apt-get install python-dev package.

Compilation Part:

Now as you have installed the prerequisites on your system, so now you need to edit the config file (Makefile.config). Create the config file by copying the contents of Makefile.config.example file to Makefile.config using the following command:

cp Makefile.config.example Makefile.config

After this, you need to change the configurations in Makefile.config file. Change the config according to the following conditions:

For cuDNN acceleration, you should uncomment the USE_CUDNN := 1 switch in Makefile.config.
For CPU-only Caffe, uncomment CPU_ONLY := 1 in Makefile.config.

For the lazy ones, here is the complete the configuration of my Makefile.



## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# cuDNN acceleration switch (uncomment to build with cuDNN).
USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
#CPU_ONLY := 1

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
# CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
  -gencode arch=compute_20,code=sm_21 \
  -gencode arch=compute_30,code=sm_30 \
  -gencode arch=compute_35,code=sm_35 \
  -gencode arch=compute_50,code=sm_50 \
  -gencode arch=compute_50,code=compute_50

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
# BLAS_LIB := /path/to/your/blas

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/include/python2.7 \
  /usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# ANACONDA_HOME := $(HOME)/anaconda
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
  # $(ANACONDA_HOME)/include/python2.7 \
  # $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/lib
# PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
# WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1

BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
# DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @

After making successful changes in the configuration file, you need to run the following commands:

make all

make test

make runtest

(Before running next two compilation commands, you need to make sure that you have set PYTHON and MATLAB path in Makefile.config)

To compile the PYTHON wrappers, you need to run the command,

make pycaffe

To compile the MATLAB wrappers, you need to run the command ,

make metacaffe

Finally, if you have reached here, then you have installed Caffe on your System successfully. I am sure that after the successful installation of Caffe, you will feel like Heaven ;)

Errors that may occur during Installation:

nvcc is not found during the make all command. To solve this error, install the whole nvidia-cuda-toolkit using the following command:
sudo apt-get install nvidia-cuda-toolkit
cudnn.hpp does not exist: To fix this error, you need to place the cudnn.hpp file that resides in the tar file( for eg. cudnn-6.5-linux-x64-v2 in my case) you just downloaded. Copy the file cudnn.hpp to ./caffe/include/ directory with appropriate permissions(644) . After placing the file there, then run the make commands again.
This issue was also also reported on github Caffe repository at https://github.com/BVLC/caffe/issues/1325.

Now you are all done. Now, you can start playing with examples like Training LeNet on MNIST.

End-Note:

Hope this page helps reduce some effort required to install Caffe on your machines. To view our bigger attempts at reducing this overload, checkout the CloudCV organisation page.

The BVLC Caffe issues and the caffe-users google-group are excellent sources to get your queries solved and debug the errors you face.

I have taken references from a lot of sources I found online and one can easily find them through the sources cited in above bullet. This wiki is for educational purposes only.

Saturday, May 30, 2015

Django-Allauth vs. Django Social Auth vs. Python-Social-Auth

Process Flow of Authentication and Authorization using OAuth 2.0 Protocol :

Fig: Process flow of Authentication through Google using OAuth 2.0

The whole process flow described above involves four steps. Firstly, the OAuth 2.0 credentials like client ID and client secret are obtained from the provide(Google Developers Console in this case) for the web application. Now, to access any API, one needs to get the access token from the provider using the set of client ID and client Secret . After the user login, the user is asked whether they are willing to grant the permissions that the application is requesting. This process is called user consent. If the user grants the permission, the provider Authorization Server sends the application an access token (or an authorization code that the application can use to obtain an access token). After the application obtains an access token, it sends the token to the Provider API in an HTTP authorization header. Access tokens have limited lifetimes. If the application needs access to provider's API beyond the lifetime of a single access token, it can obtain a refresh token. A refresh token allows an application to obtain new access tokens.
This is the whole workflow of the authorization using OAuth 2.0 protocol.

Majorly three django packages are used for integrating social authentication. Below, they are described briefly:

Django-allauth:

Codebase: https://github.com/pennersr/django-allauth

It is an integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication. In simple terms, it is a combination of simple login and registration along with social authentication system.

Django-Allauth is the most popular Django package used for Authentication purposes. Best thing about it is that it is available as open source which speeds up the development process due to collaborations with the developers all around the world.

Advantages of Django-allauth:

Support for Python 2.6, 2.7 and 3.3
Support for Django Version 1.4.3 and above.
Supports Social authentication through Dropbox and google using OAuth2.0
Supports multiple authentication schemes (e.g. login by user name, or by e-mail)
Multiple strategies for account verification (ranging from none to e-mail verification)
Support to connect multiple social accounts to a Django user account.
Consumer keys, tokens make use of the Django sites framework. This is especially helpful for larger multi-domain projects, but also allows for for easy switching between a development (localhost) and production setup without interfering with your settings and database.
Rapidly gaining traction and improving day by day.
FAQ page solves most of the problems related to Django-allauth. Otherwise most problems are solved on StackOverflow under the tag #django-allauth. Also mailing list is also available to solve further issues if any.
It is well tested than other Django authentication packages; credits to consistent development work going on.
The installation, configuration is easy to get started with.
Pre-defined templates are available. Its upto the developer whether he wants to customize them or not.
Also, one can easily add the new backends if required.

Disadvantages of django-allauth:

While working on it and researching more about it, I found that it has some documentation gaps that need to be fixed.
It can get problematic for beginners since it involves series of small but time-consuming hurdles.

Django Social Auth:

Codebase: https://github.com/omab/django-social-auth

Django Social-auth library is a Django package that was made for the social authentication. But the library is deprecated now and has migrated to Python-social-auth for providing better support to several frameworks and ORMs.

Advantages of Django-Social-Auth:

Provides social authentication using OAuth and OpenId mechanism
Support for Django, Flask, Pyramid, Webpy frameworks.

Disadvantages of Django-Social-auth:

This package is deprecated.
No support for Python 3.x
Development on its codebase has stopped.
Not well documented. It lacks FAQ page for bug fixing.

Python-Social-Auth:

Codebase: https://github.com/omab/python-social-auth

Python social-auth is easy-to-setup authentication system that provides vast support. It supports several frameworks and auth providers. It has migrated from django-social-auth to generalize the package to provide a vast support and implement a common interface to define new authentication providers from third-parties and bring support for more frameworks and ORMs.

Advantages of Python-Social-auth:

Provides support for frameworks such as Django, Flask, Webpy, Pyramid and Tornado.
Supports Python 3.x
Supports social authentication through Dropbox, Google, etc. using OAuth 2.0
Frameworks API is provided that ease the implementation to increase the number of frameworks supported.
ORMs like SQLAlchemy, Django ORM and Mongoengine are supported
Storage API is provided to add support to more ORMs.
Multiple social accounts can be associated to a single user.
Basic user data population, to allow custom fields values from providers response.

Disadvantages of Python Social-Auth:

Not well documented.
Newer package. So, we cannot predict the future developments.
Lot of issues(98 issues) are listed on Github.

Comparison between the three authentication packages:

Features	Django-Allauth	Django-Social-auth	Python-Social-auth
Commits on Codebase	Regular and fast- developing	No commits made since 6 months.	Regular commits on codebase
Python 3.x Support	YES	NO	YES
Authentication	Simple authentication + Social authentication	Only Social authentication	Only Social authentication
Predefined and customizable Templates	YES	NO	NO
Framework Support	Django	Django, Flask, Webpy, Pyramid and Tornado	Django, Flask, Webpy, Pyramid and Tornado
Total Commits	1072	1611	1384
Development Status	Beta	Beta	Beta
OAuth 2.0 Support	YES	YES	YES
Current Stable Version	0.19.1	0.7.28	0.2.2
Support for Dropbox and Google	YES	YES	YES
Repository Forks	545	735	458
Total no. of Contributors	150	167	148
Repo. watchers	1650	2169	1502

Selecting the Appropriate Package:

The package which does not affects the scalability of the project must be given more weightage among all other packages.

From previous work experience and research on authentication libraries, Django-Allauth turns out to be the most appropriate choice of them all. So. in my view, django-allauth should be used for integration.