Tuesday, June 06, 2017

Setup Remote Desktop for Raspberry Pi with no need for an external display


If you are thinking about how to setup remote desktop to raspberry pi, this article is for you. I will show you a walk-through to install required packages so that you are able to remote desktop from your windows machine or any other remote machine.

Steps to configure remote desktop on raspberry pi:

1) Connect to your Raspberry Pi using Putty.
2) Open a terminal window.
3) We are going to install XRDP package to configure RDP to the Pi. Before installing xrdp, we must first install the tightvncserver package.  The tightvncserver installation will also remove the RealVNC server software that ships with newer versions of Raspbian OS since tightvncserver (xrdp) will not work if RealVNC is installed. 

$ sudo apt install -y tightvncserver
$ sudo apt install -y xrdp

3) Now, Just install Samba package that provides a GUI when accessing a Pi using RDP.

$ sudo apt install -y samba

4) Open up the remote desktop tool in windows or your host OS and set the name or IP of your Pi and hit connect.

With that, we can connect to any remote Pi or Linux based IoT device from your computer; therefore no need to connect an IoT device to an external screen.


Tuesday, April 25, 2017

Linear Regression Algorthims in Scikit-Learn


While i am working on different regression algorithms in scikit-learn library. I would like to share some important tips to differentiate between major linear regression algorithms in Machine Learning space.

Below is a comparison table to compare among four linear regression algorithms:

The general idea of Gradient Descent (GD) is to tweak parameters iteratively in order to minimize a cost function.

Batch and Stochastic Gradient Descent: at each step, both algorithms compute the gradients based on the full training dataset (as in Batch GD) or based on just one instance (as in Stochastic GD).

While in Mini-Batch Gradient Descent algorithm: computes the gradients based on small random sets of instances called mini batches.

There are more linear regression algorithms in sklearn that is not covered in this blog post, you can find it here:  http://scikit-learn.org/stable/modules/sgd.html#regression

Hope this helps!

Sunday, April 23, 2017

What is the difference between estimators vs transformers vs predictors in sklearn?

Hi All,

While working in Machine Learning projects using scikit-learn library, I would like to highlight important and fundamental concepts that every ML ninja needs to be aware of. In this post i am highlighting few concepts to differentiate estimators vs transformers vs predictors in building machine learning solutions using sklearn.

1) Estimators: Any objects that can estimate some parameters based on a dataset is called an estimator. The estimation itself is performed by calling fit() method.
This method takes one parameter (or two in case of supervised learning algorithms). Any other parameter needed to guide the estimation process is called hyperparameter and must be set as in instance variable.

For example: i would like to estimate a mean, median or most frequent value of a column in my dataset.

This is a cheat sheet of sklearn estimators. you can find the up to date version here.

2) Transformers: Transform a dataset. It transforms a dataset by calling transform() method and it returns a transformed dataset. some estimators can also transform a dataset.

For example: Imputer class in sklearn is an estimator and a transformer. You can call fit_transform() method that estimate and transform a dataset.

Python code: 

from sklearn.preprocessing inport Imputer

imputer = Imputer(strategy="mean") #estimate mean value for dataset columns

imputer.fit(mydataset)    # Imputer as an estimator

imputer.fit_transform(mydataset)   # Imputer as a transformer and estimator (Combined two steps)

3) Predictors: making predictions for  given a dataset. A predictor class has predict() method that takes a new instances of a dataset and returns a dataset with corresponding predictions. Also, it contains score() method that measures the quality of the predictions for a giving test dataset.

For example: LinearRegression, SVM, Decision Tree,..etc are predictors.

You can combine building blocks of estimators, transformers and predictors as a pipeline in sklearn. This allows developers to use multiple estimators from a sequence of transformers followed by a final estimator or predictor. This concept is called composition in Machine Learning.

Hope this helps

Friday, April 07, 2017

How to configure X2GO Client on Data Science Virtual Machine


While i was trying to connect to a newly provisioned data science virtual machine in Azure. I have received few challenges on successfully start a session in X2GO client app.

The Data Science Virtual Machine (DSVM) VM image makes it easy to get started doing data science in minutes, without having to install and configure each of the tools individually.
This virtual machine contains: Cent OS, Microsoft R Developer edition, Anaconda python distribution, Standalone spark, CNTK, Rattle, XGBoost, in addition to other tools. Check out this article for the full details of this VM.

X2GO client provides a client tool for windows users to RDP to linux VMs, you can install this tool from here.

After you install this tool and try to connect to the DSVM VM, you will get this error:

unable to start startkde

To solve this problem, follow these steps:

1) Connect to the VM using any client linux tool to such as Putty.
2) After you login to the VM, execute the following command:

sudo yum install @kde 

This will install and upgrade existing packages on the VM. The VM will prompt you to accept installing all required packages and upgrades.

3) This command will take few seconds to complete. Below screenshot upon completion step is finished.

4) Return back to X2GO client and login using your username and password.

5) You will be able to successfully RDP to DSVM machine.

Hope this helps!

Wednesday, April 05, 2017

How to install Keras on Windows 10 with 64 bit


I was trying to install Keras library on Windows 10 with 64 bit build machine. Since i use Anaconda to manage python packages on my machine, The first thing i tried was to install the package from the Anaconda command line by executing the following command:

conda install keras

I got the following error:

PackageNotFoundError: Package missing in current win-64 bit channels:
 - keras

To fix this issue, Follow these steps:

1) Check the latest Keras package from Anaconda website by visiting this link:

2) Select Keras library from the list, then copy the displayed command from the website:

conda install -c conda-forge keras=2.0.2

3) Run this command in the Anaconda command prompt window.

4) Keras library is installed and you will be able to start deep learning with Keras!


Monday, March 27, 2017

How to install and run Jupyter from your local computer for python development


If you are planning to program in Python from your local computer, the best development environment to code, instruct and visualize data is using Jupyter notebook.

I really like working with Jupyter notebook (aka IPython Notebook) for coding in Python, R programs.

As a lot of us download and look at ipynb files to use it in our applications. Instead of copy and paste code into Python console window, Jupyter notebook provides more interactive way to write code in Python and tons of other languages.

If you got a punch of ipynb files and would like to install and start working with Jupyter, follow these below steps:

1) Open command prompt window, write below command:

pip install jupyter notebook

2) After this installation is complete, navigate to the folder where you have set of ipynb files.

3) Run Jupyter notebook by executing the following command in the ipynb files folder:

jupyter notebook

4) A new browser window will open where it has jupyter files to start viewing or creating new ipynb files. Jupyter usually run on port 8888. The url for jupyter notebook looks like: http://localhost:8888/


Wednesday, March 01, 2017

How to set storage account connection string in Azure Functions


I was developing an Azure Function App that connects to an Azure blob storage. After setting up the binding for my blob storage account. I got the following error message when running my function app:

The error message in the screen shot above suggests three options to fix this. I will walk through how to implement the first option as one of the available solutions. The first solution is to set the connection string name in the appsettings.json file so it will look like this.

  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "",
    "AzureWebJobsDashboard": "",
    "AzureWebJobsmofunctions_STORAGE": "DefaultEndpointsProtocol=https;AccountName=mofunctions;AccountKey=KEY"

function.json (Just a section where i define my blob binding info)

      "type": "blob",
      "name": "iBlob",
      "path": "mydata/file1.csv",
      "connection": "mofunctions_STORAGE",
      "direction": "in"

You will notice that the connection name value in function.json is a suffix for AzureWebJobs key in the appsettings.json file.

Once you set this, Press F5 and you will be able to connect and read blob contents from Azure storage accounts.


Thursday, January 26, 2017

Mashing RDDs in Apache Spark from RDBMs perspective


Happy new year! This is my first post in 2017!. 2016 was amazing year for me. lots of work, projects and achievements. Looking forward to 2017.

I am writing this blog post to cover the standard techniques to work with Resilient Distributed Datasets (RDDs) to join data in Apache Spark.

I would like to share some insights when working with RDDs in Spark. That's related to how to work with multiple RDDs as we do when working with relational database management systems.

Apache Spark support joins in RDDs, where you can implement all kinds of joins that we are aware of in RDBMS. Below i will list how would you implement this on this platform.

Apache Spark Join Transformations Operations:

1) join: This is equivalent to inner join in RDBMs. It returns a new pair RDD with the elements containing all possible pairs of values from the first and second RDDs that has the same keys. For the keys that exist in only one of the the two RDDs. the resulting RDD will have no elements.

2) leftOuterJoin: This is equivalent to left outer join in RDBMs. The resulting RDD will also contain the elements for those keys that don't exist in the second RDD.

3) rightOuterJoin: This is equivalent to right outer join in RDBMs. The resulting RDD will also contain the elements for those keys that don't exist in the first RDD.

4) fullOuterJoin: This is equivalent to cross join in RDBMs. The resulting RDD will also contain the elements for both keys that exist in either RDDs.

In case of the RDDs contain duplicate keys, these keys will be joined multiple times.

Hope this helps!