Friday, October 14, 2016

Introducing Power BI Embedded talk at Cloud Summit

Hi All,

Earlier today, I have presented "Introducing Power BI Embedded" top that covers platform capabilities and tools in Cloud Summit event at Microsoft Chevy Chase office.

The session covered Power BI Platform capabilities, tools and Power BI Embedded as PaaS option in Microsoft Cloud platform.

I have got a lot of questions about Power BI data set scheduling, working with data capabilities including direct queries vs. import options while authoring reports in Power BI desktop. I also covered the need for Power BI Gateway for hybrid scenarios.


Thursday, October 13, 2016

Fixing powerbi.d.ts missing modules errors in Visual Studio 2015


While i was working on Visual Studio 2015; I have got this errors due to missing modules in powerbi.d.ts file. I have an ASP.NET MVC project that uses Power BI Embedded and i would like to get this application up and running but i am getting these errors while building my app.

These errors are due to missing TypeScript tools for Visual Studio 2015. Once you install them, you will be able to run your app and all these errors disappear.

To fix this problem, follow these steps:

  • Open Tools | Extensions and Updates.
  • Select Online in the tree on the left.
  • Search for TypeScript using the search box in the upper right.
  • Select the most current available TypeScript version.
  • Download and install the package.
  • Build your project!

Hope this helps.

Monday, September 19, 2016

Extending Product Outreach with Outlook Connectors

Hi All,

I presented last Saturday at SharePoint Detroit a talk with title "Extending Product Outreach with Outlook Connectors"; Since i covered how to utilize office 365 groups to extend product outreach using outlook group connectors with demos.

Session Description:

Office 365 Connectors is a brand new experience that delivers relevant interactive content and updates from popular apps and services to Office 365 Groups. We are now bringing this experience to you, our Office 365 customers. Whether you are tracking a Twitter feed, managing a project with Trello or watching the latest news headlines with Bing—Office 365 Connectors surfaces all the information you care about in the Office 365 Groups shared inbox, so you can easily collaborate with others and interact with the updates as they happen. Session will cover how to build your office 365 connectors and how to work with Microsoft to help you build your company one.

Thursday, September 08, 2016

Build Intelligent Microservices Solutions using Azure

Hi All,

I had the pleasure last night to present at one of our local user groups to talk about building intelligent microservices in Azure.

The session covers in detail how to build intelligent microservices solutions using Cloud Services including web and worker roles, Azure App Service features in Azure & Service Fabric. The session was a demo driven and i demonstrated how to design and provision complete end-to-end solutions using cloud services using web roles, worker roles and service bus in Azure.
I also covered Azure App Service capabilities that help developers to scale and monitor production applications; in addition to setup continuous deployment.

Session objectives and takeaways:

  1. Benefits of creating micro services in the cloud
  2. End-To-End Use case for building cloud service with web & worker roles with service bus integration
  3. Azure App Service intelligent features including troubleshooting, CI, back up, routing, scheduling & other features
  4. Azure Service Fabric microservices platform

The presentation is posted below.

Wednesday, August 31, 2016

Building Big Data Solutions in Azure Data Platform @ Data Science MD

Hi All,

Yesterday i was at Johns Hopkins University in Laurel, MD presenting how to build big data solutions in Azure. The presentation was focused on the underling technologies and tools that are needed to build end to end big data solutions in the cloud. I presented the capabilities that Azure offers out of the box in addition to cluster types and tiers that are available for ISVs and developers.

The session covers the following:

1) What HDInsight cluster offers in hadoop ecosystem technology stack.
2) HDInsight cluster tiers and types.
3) HDInsight developer tools in Visual Studio 2015, HDInsight developer tools.
4) Working with HBase databases and Hive View, deploying Hive apps from Visual Studio.
5) Building, Debugging and Deploying Storm Apps into Storm clusters.
6) Working with Spark clusters using Jupyter, PySpark.

Session Title: Building Big Data Solutions in Azure Data Platform

Session Details:
The session covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop HDFS, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.

Friday, August 26, 2016

Study notes for exam 70-475: Designing and Implementing Big Data Analytics Solutions

Hi All,

Today I passed the "Designing and Implementing Big Data Analytics Solutions" Microsoft exam.

I have been preparing for this exam (70-475) for a couple of months and I have been using Hadoop ecosystem tools and platforms for awhile.

I wanted to master building big data analytics solutions using HDInsight clusters using Hadoop ecosystem which contains: Storm, Spark, HBase, Hive and HDFS. I worked to cover any gap in understanding I had in Azure Data Lake, ML, Python & R programming and Azure Machine Learning.

This exam covers the following primarily four main technologies (from most covered to least):

1) Hadoop ecosystem: Working with HDFS, HBase, Hive, Storm, Spark and understanding Lambda Architecture. If you want to know more about Lambda Architecture, read my blog post explaining it here.

2) Azure Machine Learning: building/training models, predictive models, classification vs regression vs clustering, recommender algorithms. building custom models, Executing code in R and Python. Ingesting data from Azure Event Hub & transformation in Stream analytics.

3) Azure Data Lake: building pipeline, activities, linked services, move, transform and analyze data, working with storage options in Azure (blob vs block) & tools to transform data.

4) SQL Server and Azure SQL: Security in transit and at rest, SQL Data Warehouse. Working with R in Sql Server 2016/Azure SQL.

My study notes while preparing to pass this test:

1) To protect data at rest as well as querying in Azure SQL Database: Use "Always Encrypted" to make sure data in transit is encrypted. Use "Transparent Data Encryption" to make sure that data at rest is encrypted. Read more about TDE here. Read more about Always Encrypted feature here.

2) When running an Azure ML experiment and you are getting "Out of memory error" here is how to fix it:
   a) Increase the memory settings for the map and reduce operations in the import module.
   b) Use Hive query to limit the amount of data being processed in the import module.

3) The easiest way to manage Hadoop clusters in Azure is to assign every HDInsight cluster to a resource group and to apply tags to all related resources.

4) In Hadoop, When the data is row-based, self-describing with schema and provide compact binary data serialization: it is recommended to use Avro.

5) Which Hadoop cluster type for query and analysis batch jobs:
     a) Spark: A cluster for In-memory processing, interactive queries, and micro-batch stream processing.
     b) Storm: A real-time event processing.
     c) HBase: NoSQL data storage for big data systems.

6) Importing data using Pyhon in Azure ML tips:
    a) Missing values are converted into NA for processing. NA will be converted back to missing values when converted back to datasets.
    b) Azure Dataset are converted to data frames in Pandas. Pandas module is used to work with data in Python.
    c) Number names columns are not ignored. str() function is applied to those.
    d) Duplicate column names are not ignored. The duplicate column names are modified to make sure they have unique names.

7) The only platform that supports ACID transaction in Hadoop file storage options is Apache Orc.

8) You have three utilities you can use to move data from local storage to managed cluster blob storage. These tools are: Azure CLI, PowerShell & AzCopy.

9) How to improve Hive queries using static vs dynamic partitioning, read more here.

10) Understand when to use Filter based Feature Selection in Azure ML.

11) AzureML requires Python to store visualizations as PNG Files. To configure MatPlotLib in AzureML, you should configure it to use AGG backend for rendering and you should save charts as PNG files.

12) To detect potential SQL injection attempts on Azure SQL database in ADL cluster: Enable Threat Detection.

13) To create synthetic samples of dataset for classes that are under represented: use SMOTE module in AzureML.

14) D14 V2 Virtual Machines in Azure supports 100GB in memory processing.

15) You can add multiple contributors to AzureML workspace as users.

16) Understand the minimum requirements for each cluster type in HDInsight;
       a) At least 1 data node for Hadoop cluster type.
       b) At least 1 region server for HBase cluster type.
       c) Two Nimbus nodes for Storm cluster type.
       d) At least 1 worker role for Spark cluster type.

17) If you want to store a file with a file size is greater than 1 TB, you need to use Azure Data Lake Store.

18) In Azure Data Factory (ADF), you can train, score and publish experiments to AzureML using:
      a) AzureML Batch execution: to train and score.
      b) AzureML Update resource activity: to update AzureML web services.

19) In Azure Data Factory (ADF), A pipeline is used to configure several activities, including the sequence and timing activities in a pipeline can be managed as a unit.

20) Working with R models in SQL Server 2016/AzureSQL: read more here.

21) Apache Spark in HDInsight can read files from Azure blob storage (WASB) but not SQL Server.

22) Always Encrypted protects data in transit and at rest will be encrypted. Also this feature allows you to store encryption keys on premise.

23) Transparent Data encryption (TDE) : secure data at rest, it will not protect data in transit and the keys are stored in the cloud.

24) Distcp is a Hadoop tool to copy data to and from HDInsight clusters storage blob into Azure Data lake store.

25) Adlcopy: is a command line utility to copy data from azure blob storage into azure data lake storage account.

26) AzCopy: A tool to copy data from and to Azure blob storage.

27) While working with large binary files and you would like to optimize the speed of AzureML experiment, you can do the following:
      a) Developers should write data as block blob.
      b) The blob format should be in CSV or TSV.
      c) You should NOT turn off the cached results option.
      d) You can NOT filter data using SQL but R language.

28) SQL DB contributor role allows monitoring and auditing of SQL databases without granting permissions to modify security or audit policies.

29) To process data in HDInsight clusters in Azure Data Factory (ADF):
      a) Add a new item to the pipeline in the solution explorer.
      b) Select Hive Transformation.
      c) Construct JSON to process the cluster data in an activity.

30) Understanding Tumbling vs Hopping vs Sliding Windows in Azure Stream Analytics. (link)

Hope this helps you get ready to pass the test, and good luck everyone!
Let's get all certified ya'll data wranglers :-)

-- ME

1) Microsot Exam 70-475 details, skills measured and more:

Thursday, August 25, 2016

GIT 101 in Visual Studio Team Services (VSTS)

Hi All,

I have been working with multiple developers on sharing project code using git. I found out that git is new to a lot of developers who have been using Visual Studio Team Foundation Server (TFS), Visual Studio Online (VSO aka VSTS now), or any other centralized source control system.

What is the difference between TFS/VSO/VSTS versus Git?

If you have been using TFS, VSTS or VSO, those all fall under Team foundation Version Control (TFVC) which is a centralized source control system.

While git, is a distributed source control system (DVCS). which means: you have a local and remote code repositories. you can commit your code to you local repo but not remote repo (unless you want to). Also, you can share your code to the remote repo so other team members can get these changes.
This is a fundamental concept to understand when working with git. git is distributed, contains local and remote repos & works offline and it is a great way to enable collaborations among developers.

**Popular git platforms: GitHub, VSTS, Bitbucket, GitLab, RhodeCode and others.

This article is focusing on managing multiple developers code working in a team & what is the git best practices around that, This also applies to any other git platform. For the sake of simplicity, This article will be focusing on using Git in VSTS.

Basic terminology and keywords to know when working with Git:

1) A branch: In Git, every developer should have his own branch. you write code and commit your changes into your local branch. To sync with other developers, get latest from the master branch and merge it into yours so you make sure everything is compiling & working before creating a new pull request to the master branch (by merging back your code into master).

2) Fetch: It download changes to your local branch from the remote branch. Fetch downloads these commits and adds them to the local repo without updating your local branch. To update your local branch either by executing merge or pull requests to your local repo to be up to date with its remote.

3) Pull: Get updates from your remote branch into your local branch. basically keeps your branch up to date with its remote one. Pull does a fetch and then a merge into your local branch.
So just use Pull to get your local branch updates from its remote one.

3) Pull vs Fetch: git pull does a git fetch. so if you used git pull this means that your have executed git fetch.  you can execute fetch if you want to get the updates but do not want to merge them into yor local branch yet.

5) Push: sends committed changes to remote branch so it is shared with others.

Basic rules to work with git in Visual Studio that everyone should be aware of before start coding:
This section i cover all needed actions to work with using git in Visual Studio Team Explorer window.

1) You need to click on Sync in team explorer to refresh the current branch from the remote branch. followed by pull to get those changes merged into the current local branch. sync just show status but to actually merge those changes you need to click on pull link.

2)  You need to click on Changes in team explorer every time you want to check in or get latest updates of the current branch.

3) You need to click on Branches in team explorer every time you want to manage branches in Visual Studio.

4) You need to click on Pull Requests in team explorer every time you want to manage pull requests in Visual Studio.

A) Setup a project for your team using Git in VSTS:

1) Visit
2) Login to your account.
3) Click on New button to create a new git project.

4) Once you hit create project button, the project will be created in few seconds and then we will use Visual Studio to do some necessary steps.

5) To start using Visual Studio with the created project, Click on Code tab for the newly created project "MicrosoftRocks".

6) Click on Clone in Visual Studio button.
7) This will open up VS and then open up Team Explorer window.
8) You need to click on "Clone this repository" this will allow you to create a local repo of the remote repo we have just created in VSTS.

9) Select a local repo folder and click on Clone.

10) Now, VS shows a message that we can create a new project or solution.

11) You can go ahead and create any project in VS, the only thing to notice to uncheck create a new Git repository checkbox when creating any new project since we have created already our local repo.

12) First things first, you need to exclude bin and debug folders from getting checked in Git. So, click on Settings in Team explorer -- > Click on Repository Settings link under Git --> Click on Add link to add .gitignore file.

13) To edit .gitignore file, click on edit link. Then, add the following at the bottom of the file:

# exclude bin and debug folders

14) Build the project and then we will do our first check in to the master branch.

15) Click on Home icon in team explorer to go back to the home page to manage source control options.

16)  Click on Settings, Type a check in message and then click on Commit Staged.

17) Commit staged action has check in all our changes to our local repo, these changes have not been share to the remote, so we need to click on sync to share it with others.
You will notice, that VS shows you a sync link afterwards so you can sync changes immediately or your can click on Sync from team explorer and then click on Push.

18) Now, the project is ready in the master branch for everyone with git ignore file before everyone will create his own branch and start developing.

B) Create your own branch in Visual Studio:

1) Every developer in a team, should create his own branch and get the latest from master to start developing in our project.

2) From Visual Studio, Click on master branch from the bottom bar and click on new branch.

3) Enter your branch name "dev1" and from which branch your want to create yours "master" and then click on Create Branch button. This step will create your own branch and get latest from master and switch to your branch to start coding in it.

4) You will notice, the name of the current branch has changed from master to dev1 in Visual Studio. now you can start working in your branch.

5) Once you are done coding a feature or at a good point to check in some code, Follow these steps to check in your changes:
  • Click on Changes in team explorer, write a message and then click on Commit All button.
  • You can also click on sync to push these changes to the remote branch in VSTS online.
  • Remember, these changes are still in your branch no one else has seen it until you submit it to the master branch.

6) Publish your branch: It is important to publish your branch to VSTS, Follow these steps:
  • From team explorer, click on branches.
  • Right click on your branch.
  • Click on Publish Branch.

C) How to submit your code to the master branch:

1) First, you need to make sure that your local master branch is up to date. to do that, switch to master branch and click on sync and then click on pull in Team explorer window.

2) Second, Switch back to your branch "dev1" and then click on branches in team explorer.

3) Click on Merge link.

4) Select to merge from "master" into "dev1" and then click on Merge. This step will merge all master changes into your branch so your branch will get other people work and fix any conflict (if any) before submitting all changes to master using Pull Request (PR) action.

5) Now, we need to submit all these changes after making sure there are no conflicts to the master branch. Click on Pull Requests in Team explorer.

6) Click on New Pull Request link.

7) This will open Visual Studio online webpage to submit new pull request.
8) Click on New Pull Request button.

9) Submitted Pull Requests (PRs) will be either approved or rejected by the repository admins. unless you are an admin, you will be able to approve/reject and complete submitted PRs in any project and therefore these changes are committed/merged to the master branch.

10) Click on Complete button to complete the pull request. Visual Studio will prompt a popup window if you want to add any notes and then click on Complete merge button. This is the last step to merge your changes into master after your PR has been approved.

11) Repeat the same steps every time you want to merge your changes into master using PRs.

Hope this article has shown in detailed walk-though how to work in a team using Git in Visual Studio Team Services and manage your code checkins/checkouts/merge/branching and PRs in Git.


-- ME