Microsoft Certified ‘Azure Data Scientist Associate’ Badge

Tips to clear DP-100: Microsoft Data Scientist Associate Certification

Anirudh @ krysins.com
8 min readMar 6, 2022

In early Jan-2022, inspired by a friend, I decided to take a cloud certification course. The idea of training ML models on GPUs and also being able to deploy ML solutions in production was intriguing. I appeared and cleared the exam end of Feb-2022

To help me decide which certification (Microsoft, Amazon or Google) to go for, I got help from the LinkedIn post of Sanjana S. She does a great job analysing the 3 certifications on multiple criteria like ease of preparation, affordability, exam experience etc. Based on this writeup, I decided to go for Azure Data Scientist Associate exam.

I appeared for the online version of the exam on 27-Feb-2022 and thankfully cleared the DP-100 in one go. For my successful efforts, I have a certificate signed by none other than Satya Nadella ;).

My ‘Microsoft Certified Azure Data Scientist Associate’ certificate
My ‘Microsoft Certified Azure Data Scientist Associate’ certificate

In this post I will present the following content

  • Exam preparation duration
  • Exam preparation technique
  • The type of questions
  • The exam experience
  • The study notes

Note of caution: This course & certification gets you comfortable with the Azure platform for Machine Learning. Knowing ML is more or less a prerequisite (though there is a quick module on it as a part of learning objectives). While, I do not use ML as a part of my job, over last few years, I have taken a few courses on Coursera/Udacity/Udemy. I have also written a few ML ‘hello worlds’ and also created small prototypes. The concepts from these trainings were sufficient for this certification.

Exam preparation duration

I registered for the course on11-Jan-2022, registered for the exam on 31-Jan and wrote the exam on 27-Feb-2022. The turn around time of less than 2 months with studies in evenings and on weekends. In Jan and early Feb the study was light (I was not sure I will try for the certification) but yet I was regular (approx. 1 hours per day). In Feb the first three weeks were intense (2–3 hours) as I was creating notes (shared below) and last week was again easy i.e. < 1 hour per day

Exam preparation technique

Since this certification has no relationship to my daily work, low cost preparation was definitely an important criteria. I decided to not use any course but the free self paced course provided by Azure as my main study material.

First, I went through all the modules within the learning paths one by one.

The learning paths were

  • Create machine learning models: Regression, classification, clustering and deep learning examples to be run locally on your computer. Do not expect to learn about any ML techniques in depth here and modules here are only a refresher. Other way to look at it is that if you are comfortable with basic ML programs you can skip this. The concepts like Accuracy, Recall, AUC etc. are important from this section and very few questions came from this section in my exam. If one is comfortable with ML, one can skip this section.
  • Explore visual tools for machine learning: Azure provides two visual ways to write ML programs a. AutoML (no code) b. ML Designer (low code, if any). In this section you actually create Azure account and use the free 200$ credit to create, deploy and test ML programs. This was good fun activity to follow. As an analogy think of statistician who has done all calculations only using paper and calculator has been suddenly introduced to excel. The productivity will skyrocket. Personally, if someone asked me quickly create and deploy ML program. I would use the Azure ML techniques described here to get up and running in a short time. Quite a few questions came from this section of the exam.
  • Build and operate machine learning solutions with Azure Machine Learning: This is same section as above but using Python SDK. i.e. not only create your core ML model (e.g. to classify penguins) but also using python to create resources like dataset, compute (CPU / VirtualMachine / GPU) and deploy it. This is particularly useful in a production like scenario where you can perform continuous deployment of models. Please note that while one does not have to write any code in the exam there will be questions on APIs and the order in which they need to be called. During the practice exam that large number of questions come from this section, so I spent an inordinate amount of time creating notes that I have shared in the section below.
  • Build and operate machine learning solutions with Azure Databricks: Azure Databricks runs on top of a proprietary data processing engine called Databricks Runtime, an optimised version of Apache Spark. I clicked through the Jupyter notebook cells of this module to understand how it works and did not spend too much time here. I do not remember if I had any questions from this section.

Next, after completing the above learning path I registered for the exam. During the registration for the exam I also registered for so called ‘official question bank’ consisting of 130 questions. I would take custom test of 30 questions and review answers and prepare notes.

Side Note: While this question bank was very helpful in getting me ready for the exam I am not sure till now if spending 90–100$ was worth it or other (much cheaper) question banks available on internet would have sufficed. However, with the notes I am sharing I think one can skip this purchase (assuming other question banks are somewhat decent quality). Other negative points I noticed about this questions.

  1. Some questions were obsolete. e.g. questions about Basic Azure / Enterprise Azure which is no longer a valid concept
  2. I must have taken easily 10 exams of 10-30 question set and a 3 times full exam of 54 questions. Most of the questions after a few exams were repeats. It never felt that they have 130 unique questions.

Having said that these questions were instrumental in giving me idea of the exam questions and getting me ready. I attempted by first practice exam on 13-Feb-2022 and this is the time I realised I am not well prepared and went in full study mode for the next 2 weeks.

Types of questions

There were 5 type of exam questions

  1. Small text / code snippet and then answering 3 questions based on that. For example you want to deploy a low cost solution to make batch inferencing (predictions), followed by yes/no questions if the code shown works or not.
  2. Select single choice options
  3. Select multiple choice options
  4. Drop down selection of the parameters to be passed in to python function calls (knowing APIs is important)
  5. Arrange the sequence order to follow. These questions were tricky cause you select for example 3 out of 5 options provided and also place them in order. See example screenshot of such a question
Example of ‘select and order’ question
Example of ‘select and order’ question

Extra clarification: You do not have to code but you need to be not only be able to read Python but also remember parameters of some important functions. For example, the parameters to pass to create workspace object.

The exam experience

I gave the exam from the comforts of my home on a Sunday morning. The experience itself was smooth.

After logging in, you have to take photos with your phone of the desk with laptop on it and all the 4 walls from a distance (showing juxtaposition of your laptop).

Next, The online invigilator asks you to pick up your laptop and slowly move the camera around while he/she is checking the room. He/She will ask you remove anything within hands reach on the desk.

The entire session is recorded and you are being actively watched. When I was looking outside of the window (which was on the left of my desk) thinking over couple of questions, the invigilator asked me to stop looking away from the screen.

Also toilet breaks are not allowed for the duration of the exam. One has to be visible (and eyes on the screen) at all times.

The study notes

Finally, the section which perhaps is most useful for the people who are in midst of their preparation. I will provide link to my notes (google docs) which the readers can leverage in their journey.

Note: these notes are for revising for the exam and not for main learning purpose. For main learning refer to ‘Exam Preparation Technique’ section above.

There are two type of study notes (two different links):

  • Theory Notes: These are snippets of descriptions copied from various Azure pages (i.e. these are not interpretation notes but simply copy-paste) and the key points are highlighted which are relevant to remember from the exam point of view. Each section points to the original azure documentation from which the text was copied. So one can read the original text in entirety. I highly recommend reading the original Azure documentation of the link in entirety at least once.
  • Code Notes: There were many questions in the exam related to using Python SDK to train and deploy the models. The Azure documentation is a rabbit hole with different pages explaining different options in details and if one follows the links of these multitude of sub-options one forgets where one started from and what was the objective. Here I have listed in steps of ‘end-to-end’ deployment using Azure Python SDK with a back link to the main step. This way one can go back and forth. Also each section has link to original Azure documentation to see from where the code snippet was picked up. Please remember that Azure will release new versions of API over time and the code snippets might become obsolete so always reconfirm with the original Azure gitlab documentation (link provided within the note documentation). Here is a snapshot of the first page of the document to give you a flavour of how the notes are organised.
Snippet of the first page of “Azure Data Scientist End2End CodeFlow”
Snippet of the first page of “Azure Data Scientist End2End CodeFlow”

I hope this writeup was useful.

All the best to those preparing for the exam.

--

--

Anirudh @ krysins.com

To use my passion for learning and problem-solving to create innovative solutions that improve productivity and share my learnings to help others.