Practice Free Microsoft DP-100 Exam Questions Online, DP-100 Free Demo

Question 91 Selectable Answer

You have a dataset that includes confidential data. You use the dataset to train a model.
You must use a differential privacy parameter to keep the data of individuals safe and private.
You need to reduce the effect of user data on aggregated results.
What should you do?

A. Decrease the value of the epsilon parameter to reduce the amount of noise added to the data
B. Increase the value of the epsilon parameter to decrease privacy and increase accuracy
C. Decrease the value of the epsilon parameter to increase privacy and reduce accuracy
D. Set the value of the epsilon parameter to 1 to ensure maximum privacy

Answer:
Explanation:
Differential privacy tries to protect against the possibility that a user can produce an indefinite number of reports to eventually reveal sensitive data. A value known as epsilon measures how noisy, or private, a report is. Epsilon has an inverse relationship to noise or privacy. The lower the epsilon, the more noisy (and private) the data is.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/concept-differential-privacy

Question 92 Written Answer

HOTSPOT
You need to build a feature extraction strategy for the local models.
How should you complete the code segment? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Answer:

Question 93 Selectable Answer

You use the Azure Machine Learning service to create a tabular dataset named training.data. You plan to use this dataset in a training script.
You create a variable that references the dataset using the following code:
training_ds = workspace.datasets.get("training_data")
You define an estimator to run the script.
You need to set the correct property of the estimator to ensure that your script can access the training.data dataset
Which property should you set?

A. inputs = [training_ds.as_named_input('training_ds')]
B. script_params = {"--training_ds":training_ds}
C. environment_definition = {"training_data":training_ds}
D. source_directory = training_ds

Answer:
Explanation:
Example:
# Get the training dataset
diabetes_ds = ws.datasets.get("Diabetes Dataset")
# Create an estimator that uses the remote compute
hyper_estimator = SKLearn(source_directory=experiment_folder,
inputs=[diabetes_ds.as_named_input('diabetes')], # Pass the dataset as an input
compute_target = cpu_cluster,
conda_packages=['pandas','ipykernel','matplotlib'],
pip_packages=['azureml-sdk','argparse','pyarrow'],
entry_script='diabetes_training.py')
Explanation:
Reference: https://notebooks.azure.com/GraemeMalcolm/projects/azureml-primers/html/04%20-%20Optimizing%20Model%20Training.ipynb

Question 94 Written Answer

HOTSPOT
You create a script for training a machine learning model in Azure Machine Learning service.
You create an estimator by running the following code:

For each of the following statements, select Yes if the statement is true. Otherwise, select No. NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: Yes
Parameter source_directory is a local directory containing experiment configuration and code files needed for a training job.
Box 2: Yes
script_params is a dictionary of command-line arguments to pass to the training script specified in entry_script.
Box 3: No
Box 4: Yes
The conda_packages parameter is a list of strings representing conda packages to be added to the Python environment for the experiment.

Question 95 Written Answer

HOTSPOT
You create an experiment in Azure Machine Learning Studio- You add a training dataset that contains 10.000 rows. The first 9.000 rows represent class 0 (90 percent). The first 1.000 rows represent class 1 (10 percent).
The training set is unbalanced between two Classes. You must increase the number of training examples for class 1 to 4,000 by using data rows. You add the Synthetic Minority Oversampling Technique (SMOTE) module to the experiment.
You need to configure the module.
Which values should you use? To answer, select the appropriate options in the dialog box in the answer area. NOTE: Each correct selection is worth one point.

Answer:

Question 96 Selectable Answer

You run a script as an experiment in Azure Machine Learning.
You have a Run object named run that references the experiment run. You must review the log files that were generated during the experiment run.
You need to download the log files to a local folder for review.
Which two code segments can you run to achieve this goal? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.

A. run.get_details()
B. run.get_file_names()
C. run.get_metrics()
D. run.download_files(output_directory='./runfiles')
E. run.get_all_logs(destination='./runlogs')

Answer:
Explanation:
The run Class get_all_logs method downloads all logs for the run to a directory.
The run Class get_details gets the definition, status information, current log files, and other details of the run.
Reference: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run(class)

Question 97 Selectable Answer

1. Topic 1, Case Study 1

Overview
You are a data scientist in a company that provides data science for professional sporting events.
Models will be global and local market data to meet the following business goals:
• Understand sentiment of mobile device users at sporting events based on audio from crowd reactions.
• Access a user's tendency to respond to an advertisement.
• Customize styles of ads served on mobile devices.
• Use video to detect penalty events.

Current environment
Requirements
• Media used for penalty event detection will be provided by consumer devices. Media may include images and videos captured during the sporting event and snared using social media. The images and videos will have varying sizes and formats.
• The data available for model building comprises of seven years of sporting event media. The sporting event media includes: recorded videos, transcripts of radio commentary, and logs from related social media feeds feeds captured during the sporting events.
• Crowd sentiment will include audio recordings submitted by event attendees in both mono and stereo Formats.

Advertisements
• Ad response models must be trained at the beginning of each event and applied during the sporting event.
• Market segmentation nxxlels must optimize for similar ad resporr.r history.
• Sampling must guarantee mutual and collective exclusivity local and global segmentation models that share the same features.
• Local market segmentation models will be applied before determining a user’s propensity to respond to an advertisement.
• Data scientists must be able to detect model degradation and decay.
• Ad response models must support non linear boundaries features.
• The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviates from 0.1 +/-5%.
• The ad propensity model uses cost factors shown in the following diagram:

• The ad propensity model uses proposed cost factors shown in the following diagram:

Performance curves of current and proposed cost factor scenarios are shown in the following diagram:

Penalty detection and sentiment
Findings
• Data scientists must build an intelligent solution by using multiple machine learning models for penalty event detection.
• Data scientists must build notebooks in a local environment using automatic feature engineering and model building in machine learning pipelines.
• Notebooks must be deployed to retrain by using Spark instances with dynamic worker allocation
• Notebooks must execute with the same code on new Spark instances to recode only the source of the data.
• Global penalty detection models must be trained by using dynamic runtime graph computation during training.
• Local penalty detection models must be written by using BrainScript.
• Experiments for local crowd sentiment models must combine local penalty detection data.
• Crowd sentiment models must identify known sounds such as cheers and known catch phrases. Individual crowd sentiment models will detect similar sounds.
• All shared features for local models are continuous variables.
• Shared features must use double precision. Subsequent layers must have aggregate running mean and standard deviation metrics Available.

segments
During the initial weeks in production, the following was observed:
• Ad response rates declined.
• Drops were not consistent across ad styles.
• The distribution of features across training and production data are not consistent.
Analysis shows that of the 100 numeric features on user location and behavior, the 47 features that come from location sources are being used as raw features. A suggested experiment to remedy the bias and variance issue is to engineer 10 linearly uncorrected features.

Penalty detection and sentiment
• Initial data discovery shows a wide range of densities of target states in training data used for crowd sentiment models.
• All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD) are running too stow.
• Audio samples show that the length of a catch phrase varies between 25%-47%, depending on region.
• The performance of the global penalty detection models show lower variance but higher bias when comparing training and validation sets. Before implementing any feature changes, you must confirm the bias and variance using all training and validation cases.

You need to implement a model development strategy to determine a user’s tendency to respond to an ad.
Which technique should you use?

A. Use a Relative Expression Split module to partition the data based on centroid distance.
B. Use a Relative Expression Split module to partition the data based on distance travelled to the event.
C. Use a Split Rows module to partition the data based on distance travelled to the event.
D. Use a Split Rows module to partition the data based on centroid distance.

Answer:
Explanation:
Split Data partitions the rows of a dataset into two distinct sets.
The Relative Expression Split option in the Split Data module of Azure Machine Learning Studio is helpful when you need to divide a dataset into training and testing datasets using a numerical expression.
Relative Expression Split: Use this option whenever you want to apply a condition to a number column. The number could be a date/time field, a column containing age or dollar amounts, or even a percentage. For example, you might want to divide your data set depending on the cost of the items, group people by age ranges, or separate data by a calendar date.
Scenario:
Local market segmentation models will be applied before determining a user’s propensity to respond to an advertisement.
The distribution of features across training and production data are not consistent
References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-
data

Question 98 Written Answer

HOTSPOT
You are creating a machine learning model in Python. The provided dataset contains several numerical columns and one text column. The text column represents a product's category.
The product category will always be one of the following:
✑ Bikes
✑ Cars
✑ Vans
✑ Boats
You are building a regression model using the scikit-learn Python package.
You need to transform the text data to be compatible with the scikit-learn Python package.
How should you complete the code segment? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Answer:

Explanation:
Box 1: pandas as df
Pandas takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows and columns called data frame that looks very similar to table in a statistical software (think Excel or SPSS for example.
Box 2: transpose[ProductCategoryMapping]
Reshape the data from the pandas Series to columns.

Question 99 Written Answer

DRAG DROP
You have an Azure Machine Learning workspace that contains a CPU-based compute cluster and an Azure Kubernetes Services (AKS) inference cluster. You create a tabular dataset containing data that you plan to use to create a classification model.
You need to use the Azure Machine Learning designer to create a web service through which client applications can consume the classification model by submitting new data and getting an immediate prediction as a response.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Answer:

Explanation:
Step 1: Create and start a Compute Instance
To train and deploy models using Azure Machine Learning designer, you need compute on which to run the training process, test the model, and host the model in a deployed service.
There are four kinds of compute resource you can create:
Compute Instances: Development workstations that data scientists can use to work with data and models.
Compute Clusters: Scalable clusters of virtual machines for on-demand processing of experiment code.
Inference Clusters: Deployment targets for predictive services that use your trained models.
Attached Compute: Links to existing Azure compute resources, such as Virtual Machines or Azure Databricks clusters.
Step 2: Create and run a training pipeline..
After you've used data transformations to prepare the data, you can use it to train a machine learning model. Create and run a training pipeline
Step 3: Create and run a real-time inference pipeline
After creating and running a pipeline to train the model, you need a second pipeline that performs the same data transformations for new data, and then uses the trained model to inference (in other words, predict) label values based on its features. This pipeline will form the basis for a predictive service that you can publish for applications to use.

Question 100 Selectable Answer

You create a machine learning model by using the Azure Machine Learning designer. You publish the model as a real-time service on an Azure Kubernetes Service (AKS) inference compute cluster. You make no changes to the deployed endpoint configuration.
You need to provide application developers with the information they need to consume the endpoint.
Which two values should you provide to application developers? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

A. The name of the AKS cluster where the endpoint is hosted.
B. The name of the inference pipeline for the endpoint.
C. The URL of the endpoint.
D. The run ID of the inference pipeline experiment for the endpoint.
E. The key for the endpoint.

Answer:
Explanation:
Deploying an Azure Machine Learning model as a web service creates a REST API endpoint. You can send data to this endpoint and receive the prediction returned by the model.
You create a web service when you deploy a model to your local environment, Azure Container Instances, Azure Kubernetes Service, or field-programmable gate arrays (FPGA). You retrieve the URI used to access the web service by using the Azure Machine Learning SDK. If authentication is enabled, you can also use the SDK to get the authentication keys or tokens.
Example:
# URL for the web service
scoring_uri = '<your web service URI>'
# If the service is authenticated, set the key or token key = '<your key or token>'
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-consume-web-service

Question 101 Selectable Answer

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Learning learning Studio.
One class has a much smaller number of observations than the other classes in the training
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Does the solution meet the goal?

A. Yes
B. No

Answer:
Explanation:
SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.
References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote

Question 102 Selectable Answer

You create a pipeline in designer to train a model that predicts automobile prices.
Because of non-linear relationships in the data, the pipeline calculates the natural log (Ln) of the prices in the training data, trains a model to predict this natural log of price value, and then calculates the exponential of the scored label to get the predicted price.
The training pipeline is shown in the exhibit. (Click the Training pipeline tab.)

Training pipeline

You create a real-time inference pipeline from the training pipeline, as shown in the exhibit. (Click the Real-time pipeline tab.)

Real-time pipeline

You need to modify the inference pipeline to ensure that the web service returns the exponential of the scored label as the predicted automobile price and that client applications are not required to include a price value in the input values.
Which three modifications must you make to the inference pipeline? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

A. Connect the output of the Apply SQL Transformation to the Web Service Output module.
B. Replace the Web Service Input module with a data input that does not include the price column.
C. Add a Select Columns module before the Score Model module to select all columns other than price.
D. Replace the training dataset module with a data input that does not include the price column.
E. Remove the Apply Math Operation module that replaces price with its natural log from the data flow.
F. Remove the Apply SQL Transformation module from the data flow.

Answer:

Question 103 Selectable Answer

You plan to use a Data Science Virtual Machine (DSVM) with the open source deep learning frameworks Caffe2 and Theano. You need to select a pre configured DSVM to support the framework.
What should you create?

A. Data Science Virtual Machine for Linux (CentOS)
B. Data Science Virtual Machine for Windows 2012
C. Data Science Virtual Machine for Windows 2016
D. Geo AI Data Science Virtual Machine with ArcGIS
E. Data Science Virtual Machine for Linux (Ubuntu)

Answer:

Question 104 Written Answer

HOTSPOT
You create a binary classification model using Azure Machine Learning Studio.
You must use a Receiver Operating Characteristic (RO C) curve and an F1 score to evaluate the model.
You need to create the required business metrics.
How should you complete the experiment? To answer, select the appropriate options in the dialog box in the answer area. NOTE: Each correct selection is worth one point.

Answer:

Question 105 Selectable Answer

You are building a regression model tot estimating the number of calls during an event.
You need to determine whether the feature values achieve the conditions to build a Poisson regression model.
Which two conditions must the feature set contain? I ach correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

A. The label data must be a negative value.
B. The label data can be positive or negative,
C. The label data must be a positive value
D. The label data must be non discrete.
E. The data must be whole numbers.

Answer:
Explanation:
Poisson regression is intended for use in regression models that are used to predict numeric values, typically counts.
Therefore, you should use this module to create your regression model only if the values you are trying to predict fit the following conditions:
✑ The response variable has a Poisson distribution.
✑ Counts cannot be negative. The method will fail outright if you attempt to use it with negative labels.
✑ A Poisson distribution is a discrete distribution; therefore, it is not meaningful to use this method with non-whole numbers.
References: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/poisson-
regression

Test Online Free Microsoft DP-100 Exam Questions and Answers