google professional machine learning engineer practice test

Professional Machine Learning Engineer

Last exam update: Dec 15 ,2024
Page 1 out of 4. Viewing questions 1-15 out of 60

Question 1

You recently joined an enterprise-scale company that has thousands of datasets. You know that there are accurate
descriptions for each table in BigQuery, and you are searching for the proper BigQuery table to use for a model you are
building on AI Platform. How should you find the data that you need?

  • A. Use Data Catalog to search the BigQuery datasets by using keywords in the table description.
  • B. Tag each of your model and version resources on AI Platform with the name of the BigQuery table that was used for training.
  • C. Maintain a lookup table in BigQuery that maps the table descriptions to the table ID. Query the lookup table to find the correct table ID for the data that you need.
  • D. Execute a query in BigQuery to retrieve all the existing table names in your project using the INFORMATION_SCHEMA metadata tables that are native to BigQuery. Use the result o find the table that you need.
Mark Question:
Answer:

B

User Votes:
A 1 votes
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 2

Your company manages a video sharing website where users can watch and upload videos. You need to create an ML
model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your
companys website. Which result should you use to determine whether the model is successful?

  • A. The model predicts videos as popular if the user who uploads them has over 10,000 likes.
  • B. The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.
  • C. The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.
  • D. The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.
Mark Question:
Answer:

C

User Votes:
A
50%
B
50%
C 1 votes
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 3

You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (PII) to
Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the
PII is not accessible by unauthorized individuals?

  • A. Stream all files to Google Cloud, and then write the data to BigQuery. Periodically conduct a bulk scan of the table using the DLP API.
  • B. Stream all files to Google Cloud, and write batches of the data to BigQuery. While the data is being written to BigQuery, conduct a bulk scan of the data using the DLP API.
  • C. Create two buckets of data: Sensitive and Non-sensitive. Write all data to the Non-sensitive bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket.
  • D. Create three buckets of data: Quarantine, Sensitive, and Non-sensitive. Write all data to the Quarantine bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non- Sensitive bucket.
Mark Question:
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D 1 votes
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
Jane
3 months, 4 weeks ago


Question 4

You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an
inventory prediction model. Your models features include region, location, historical demand, and seasonal popularity. You
want the algorithm to learn from new inventory data on a daily basis. Which algorithms should you use to build the model?

  • A. Classification
  • B. Reinforcement Learning
  • C. Recurrent Neural Networks (RNN)
  • D. Convolutional Neural Networks (CNN)
Mark Question:
Answer:

B


Explanation:
Reference: https://www.kdnuggets.com/2018/03/5-things-reinforcement-learning.html

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 5

You trained a text classification model. You have the following SignatureDefs:

You started a TensorFlow-serving component server and tried to send an HTTP request to get a prediction using:
headers = {"content-type": "application/json"}
json_response = requests.post('http: //localhost:8501/v1/models/text_model:predict', data=data, headers=headers)
What is the correct way to write the predict request?

  • A. data = json.dumps({“signature_name”: “seving_default”, “instances” [[‘ab’, ‘bc’, ‘cd’]]})
  • B. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’]]})
  • C. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’, ‘c’], [‘d’, ‘e’, ‘f’]]})
  • D. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’], [‘c’, ‘d’], [‘e’, ‘f’]]})
Mark Question:
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D 1 votes
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
Jane
3 months, 4 weeks ago

represents a vector with any number of rows but only 2 Columns


Question 6

You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on
Google Cloud to classify whether an image contains your companys product. Expecting the release of new products in the
near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You
also want to use AI Platforms continuous evaluation service to ensure that the models have high accuracy on your test
dataset. What should you do?

  • A. Keep the original test dataset unchanged even if newer products are incorporated into retraining.
  • B. Extend your test dataset with images of the newer products when they are introduced to retraining.
  • C. Replace your test dataset with images of the newer products when they are introduced to retraining.
  • D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.
Mark Question:
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 7

You have a functioning end-to-end ML pipeline that involves tuning the hyperparameters of your ML model using AI Platform,
and then using the best-tuned parameters for training. Hypertuning is taking longer than expected and is delaying the
downstream processes. You want to speed up the tuning job without significantly compromising its effectiveness. Which
actions should you take? (Choose two.)

  • A. Decrease the number of parallel trials.
  • B. Decrease the range of floating-point values.
  • C. Set the early stopping parameter to TRUE.
  • D. Change the search algorithm from Bayesian search to random search.
  • E. Decrease the maximum number of trials during subsequent training phases.
Mark Question:
Answer:

B D


Explanation:
Reference: https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 8

You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage
bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As
part of your CI/CD workflow, you want to automatically run a Kubeflow Pipelines training job on Google Kubernetes Engine
(GKE). How should you architect this workflow?

  • A. Configure your pipeline with Dataflow, which saves the files in Cloud Storage. After the file is saved, start the training job on a GKE cluster.
  • B. Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files. As soon as a file arrives, initiate the training job.
  • C. Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster.
  • D. Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job, check the timestamp of objects in your Cloud Storage bucket. If there are no new files since the last run, abort the job.
Mark Question:
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 9

You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per
second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across
multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency
without changing the underlying infrastructure. What should you do?

  • A. Significantly increase the max_batch_size TensorFlow Serving parameter.
  • B. Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
  • C. Significantly increase the max_enqueued_batches TensorFlow Serving parameter.
  • D. Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes.
Mark Question:
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 10

You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that
accepts or rejects insurance applications from potential customers. What factors should you consider before building the
model?

  • A. Redaction, reproducibility, and explainability
  • B. Traceability, reproducibility, and explainability
  • C. Federated learning, reproducibility, and explainability
  • D. Differential privacy, federated learning, and explainability
Mark Question:
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 11

Your team is building an application for a global bank that will be used by millions of customers. You built a forecasting
model that predicts customers account balances 3 days in the future. Your team will use the results in a new feature that will
notify users when their account balance is likely to drop below $25. How should you serve your predictions?

  • A. 1. Create a Pub/Sub topic for each user. 2. Deploy a Cloud Function that sends a notification when your model predicts that a users account balance will drop below the $25 threshold.
  • B. 1. Create a Pub/Sub topic for each user. 2. Deploy an application on the App Engine standard environment that sends a notification when your model predicts that a users account balance will drop below the $25 threshold.
  • C. 1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when the average of all account balance predictions drops below the $25 threshold.
  • D. 1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when your model predicts that a users account balance will drop below the $25 threshold.
Mark Question:
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D 1 votes
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
Jane
3 months, 4 weeks ago


Question 12

You started working on a classification problem with time series data and achieved an area under the receiver operating
characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You havent explored using any
sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the
problem?

  • A. Address the model overfitting by using a less complex algorithm.
  • B. Address data leakage by applying nested cross-validation during model training.
  • C. Address data leakage by removing features highly correlated with the target value.
  • D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Mark Question:
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 13

You are training an LSTM-based model on AI Platform to summarize text using the following job submission script:
gcloud ai-platform jobs submit training $JOB_NAME \
--package-path $TRAINER_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--job-dir $JOB_DIR \
--region $REGION \
--scale-tier basic \
-- \
--epochs 20 \
--batch_size=32 \
--learning_rate=0.001 \
You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What
should you do?

  • A. Modify the ‘epochs’ parameter.
  • B. Modify the ‘scale-tier’ parameter.
  • C. Modify the ‘batch size’ parameter.
  • D. Modify the ‘learning rate’ parameter.
Mark Question:
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 14

You are developing models to classify customer support emails. You created models with TensorFlow Estimators using
small datasets on your on-premises system, but you now need to train the models using large datasets to ensure high
performance. You will port your models to Google Cloud and want to minimize code refactoring and infrastructure overhead
for easier migration from on-prem to cloud. What should you do?

  • A. Use AI Platform for distributed training.
  • B. Create a cluster on Dataproc for training.
  • C. Create a Managed Instance Group with autoscaling.
  • D. Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster.
Mark Question:
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 15

Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They
need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they
use to track and report their experiments while minimizing manual effort?

  • A. Use Kubeflow Pipelines to execute the experiments. Export the metrics file, and query the results using the Kubeflow Pipelines API.
  • B. Use AI Platform Training to execute the experiments. Write the accuracy metrics to BigQuery, and query the results using the BigQuery API.
  • C. Use AI Platform Training to execute the experiments. Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
  • D. Use AI Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API.
Mark Question:
Answer:

B

User Votes:
A
50%
B
50%
C 1 votes
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
To page 2