google professional machine learning engineer practice test
Professional Machine Learning Engineer
Last exam update: Dec 15 ,2024
Page 1 out of 4. Viewing questions 1-15 out of 60
Question 1
You recently joined an enterprise-scale company that has thousands of datasets. You know that there are accurate descriptions for each table in BigQuery, and you are searching for the proper BigQuery table to use for a model you are building on AI Platform. How should you find the data that you need?
A.
Use Data Catalog to search the BigQuery datasets by using keywords in the table description.
B.
Tag each of your model and version resources on AI Platform with the name of the BigQuery table that was used for training.
C.
Maintain a lookup table in BigQuery that maps the table descriptions to the table ID. Query the lookup table to find the correct table ID for the data that you need.
D.
Execute a query in BigQuery to retrieve all the existing table names in your project using the INFORMATION_SCHEMA metadata tables that are native to BigQuery. Use the result o find the table that you need.
Answer:
B
User Votes:
A 1 votes
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 2
Your company manages a video sharing website where users can watch and upload videos. You need to create an ML model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your companys website. Which result should you use to determine whether the model is successful?
A.
The model predicts videos as popular if the user who uploads them has over 10,000 likes.
B.
The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.
C.
The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.
D.
The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.
Answer:
C
User Votes:
A
50%
B
50%
C 1 votes
50%
D
50%
Discussions
0/ 1000
Question 3
You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (PII) to Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the PII is not accessible by unauthorized individuals?
A.
Stream all files to Google Cloud, and then write the data to BigQuery. Periodically conduct a bulk scan of the table using the DLP API.
B.
Stream all files to Google Cloud, and write batches of the data to BigQuery. While the data is being written to BigQuery, conduct a bulk scan of the data using the DLP API.
C.
Create two buckets of data: Sensitive and Non-sensitive. Write all data to the Non-sensitive bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket.
D.
Create three buckets of data: Quarantine, Sensitive, and Non-sensitive. Write all data to the Quarantine bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non- Sensitive bucket.
Answer:
A
User Votes:
A
50%
B
50%
C
50%
D 1 votes
50%
Discussions
0/ 1000
Jane
3 months, 4 weeks ago
Question 4
You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model. Your models features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory data on a daily basis. Which algorithms should you use to build the model?
You trained a text classification model. You have the following SignatureDefs:
You started a TensorFlow-serving component server and tried to send an HTTP request to get a prediction using: headers = {"content-type": "application/json"} json_response = requests.post('http: //localhost:8501/v1/models/text_model:predict', data=data, headers=headers) What is the correct way to write the predict request?
A.
data = json.dumps({“signature_name”: “seving_default”, “instances” [[‘ab’, ‘bc’, ‘cd’]]})
B.
data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’]]})
C.
data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’, ‘c’], [‘d’, ‘e’, ‘f’]]})
D.
data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’], [‘c’, ‘d’], [‘e’, ‘f’]]})
Answer:
C
User Votes:
A
50%
B
50%
C
50%
D 1 votes
50%
Discussions
0/ 1000
Jane
3 months, 4 weeks ago
represents a vector with any number of rows but only 2 Columns
Question 6
You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your companys product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platforms continuous evaluation service to ensure that the models have high accuracy on your test dataset. What should you do?
A.
Keep the original test dataset unchanged even if newer products are incorporated into retraining.
B.
Extend your test dataset with images of the newer products when they are introduced to retraining.
C.
Replace your test dataset with images of the newer products when they are introduced to retraining.
D.
Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.
Answer:
C
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 7
You have a functioning end-to-end ML pipeline that involves tuning the hyperparameters of your ML model using AI Platform, and then using the best-tuned parameters for training. Hypertuning is taking longer than expected and is delaying the downstream processes. You want to speed up the tuning job without significantly compromising its effectiveness. Which actions should you take? (Choose two.)
A.
Decrease the number of parallel trials.
B.
Decrease the range of floating-point values.
C.
Set the early stopping parameter to TRUE.
D.
Change the search algorithm from Bayesian search to random search.
E.
Decrease the maximum number of trials during subsequent training phases.
You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to automatically run a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?
A.
Configure your pipeline with Dataflow, which saves the files in Cloud Storage. After the file is saved, start the training job on a GKE cluster.
B.
Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files. As soon as a file arrives, initiate the training job.
C.
Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster.
D.
Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job, check the timestamp of objects in your Cloud Storage bucket. If there are no new files since the last run, abort the job.
Answer:
C
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 9
You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?
A.
Significantly increase the max_batch_size TensorFlow Serving parameter.
B.
Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
C.
Significantly increase the max_enqueued_batches TensorFlow Serving parameter.
D.
Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes.
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 10
You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers. What factors should you consider before building the model?
A.
Redaction, reproducibility, and explainability
B.
Traceability, reproducibility, and explainability
C.
Federated learning, reproducibility, and explainability
D.
Differential privacy, federated learning, and explainability
Answer:
A
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 11
Your team is building an application for a global bank that will be used by millions of customers. You built a forecasting model that predicts customers account balances 3 days in the future. Your team will use the results in a new feature that will notify users when their account balance is likely to drop below $25. How should you serve your predictions?
A.
1. Create a Pub/Sub topic for each user. 2. Deploy a Cloud Function that sends a notification when your model predicts that a users account balance will drop below the $25 threshold.
B.
1. Create a Pub/Sub topic for each user. 2. Deploy an application on the App Engine standard environment that sends a notification when your model predicts that a users account balance will drop below the $25 threshold.
C.
1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when the average of all account balance predictions drops below the $25 threshold.
D.
1. Build a notification system on Firebase. 2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when your model predicts that a users account balance will drop below the $25 threshold.
Answer:
A
User Votes:
A
50%
B
50%
C
50%
D 1 votes
50%
Discussions
0/ 1000
Jane
3 months, 4 weeks ago
Question 12
You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You havent explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?
A.
Address the model overfitting by using a less complex algorithm.
B.
Address data leakage by applying nested cross-validation during model training.
C.
Address data leakage by removing features highly correlated with the target value.
D.
Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
Answer:
B
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 13
You are training an LSTM-based model on AI Platform to summarize text using the following job submission script: gcloud ai-platform jobs submit training $JOB_NAME \ --package-path $TRAINER_PACKAGE_PATH \ --module-name $MAIN_TRAINER_MODULE \ --job-dir $JOB_DIR \ --region $REGION \ --scale-tier basic \ -- \ --epochs 20 \ --batch_size=32 \ --learning_rate=0.001 \ You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?
A.
Modify the ‘epochs’ parameter.
B.
Modify the ‘scale-tier’ parameter.
C.
Modify the ‘batch size’ parameter.
D.
Modify the ‘learning rate’ parameter.
Answer:
C
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 14
You are developing models to classify customer support emails. You created models with TensorFlow Estimators using small datasets on your on-premises system, but you now need to train the models using large datasets to ensure high performance. You will port your models to Google Cloud and want to minimize code refactoring and infrastructure overhead for easier migration from on-prem to cloud. What should you do?
A.
Use AI Platform for distributed training.
B.
Create a cluster on Dataproc for training.
C.
Create a Managed Instance Group with autoscaling.
D.
Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster.
Answer:
C
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 15
Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?
A.
Use Kubeflow Pipelines to execute the experiments. Export the metrics file, and query the results using the Kubeflow Pipelines API.
B.
Use AI Platform Training to execute the experiments. Write the accuracy metrics to BigQuery, and query the results using the BigQuery API.
C.
Use AI Platform Training to execute the experiments. Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
D.
Use AI Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API.