amazon aws certified machine learning - specialty (mls-c01) practice test
Last exam update: Nov 16 ,2024
Page 1 out of 33. Viewing questions 1-10 out of 327
Question 1
A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products.
Which solution will meet these requirements with the MOST operational efficiency?
A.
Build a custom clustering model. Create a Dockerfile and build a Docker image. Register the Docker image in Amazon Elastic Container Registry (Amazon ECR). Use the custom image in Amazon SageMaker to generate a trained model.
B.
Tokenize the data and transform the data into tabular data. Train an Amazon SageMaker k-means model to generate the product categories.
C.
Train an Amazon SageMaker Neural Topic Model (NTM) model to generate the product categories.
D.
Train an Amazon SageMaker Blazing Text model to generate the product categories.
Answer:
b
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 2
An online store is predicting future book sales by using a linear regression model that is based on past sales data. The data includes duration, a numerical feature that represents the number of days that a book has been listed in the online store. A data scientist performs an exploratory data analysis and discovers that the relationship between book sales and duration is skewed and non-linear.
Which data transformation step should the data scientist take to improve the predictions of the model?
A.
One-hot encoding
B.
Cartesian product transformation
C.
Quantile binning
D.
Normalization
Answer:
a
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 3
A financial services company wants to adopt Amazon SageMaker as its default data science environment. The company's data scientists run machine learning (ML) models on confidential financial data. The company is worried about data egress and wants an ML engineer to secure the environment. Which mechanisms can the ML engineer use to control data egress from SageMaker? (Choose three.)
A.
Connect to SageMaker by using a VPC interface endpoint powered by AWS PrivateLink.
B.
Use SCPs to restrict access to SageMaker.
C.
Disable root access on the SageMaker notebook instances.
D.
Enable network isolation for training jobs and models.
E.
Restrict notebook presigned URLs to specific IPs used by the company.
F.
Protect data with encryption at rest and in transit. Use AWS Key Management Service (AWS KMS) to manage encryption keys.
Answer:
bdf
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
F
50%
Discussions
0/ 1000
Question 4
A finance company needs to forecast the price of a commodity. The company has compiled a dataset of historical daily prices. A data scientist must train various forecasting models on 80% of the dataset and must validate the efficacy of those models on the remaining 20% of the dataset.
How should the data scientist split the dataset into a training dataset and a validation dataset to compare model performance?
A.
Pick a date so that 80% of the data points precede the date. Assign that group of data points as the training dataset. Assign all the remaining data points to the validation dataset.
B.
Pick a date so that 80% of the data points occur after the date. Assign that group of data points as the training dataset. Assign all the remaining data points to the validation dataset.
C.
Starting from the earliest date in the dataset, pick eight data points for the training dataset and two data points for the validation dataset. Repeat this stratified sampling until no data points remain.
D.
Sample data points randomly without replacement so that 80% of the data points are in the training dataset. Assign all the remaining data points to the validation dataset.
Answer:
b
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 5
A company will use Amazon SageMaker to train and host a machine learning model for a marketing campaign. The data must be encrypted at rest. Most of the data is sensitive customer data. The company wants AWS to maintain the root of trust for the encryption keys and wants key usage to be logged.
Which solution will meet these requirements with the LEAST operational overhead?
A.
Use AWS Security Token Service (AWS STS) to create temporary tokens to encrypt the storage volumes for all SageMaker instances and to encrypt the model artifacts and data in Amazon S3.
B.
Use customer managed keys in AWS Key Management Service (AWS KMS) to encrypt the storage volumes for all SageMaker instances and to encrypt the model artifacts and data in Amazon S3.
C.
Use encryption keys stored in AWS CloudHSM to encrypt the storage volumes for all SageMaker instances and to encrypt the model artifacts and data in Amazon S3.
D.
Use SageMaker built-in transient keys to encrypt the storage volumes for all SageMaker instances. Enable default encryption ffnew Amazon Elastic Block Store (Amazon EBS) volumes.
Answer:
d
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 6
A data scientist is training a large PyTorch model by using Amazon SageMaker. It takes 10 hours on average to train the model on GPU instances. The data scientist suspects that training is not converging and that resource utilization is not optimal.
What should the data scientist do to identify and address training issues with the LEAST development effort?
A.
Use CPU utilization metrics that are captured in Amazon CloudWatch. Configure a CloudWatch alarm to stop the training job early if low CPU utilization occurs.
B.
Use high-resolution custom metrics that are captured in Amazon CloudWatch. Configure an AWS Lambda function to analyze the metrics and to stop the training job early if issues are detected.
C.
Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.
D.
Use the SageMaker Debugger confusion and feature_importance_overweight built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.
Answer:
d
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 7
A company is training machine learning (ML) models on Amazon SageMaker by using 200 TB of data that is stored in Amazon S3 buckets. The training data consists of individual files that are each larger than 200 MB in size. The company needs a data access solution that offers the shortest processing time and the least amount of setup.
Which solution will meet these requirements?
A.
Use File mode in SageMaker to copy the dataset from the S3 buckets to the ML instance storage.
B.
Create an Amazon FSx for Lustre file system. Link the file system to the S3 buckets.
C.
Create an Amazon Elastic File System (Amazon EFS) file system. Mount the file system to the training instances.
D.
Use FastFile mode in SageMaker to stream the files on demand from the S3 buckets.
Answer:
d
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 8
When submitting Amazon SageMaker training jobs using one of the built-in algorithms, which common parameters MUST be specified? (Choose three.)
A.
The training channel identifying the location of training data on an Amazon S3 bucket.
B.
The validation channel identifying the location of validation data on an Amazon S3 bucket.
C.
The IAM role that Amazon SageMaker can assume to perform tasks on behalf of the users.
D.
Hyperparameters in a JSON array as documented for the algorithm used.
E.
The Amazon EC2 instance class specifying whether training will be run using CPU or GPU.
F.
The output path specifying where on an Amazon S3 bucket the trained model will persist.
Answer:
aef
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
F
50%
Discussions
0/ 1000
Question 9
A machine learning specialist stores IoT soil sensor data in Amazon DynamoDB table and stores weather event data as JSON files in Amazon S3. The dataset in DynamoDB is 10 GB in size and the dataset in Amazon S3 is 5 GB in size. The specialist wants to train a model on this data to help predict soil moisture levels as a function of weather events using Amazon SageMaker. Which solution will accomplish the necessary transformation to train the Amazon SageMaker model with the LEAST amount of administrative overhead?
A.
Launch an Amazon EMR cluster. Create an Apache Hive external table for the DynamoDB table and S3 data. Join the Hive tables and write the results out to Amazon S3.
B.
Crawl the data using AWS Glue crawlers. Write an AWS Glue ETL job that merges the two tables and writes the output to an Amazon Redshift cluster.
C.
Enable Amazon DynamoDB Streams on the sensor table. Write an AWS Lambda function that consumes the stream and appends the results to the existing weather files in Amazon S3.
D.
Crawl the data using AWS Glue crawlers. Write an AWS Glue ETL job that merges the two tables and writes the output in CSV format to Amazon S3.
Answer:
c
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 10
A power company wants to forecast future energy consumption for its customers in residential properties and commercial business properties. Historical power consumption data for the last 10 years is available. A team of data scientists who performed the initial data analysis and feature selection will include the historical power consumption data and data such as weather, number of individuals on the property, and public holidays. The data scientists are using Amazon Forecast to generate the forecasts. Which algorithm in Forecast should the data scientists use to meet these requirements?
A.
Autoregressive Integrated Moving Average (AIRMA)
B.
Exponential Smoothing (ETS)
C.
Convolutional Neural Network - Quantile Regression (CNN-QR)