amazon AWS Certified Data Analytics - Specialty practice test

Last exam update: Jan 14 ,2025
Page 1 out of 11. Viewing questions 1-15 out of 164

Question 1

A company wants to collect and process events data from different departments in near-real time. Before storing the data in
Amazon S3, the company needs to clean the data by standardizing the format of the address and timestamp columns. The
data varies in size based on the overall load at each particular point in time. A single data record can be 100 KB-10 MB.
How should a data analytics specialist design the solution for data ingestion?

  • A. Use Amazon Kinesis Data Streams. Configure a stream for the raw data. Use a Kinesis Agent to write data to the stream. Create an Amazon Kinesis Data Analytics application that reads data from the raw stream, cleanses it, and stores the output to Amazon S3.
  • B. Use Amazon Kinesis Data Firehose. Configure a Firehose delivery stream with a preprocessing AWS Lambda function for data cleansing. Use a Kinesis Agent to write data to the delivery stream. Configure Kinesis Data Firehose to deliver the data to Amazon S3.
  • C. Use Amazon Managed Streaming for Apache Kafka. Configure a topic for the raw data. Use a Kafka producer to write data to the topic. Create an application on Amazon EC2 that reads data from the topic by using the Apache Kafka consumer API, cleanses the data, and writes to Amazon S3.
  • D. Use Amazon Simple Queue Service (Amazon SQS). Configure an AWS Lambda function to read events from the SQS queue and upload the events to Amazon S3.
Mark Question:
Answer:

B

User Votes:
A
50%
B 1 votes
50%
C
50%
D 1 votes
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 2

A retail company is using an Amazon S3 bucket to host an ecommerce data lake. The company is using AWS Lake
Formation to manage the data lake.
A data analytics specialist must provide access to a new business analyst team. The team will use Amazon Athena from the
AWS Management Console to query data from existing web_sales and customer tables in the ecommerce database. The
team needs read-only access and the ability to uniquely identify customers by using first and last names. However, the team
must not be able to see any other personally identifiable data. The table structure is as follows:

Which combination of steps should the data analytics specialist take to provide the required permission by using the principle
of least privilege? (Choose three.)

  • A. In AWS Lake Formation, grant the business_analyst group SELECT and ALTER permissions for the web_sales table.
  • B. In AWS Lake Formation, grant the business_analyst group the SELECT permission for the web_sales table.
  • C. In AWS Lake Formation, grant the business_analyst group the SELECT permission for the customer table. Under columns, choose filter type Include columns with columns fisrt_name, last_name, and customer_id.
  • D. In AWS Lake Formation, grant the business_analyst group SELECT and ALTER permissions for the customer table. Under columns, choose filter type Include columns with columns fisrt_name and last_name.
  • E. Create users under a business_analyst IAM group. Create a policy that allows the lakeformation:GetDataAccess action, the athena:* action, and the glue:Get* action.
  • F. Create users under a business_analyst IAM group. Create a policy that allows the lakeformation:GetDataAccess action, the athena:* action, and the glue:Get* action. In addition, allow the s3:GetObject action, the s3:PutObject action, and the s3:GetBucketLocation action for the Athena query results S3 bucket.
Mark Question:
Answer:

B D F

User Votes:
A
50%
B 1 votes
50%
C 1 votes
50%
D
50%
E
50%
F 1 votes
50%
Discussions
vote your answer:
A
B
C
D
E
F
0 / 1000
kousik.cemk
1 year ago

Why option D is suggested as answer it clearly says readonly access required option D mentioned Alter access


Question 3

A media analytics company consumes a stream of social media posts. The posts are sent to an Amazon Kinesis data stream
partitioned on user_id. An AWS Lambda function retrieves the records and validates the content before loading the posts
into an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. The validation process needs to receive the
posts for a given user in the order they were received by the Kinesis data stream.
During peak hours, the social media posts take more than an hour to appear in the Amazon OpenSearch Service (Amazon
ES) cluster. A data analytics specialist must implement a solution that reduces this latency with the least possible operational
overhead.
Which solution meets these requirements?

  • A. Migrate the validation process from Lambda to AWS Glue.
  • B. Migrate the Lambda consumers from standard data stream iterators to an HTTP/2 stream consumer.
  • C. Increase the number of shards in the Kinesis data stream.
  • D. Send the posts stream to Amazon Managed Streaming for Apache Kafka instead of the Kinesis data stream.
Mark Question:
Answer:

C


Explanation:
For real-time processing of streaming data, Amazon Kinesis partitions data in multiple shards that can then be consumed by
multiple Amazon EC2
Reference: https://d1.awsstatic.com/whitepapers/AWS_Cloud_Best_Practices.pdf

User Votes:
A
50%
B
50%
C 1 votes
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 4

A company provides an incentive to users who are physically active. The company wants to determine how active the users
are by using an application on their mobile devices to track the number of steps they take each day. The company needs to
ingest and perform near-real-time analytics on live data. The processed data must be stored and must remain available for 1
year for analytics purposes.
Which solution will meet these requirements with the LEAST operational overhead?

  • A. Use Amazon Cognito to write the data from the application to Amazon DynamoDB. Use an AWS Step Functions workflow to create a transient Amazon EMR cluster every hour and process the new data from DynamoDB. Output the processed data to Amazon Redshift for analytics. Archive the data from Amazon Redshift after 1 year.
  • B. Ingest the data into Amazon DynamoDB by using an Amazon API Gateway API as a DynamoDB proxy. Use an AWS Step Functions workflow to create a transient Amazon EMR cluster every hour and process the new data from DynamoDOutput the processed data to Amazon Redshift to run analytics calculations. Archive the data from Amazon Redshift after 1 year.
  • C. Ingest the data into Amazon Kinesis Data Streams by using an Amazon API Gateway API as a Kinesis proxy. Run Amazon Kinesis Data Analytics on the stream data. Output the processed data into Amazon S3 by using Amazon Kinesis Data Firehose. Use Amazon Athena to run analytics calculations. Use S3 Lifecycle rules to transition objects to S3 Glacier after 1 year.
  • D. Write the data from the application into Amazon S3 by using Amazon Kinesis Data Firehose. Use Amazon Athena to run the analytics on the data in Amazon S3. Use S3 Lifecycle rules to transition objects to S3 Glacier after 1 year.
Mark Question:
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D 1 votes
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 5

A retail company stores order invoices in an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster Indices
on the cluster are created monthly. Once a new month begins, no new writes are made to any of the indices from the
previous months. The company has been expanding the storage on the Amazon OpenSearch Service (Amazon
Elasticsearch Service) cluster to avoid running out of space, but the company wants to reduce costs. Most searches on the
cluster are on the most recent 3 months of data, while the audit team requires infrequent access to older data to generate
periodic reports. The most recent 3 months of data must be quickly available for queries, but the audit team can tolerate
slower queries if the solution saves on cluster costs
Which of the following is the MOST operationally efficient solution to meet these requirements?

  • A. Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to store the indices in Amazon S3 Glacier. When the audit team requires the archived data, restore the archived indices back to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.
  • B. Archive indices that are older than 3 months by taking manual snapshots and storing the snapshots in Amazon S3. When the audit team requires the archived data, restore the archived indices back to the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster.
  • C. Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage.
  • D. Archive indices that are older than 3 months by using Index State Management (ISM) to create a policy to migrate the indices to Amazon OpenSearch Service (Amazon Elasticsearch Service) UltraWarm storage. When the audit team requires the older data, migrate the indices in UltraWarm storage back to hot storage.
Mark Question:
Answer:

D


Explanation:
Reference: https://docs.aws.amazon.com/da_pv/opensearch-service/latest/developerguide/opensearch-service-dg.pdf

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 6

A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The
company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following
tables.
A trips fact table for information on completed rides.

A drivers dimension table for driver profiles.

A customers fact table holding customer profile information.

The company analyzes trip details by date and destination to examine profitability by region. The drivers data rarely
changes. The customers data frequently changes.
What table design provides optimal query performance?

  • A. Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers and customers tables.
  • B. Use DISTSTYLE EVEN for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
  • C. Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
  • D. Use DISTSTYLE EVEN for the drivers table and sort by date. Use DISTSTYLE ALL for both fact tables.
Mark Question:
Answer:

A

User Votes:
A
50%
B
50%
C 1 votes
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
kousik.cemk
1 year ago

The customers data frequently changes so DISTSTYLE ALL is not correct option as suggested i option A correct answer should be C


Question 7

A technology company is creating a dashboard that will visualize and analyze time-sensitive data. The data will come in
through Amazon Kinesis Data Firehose with the butter interval set to 60 seconds. The dashboard must support near-real-
time data.
Which visualization solution will meet these requirements?

  • A. Select Amazon OpenSearch Service (Amazon Elasticsearch Service) as the endpoint for Kinesis Data Firehose. Set up an OpenSearch Dashboards (Kibana) using the data in Amazon OpenSearch Service (Amazon ES) with the desired analyses and visualizations.
  • B. Select Amazon S3 as the endpoint for Kinesis Data Firehose. Read data into an Amazon SageMaker Jupyter notebook and carry out the desired analyses and visualizations.
  • C. Select Amazon Redshift as the endpoint for Kinesis Data Firehose. Connect Amazon QuickSight with SPICE to Amazon Redshift to create the desired analyses and visualizations.
  • D. Select Amazon S3 as the endpoint for Kinesis Data Firehose. Use AWS Glue to catalog the data and Amazon Athena to query it. Connect Amazon QuickSight with SPICE to Athena to create the desired analyses and visualizations.
Mark Question:
Answer:

A

User Votes:
A 1 votes
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 8

A data analytics specialist is setting up workload management in manual mode for an Amazon Redshift environment. The
data analytics specialist is defining query monitoring rules to manage system performance and user experience of an
Amazon Redshift cluster.
Which elements must each query monitoring rule include?

  • A. A unique rule name, a query runtime condition, and an AWS Lambda function to resubmit any failed queries in off hours
  • B. A queue name, a unique rule name, and a predicate-based stop condition
  • C. A unique rule name, one to three predicates, and an action
  • D. A workload name, a unique rule name, and a query runtime-based condition
Mark Question:
Answer:

C


Explanation:
Reference: https://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-query-monitoring-rules.html

User Votes:
A
50%
B
50%
C 1 votes
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 9

An online retail company uses Amazon Redshift to store historical sales transactions. The company is required to encrypt
data at rest in the clusters to comply with the Payment Card Industry Data Security Standard (PCI DSS). A corporate
governance policy mandates management of encryption keys using an on-premises hardware security module (HSM).
Which solution meets these requirements?

  • A. Create and manage encryption keys using AWS CloudHSM Classic. Launch an Amazon Redshift cluster in a VPC with the option to use CloudHSM Classic for key management.
  • B. Create a VPC and establish a VPN connection between the VPC and the on-premises network. Create an HSM connection and client certificate for the on-premises HSM. Launch a cluster in the VPC with the option to use the on- premises HSM to store keys.
  • C. Create an HSM connection and client certificate for the on-premises HSM. Enable HSM encryption on the existing unencrypted cluster by modifying the cluster. Connect to the VPC where the Amazon Redshift cluster resides from the on- premises network using a VPN.
  • D. Create a replica of the on-premises HSM in AWS CloudHSM. Launch a cluster in a VPC with the option to use CloudHSM to store keys.
Mark Question:
Answer:

B

User Votes:
A
50%
B 1 votes
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 10

A company leverages Amazon Athena for ad-hoc queries against data stored in Amazon S3. The company wants to
implement additional controls to separate query execution and query history among users, teams, or applications running in
the same AWS account to comply with internal security policies.
Which solution meets these requirements?

  • A. Create an S3 bucket for each given use case, create an S3 bucket policy that grants permissions to appropriate individual IAM users. and apply the S3 bucket policy to the S3 bucket.
  • B. Create an Athena workgroup for each given use case, apply tags to the workgroup, and create an IAM policy using the tags to apply appropriate permissions to the workgroup.
  • C. Create an IAM role for each given use case, assign appropriate permissions to the role for the given use case, and add the role to associate the role with Athena.
  • D. Create an AWS Glue Data Catalog resource policy for each given use case that grants permissions to appropriate individual IAM users, and apply the resource policy to the specific tables used by Athena.
Mark Question:
Answer:

C


Explanation:
Reference: https://aws.amazon.com/athena/faqs/

User Votes:
A
50%
B 1 votes
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 11

A central government organization is collecting events from various internal applications using Amazon Managed Streaming
for Apache Kafka (Amazon MSK). The organization has configured a separate Kafka topic for each application to separate
the data. For security reasons, the Kafka cluster has been configured to only allow TLS encrypted data and it encrypts the
data at rest.
A recent application update showed that one of the applications was configured incorrectly, resulting in writing data to a
Kafka topic that belongs to another application. This resulted in multiple errors in the analytics pipeline as data from different
applications appeared on the same topic. After this incident, the organization wants to prevent applications from writing to a
topic different than the one they should write to.
Which solution meets these requirements with the least amount of effort?

  • A. Create a different Amazon EC2 security group for each application. Configure each security group to have access to a specific topic in the Amazon MSK cluster. Attach the security group to each application based on the topic that the applications should read and write to.
  • B. Install Kafka Connect on each application instance and configure each Kafka Connect instance to write to a specific topic only.
  • C. Use Kafka ACLs and configure read and write permissions for each topic. Use the distinguished name of the clients TLS certificates as the principal of the ACL.
  • D. Create a different Amazon EC2 security group for each application. Create an Amazon MSK cluster and Kafka topic for each application. Configure each security group to have access to the specific cluster.
Mark Question:
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 12

A data analytics specialist is building an automated ETL ingestion pipeline using AWS Glue to ingest compressed files that
have been uploaded to an Amazon S3 bucket. The ingestion pipeline should support incremental data processing.
Which AWS Glue feature should the data analytics specialist use to meet this requirement?

  • A. Workflows
  • B. Triggers
  • C. Job bookmarks
  • D. Classifiers
Mark Question:
Answer:

B


Explanation:
Reference: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/build-an-etl-service-pipeline-to-load-data-
incrementally-from-amazon-s3-to-amazon-redshift-using-aws-glue.html

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 13

A transportation company uses IoT sensors attached to trucks to collect vehicle data for its global delivery fleet. The
company currently sends the sensor data in small .csv files to Amazon S3. The files are then loaded into a 10-node Amazon
Redshift cluster with two slices per node and queried using both Amazon Athena and Amazon Redshift. The company wants
to optimize the files to reduce the cost of querying and also improve the speed of data loading into the Amazon Redshift
cluster.
Which solution meets these requirements?

  • A. Use AWS Glue to convert all the files from .csv to a single large Apache Parquet file. COPY the file into Amazon Redshift and query the file with Athena from Amazon S3.
  • B. Use Amazon EMR to convert each .csv file to Apache Avro. COPY the files into Amazon Redshift and query the file with Athena from Amazon S3.
  • C. Use AWS Glue to convert the files from .csv to a single large Apache ORC file. COPY the file into Amazon Redshift and query the file with Athena from Amazon S3.
  • D. Use AWS Glue to convert the files from .csv to Apache Parquet to create 20 Parquet files. COPY the files into Amazon Redshift and query the files with Athena from Amazon S3.
Mark Question:
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 14

A mobile gaming company wants to capture data from its gaming app and make the data available for analysis immediately.
The data record size will be approximately 20 KB. The company is concerned about achieving optimal throughput from each
device. Additionally, the company wants to develop a data stream processing application with dedicated throughput for each
consumer.
Which solution would achieve this goal?

  • A. Have the app call the PutRecords API to send data to Amazon Kinesis Data Streams. Use the enhanced fan-out feature while consuming the data.
  • B. Have the app call the PutRecordBatch API to send data to Amazon Kinesis Data Firehose. Submit a support case to enable dedicated throughput on the account.
  • C. Have the app use Amazon Kinesis Producer Library (KPL) to send data to Kinesis Data Firehose. Use the enhanced fan- out feature while consuming the data.
  • D. Have the app call the PutRecords API to send data to Amazon Kinesis Data Streams. Host the stream-processing application on Amazon EC2 with Auto Scaling.
Mark Question:
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
krishna1234
6 months, 1 week ago

A company analyzes its data in an Amazon Redshift data warehouse, which currently has a cluster of three dense storage nodes. Due to a recent business acquisition, the company needs to load an additional 4 TB of user data into Amazon Redshift. The engineering team will combine all the user data and apply complex calculations that require I/O intensive resources. The company needs to adjust the cluster's capacity to support the change in analytical and storage requirements. Which solution meets these requirements?


Question 15

A company has 10-15 of uncompressed .csv files in Amazon S3. The company is evaluating Amazon Athena as a one-
time query engine. The company wants to transform the data to optimize query runtime and storage costs.
Which option for data format and compression meets these requirements?

  • A. CSV compressed with zip
  • B. JSON compressed with bzip2
  • C. Apache Parquet compressed with Snappy
  • D. Apache Avro compressed with LZO
Mark Question:
Answer:

B


Explanation:
Reference: https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/

User Votes:
A
50%
B
50%
C 1 votes
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
kousik.cemk
1 year ago

As per the link For Athena, we recommend using either Apache Parquet or Apache ORC, which compress data by default and are splittable.

To page 2