databricks certified data engineer associate practice test
certified data engineer associate
Last exam update: Nov 17 ,2024
Page 1 out of 9. Viewing questions 1-10 out of 90
Question 1
Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?
A.
Silver tables contain a less refined, less clean view of data than Bronze data.
B.
Silver tables contain aggregates while Bronze data is unaggregated.
C.
Silver tables contain more data than Bronze tables.
D.
Silver tables contain a more refined and cleaner view of data than Bronze tables.
Most Votes
E.
Silver tables contain less data than Bronze tables.
Answer:
d
User Votes:
A
50%
B 1 votes
50%
C 2 votes
50%
D 10 votes
50%
E
50%
Discussions
0/ 1000
Question 2
Which of the following commands will return the location of database customer360?
A.
DESCRIBE LOCATION customer360;
B.
DROP DATABASE customer360;
C.
DESCRIBE DATABASE customer360;
D.
ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};
E.
USE DATABASE customer360;
Answer:
c
User Votes:
A 2 votes
50%
B 1 votes
50%
C 9 votes
50%
D
50%
E
50%
Discussions
0/ 1000
Question 3
A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?
A.
SELECT * FROM sales
B.
spark.delta.table
C.
spark.sql
D.
There is no way to share data between PySpark and SQL.
E.
spark.table
Answer:
c
User Votes:
A
50%
B 1 votes
50%
C 7 votes
50%
D
50%
E
50%
Discussions
0/ 1000
Question 4
A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary. Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?
A.
They can ensure the dashboards SQL endpoint matches each of the queries SQL endpoints.
B.
They can set up the dashboards SQL endpoint to be serverless.
C.
They can turn on the Auto Stop feature for the SQL endpoint.
D.
They can reduce the cluster size of the SQL endpoint.
E.
They can ensure the dashboards SQL endpoint is not one of the included querys SQL endpoint.
Answer:
c
User Votes:
A
50%
B 2 votes
50%
C 5 votes
50%
D
50%
E
50%
Discussions
0/ 1000
Question 5
A data engineer needs to create a table in Databricks using data from their organizations existing SQLite database. They run the following command:
Which of the following lines of code fills in the above blank to successfully complete the task?
A.
org.apache.spark.sql.jdbc
B.
autoloader
C.
DELTA
D.
sqlite
E.
org.apache.spark.sql.sqlite
Answer:
e
User Votes:
A 8 votes
50%
B
50%
C
50%
D 1 votes
50%
E 1 votes
50%
Discussions
0/ 1000
examgoprasadd
3 months, 2 weeks ago
A is the correct answer
Question 6
A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?
Which of the following code blocks can the data engineer use to complete this task?
E.
None
Answer:
d
User Votes:
E 2 votes
50%
Discussions
0/ 1000
jwneil17
4 months, 4 weeks ago
only one answer, not sure what happened here
Question 7
A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.
Which of the following approaches can be used to identify the owner of new_table?
A.
Review the Permissions tab in the table's page in Data Explorer
B.
All of these options can be used to identify the owner of the table
C.
Review the Owner field in the table's page in Data Explorer
D.
Review the Owner field in the table's page in the cloud storage solution
E.
There is no way to identify the owner of the table
Answer:
c
User Votes:
A
50%
B 2 votes
50%
C 7 votes
50%
D
50%
E
50%
Discussions
0/ 1000
Question 8
A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.
Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?
A.
They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."
B.
They can turn on the Auto Stop feature for the SQL endpoint.
C.
They can increase the cluster size of the SQL endpoint.
D.
They can turn on the Serverless feature for the SQL endpoint.
E.
They can increase the maximum bound of the SQL endpoint's scaling range.
Answer:
d
User Votes:
A 2 votes
50%
B
50%
C
50%
D 5 votes
50%
E 1 votes
50%
Discussions
0/ 1000
Question 9
Which of the following commands will return the number of null values in the member_id column?
A.
SELECT count(member_id) FROM my_table;
B.
SELECT count(member_id) - count_null(member_id) FROM my_table;
C.
SELECT count_if(member_id IS NULL) FROM my_table;
D.
SELECT null(member_id) FROM my_table;
E.
SELECT count_null(member_id) FROM my_table;
Answer:
c
User Votes:
A
50%
B
50%
C 4 votes
50%
D
50%
E 1 votes
50%
Discussions
0/ 1000
Question 10
Which of the following describes a scenario in which a data engineer will want to use a single-node cluster?
A.
When they are working interactively with a small amount of data
B.
When they are running automated reports to be refreshed as quickly as possible
C.
When they are working with SQL within Databricks SQL
D.
When they are concerned about the ability to automatically scale with larger data
E.
When they are manually running reports with a large amount of data