You have an Azure data factory.
You need to examine the pipeline failures from the last 180 days.
What should you use?
D
Explanation:
Data Factory stores pipeline-run data for only 45 days. Use Azure Monitor if you want to keep that data for a longer time.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor
You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance changes for
infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?
B
Explanation:
Monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the adaptive
cache for dedicated SQL pools.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-how-to-
monitor-cache
You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and a database named DB1. DB1 contains a fact
table named Table1.
You need to identify the extent of the data skew in Table1.
What should you do in Synapse Studio?
D
Explanation:
Microsoft recommends use of sys.dm_pdw_nodes_db_partition_stats to analyze any skewness in the data.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-partition-stats-
transact-sql https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/cheat-sheet
You have several Azure Data Factory pipelines that contain a mix of the following types of activities:
Wrangling data flow
Notebook
Copy Jar
Which two Azure services should you use to debug the activities? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point
A C
HOTSPOT
You have an Azure Data Factory pipeline that has the activities shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each statement based on the information presented in
the graphic.
NOTE: Each correct selection is worth one point.
Hot Area:
Explanation:
Box 1: succeed
Box 2: failed Example:
Now lets say we have a pipeline with 3 activities, where Activity1 has a success path to Activity2 and a failure path to
Activity3. If Activity1 fails and Activity3 succeeds, the pipeline will fail. The presence of the success path alongside the failure
path changes the outcome reported by the pipeline, even though the activity executions from the pipeline are the same as
the previous scenario.
Activity1 fails, Activity2 is skipped, and Activity3 succeeds. The pipeline reports failure.
Reference:
https://datasavvy.me/2021/02/18/azure-data-factory-activity-failures-and-pipeline-outcomes/
You have two fact tables named Flight and Weather. Queries targeting the tables will be based on the join between the
following columns.
You need to recommend a solution that maximizes query performance.
What should you include in the recommendation?
B
Explanation:
Hash-distribution improves query performance on large fact tables.
Incorrect Answers:
A: Do not use a date column for hash distribution. All data for the same date lands in the same distribution. If several users
are all filtering on the same date, then only 1 of the 60 distributions do all the processing work.
You have an Azure Synapse Analytics dedicated SQL pool.
You run PDW_SHOWSPACEUSED('dbo.FactInternetSales'); and get the results shown in the following table.
Which statement accurately describes the dbo.FactInternetSales table?
D
Explanation:
Data skew means the data is not distributed evenly across the distributions.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute
You configure monitoring for an Azure Synapse Analytics implementation. The implementation uses PolyBase to load data
from comma-separated value (CSV) files stored in Azure Data Lake Storage Gen2 using an external table.
Files with an invalid schema cause errors to occur.
You need to monitor for an invalid schema error.
For which error should you monitor?
B
Explanation:
Error message: Cannot execute the query "Remote Query"
Possible Reason:
The reason this error happens is because each file has different schema. The PolyBase external table DDL when pointed to
a directory recursively reads all the files in that directory. When a column or data type mismatch happens, this error could be
seen in SSMS.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-errors-and-possible-solutions
You are designing a highly available Azure Data Lake Storage solution that will include geo-zone-redundant storage (GZRS).
You need to monitor for replication delays that can affect the recovery point objective (RPO).
What should you include in the monitoring solution?
D
Explanation:
Because geo-replication is asynchronous, it is possible that data written to the primary region has not yet been written to the
secondary region at the time an outage occurs. The Last Sync Time property indicates the last time that data from the
primary region was written successfully to the secondary region. All writes made to the primary region before the last sync
time are available to be read from the secondary location. Writes made to the primary region after the last sync time property
may or may not be available for reads yet.
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/last-sync-time-get
You have an Azure Databricks resource.
You need to log actions that relate to changes in compute for the Databricks resource.
Which Databricks services should you log?
B
Explanation:
Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor
detailed Databricks usage patterns.
There are two types of logs:
Workspace-level audit logs with workspace-level events. Account-level audit logs with account-level events.
Reference:
https://docs.databricks.com/administration-guide/account-settings/audit-logs.html
You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance changes for
infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?
B
Explanation:
Monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the adaptive
cache for dedicated SQL pools.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache
You have a SQL pool in Azure Synapse.
A user reports that queries against the pool take longer than expected to complete. You determine that the issue relates to
queried columnstore segments.
You need to add monitoring to the underlying storage to help diagnose the issue.
Which two metrics should you monitor? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
B D
Explanation:
D: Cache hit percentage: (cache hits / cache miss) * 100 where cache hits is the sum of all columnstore segments hits in the
local SSD cache and cache miss is the columnstore segments misses in the local SSD cache summed across all nodes
B: (cache used / cache capacity) * 100 where cache used is the sum of all bytes in the local SSD cache across all nodes and
cache capacity is the sum of the storage capacity of the local SSD cache across all nodes
Incorrect Asnwers:
C: DWU limit: Service level objective of the data warehouse.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-concept-
resource-utilization-query-activity
A company purchases IoT devices to monitor manufacturing machinery. The company uses an Azure IoT Hub to
communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?
C
Explanation:
In a real-world scenario, you could have hundreds of these sensors generating events as a stream. Ideally, a gateway device
would run code to push these events to Azure Event Hubs or Azure IoT Hubs. Your Stream Analytics job would ingest these
events from Event Hubs and run real-time analytics queries against the streams.
Create a Stream Analytics job:
In the Azure portal, select + Create a resource from the left navigation menu. Then, select Stream Analytics job from
Analytics.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-get-started-with-azure-stream-analytics-to-
process-data-from-iot-devices
You are designing a star schema for a dataset that contains records of online orders. Each record includes an order date, an
order due date, and an order ship date.
You need to ensure that the design provides the fastest query times of the records when querying for arbitrary date ranges
and aggregating by fiscal calendar attributes.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
B D
You are designing an inventory updates table in an Azure Synapse Analytics dedicated SQL pool. The table will have a
clustered columnstore index and will include the following columns:
You identify the following usage patterns:
Analysts will most commonly analyze transactions for a warehouse.
Queries will summarize by product category type, date, and/or inventory event type.
You need to recommend a partition strategy for the table to minimize query times.
On which column should you partition the table?
D
Explanation:
The number of records for each warehouse is big enough for a good partitioning.
Note: Table partitions enable you to divide your data into smaller groups of data. In most cases, table partitions are created
on a date column.
When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition.
For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and
partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed
databases.
Azure Data Factory activity runs in Azure Monitor
Azure Data Factory activity runs in Azure Monitor