Notebooks in Synapse
Azure Synapse Analytics’ most appealing feature at first glance is the Synapse Studio. One unified UX across data stores, notebooks and pipelines. Notebook experience is appreciated the most among folks who read a load of data that takes minutes or hours to load then do operations on it whether in data engineering, feature engineering or ML training. The ability to divide your code into smaller chunks that you control which to execute when is a powerful productivity tool.
Added value that the notebook stores not only code by also the results of your code so to speak, it has now a data storage capacity that makes some organizations that are highly regulated or handles high confidential information worried about it.
Save output property
Luckily, that’s controllable by a per-notebook property SaveOutput
, it’s enabled by default for any workspace that is not linked to git source control.
The main security concern comes when the workspace is linked to source control because now the source control repo has both the code and the data, that’s why this feature is disabled once you link your workspace, the screenshot below shows the configuration when linked to source control.
So If you have a concern, just link the workspace to source control and problem fixed.
However in the cases when you can’t link to source control for whatever reason, you have to ask the users to disable it per notebook as there’s no workspace-wide configurations. And that what inspired me to write this script. If you just want the script, head to the bottom of this article or keep reading to explain it.
The Dev endpoint
There are 3 endpoint for any workspace, SQL on demand endpoint, SQL dedicated endpoint, and these two are self explanatory. They are SQL endpoint for the built-in serverless logical server and the dedicated pool if you are created respectively. The third endpoint is the dev endpoint, there’s not much details for that one other than it’s the endpoint for anything else. What it means is it’s the API endpoint for any APIs that are specific to your workspace, in other meaning, anything under the data plane category of the Synapse APIs . That endpoint is specific to your workspace and it has the format of workspace-name.dev.azuresynapse.net
It’s important to understand that this endpoint has the same network capabilities like the other two so you can link it to a private endpoint to make sure the traffic comes only from your own vnets. For more information about the network security, refer to my youtube video.
The Notebook APIs
One API set exposed from the dev endpoint are the notebooks APIs. You can create/update, delete and get notebooks through these APIs. That’s what I leveraged to create the script.
The script…..disable SaveOutput
The Token
First before send our first API call, we should get access token to use it in the authentication. This token is different than the Azure resource manager token as the resource in this case is different, it’s should be a token for the dev endpoint
Write-Host "Getting token for workspace $workspaceName"
$token = (Get-AzAccessToken -ResourceUrl "https://$devDomain" -TenantId $tenantId).Token
return $token
Get all the notebooks in the workspace
using the GET /notebooksSummary
APIs we will get all the notebooks names to loop through them.
Loop and update
Loop through the notebooks, get the full notebook details and change the saveOutput
property to false
Write-Host "Notebook: $($notebook.name)"
$response = invokeREST -method GET -relativeUrl "/notebooks/$($notebook.name)" -body $null
# convert the response to a notebookDetails object
$notebookDetails = $response | ConvertFrom-Json
# Set the saveOutput flag to false
$notebookDetails.properties.metadata.saveOutput=$false
That doesn’t remove any output already written to the notebook, so if we want to remove what was already added, I’m using these lines
# Remove the state of the notebook
$notebookDetails.properties.metadata.synapse_widget=New-Object -TypeName object
# Remove the outputs of the notebook cells
foreach($cell in $notebookDetails.properties.cells) {
$cell.outputs=@()
}
Finally before we send the update, make sure that we are updating the right notebook, not just by the name because there might be change in the names between the time we Get
the notebook and the time we PUT
the notebook so I use the eTag
.
Also removing some json sections that are not expected from the PUT
API
$headers = @{
"If-Match" = """$($notebookDetails.etag)"""
}
# Remove id, type and etag properties from the notebook details
$notebookDetails.PSObject.properties.remove('id')
$notebookDetails.PSObject.properties.remove('type')
$notebookDetails.PSObject.properties.remove('etag')
The complete script
# THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND,
# EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES
# OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Change the published notebooks in Synapse workspaces to not save the outputs
# And clear all the notebooks outputs
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Parameters
[CmdletBinding()]
param (
[Parameter(Mandatory=$true)]
$workspaceName,
[Parameter(Mandatory=$true)]
$tenantId
)
$devDomain="dev.azuresynapse.net"
$apiVersion="2020-12-01"
$WorkspaceSpecificUrl="https://$workspaceName.$devDomain"
#function to get the token for the workspace dev endpoint
function getToken() {
Write-Host "Getting token for workspace $workspaceName"
$token = (Get-AzAccessToken -ResourceUrl "https://$devDomain" -TenantId $tenantId).Token
return $token
}
# function to call the Synapse REST API
function invokeREST($method, $relativeUrl,$headers, $body) {
$uri=$WorkspaceSpecificUrl + $relativeUrl + "?api-version=" +$apiVersion
# Authorization header
$defaultHeaders = @{
"Authorization" = "Bearer $token"
"Content-Type" = "application/json"
}
# check if headers passed
if ($headers.Count -gt 0) {
$headers = $defaultHeaders + $headers
} else {
$headers = $defaultHeaders
}
# send the request
try{
$response = Invoke-RestMethod -Method $method -Uri $uri -Body $body -Headers $headers | ConvertTo-Json -Depth 99
}
catch{
Write-Host "Error: $($_.Exception.Message)"
Write-Host "Details:" $_.ErrorDetails.Message
exit 1
}
return $response
}
# Get Access token to https://dev.azuresynapse.net
$token = getToken
# Get a list of notebooks in the workpsace
$response=invokeREST -method GET -relativeUrl "/notebooksSummary" -body $null
#convert the response to a list of notebooks
$notebooks = $response | ConvertFrom-Json
# check if the notebook exists
if($notebooks.value.Length -eq 0)
{
Write-Host "No Notebooks in workspace $workspaceName"
exit 1
}
#loop through the notebooks and get the notebook details
foreach($notebook in $notebooks.value) {
Write-Host "Notebook: $($notebook.name)"
$response = invokeREST -method GET -relativeUrl "/notebooks/$($notebook.name)" -body $null
# convert the response to a notebookDetails object
$notebookDetails = $response | ConvertFrom-Json
# Set the saveOutput flag to false
$notebookDetails.properties.metadata.saveOutput=$false
# Remove the state of the notebook
$notebookDetails.properties.metadata.synapse_widget=New-Object -TypeName object
# Remove the outputs of the notebook cells
foreach($cell in $notebookDetails.properties.cells) {
$cell.outputs=@()
}
$headers = @{
"If-Match" = """$($notebookDetails.etag)"""
}
# Remove id, type and etag properties from the notebook details
$notebookDetails.PSObject.properties.remove('id')
$notebookDetails.PSObject.properties.remove('type')
$notebookDetails.PSObject.properties.remove('etag')
# convert notebookDetails to a json string
$notebookDetails = $notebookDetails | ConvertTo-Json -Depth 99
# Update the notebook
$updateNotebookResponse = invokeREST -method PUT -relativeUrl "/notebooks/$($notebook.name)" -headers $headers -body $notebookDetails
$updateNotebookResponse
}