SoM Azure HPC Onboarding Guide
You will receive an email invitation or other direct guidance (if you're not sure, contact us). This will be the main place to go for communication and technical support for HPC. Make sure you are on the Duke network
You must be on the Duke medicine network to connect to the Azure HPC cluster. Use the VPN
Connect to the login node
$ ssh [NetID]@somhpc-scheduler.azure.dhe.duke.edu
Migrate data to Azure HPC cluster
Data migration currently involves two steps:
Create a personal directory in the Azure storage container called
sharedcontainer. For direct access to
If you do not have permission to access the container, reach out to your lab's PI to be added to your lab's grouper group. Once the personal container is created, you can upload data to the temporary Azure blob store using AzCopy. Follow this link for instructions on downloading and using AzCopy.
You will need to download and install AzCopy on a device that has access to the Duke medicine network/VPN or DHTS approved Duke public IP range. Sign in with AzCopy on a browser
Using your CLI, enter
$ azcopy login and follow the instructions displayed:
Once you have signed in successfully, you will see this in the terminal:
azcopy copy [source-data-path] https://dhpsomhpchardacsa01.dfs.core.windows.net/sharedcontainer/[personal-directory] --recursive
If data is transferred successfully you will see a response in the terminal:
dhpsomhpchardacsa01) to Azure HPC cluster scratch space
Azure HPC scheduler node has the AzCopy CLI tool preinstalled. Once you ssh into the scheduler node and login with
azcopy login, you can run the following command to transfer the files from temporary storage to your personal scratch space.
azcopy copy https://dhpsomhpchardacsa01.dfs.core.windows.net/sharedcontainer/[personal-directory] /data/[lab-directory]/[personal-directory] --recursive
dhpsomhpchardacsa01 isa temporary data placeholder. It is recommended that you remove data from the personal container once you finish the second step of data migration. Azure HPC cluster
/data directory should be the permanent final destination.
Get code onto the cluster
Refer to Gitlab's documentation on how to generate a personal access token.
When selecting scopes for the token,
write_repository scopes are sufficient in most scenarios.
Once you obtain your personal access token, you can download the repository in your scheduler terminal:
git clone https://[token-name]:[token]@[repository-url]
git clone https://hpctoken:email@example.com/myrepository
Download Github Repos:
Refer to Github's documentation on how to generate a personal access token and download a repository.
Frequently Asked Questions
scp to copy files to Azure HPC node?
A: Due to the limitation of the ExpressRoute network capacity (the network that connects Duke Health network to Azure)
scp is not recommended for transferring large dataset (hundreds of GB or TB). Use
azcopy CLI to transfer data over to the temporary blob store (
dhpsomhpchardacsa01) over public internet with a higher network throughput.
azcopy also has built-in parallel data transferring feature to expedite large data transfers.
Q: Why I can't ssh to x machine from scheduler node?
A: Outbound port 22 is closed by design
Updated 17 April 2022