Azure Data Factory and “on-premises” Azure VMs

A question came up recently on the MSDN Forum for Azure Data Factory around whether or not using an Azure VM would count as a cloud or on-premises resource when it comes to billing.
Checking the pricing for Azure Data Factory you can see that price for Data Movement is different depending on the source location of the data, so where the data is has quite an impact on cost.
So is an Azure VM considered as a cloud location or an on-premises location.
I thought I’d do a quick test to confirm.

Environment

In order to understand how the Data Movement Service sees an Azure VM some setup is required.

 

  1. Create an Azure VM; I already had a Windows Server 2016 CTP4 one so I reused that
  2. Create an Azure Storage account that will act as the Sink for the data
  3. Create a simple Azure Data Factory that contains a Copy activity to move data from the VM to the Storage account

Setup

To allow data to be moved from an on-premises File System sink in Azure Data Factory you need to use the Data Management Gateway on your server.
When the server is a virtual machine in Azure the process is the same, so for the first part of the environment it is pretty straightforward, you need to download, install and run the gateway. Once up and running you’ll expect to see something like the following.

 

ADF-OnPremTest-DMG
For the Azure Data Factory, a number of artefacts need to be created:

 

  1. Data Management Gateway that will provide access to the VM
  2. Linked Service to a File system
  3. Linked Service to an Azure Blob
  4. Dataset representing source data
  5. Dataset representing sink data
  6. Pipeline containing a Copy activity
You also need some very basic data to be move, which can be a simple CSV file containing a couple of items.

Data Management Gateway

The Data Management Gateway is very straightforward and just follows the usual pattern for an on-premises service as it just creates an endpoint on the server.

{
    "name": "OnPremisesFileServerLinkedService",
    "properties": {
        "description": "",
        "hubName": "adfiaastest_hub",
        "type": "OnPremisesFileServer",
        "typeProperties": {
            "host": "localhost",
            "gatewayName": "IAASTEST",
            "userId": "",
            "password": "",
            "encryptedCredential": "[REMOVED]"
        }
    }
}
When you set up an on-premises Linked Service you can either store the credentials for the server directly in the configuration (NOTE: the password is always replaced by asterisks when display), or use an encrypted credential.

Azure Storage Linked Service

Once you’ve created an Azure storage account, you need to create a container. This can be done directly in the Azure portal or through a number of other tools such a Azure Management Studio, Cloud Portam or indeed Visual Studio.

{
    "name": "StorageLinkedService",
    "properties": {
        "description": "",
        "hubName": "adfiaastest_hub",
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=[STORAGEACCT];AccountKey=[STORAGEKEY]"
        }
    }
}

Datasets

As this is a test the dataset used for the test data is extremely simple.
 

For the File System file:

{
    "name": "OnPremisesFile",
    "properties": {
        "published": false,
        "type": "FileShare",
        "linkedServiceName": "OnPremisesFileServerLinkedService",
        "typeProperties": {
            "fileName": "people.csv",
            "folderPath": "c:\\adfiaastest"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

And for Blob Storage:

{
    "name": "AzureBlobDatasetTemplate",
    "properties": {
        "structure": [
            {
                "name": "firstname",
                "type": "String"
            },
            {
                "name": "lastname",
                "type": "String"
            }
        ],
        "published": false,
        "type": "AzureBlob",
        "linkedServiceName": "StorageLinkedService",
        "typeProperties": {
            "fileName": "people.csv",
            "folderPath": "output",
            "format": {
                "type": "TextFormat"
            }
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

Copy Activity Pipeline

Since we are only moving data our pipeline only contains a single Copy activity.

{
    "name": "PipelineTemplate",
    "properties": {
        "description": "Testing IaaS VM",
        "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "FileSystemSource"
                    },
                    "sink": {
                        "type": "BlobSink",
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    }
                },
                "inputs": [
                    {
                        "name": "OnpremisesFile"
                    }
                ],
                "outputs": [
                    {
                        "name": "AzureBlobDatasetTemplate"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "OnpremisesFileSystemtoBlob",
                "description": "copy activity"
            }
        ],
        "start": "2015-12-26T00:00:00Z",
        "end": "2015-12-28T00:00:00Z",
        "isPaused": false,
        "hubName": "adfiaastest_hub",
        "pipelineMode": "Scheduled"
    }
}

Once completed, having a look at the result in the Diagram blade for the data factory should show similar to the following:

ADF-OnPremTest-Pipeline

Result

Once the Data Management Gateway and Azure Storage has been linked and a file uploaded to the blob container, the factory should execute and move the data as expected.
This is confirmed by quickly checking the storage container.

ADF-OnPremTest-Storage

After checking the process has been successful, I examined my subscription to see what Data Factory charges had been incurred. NOTE: It takes a few hours for new charges to show.

ADF-OnPremTest-Bill

Conclusion

Looking at the charges incurred during execution of the data movement activity, it can be seen that whilst we are essentially running a cloud service in the form of an Azure Virtual Machine, the data movement activity is showing as an On Premises move.

It should be noted that the Azure Virtual Machine in this case was one created in the new portal.

Leave a Reply

Your email address will not be published. Required fields are marked *