An R package to handle Azure authentication and some basic tasks accessing blob and table storage and reading in data from files.
The package is in development. Please create an issue if you have ideas for its improvement.
You can install the development version of {azkit} with:
# install.packages("pak")
pak::pak("The-Strategy-Unit/azkit")A primary function in {azkit} enables access to an Azure blob container:
data_container <- azkit::get_container("data-container")
Authentication is handled automatically by get_container(), but if you need
to, you can explicitly return an authentication token for inspection or re-use:
my_token <- azkit::get_auth_token()
data_container <- azkit::get_container("data-container", token = my_token)Return a list of all available containers in your default Azure storage with:
list_container_names()Once you have access to a container, you can use one of a set of data reading
functions to bring data into R from .parquet, .rds, .json or .csv files.
For example:
pqt_data <- azkit::read_azure_parquet(data_container, "important_data.parquet")
To read in any file from the container in raw format, to be passed to the handler of your choice, use:
raw_data <- azkit::read_azure_file(data_container, "misc_data.ext")You can map over multiple files by first using azkit::list_files() and then
passing the file paths to the read* function:
azkit::list_files(data_container, "data/latest", "parquet") |>
purrr::map(\(x) azkit::read_azure_parquet(data_container, x))Currently these functions only read in a single file at a time.
You can also pass through arguments in ... that will be applied to the
appropriate handler function (see documentation).
For example, readr::read_delim() is used under the hood by
azkit::read_azure_csv, so you can pass through a config argument such as
col_types:
csv_data <- data_container |>
azkit::read_azure_csv("vital_data.csv", path = "data", col_types = "ccci")
To facilitate access to Azure Storage you may want to set some environment
variables.
The neatest way to do this is to include a .Renviron file in
your project folder.
.Renviron in the .gitignore file for
your project.
Your .Renviron file can contain the variables below.
Ask a member of the Data Science team for the necessary values.
AZ_STORAGE_EP=
AZ_TABLE_EP=
Azure authentication is probably the main area where you might experience difficulty.
To debug, try running:
azkit::get_auth_token()and see what is returned.
AzureRMR::get_azure_login() or AzureRMR::list_azure_tokens() may also be
helpful for troubleshooting - try them and see if they work / what they reveal.
To refresh a token, you can do:
# if previously you did:
# token <- azkit::get_auth_token()
azkit::refresh_token(token)If you get errors when reading in files, first check that you are passing in the full and correct filepath relative to the root directory of the container.
If read_azure_json() and similar are not working as expected, try reading in
the raw data first with azkit::read_azure_file() and then passing that to a
handler function of your choice.
Please use the Issues feature on GitHub to report any bugs, ideas or problems, including with the package documentation.
Alternatively, to ask any questions about the package you may contact Fran Barton.
