Detecting data_access¶
Globus Collections come in several varieties, but only some of them have a
data_access
scope.
data_access
scopes control application access to collections, allowing
users to revoke access for an application independent from other application
permissions.
Revoking consent stops data transfers and other operations.
Because only some collection types have data_access
scopes, application
authors interacting with these collections may need to detect the type of
collection and determine whether or not the scope will be needed.
For readers who prefer to start with complete working examples, jump ahead to the example script.
Accessing Collections in Globus Transfer¶
The Globus Transfer service acts as a central registration hub for collections.
Therefore, in order to get information about an unknown collection, we will
need a TransferClient
with credentials.
The following snippet creates a client and uses it to fetch a collection from the service:
import globus_sdk
# Tutorial Client ID - <replace this with your own client>
NATIVE_CLIENT_ID = "61338d24-54d5-408f-a10d-66c06b59f6d2"
USER_APP = globus_sdk.UserApp("detect-data-access-example", client_id=NATIVE_CLIENT_ID)
# Globus Tutorial Collection 1
# https://app.globus.org/file-manager/collections/6c54cade-bde5-45c1-bdea-f4bd71dba2cc
# replace with your own COLLECTION_ID
COLLECTION_ID = "6c54cade-bde5-45c1-bdea-f4bd71dba2cc"
transfer_client = globus_sdk.TransferClient(app=USER_APP)
collection_doc = transfer_client.get_endpoint(COLLECTION_ID)
Note
Careful readers may note that we use the TransferClient.get_endpoint()
method to lookup a collection.
The Transfer service contains both Endpoints and Collections, and both document types are available from the Get Endpoint API.
Reading Collection Type¶
There are two attributes we need from the collection document to determine
whether or not a data_access
scope is used.
First, whether or not the collection is a GCSv5 Mapped Collection:
entity_type = collection_doc["entity_type"]
is_v5_mapped_collection = entity_type == "GCSv5_mapped_collection"
Second, whether or not the collection is a High Assurance Collection:
is_high_assurance = collection_doc["high_assurance"]
Once we have this information, we can deduce whether or not data_access
is
needed with the following boolean assignment:
collection_uses_data_access = is_v5_mapped_collection and not is_high_assurance
Converting Logic to a Helper Function¶
In order to make the logic above reusable, we need to rephrase.
One of the simpler approaches is to define a helper function which accepts the
TransferClient
and collection ID as inputs.
Here’s a definition of such a helper which is broadly applicable:
def uses_data_access(
transfer_client: globus_sdk.TransferClient, collection_id: str
) -> bool:
"""
Use the provided `transfer_client` to lookup a collection by ID.
Return `True` if the collection uses a `data_access` scope
and `False` otherwise.
"""
doc = transfer_client.get_endpoint(collection_id)
if doc["entity_type"] != "GCSv5_mapped_collection":
return False
if doc["high_assurance"]:
return False
return True
Guarding data_access
Scope Handling¶
Now that we have a reusable helper for determining whether or not collections
use a data_access
scope, it’s possible to use this to drive logic for scope
manipulations.
For example, we can choose to add data_access
requirements to a
GlobusApp
like so:
# Globus Tutorial Collection 1 & 2
# https://app.globus.org/file-manager/collections/6c54cade-bde5-45c1-bdea-f4bd71dba2cc
# https://app.globus.org/file-manager/collections/31ce9ba0-176d-45a5-add3-f37d233ba47d
# replace with your desired collections
SRC_COLLECTION = "6c54cade-bde5-45c1-bdea-f4bd71dba2cc"
DST_COLLECTION = "31ce9ba0-176d-45a5-add3-f37d233ba47d"
if uses_data_access(transfer_client, SRC_COLLECTION):
transfer_client.add_app_data_access_scope(SRC_COLLECTION)
if uses_data_access(transfer_client, DST_COLLECTION):
transfer_client.add_app_data_access_scope(DST_COLLECTION)
Summary: Complete Example¶
With these modifications in place, we can compile the above tooling into a complete script.
This example is complete. It should run without errors “as is”.
import globus_sdk
def uses_data_access(
transfer_client: globus_sdk.TransferClient, collection_id: str
) -> bool:
"""
Use the provided `transfer_client` to lookup a collection by ID.
Based on the record, return `True` if it uses a `data_access` scope and `False`
otherwise.
"""
doc = transfer_client.get_endpoint(collection_id)
if doc["entity_type"] != "GCSv5_mapped_collection":
return False
if doc["high_assurance"]:
return False
return True
# Tutorial Client ID - <replace this with your own client>
NATIVE_CLIENT_ID = "61338d24-54d5-408f-a10d-66c06b59f6d2"
USER_APP = globus_sdk.UserApp("detect-data-access-example", client_id=NATIVE_CLIENT_ID)
# Globus Tutorial Collection 1 & 2
# https://app.globus.org/file-manager/collections/6c54cade-bde5-45c1-bdea-f4bd71dba2cc
# https://app.globus.org/file-manager/collections/31ce9ba0-176d-45a5-add3-f37d233ba47d
# replace with your desired collections
SRC_COLLECTION = "6c54cade-bde5-45c1-bdea-f4bd71dba2cc"
DST_COLLECTION = "31ce9ba0-176d-45a5-add3-f37d233ba47d"
SRC_PATH = "/home/share/godata/file1.txt"
DST_PATH = "/~/example-transfer-script-destination.txt"
transfer_client = globus_sdk.TransferClient(app=USER_APP)
# check if either source or dest needs data_access, and if so add the relevant
# requirement
if uses_data_access(transfer_client, SRC_COLLECTION):
transfer_client.add_app_data_access_scope(SRC_COLLECTION)
if uses_data_access(transfer_client, DST_COLLECTION):
transfer_client.add_app_data_access_scope(DST_COLLECTION)
transfer_request = globus_sdk.TransferData(
source_endpoint=SRC_COLLECTION,
destination_endpoint=DST_COLLECTION,
)
transfer_request.add_item(SRC_PATH, DST_PATH)
task = transfer_client.submit_transfer(transfer_request)
print(f"Submitted transfer. Task ID: {task['task_id']}.")
Note
Because the data_access
requirement can’t be detected until after you have
logged in to the app, it is possible for this to result in a “double login”
scenario.
First, you login providing consent for Transfer, but then a data_access
scope is found to be needed.
You then have to login again to satisfy that requirement.