Globus Search¶

class globus_sdk.SearchClient(*, environment=None, base_url=None, app=None, app_scopes=None, authorizer=None, app_name=None, transport=None, retry_config=None)[source]¶

Bases: BaseClient

Client for the Globus Search API

Parameters:

app (GlobusApp | None) – A GlobusApp which will be used for handling authorization and storing and validating tokens. Passing an app will automatically include a client’s default scopes in the app’s scope requirements unless specific app_scopes are given. If app_name is not given, the app’s app_name will be used. Mutually exclusive with authorizer.
app_scopes (list[Scope] | None) – Optional list of Scope objects to be added to app’s scope requirements instead of default_scope_requirements. Requires app.
authorizer (GlobusAuthorizer | None) – A GlobusAuthorizer which will generate Authorization headers. Mutually exclusive with app.
app_name (str | None) – Optional “nice name” for the application. Has no bearing on the semantics of client actions. It is just passed as part of the User-Agent string, and may be useful when debugging issues with the Globus Team. If both``app`` and app_name are given, this value takes priority.
base_url (str | None) – The URL for the service. Most client types initialize this value intelligently by default. Set it when inheriting from BaseClient or communicating through a proxy. This value takes precedence over the class attribute of the same name.
transport (RequestsTransport | None) – A RequestsTransport object for sending and retrying requests. By default, one will be constructed by the client.
retry_config (RetryConfig | None) – A RetryConfig object with parameters to control request retry behavior. By default, one will be constructed by the client.

This class provides helper methods for most common resources in the API, and basic get, put, post, and delete methods from the base client that can be used to access any API resource.

Methods

batch_delete_by_subject()
create_index()
create_role()
delete_by_query()
delete_entry()
delete_index()
delete_role()
delete_subject()
get_entry()
get_index()
get_role_list()
get_subject()
get_task()
get_task_list()
index_list()
ingest()
post_search(), paginated.post_search()
reopen_index()
scroll(), paginated.scroll()
search(), paginated.search()
update_index()

scopes = <globus_sdk.scopes.data.search._SearchScopes object>¶: the scopes for this client may be present as a ScopeCollection

create_index(display_name, description)[source]¶

Create a new index.

Parameters:

display_name (str) – the name of the index
description (str) – a description of the index

Return type:

GlobusHTTPResponse

New indices default to trial status. For subscribers with a subscription ID, indices can be converted to non-trial by sending a request to support@globus.org

Example Usage

sc = globus_sdk.SearchClient(...)
r = sc.create_index(
    "History and Witchcraft",
    "Searchable information about history and witchcraft",
)
print(f"index ID: {r['id']}")

Example Response Data

{
  "@datatype": "GSearchIndex",
  "@version": "2017-09-01",
  "creation_date": "2021-04-05 15:05:18",
  "display_name": "Awesome Index of Awesomeness",
  "description": "An index so awesome that it simply cannot be described",
  "id": "58940197-fece-4297-8e6d-53994077ceb2",
  "is_trial": true,
  "subscription_id": null,
  "max_size_in_mb": 1,
  "num_entries": 0,
  "num_subjects": 0,
  "size_in_mb": 0,
  "status": "open"
}

API Info

POST /v1/index

See Index Create in the API documentation for details.

update_index(index_id, *, display_name=MISSING, description=MISSING)[source]¶

Update index metadata.

Parameters:

index_id (UUID | str) – the ID of the index
display_name (str | MissingType) – the name of the index
description (str | MissingType) – a description of the index

Return type:

GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
MY_INDEX_ID = ...
r = sc.update_index(
    MY_INDEX_ID,
    display_name="My Awesome Index",
    description="Very awesome searchable data",
)
print(f"index ID: {r['id']}")

Example Response Data

{
  "@datatype": "GSearchIndex",
  "@version": "2017-09-01",
  "creation_date": "2021-04-05 15:05:18",
  "display_name": "Awesome Index of Awesomeness",
  "description": "An index so awesome that it simply cannot be described",
  "id": "58940197-fece-4297-8e6d-53994077ceb2",
  "is_trial": true,
  "subscription_id": null,
  "max_size_in_mb": 1,
  "num_entries": 0,
  "num_subjects": 0,
  "size_in_mb": 0,
  "status": "open"
}

API Info

PATCH /v1/index/<index_id>

See Index Update in the API documentation for details.

delete_index(index_id)[source]¶

Mark an index for deletion.

Globus Search does not immediately delete indices. Instead, this API sets the index status to "delete-pending". Search will move pending tasks on the index to the CANCELLED state and will eventually delete the index.

If the index is a trial index, it will be deleted a few minutes after being marked for deletion. If the index is non-trial, it will be kept for 30 days and will be eligible for use with the reopen API (see reopen_index()) during that time.

Parameters:: index_id (UUID | str) – the ID of the index
Return type:: GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
sc.delete_index(index_id)

Example Response Data

{
  "index_id": "b37d8261-38fd-4497-996b-30cd869eb02a",
  "acknowledged": true
}

API Info

DELETE /v1/index/<index_id>

See Index Delete in the API documentation for details.

reopen_index(index_id)[source]¶

Reopen an index that has been marked for deletion, cancelling the deletion.

Parameters:: index_id (UUID | str) – the ID of the index
Return type:: GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
sc.reopen_index(index_id)

Example Response Data

{
  "index_id": "ac6b0e09-c0fc-4125-bc8a-997f03a14549",
  "acknowledged": true
}

API Info

POST /v1/index/<index_id>/reopen

See Index Reopen in the API documentation for details.

get_index(index_id, *, query_params=None)[source]¶

Get descriptive data about a Search index, including its title and description and how much data it contains.

Parameters:

index_id (UUID | str) – the ID of the index
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
index = sc.get_index(index_id)
assert index["id"] == index_id
print(index["display_name"], "(" + index_id + "):", index["description"])

API Info

GET /v1/index/<index_id>

See Index Show in the API documentation for details.

index_list(filter_roles=MISSING, *, query_params=None)[source]¶

Get a list of indices on which the caller has permissions.

Parameters:

filter_roles (Literal['owner', 'admin', 'writer'] | ~typing.Iterable[~typing.Literal['owner', 'admin', 'writer']] | ~globus_sdk._missing.MissingType) – An iterable of roles to use to filter the listing. By default, all indices where the user has a role are returned. Valid values are owner, admin, and writer.
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

IterableResponse

Example Usage

sc = globus_sdk.SearchClient(...)
for index_doc in sc.index_list():
    print(index_doc["display_name"], f"({index_doc['id']}):")
    print("  permissions:", ", ".join(index_doc["permissions"]))

Example Response Data

{
  "index_list": [
    {
      "@datatype": "GSearchIndex",
      "@version": "2017-09-01",
      "id": "e565fb7e-69a1-11f1-9b34-5690dda43b03",
      "is_trial": true,
      "status": "open",
      "subscription_id": null,
      "creation_date": "2038-07-17 16:48:24",
      "display_name": "Index of Indexed Awesomeness",
      "description": "Turbo Awesome",
      "max_size_in_mb": 1,
      "size_in_mb": 0,
      "num_subjects": 0,
      "num_entries": 0,
      "permissions": [
        "owner"
      ]
    },
    {
      "@datatype": "GSearchIndex",
      "@version": "2017-09-01",
      "id": "e565fc32-69a1-11f1-9b34-5690dda43b03",
      "is_trial": false,
      "status": "open",
      "subscription_id": "e565fc8c-69a1-11f1-9b34-5690dda43b03",
      "creation_date": "2470-10-11 20:09:40",
      "display_name": "Catalog of encyclopediae",
      "description": "Encyclopediae from Britannica to Wikipedia",
      "max_size_in_mb": 100,
      "size_in_mb": 23,
      "num_subjects": 1822,
      "num_entries": 3644,
      "permissions": [
        "writer"
      ]
    }
  ]
}

API Info

GET /v1/index_list

See Index List in the API documentation for details.

search(index_id, q, *, offset=MISSING, limit=MISSING, advanced=MISSING, query_params=None)[source]¶

Execute a simple Search Query, described by the query string q.

Parameters:

index_id (UUID | str) – the ID of the index
q (str) – the query string
offset (int | MissingType) – an offset for pagination
limit (int | MissingType) – the size of a page of results
advanced (bool | MissingType) – enable ‘advanced’ query mode, which has sophisticated syntax but may result in BadRequest errors when used if the query is invalid
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

GlobusHTTPResponse

For details on query syntax, including the advanced query behavior, see the Search Query Syntax documentation.

Example Usage

sc = globus_sdk.SearchClient(...)
result = sc.search(index_id, "query string")
advanced_result = sc.search(index_id, 'author: "Ada Lovelace"', advanced=True)

Paginated Usage

This method supports paginated access. To use the paginated variant, give the same arguments as normal, but prefix the method name with paginated, as in

client.paginated.search(...)

For more information, see how to make paginated calls.

API Info

GET /v1/index/<index_id>/search

See GET Search Query in the API documentation for details.

Example Response Data

{
  "@datatype": "GSearchResult",
  "@version": "2017-09-01",
  "count": 1,
  "gmeta": [
    {
      "@datatype": "GMetaResult",
      "@version": "2019-08-27",
      "entries": [
        {
          "content": {
            "foo": "bar"
          },
          "entry_id": null,
          "matched_principal_sets": []
        }
      ],
      "subject": "foo-bar"
    }
  ],
  "has_next_page": true,
  "offset": 0,
  "total": 10
}

post_search(index_id, data, *, offset=MISSING, limit=MISSING)[source]¶

Execute a complex Search Query, using a query document to express filters, facets, sorting, field boostring, and other behaviors.

Parameters:

index_id (UUID | str) – The index on which to search
data (dict[str, Any] | SearchQueryV1) – A Search Query document containing the query and any other fields
offset (int | MissingType) – offset used in paging (overwrites any offset in data)
limit (int | MissingType) – limit the number of results (overwrites any limit in data)

Return type:

GlobusHTTPResponse

For details on query syntax, including the advanced query behavior, see the Search Query Syntax documentation.

Example Usage

sc = globus_sdk.SearchClient(...)
query_data = {
    "q": "user query",
    "filters": [
        {
            "type": "range",
            "field_name": "path.to.date",
            "values": [{"from": "*", "to": "2014-11-07"}],
        }
    ],
    "facets": [
        {
            "name": "Publication Date",
            "field_name": "path.to.date",
            "type": "date_histogram",
            "date_interval": "year",
        }
    ],
    "sort": [{"field_name": "path.to.date", "order": "asc"}],
}
search_result = sc.post_search(index_id, query_data)

Paginated Usage

This method supports paginated access. To use the paginated variant, give the same arguments as normal, but prefix the method name with paginated, as in

client.paginated.post_search(...)

For more information, see how to make paginated calls.

API Info

POST /v1/index/<index_id>/search

See POST Search Query in the API documentation for details.

scroll(index_id, data, *, marker=MISSING)[source]¶

Scroll all data in a Search index. The paginated version of this API should typically be preferred, as it is the intended mode of usage.

Note that if data is written or deleted during scrolling, it is possible for scrolling to not include results or show other unexpected behaviors.

Parameters:

index_id (UUID | str) – The index on which to search
data (dict[str, Any] | SearchScrollQuery) – A Search Scroll Query document
marker (str | MissingType) – marker used in paging (overwrites any marker in data)

Return type:

GlobusHTTPResponse

For details on query syntax, including the advanced query behavior, see the Search Query Syntax documentation.

Example Usage

sc = globus_sdk.SearchClient(...)
scroll_result = sc.scroll(index_id, {"q": "*"})

Paginated Usage

This method supports paginated access. To use the paginated variant, give the same arguments as normal, but prefix the method name with paginated, as in

client.paginated.scroll(...)

For more information, see how to make paginated calls.

API Info

POST /v1/index/<index_id>/scroll

See Scroll Query in the API documentation for details.

ingest(index_id, data)[source]¶

Write data to a Search index as an asynchronous task. The data can be provided as a single document or list of documents, but only one task_id value will be included in the response.

Parameters:

index_id (UUID | str) – The index into which to write data
data (dict[str, Any]) – an ingest document

Return type:

GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
ingest_data = {
    "ingest_type": "GMetaEntry",
    "ingest_data": {
        "subject": "https://example.com/foo/bar",
        "visible_to": ["public"],
        "content": {"foo/bar": "some val"},
    },
}
sc.ingest(index_id, ingest_data)

or with multiple entries at once via a GMetaList:

sc = globus_sdk.SearchClient(...)
ingest_data = {
    "ingest_type": "GMetaList",
    "ingest_data": {
        "gmeta": [
            {
                "subject": "https://example.com/foo/bar",
                "visible_to": ["public"],
                "content": {"foo/bar": "some val"},
            },
            {
                "subject": "https://example.com/foo/bar",
                "id": "otherentry",
                "visible_to": ["public"],
                "content": {"foo/bar": "some otherval"},
            },
        ]
    },
}
sc.ingest(index_id, ingest_data)

API Info

POST /v1/index/<index_id>/ingest

See Ingest in the API documentation for details.

delete_by_query(index_id, data)[source]¶

Delete data in a Search index as an asynchronous task, deleting all documents which match a given query. The query uses a restricted subset of the syntax available for complex queries, as it is not meaningful to boost, sort, or otherwise rank data in this case.

A task_id value will be included in the response.

Parameters:

index_id (UUID | str) – The index in which to delete data
data (dict[str, Any]) – a query document for documents to delete

Return type:

GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
query_data = {
    "q": "user query",
    "filters": [
        {
            "type": "range",
            "field_name": "path.to.date",
            "values": [{"from": "*", "to": "2014-11-07"}],
        }
    ],
}
sc.delete_by_query(index_id, query_data)

API Info

POST /v1/index/<index_id>/delete_by_query

See Delete By Query in the API documentation for details.

batch_delete_by_subject(index_id, subjects, additional_params=None)[source]¶

Delete data in a Search index as an asynchronous task, deleting multiple documents based on their subject values.

A task_id value will be included in the response.

Parameters:

index_id (UUID | str) – The index in which to delete data
subjects (Iterable[str]) – The subjects to delete, as an iterable of strings
additional_params (dict[str, Any] | None) – Additional parameters to include in the request body

Return type:

GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
sc.batch_delete_by_subject(
    index_id,
    subjects=[
        "very-cool-document",
        "less-cool-document",
        "document-wearing-sunglasses",
    ],
)

Example Response Data

{
  "task_id": "e56a26ea-69a1-11f1-9b34-5690dda43b03"
}

API Info

POST /v1/index/<index_id>/batch_delete_by_subject

See Delete By Subject in the API documentation for details.

get_subject(index_id, subject, *, query_params=None)[source]¶

Fetch exactly one Subject document from Search, containing one or more Entries.

Parameters:

index_id (UUID | str) – the index containing this Subject
subject (str) – the subject string to fetch
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

GlobusHTTPResponse

Example Usage

Fetch the data for subject http://example.com/abc from index index_id:

sc = globus_sdk.SearchClient(...)
subject_data = sc.get_subject(index_id, "http://example.com/abc")

API Info

GET /v1/index/<index_id>/subject

See Get By Subject in the API documentation for details.

delete_subject(index_id, subject, *, query_params=None)[source]¶

Delete exactly one Subject document from Search, containing one or more Entries, as an asynchronous task.

A task_id value will be included in the response.

Parameters:

index_id (UUID | str) – the index in which data will be deleted
subject (str) – the subject string for the Subject document to delete
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

GlobusHTTPResponse

Example Usage

Delete all data for subject http://example.com/abc from index index_id, even data which is not visible to the current user:

sc = globus_sdk.SearchClient(...)
response = sc.delete_subject(index_id, "http://example.com/abc")
task_id = response["task_id"]

API Info

DELETE /v1/index/<index_id>/subject

See Delete By Subject in the API documentation for details.

get_entry(index_id, subject, *, entry_id=MISSING, query_params=None)[source]¶

Fetch exactly one Entry document from Search, identified by the combination of subject string and entry_id, which defaults to null.

Parameters:

index_id (UUID | str) – the index containing this Entry
subject (str) – the subject string for the Subject document containing this Entry
entry_id (str | MissingType) – the entry_id for this Entry, which defaults to null
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

GlobusHTTPResponse

Example Usage

Lookup the entry with a subject of https://example.com/foo/bar and a null entry_id:

sc = globus_sdk.SearchClient(...)
entry_data = sc.get_entry(index_id, "http://example.com/foo/bar")

Lookup the entry with a subject of https://example.com/foo/bar and an entry_id of foo/bar:

sc = globus_sdk.SearchClient(...)
entry_data = sc.get_entry(index_id, "http://example.com/foo/bar", entry_id="foo/bar")

API Info

GET /v1/index/<index_id>/entry

See Get Entry in the API documentation for details.

delete_entry(index_id, subject, *, entry_id=MISSING, query_params=None)[source]¶

Delete exactly one Entry document in Search as an asynchronous task.

A task_id value will be included in the response.

Parameters:

index_id (UUID | str) – the index in which data will be deleted
subject (str) – the subject string for the Subject of the document to delete
entry_id (str | MissingType) – the ID string for the Entry to delete
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

GlobusHTTPResponse

Example Usage

Delete an entry with a subject of https://example.com/foo/bar and a null entry_id:

sc = globus_sdk.SearchClient(...)
sc.delete_entry(index_id, "https://example.com/foo/bar")

Delete an entry with a subject of https://example.com/foo/bar and an entry_id of “foo/bar”:

sc = globus_sdk.SearchClient(...)
sc.delete_entry(index_id, "https://example.com/foo/bar", entry_id="foo/bar")

API Info

DELETE /v1/index/<index_id>/entry

See Delete Entry in the API documentation for details.

get_task(task_id, *, query_params=None)[source]¶

Fetch a Task document by ID, getting task details and status.

Parameters:

task_id (UUID | str) – the task ID from the original task submission
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
task = sc.get_task(task_id)
assert task["index_id"] == known_index_id
print(task["task_id"], "|", task["state"])

API Info

GET /v1/task/<task_id>

See Get Task in the API documentation for details.

get_task_list(index_id, *, query_params=None)[source]¶

Fetch a list of recent Task documents for an index, getting task details and status.

Parameters:

index_id (UUID | str) – the index to query
query_params (dict[str, Any] | None) – additional parameters to pass as query params

Return type:

GlobusHTTPResponse

Example Usage

sc = globus_sdk.SearchClient(...)
task_list = sc.get_task_list(index_id)
for task in task_list["tasks"]:
    print(task["task_id"], "|", task["state"])

API Info

GET /v1/task_list/<index_id>

See Task List in the API documentation for details.

create_role(index_id, data, *, query_params=None)[source]¶

Create a new role on an index. You must already have the owner or admin role on an index to create additional roles.

Roles are specified as a role name (one of "owner", "admin", or "writer") and a Principal URN.

Parameters:

index_id (UUID | str) – The index on which to create the role
data (dict[str, Any]) – The partial role document to use for creation
query_params (dict[str, Any] | None) – Any additional query params to pass

Return type:

GlobusHTTPResponse

Example Usage

identity_id = "46bd0f56-e24f-11e5-a510-131bef46955c"
sc = globus_sdk.SearchClient(...)
sc.create_role(
    index_id,
    {"role_name": "writer", "principal": f"urn:globus:auth:identity:{identity_id}"},
)

API Info

POST /v1/index/<index_id>/role

See Create Role in the API documentation for details.

get_role_list(index_id, *, query_params=None)[source]¶

List all roles on an index. You must have the owner or admin role on an index to list roles.

Parameters:

index_id (UUID | str) – The index on which to list roles
query_params (dict[str, Any] | None) – Any additional query params to pass

Return type:

GlobusHTTPResponse

API Info

GET /v1/index/<index_id>/role_list

See Get Role List in the API documentation for details.

delete_role(index_id, role_id, *, query_params=None)[source]¶

Delete a role from an index. You must have the owner or admin role on an index to delete roles. You cannot remove the last owner from an index.

Parameters:

index_id (UUID | str) – The index from which to delete a role
role_id (str) – The role to delete
query_params (dict[str, Any] | None) – Any additional query params to pass

Return type:

GlobusHTTPResponse

API Info

DELETE /v1/index/<index_id>/role/<role_id>

See Role Delete in the API documentation for details.

Helper Objects¶

class globus_sdk.SearchQueryV1(*, q=MISSING, limit=MISSING, offset=MISSING, advanced=MISSING, filters=MISSING, facets=MISSING, post_facet_filters=MISSING, boosts=MISSING, sort=MISSING, additional_fields=None)[source]¶

Bases: GlobusPayload

A specialized dict which has helpers for creating and modifying a Search Query document.

Parameters:

q (str | MissingType) – The query string. Required unless filters are used.
limit (int | MissingType) – A limit on the number of results returned in a single page
offset (int | MissingType) – An offset into the set of all results for the query
advanced (bool | MissingType) – Whether to enable (True) or not to enable (False) advanced parsing of query strings. The default of False is robust and guarantees that the query will not error with “bad query string” errors
filters (list[dict[str, t.Any]] | MissingType) – a list of filters to apply to the query
facets (list[dict[str, t.Any]] | MissingType) – a list of facets to apply to the query
post_facet_filters (list[dict[str, t.Any]] | MissingType) – a list of filters to apply after facet results are returned
boosts (list[dict[str, t.Any]] | MissingType) – a list of boosts to apply to the query
sort (list[dict[str, t.Any]] | MissingType) – a list of fields to sort results
additional_fields (dict[str, t.Any] | None) – additional data to include in the query document

class globus_sdk.SearchScrollQuery(q=MISSING, *, limit=MISSING, advanced=MISSING, marker=MISSING, additional_fields=None)[source]¶

Bases: GlobusPayload

A scrolling query type, for scrolling the full result set for an index.

Scroll queries have more limited capabilities than general searches. They cannot boost fields, sort, or apply facets. They can, however, still apply the same filtering mechanisms which are available to normal queries.

Scrolling also differs in that it supports the use of the marker field, which is used to paginate results.

Parameters:

q (str | MissingType) – The query string
limit (int | MissingType) – A limit on the number of results returned in a single page
advanced (bool | MissingType) – Whether to enable (True) or not to enable (False) advanced parsing of query strings. The default of False is robust and guarantees that the query will not error with “bad query string” errors
marker (str | MissingType) – the marker value
additional_fields (dict[str, t.Any] | None) – additional data to include in the query document

Client Errors¶

When an error occurs, a SearchClient will raise this specialized type of error, rather than a generic GlobusAPIError.

class globus_sdk.SearchAPIError(r)[source]¶

Bases: GlobusAPIError

Error class for the Search API client. In addition to the inherited instance variables, provides error_data.

Variables:: error_data – Additional object returned in the error response. May be a dict, list, or None.