Globus Search¶
- class globus_sdk.SearchClient(*, environment=None, base_url=None, app=None, app_scopes=None, authorizer=None, app_name=None, transport_params=None)[source]¶
Bases:
BaseClient
Client for the Globus Search API
This class provides helper methods for most common resources in the API, and basic
get
,put
,post
, anddelete
methods from the base client that can be used to access any API resource.Methods
Methods
post_search()
,paginated.post_search()
scroll()
,paginated.scroll()
search()
,paginated.search()
- scopes: ScopeBuilder | None = <globus_sdk.scopes.builder.ScopeBuilder object>¶
the scopes for this client may be present as a
ScopeBuilder
- property default_scope_requirements: list[Scope]¶
Scopes that will automatically be added to this client’s app’s scope_requirements during _finalize_app.
For clients with static scope requirements this can just be a static value. Clients with dynamic requirements should use @property and must return sane results while the Base Client is being initialized.
- create_index(display_name, description)[source]¶
Create a new index.
- Parameters:
- Return type:
New indices default to trial status. For subscribers with a subscription ID, indices can be converted to non-trial by sending a request to support@globus.org
sc = globus_sdk.SearchClient(...) r = sc.create_index( "History and Witchcraft", "Searchable information about history and witchcraft", ) print(f"index ID: {r['id']}")
{ "@datatype": "GSearchIndex", "@version": "2017-09-01", "creation_date": "2021-04-05 15:05:18", "display_name": "Awesome Index of Awesomeness", "description": "An index so awesome that it simply cannot be described", "id": "9785a0be-408d-481c-9d07-409d668235b8", "is_trial": true, "subscription_id": null, "max_size_in_mb": 1, "num_entries": 0, "num_subjects": 0, "size_in_mb": 0, "status": "open" }
POST /v1/index
See Index Create in the API documentation for details.
- delete_index(index_id)[source]¶
Mark an index for deletion.
Globus Search does not immediately delete indices. Instead, this API sets the index status to
"delete-pending"
. Search will move pending tasks on the index to theCANCELLED
state and will eventually delete the index.If the index is a trial index, it will be deleted a few minutes after being marked for deletion. If the index is non-trial, it will be kept for 30 days and will be eligible for use with the
reopen
API (seereopen_index()
) during that time.- Parameters:
- Return type:
sc = globus_sdk.SearchClient(...) sc.delete_index(index_id)
{ "index_id": "7471ae92-2ff2-4bd4-8881-94f6b228d2a7", "acknowledged": true }
DELETE /v1/index/<index_id>
See Index Delete in the API documentation for details.
- reopen_index(index_id)[source]¶
Reopen an index that has been marked for deletion, cancelling the deletion.
- Parameters:
- Return type:
sc = globus_sdk.SearchClient(...) sc.reopen_index(index_id)
{ "index_id": "8995d0b9-80ad-4b1f-9050-5b55864dd73b", "acknowledged": true }
POST /v1/index/<index_id>/reopen
See Index Reopen in the API documentation for details.
- get_index(index_id, *, query_params=None)[source]¶
Get descriptive data about a Search index, including its title and description and how much data it contains.
- Parameters:
- Return type:
sc = globus_sdk.SearchClient(...) index = sc.get_index(index_id) assert index["id"] == index_id print(index["display_name"], "(" + index_id + "):", index["description"])
GET /v1/index/<index_id>
See Index Show in the API documentation for details.
- search(index_id, q, *, offset=0, limit=10, advanced=False, query_params=None)[source]¶
Execute a simple Search Query, described by the query string
q
.- Parameters:
q (str) – the query string
offset (int) – an offset for pagination
limit (int) – the size of a page of results
advanced (bool) – enable ‘advanced’ query mode, which has sophisticated syntax but may result in BadRequest errors when used if the query is invalid
query_params (dict[str, Any] | None) – additional parameters to pass as query params
- Return type:
For details on query syntax, including the
advanced
query behavior, see the Search Query Syntax documentation.sc = globus_sdk.SearchClient(...) result = sc.search(index_id, "query string") advanced_result = sc.search(index_id, 'author: "Ada Lovelace"', advanced=True)
This method supports paginated access. To use the paginated variant, give the same arguments as normal, but prefix the method name with
paginated
, as inclient.paginated.search(...)
For more information, see how to make paginated calls.
GET /v1/index/<index_id>/search
See GET Search Query in the API documentation for details.
{ "@datatype": "GSearchResult", "@version": "2017-09-01", "count": 1, "gmeta": [ { "@datatype": "GMetaResult", "@version": "2019-08-27", "entries": [ { "content": { "foo": "bar" }, "entry_id": null, "matched_principal_sets": [] } ], "subject": "foo-bar" } ], "has_next_page": true, "offset": 0, "total": 10 }
- post_search(index_id, data, *, offset=None, limit=None)[source]¶
Execute a complex Search Query, using a query document to express filters, facets, sorting, field boostring, and other behaviors.
- Parameters:
- Return type:
For details on query syntax, including the
advanced
query behavior, see the Search Query Syntax documentation.sc = globus_sdk.SearchClient(...) query_data = { "q": "user query", "filters": [ { "type": "range", "field_name": "path.to.date", "values": [{"from": "*", "to": "2014-11-07"}], } ], "facets": [ { "name": "Publication Date", "field_name": "path.to.date", "type": "date_histogram", "date_interval": "year", } ], "sort": [{"field_name": "path.to.date", "order": "asc"}], } search_result = sc.post_search(index_id, query_data)
This method supports paginated access. To use the paginated variant, give the same arguments as normal, but prefix the method name with
paginated
, as inclient.paginated.post_search(...)
For more information, see how to make paginated calls.
POST /v1/index/<index_id>/search
See POST Search Query in the API documentation for details.
- scroll(index_id, data, *, marker=None)[source]¶
Scroll all data in a Search index. The paginated version of this API should typically be preferred, as it is the intended mode of usage.
Note that if data is written or deleted during scrolling, it is possible for scrolling to not include results or show other unexpected behaviors.
- Parameters:
- Return type:
For details on query syntax, including the
advanced
query behavior, see the Search Query Syntax documentation.sc = globus_sdk.SearchClient(...) scroll_result = sc.scroll(index_id, {"q": "*"})
This method supports paginated access. To use the paginated variant, give the same arguments as normal, but prefix the method name with
paginated
, as inclient.paginated.scroll(...)
For more information, see how to make paginated calls.
POST /v1/index/<index_id>/scroll
See Scroll Query in the API documentation for details.
- ingest(index_id, data)[source]¶
Write data to a Search index as an asynchronous task. The data can be provided as a single document or list of documents, but only one
task_id
value will be included in the response.- Parameters:
- Return type:
sc = globus_sdk.SearchClient(...) ingest_data = { "ingest_type": "GMetaEntry", "ingest_data": { "subject": "https://example.com/foo/bar", "visible_to": ["public"], "content": {"foo/bar": "some val"}, }, } sc.ingest(index_id, ingest_data)
or with multiple entries at once via a GMetaList:
sc = globus_sdk.SearchClient(...) ingest_data = { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "subject": "https://example.com/foo/bar", "visible_to": ["public"], "content": {"foo/bar": "some val"}, }, { "subject": "https://example.com/foo/bar", "id": "otherentry", "visible_to": ["public"], "content": {"foo/bar": "some otherval"}, }, ] }, } sc.ingest(index_id, ingest_data)
POST /v1/index/<index_id>/ingest
See Ingest in the API documentation for details.
- delete_by_query(index_id, data)[source]¶
Delete data in a Search index as an asynchronous task, deleting all documents which match a given query. The query uses a restricted subset of the syntax available for complex queries, as it is not meaningful to boost, sort, or otherwise rank data in this case.
A
task_id
value will be included in the response.- Parameters:
- Return type:
sc = globus_sdk.SearchClient(...) query_data = { "q": "user query", "filters": [ { "type": "range", "field_name": "path.to.date", "values": [{"from": "*", "to": "2014-11-07"}], } ], } sc.delete_by_query(index_id, query_data)
POST /v1/index/<index_id>/delete_by_query
See Delete By Query in the API documentation for details.
- batch_delete_by_subject(index_id, subjects, additional_params=None)[source]¶
Delete data in a Search index as an asynchronous task, deleting multiple documents based on their
subject
values.A
task_id
value will be included in the response.- Parameters:
- Return type:
sc = globus_sdk.SearchClient(...) sc.batch_delete_by_subject( index_id, subjects=[ "very-cool-document", "less-cool-document", "document-wearing-sunglasses", ], )
{ "task_id": "b296234e-6c74-11ef-ae2e-0242ac110002" }
POST /v1/index/<index_id>/batch_delete_by_subject
See Delete By Subject in the API documentation for details.
- get_subject(index_id, subject, *, query_params=None)[source]¶
Fetch exactly one Subject document from Search, containing one or more Entries.
- Parameters:
- Return type:
Fetch the data for subject
http://example.com/abc
from indexindex_id
:sc = globus_sdk.SearchClient(...) subject_data = sc.get_subject(index_id, "http://example.com/abc")
GET /v1/index/<index_id>/subject
See Get By Subject in the API documentation for details.
- delete_subject(index_id, subject, *, query_params=None)[source]¶
Delete exactly one Subject document from Search, containing one or more Entries, as an asynchronous task.
A
task_id
value will be included in the response.- Parameters:
- Return type:
Delete all data for subject
http://example.com/abc
from indexindex_id
, even data which is not visible to the current user:sc = globus_sdk.SearchClient(...) response = sc.delete_subject(index_id, "http://example.com/abc") task_id = response["task_id"]
DELETE /v1/index/<index_id>/subject
See Delete By Subject in the API documentation for details.
- get_entry(index_id, subject, *, entry_id=None, query_params=None)[source]¶
Fetch exactly one Entry document from Search, identified by the combination of
subject
string andentry_id
, which defaults tonull
.- Parameters:
- Return type:
Lookup the entry with a subject of
https://example.com/foo/bar
and a null entry_id:sc = globus_sdk.SearchClient(...) entry_data = sc.get_entry(index_id, "http://example.com/foo/bar")
Lookup the entry with a subject of
https://example.com/foo/bar
and an entry_id offoo/bar
:sc = globus_sdk.SearchClient(...) entry_data = sc.get_entry(index_id, "http://example.com/foo/bar", entry_id="foo/bar")
GET /v1/index/<index_id>/entry
See Get Entry in the API documentation for details.
- create_entry(index_id, data)[source]¶
This API method is in effect an alias of ingest and is deprecated. Users are recommended to use
ingest()
instead.Create or update one Entry document in Search.
The API does not enforce that the document does not exist, and will overwrite any existing data.
- Parameters:
- Return type:
Create an entry with a subject of
https://example.com/foo/bar
and a null entry_id:sc = globus_sdk.SearchClient(...) sc.create_entry( index_id, { "subject": "https://example.com/foo/bar", "visible_to": ["public"], "content": {"foo/bar": "some val"}, }, )
Create an entry with a subject of
https://example.com/foo/bar
and an entry_id offoo/bar
:sc = globus_sdk.SearchClient(...) sc.create_entry( index_id, { "subject": "https://example.com/foo/bar", "visible_to": ["public"], "id": "foo/bar", "content": {"foo/bar": "some val"}, }, )
POST /v1/index/<index_id>/entry
See Create Entry in the API documentation for details.
- update_entry(index_id, data)[source]¶
This API method is in effect an alias of ingest and is deprecated. Users are recommended to use
ingest()
instead.Create or update one Entry document in Search.
This does not do a partial update, but replaces the existing document.
- Parameters:
- Return type:
Update an entry with a subject of
https://example.com/foo/bar
and a null entry_id:sc = globus_sdk.SearchClient(...) sc.update_entry( index_id, { "subject": "https://example.com/foo/bar", "visible_to": ["public"], "content": {"foo/bar": "some val"}, }, )
PUT /v1/index/<index_id>/entry
See Update Entry in the API documentation for details.
- delete_entry(index_id, subject, *, entry_id=None, query_params=None)[source]¶
Delete exactly one Entry document in Search as an asynchronous task.
A
task_id
value will be included in the response.- Parameters:
- Return type:
Delete an entry with a subject of
https://example.com/foo/bar
and a null entry_id:sc = globus_sdk.SearchClient(...) sc.delete_entry(index_id, "https://example.com/foo/bar")
Delete an entry with a subject of
https://example.com/foo/bar
and an entry_id of “foo/bar”:sc = globus_sdk.SearchClient(...) sc.delete_entry(index_id, "https://example.com/foo/bar", entry_id="foo/bar")
DELETE /v1/index/<index_id>/entry
See Delete Entry in the API documentation for details.
- get_task(task_id, *, query_params=None)[source]¶
Fetch a Task document by ID, getting task details and status.
- Parameters:
- Return type:
sc = globus_sdk.SearchClient(...) task = sc.get_task(task_id) assert task["index_id"] == known_index_id print(task["task_id"], "|", task["state"])
GET /v1/task/<task_id>
See Get Task in the API documentation for details.
- get_task_list(index_id, *, query_params=None)[source]¶
Fetch a list of recent Task documents for an index, getting task details and status.
- Parameters:
- Return type:
sc = globus_sdk.SearchClient(...) task_list = sc.get_task_list(index_id) for task in task_list["tasks"]: print(task["task_id"], "|", task["state"])
GET /v1/task_list/<index_id>
See Task List in the API documentation for details.
- create_role(index_id, data, *, query_params=None)[source]¶
Create a new role on an index. You must already have the
owner
oradmin
role on an index to create additional roles.Roles are specified as a role name (one of
"owner"
,"admin"
, or"writer"
) and a Principal URN.- Parameters:
- Return type:
identity_id = "46bd0f56-e24f-11e5-a510-131bef46955c" sc = globus_sdk.SearchClient(...) sc.create_role( index_id, {"role_name": "writer", "principal": f"urn:globus:auth:identity:{identity_id}"}, )
POST /v1/index/<index_id>/role
See Create Role in the API documentation for details.
- get_role_list(index_id, *, query_params=None)[source]¶
List all roles on an index. You must have the
owner
oradmin
role on an index to list roles.- Parameters:
- Return type:
GET /v1/index/<index_id>/role_list
See Get Role List in the API documentation for details.
- delete_role(index_id, role_id, *, query_params=None)[source]¶
Delete a role from an index. You must have the
owner
oradmin
role on an index to delete roles. You cannot remove the lastowner
from an index.- Parameters:
- Return type:
DELETE /v1/index/<index_id>/role/<role_id>
See Role Delete in the API documentation for details.
Helper Objects¶
Note that you should not use
SearchQueryBase
directly,
and it is not importable from the top level of the SDK. It is included in documentation
only to document the methods it provides to its subclasses.
- class globus_sdk.services.search.data.SearchQueryBase(dict=None, /, **kwargs)[source]¶
Bases:
PayloadWrapper
The base class for all Search query helpers.
Search has multiple types of query documents. Not all of their supported attributes are shared, and they therefore do not inherit from one another. This class implements common methods to all of them.
Query objects have a chainable API, in which methods return the query object after modification. This allows usage like
>>> query = ... >>> query = query.set_limit(10).set_advanced(False)
- add_filter(field_name, values, *, type='match_all', additional_fields=None)[source]¶
Add a filter subdocument to the query.
- set_advanced(advanced)[source]¶
Enable or disable advanced query string processing.
- Parameters:
advanced (bool) – whether to enable (
True
) or not (False
)- Return type:
SearchQueryT
- class globus_sdk.SearchQuery(q=None, *, limit=None, offset=None, advanced=None, additional_fields=None)[source]¶
Bases:
SearchQueryBase
A specialized dict which has helpers for creating and modifying a Search Query document.
- Parameters:
q (str | None) – The query string. Required unless filters are used.
limit (int | None) – A limit on the number of results returned in a single page
offset (int | None) – An offset into the set of all results for the query
advanced (bool | None) – Whether to enable (
True
) or not to enable (False
) advanced parsing of query strings. The default ofFalse
is robust and guarantees that the query will not error with “bad query string” errorsadditional_fields (dict[str, t.Any] | None) – additional data to include in the query document
Example usage:
>>> from globus_sdk import SearchClient, SearchQuery >>> sc = SearchClient(...) >>> index_id = ... >>> query = (SearchQuery(q='example query') >>> .set_limit(100).set_offset(10) >>> .add_filter('path.to.field1', ['foo', 'bar'])) >>> result = sc.post_search(index_id, query)
- add_boost(field_name, factor, *, additional_fields=None)[source]¶
Add a boost subdocument to the query.
- add_facet(name, field_name, *, type='terms', size=None, date_interval=None, histogram_range=None, additional_fields=None)[source]¶
Add a facet subdocument to the query.
- Parameters:
name (str) – the name for the facet in the result
field_name (str) – the field on which to build the facet
type (str) – the type of facet to apply, defaults to “terms”
size (int | None) – the size parameter for the facet
date_interval (str | None) – the date interval for a date histogram facet
histogram_range (tuple[Any, Any] | None) – a low and high bound for a numeric histogram facet
additional_fields (dict[str, Any] | None) – additional data to include in the facet document
- Return type:
- add_sort(field_name, *, order=None, additional_fields=None)[source]¶
Add a sort subdocument to the query.
- class globus_sdk.SearchScrollQuery(q=None, *, limit=None, advanced=None, marker=None, additional_fields=None)[source]¶
Bases:
SearchQueryBase
A scrolling query type, for scrolling the full result set for an index.
Scroll queries have more limited capabilities than general searches. They cannot boost fields, sort, or apply facets. They can, however, still apply the same filtering mechanisms which are available to normal queries.
Scrolling also differs in that it supports the use of the
marker
field, which is used to paginate results.- Parameters:
q (str | None) – The query string
limit (int | None) – A limit on the number of results returned in a single page
advanced (bool | None) – Whether to enable (
True
) or not to enable (False
) advanced parsing of query strings. The default ofFalse
is robust and guarantees that the query will not error with “bad query string” errorsmarker (str | None) – the marker value
additional_fields (dict[str, t.Any] | None) – additional data to include in the query document
Client Errors¶
When an error occurs, a SearchClient
will raise this specialized type of
error, rather than a generic GlobusAPIError
.
- class globus_sdk.SearchAPIError(r)[source]¶
Bases:
GlobusAPIError
Error class for the Search API client. In addition to the inherited instance variables, provides
error_data
.- Variables:
error_data – Additional object returned in the error response. May be a dict, list, or None.