Marqo InstantAPI
- class marqo_instantapi.instant_api_client.InstantAPIClient(api_key)[source]
Bases:
object
A client for the InstantAPI API.
- retrieve(webpage_url, api_method_name, api_response_structure, api_parameters=None, country_code=None, verbose=False, wait_for_xpath=None, enable_javascript=None, cache_ttl=None, serp_limit=None, serp_site=None, serp_page_num=None)[source]
Implements an interface to the InstantAPI retrieve endpoint.
- Parameters:
webpage_url (str) – The URL of the webpage to retrieve data from.
api_method_name (str) – The name of the API method to use.
api_response_structure (Union[str, Dict[str, Any]]) – The structure of the API response.
api_parameters (Optional[Union[str, Dict[str, Any]]], optional) – The parameters to pass to the API method. Defaults to None.
country_code (Optional[str], optional) – The country code to use for the request. Defaults to None.
verbose (bool, optional) – Whether to return verbose output. Defaults to False.
wait_for_xpath (Optional[str], optional) – The XPath to wait for before returning the response. Defaults to None.
enable_javascript (Optional[bool], optional) – Whether to enable JavaScript in the browser. Defaults to None.
cache_ttl (Optional[int], optional) – The time-to-live for the cache. Defaults to None.
serp_limit (Optional[int], optional) – The number of results to return for SERP requests. Defaults to None.
serp_site (Optional[str], optional) – The site to use for SERP requests. Defaults to None.
serp_page_num (Optional[int], optional) – The page number to use for SERP requests. Defaults to None.
- Returns:
The response from the InstantAPI retrieve endpoint.
- Return type:
Dict[str, Any]
- class marqo_instantapi.marqo_instantapi_adapter.InstantAPIMarqoAdapter(marqo_url='http://localhost:8882', marqo_api_key=None, instantapi_key=None)[source]
Bases:
object
A class for interfacing with Marqo and InstantAPI.
- add_documents(webpage_urls, index_name, api_response_structure, api_method_name, text_fields_to_index=[], image_fields_to_index=[], client_batch_size=8, total_image_weight=0.9, total_text_weight=0.1, enforce_schema=True, instantapi_threads=10)[source]
Add documents to a Marqo index from a list of webpage URLs, data is extracted using the InstantAPI Retrieve API.
- Parameters:
webpage_urls (list[str]) – A list of webpage URLs to index.
index_name (str) – The name of the index to add documents to.
api_response_structure (dict) – The expected structure of the API’s response.
api_method_name (str) – The name of the API method to use for data extraction.
text_fields_to_index (list[str], optional) – A list of text fields for indexing. Defaults to [].
image_fields_to_index (list[str], optional) – A list of image fields for indexing. Defaults to [].
client_batch_size (int, optional) – The client batch size for Marqo. Defaults to 8.
total_image_weight (float, optional) – The total weight for images. Defaults to 0.9.
total_text_weight (float, optional) – The total weight for text. Defaults to 0.1.
enforce_schema (bool, optional) – Toggle strict enforcement of InstantAPI responses against the schema. Defaults to True.
instantapi_threads (int, optional) – The number of threads to use for InstantAPI requests. Defaults to 10.
- Raises:
ValueError – If no fields are provided for indexing.
- Returns:
A list of responses for each document added.
- Return type:
list[dict]
- crawl(initial_webpage_urls, allowed_domains, index_name, api_response_structure, text_fields_to_index=[], image_fields_to_index=[], client_batch_size=8, total_image_weight=0.9, total_text_weight=0.1, enforce_schema=True, max_pages=None)[source]
Crawl a set of webpages and add them to a Marqo index.
- Parameters:
initial_webpage_urls (list[str]) – A list of initial webpage URLs to start the crawl from.
allowed_domains (set[str]) – A set of domains to exclude from the crawl.
index_name (str) – The name of the index to add documents to. If the index does not exist, it will be created based on the fields to index.
api_response_structure (dict) – The expected structure of the API’s response, this is passed to InstantAPI.
text_fields_to_index (list[str], optional) – A list of text fields for indexing. Defaults to [].
image_fields_to_index (list[str], optional) – A list of image fields for indexing. Defaults to [].
client_batch_size (int, optional) – The client batch size for Marqo, controls how many docs are sent at a time. Defaults to 8.
total_image_weight (float, optional) – The total weight for images, applies when both image and text fields are provided. Defaults to 0.9.
total_text_weight (float, optional) – The total weight for text, applies when both image and text fields are provided. Defaults to 0.1.
enforce_schema (bool, optional) – Toggle strict enforcement of InstantAPI responses against the schema. Defaults to True.
max_pages (Optional[int], optional) – The maximum number of pages to crawl. Defaults to None.
- Raises:
ValueError – If no fields are provided for indexing.
- Returns:
A list of responses for each document added.
- Return type:
list[dict]
- create_index(index_name, multimodal=False, model=None, skip_if_exists=False)[source]
Simplified method for creating a Marqo index, recommended when fine grained control is not needed.
- Parameters:
index_name (str) – The name of the index to create.
multimodal (bool, optional) – Toggles image downloading on or off, if model is not provided then also influences model selection. Defaults to False.
model (Optional[str], optional) – Optionally specify a specific model. Defaults to None.
skip_if_exists (bool, optional) – Skip index creation if the index already exists, does not check if the index conforms to the provided parameters. Defaults to False.
- Returns:
index creation response
- Return type:
dict
Examples
>>> marqo_adapter = InstantAPIMarqoAdapter() >>> marqo_adapter.create_index("my-index")
- delete_index(index_name, confirm=False, skip_if_not_exists=False)[source]
Delete a Marqo index.
- Parameters:
index_name (str) – The name of the index to delete.
confirm (bool, optional) – Automatically confirms the deletion. Defaults to False.
skip_if_not_exists (bool, optional) – Skip deletion if the index does not exist. Defaults to False.
- Returns:
The deletion response.
- Return type:
dict
- search(q, index_name, limit=10, offset=0, searchable_attributes=None, search_method='hybrid')[source]
Search a Marqo index via a simplified interface.
- Parameters:
q (str) – The query string to search for.
index_name (str) – The name of the index to search.
limit (int, optional) – The number of results to retrieve. Defaults to 10.
offset (int, optional) – The offset for the search results. Defaults to 0.
searchable_attributes (Optional[list], optional) – The attributes to search. Defaults to None.
search_method (Literal["tensor", "lexical", "hybrid"], optional) – The search method to use, tensor uses only vectors, lexical uses only text, hybrid combines both with RRF. Defaults to “hybrid”.
- Raises:
ValueError – If an invalid search method is provided.
- Returns:
The search response from Marqo.
- Return type:
dict