ufcscraper.utils module

ufcscraper.utils.clean_date_string(date_str: str) str[source]
Clean a date string to remove incorrect ordinal suffixes and make it

suitable for parsing.

Parameters:

date_str (str) – The date string to be cleaned.

Returns:

The cleaned date string.

Return type:

str

class ufcscraper.utils.element_present_in_list(*locators: Tuple[str, str])[source]

Bases: object

Callable to check if an element is present in a list of elements on a web page.

locators

Locators used to find elements on the page.

Type:

Tuple[str, str]

__call__(driver: webdriver.Chrome) bool | List[WebElement][source]

Check if any elements matching the locators are present on the page.

Parameters:

driver – The WebDriver instance used to interact with the web page.

Returns:

True if elements are found, otherwise False. If elements are found,

returns the list of WebElements.

ufcscraper.utils.get_session() Session[source]

Create and configure a new requests.Session object with retry functionality.

Returns:

A configured session object with retry strategy.

Return type:

requests.Session

Parse the HTML content of a given URL into a BeautifulSoup object.

Parameters:
  • url – The URL to scrape.

  • session – A requests session object. If not provided, a new session will be created.

  • delay – Delay in seconds before making the request.

Returns:

Parsed BeautifulSoup object containing the HTML content of the page.

Parse the HTML content from given URLs into a BeautifulSoup objects.

Create a generator that yields tuples of URLs and their corresponding BeautifulSoup objects.

This function uses multiple processes to fetch and parse web pages concurrently.

Parameters:
  • urls – List of URLs to be scraped.

  • n_sessions – Number of concurrent sessions to use for scraping. Defaults to 1.

  • delay – Delay in seconds to wait before making each request. Defaults to 0.

Returns:

Tuples containing the URL and the corresponding BeautifulSoup object.

ufcscraper.utils.parse_date(date_str: str) datetime.date | None[source]

Parse a date string into a datetime.date object.

Parameters:

date_str (str) – The date string to be parsed.

Returns:

The parsed date object if successful,

otherwise None.

Return type:

Optional[datetime.date]

ufcscraper.utils.worker_constructor(method: Callable[..., T], max_exception_retries: int = 4) Callable[[multiprocessing.Queue, multiprocessing.Queue, requests.Session], None][source]

Create a worker target function for processing tasks with retry functionality.

Parameters:
  • method – The function to be executed by the worker.

  • max_exception_retries – Maximum number of retries for handling exceptions.

Returns:

A worker function that processes tasks from a queue and puts results in

another queue.