ufcscraper.utils module
- ufcscraper.utils.clean_date_string(date_str: str) str[source]
- Clean a date string to remove incorrect ordinal suffixes and make it
suitable for parsing.
- Parameters:
date_str (str) – The date string to be cleaned.
- Returns:
The cleaned date string.
- Return type:
str
- class ufcscraper.utils.element_present_in_list(*locators: Tuple[str, str])[source]
Bases:
objectCallable to check if an element is present in a list of elements on a web page.
- locators
Locators used to find elements on the page.
- Type:
Tuple[str, str]
- __call__(driver: webdriver.Chrome) bool | List[WebElement][source]
Check if any elements matching the locators are present on the page.
- Parameters:
driver – The WebDriver instance used to interact with the web page.
- Returns:
- True if elements are found, otherwise False. If elements are found,
returns the list of WebElements.
- ufcscraper.utils.get_session() Session[source]
Create and configure a new requests.Session object with retry functionality.
- Returns:
A configured session object with retry strategy.
- Return type:
requests.Session
- ufcscraper.utils.link_to_soup(url: str, session: requests.Session | None = None, delay: float = 0) bs4.BeautifulSoup[source]
Parse the HTML content of a given URL into a BeautifulSoup object.
- Parameters:
url – The URL to scrape.
session – A requests session object. If not provided, a new session will be created.
delay – Delay in seconds before making the request.
- Returns:
Parsed BeautifulSoup object containing the HTML content of the page.
- ufcscraper.utils.links_to_soups(urls: List[str], n_sessions: int = 1, delay: float = 0) Generator[Tuple[str, bs4.BeautifulSoup]][source]
Parse the HTML content from given URLs into a BeautifulSoup objects.
Create a generator that yields tuples of URLs and their corresponding BeautifulSoup objects.
This function uses multiple processes to fetch and parse web pages concurrently.
- Parameters:
urls – List of URLs to be scraped.
n_sessions – Number of concurrent sessions to use for scraping. Defaults to 1.
delay – Delay in seconds to wait before making each request. Defaults to 0.
- Returns:
Tuples containing the URL and the corresponding BeautifulSoup object.
- ufcscraper.utils.parse_date(date_str: str) datetime.date | None[source]
Parse a date string into a datetime.date object.
- Parameters:
date_str (str) – The date string to be parsed.
- Returns:
- The parsed date object if successful,
otherwise None.
- Return type:
Optional[datetime.date]
- ufcscraper.utils.worker_constructor(method: Callable[..., T], max_exception_retries: int = 4) Callable[[multiprocessing.Queue, multiprocessing.Queue, requests.Session], None][source]
Create a worker target function for processing tasks with retry functionality.
- Parameters:
method – The function to be executed by the worker.
max_exception_retries – Maximum number of retries for handling exceptions.
- Returns:
- A worker function that processes tasks from a queue and puts results in
another queue.