ufcscraper.fighter_scraper module

This module defines a FighterScraper class for scraping and processing fighter data from UFCStats.

The FighterScraper class inherits from the BaseScraper class and is designed to retrieve detailed information about UFC fighters, including personal details, physical attributes, and fight records. The scraped data is processed and saved into a CSV file for later analysis. The module also provides methods for parsing and converting specific attributes like height, weight, reach, and more from the scraped HTML content.

class ufcscraper.fighter_scraper.FighterScraper(data_folder: Path | str, n_sessions: int | None = None, delay: float | None = None)[source]

Bases: BaseScraper

Scrapes and stores fighter data from UFCStats.

This class handles scraping fighter details from UFCStats, including personal information, physical attributes, and fight records. The data is saved to a CSV file for further analysis.

add_name_column() None[source]

Adds a combined name column to the DataFrame.

The new column is created by concatenating the fighter’s first

and last names.

data = Empty DataFrame Columns: [fighter_id, fighter_f_name, fighter_l_name, fighter_nickname, fighter_height_cm, fighter_weight_lbs, fighter_reach_cm, fighter_stance, fighter_dob, fighter_w, fighter_l, fighter_d, fighter_nc_dq] Index: []
dtypes: Dict[str, type | pd.core.arrays.integer.Int64Dtype] = {'fighter_d': Int64Dtype(), 'fighter_dob': 'datetime64[ns]', 'fighter_f_name': <class 'str'>, 'fighter_height_cm': <class 'float'>, 'fighter_id': <class 'str'>, 'fighter_l': Int64Dtype(), 'fighter_l_name': <class 'str'>, 'fighter_nc_dq': Int64Dtype(), 'fighter_nickname': <class 'str'>, 'fighter_reach_cm': <class 'float'>, 'fighter_stance': <class 'str'>, 'fighter_w': Int64Dtype(), 'fighter_weight_lbs': <class 'float'>}
filename: str = 'fighter_data.csv'
get_fighter_urls() List[str][source]

Retrieves the URLs for fighter profiles.

Returns:

A list of URLs to fighter profiles.

static parse_dob(dob: bs4.element.Tag) str[source]

Parses and formats the fighter’s date of birth.

Parameters:

dob – BeautifulSoup tag containing the date of birth.

Returns:

The date of birth in YYYY-MM-DD format, or “” if not available.

static parse_height(height: bs4.element.Tag) str[source]

Parses and converts fighter’s height from feet and inches to cm.

Parameters:

height – BeautifulSoup tag containing the height in feet and inches.

Returns:

The height in centimeters, or “” if not available.

static parse_l_name(name: List[str]) str[source]

Parses the last name from a list of name parts.

Parameters:

name – List of name parts.

Returns:

The parsed last name, or “” if it cannot be determined.

static parse_nickname(nickname: bs4.element.Tag) str[source]

Parses the fighter’s nickname.

Parameters:

nickname – BeautifulSoup tag containing the nickname.

Returns:

The parsed nickname, or “” if not available.

static parse_reach(reach: bs4.element.Tag) str[source]

Parses and converts fighter’s reach from inches to cm.

Parameters:

reach – BeautifulSoup tag containing the reach in inches.

Returns:

The reach in centimeters, or “” if not available.

static parse_stance(stance: bs4.element.Tag) str[source]

Parses the fighter’s stance.

Parameters:

stance – BeautifulSoup tag containing the stance.

Returns:

The stance, or “” if not available.

static parse_weight(weight_element: bs4.element.Tag) str[source]

Parses the fighter’s weight.

Parameters:

weight_element – BeautifulSoup tag containing the weight.

Returns:

The weight in pounds, or “” if not available.

scrape_fighters() None[source]

Scrapes fighter details from URLs and saves the data to a CSV file.

This method retrieves fighter URLs, scrapes details from each URL, and appends the data to the CSV file. Handles errors and logs progress.

sort_fields: List[str] = ['fighter_l_name', 'fighter_f_name', 'fighter_id']
classmethod url_from_id(id_: str) str[source]

Constructs the URL for a fighter’s details page based on their ID.

Parameters:

id – The fighter’s unique identifier.

Returns:

The URL for the fighter’s details page.