ufcscraper.fight_scraper module
This module defines classes for scraping fight and round data from the UFCStats website.
- Classes:
FightScraper: Inherits from BaseScraper and is responsible for scraping detailed fight statistics, such as fighter information, results, referees, and more. The data is stored in a CSV file named fight_data.csv. It also interacts with the RoundsHandler to scrape and store round-specific statistics.
RoundsHandler: Inherits from BaseFileHandler and manages the collection and storage of round-specific fight data. The data is saved in a CSV file named round_data.csv. It handles statistics like strikes, takedowns, control time, and more.
- class ufcscraper.fight_scraper.BaseFightScraper(data_folder: Path | str, n_sessions: int | None = None, delay: float | None = None)[source]
Bases:
BaseScraper,ABCBase class for fight scrapers.
This class provides the basic functionality to scrape fight data from the UFCStats it should be inherited by specific fight scraper classes.
- event_scraper
alias of
EventScraper
- get_fight_urls(get_all_events: bool = False) List[str][source]
Retrieves URLs of all fights from UFCStats.
- Parameters:
get_all_events – If False, only gets URLs for fights from events not already scraped.
- Returns:
A list of URLs for fights.
- static get_fighters(fight_details: bs4.element.ResultSet, fight_soup: bs4.BeautifulSoup) Tuple[str, str][source]
Extracts fighter IDs from the fight details.
- Parameters:
fight_details – A ResultSet containing fight detail information.
fight_soup – The BeautifulSoup object containing the fight page.
- Returns:
A tuple containing the IDs of the two fighters.
- static get_title_fight(fight_type: bs4.element.ResultSet) str[source]
Determines if the fight is a title fight.
- Parameters:
fight_type – A ResultSet containing fight type information.
- Returns:
‘T’ if it’s a title fight, ‘F’ otherwise.
- class ufcscraper.fight_scraper.FightScraper(*args: Any, **kwargs: Any)[source]
Bases:
BaseFightScraperScrapes fight data from the UFCStats website.
This class inherits from BaseScraper and handles scraping detailed fight statistics including fighters, referees, results, and more. It saves the scraped data into two CSV files: one for fights and one for rounds (through the companion class RoundsHandler).
- data = Empty DataFrame Columns: [fight_id, event_id, referee, fighter_1, fighter_2, winner, num_rounds, title_fight, weight_class, gender, result, result_details, finish_round, finish_time, time_format, scores_1, scores_2] Index: []
- dtypes: Dict[str, type | pd.core.arrays.integer.Int64Dtype] = {'event_id': <class 'str'>, 'fight_id': <class 'str'>, 'fighter_1': <class 'str'>, 'fighter_2': <class 'str'>, 'finish_round': Int64Dtype(), 'finish_time': <class 'str'>, 'gender': <class 'str'>, 'num_rounds': Int64Dtype(), 'referee': <class 'str'>, 'result': <class 'str'>, 'result_details': <class 'str'>, 'scores_1': Int64Dtype(), 'scores_2': Int64Dtype(), 'time_format': <class 'str'>, 'title_fight': <class 'str'>, 'weight_class': <class 'str'>, 'winner': <class 'str'>}
- filename: str = 'fight_data.csv'
- static get_gender(fight_type: bs4.element.ResultSet) str[source]
Determines the gender of the fight.
- Parameters:
fight_type – A ResultSet containing fight type information.
- Returns:
‘F’ if it’s a women’s fight, ‘M’ otherwise.
- static get_referee(overview: bs4.element.ResultSet) str[source]
Extracts the referee’s name from the fight overview.
- Parameters:
overview – A ResultSet containing fight overview information.
- Returns:
The referee’s name, or ‘’ if not found.
- static get_result(select_result: bs4.element.ResultSet, select_result_details: bs4.element.ResultSet) Tuple[str, str][source]
Extracts the result and details of the fight.
- Parameters:
select_result – A ResultSet containing the fight result.
select_result_details – A ResultSet containing additional result details.
- Returns:
A tuple with the result type and result details.
- static get_scores(overview: bs4.element.ResultSet, select_result: bs4.element.ResultSet, select_result_details: bs4.element.ResultSet) Tuple[str, str][source]
Extracts the scores of the fight if they the fight went the distance.
- Parameters:
overview – A ResultSet containing the fight overview.
select_result – A ResultSet containing the fight result.
- Returns:
A tuple with the scores of the fight. As str to be written to the CSV file.
- static get_winner(fighter_1: str, fighter_2: str, win_lose: bs4.element.ResultSet) str[source]
Determines the winner of the fight based on the win/lose status.
- Parameters:
fighter_1 – The ID of the first fighter.
fighter_2 – The ID of the second fighter.
win_lose – A ResultSet containing win/lose status for the fighters.
- Returns:
The ID of the winner, or ‘Draw’ if it’s a draw, or ‘NC if no contest or ‘’ if not determined.
- scrape_fights(get_all_events: bool = False) None[source]
Scrapes fight data and saves it to CSV files.
This method scrapes fight details and round statistics. It saves the fight details and round statistics to separate CSV files.
- Parameters:
get_all_events – If False, only scrapes fights from events not already scraped.
- sort_fields: List[str] = ['event_id', 'fight_id']
- class ufcscraper.fight_scraper.RoundsHandler(data_folder: Path | str)[source]
Bases:
BaseFileHandlerHandles the manipulation and storage of round statistics.
This class inherits from BaseFileHandler and manages round-specific statistics, including strikes, takedowns, and control time. It formats and saves the data to a CSV file.
- data = Empty DataFrame Columns: [fight_id, fighter_id, round, knockdowns, strikes_att, strikes_succ, head_strikes_att, head_strikes_succ, body_strikes_att, body_strikes_succ, leg_strikes_att, leg_strikes_succ, distance_strikes_att, distance_strikes_succ, ground_strikes_att, ground_strikes_succ, clinch_strikes_att, clinch_strikes_succ, total_strikes_att, total_strikes_succ, takedown_att, takedown_succ, submission_att, reversals, ctrl_time] Index: []
- dtypes: Dict[str, type | pd.core.arrays.integer.Int64Dtype] = {'body_strikes_att': Int64Dtype(), 'body_strikes_succ': Int64Dtype(), 'clinch_strikes_att': Int64Dtype(), 'clinch_strikes_succ': Int64Dtype(), 'ctrl_time': <class 'str'>, 'distance_strikes_att': Int64Dtype(), 'distance_strikes_succ': Int64Dtype(), 'fight_id': <class 'str'>, 'fighter_id': <class 'str'>, 'ground_strikes_att': Int64Dtype(), 'ground_strikes_succ': Int64Dtype(), 'head_strikes_att': Int64Dtype(), 'head_strikes_succ': Int64Dtype(), 'knockdowns': Int64Dtype(), 'leg_strikes_att': Int64Dtype(), 'leg_strikes_succ': Int64Dtype(), 'reversals': Int64Dtype(), 'round': Int64Dtype(), 'strikes_att': Int64Dtype(), 'strikes_succ': Int64Dtype(), 'submission_att': Int64Dtype(), 'takedown_att': Int64Dtype(), 'takedown_succ': Int64Dtype(), 'total_strikes_att': Int64Dtype(), 'total_strikes_succ': Int64Dtype()}
- filename: str = 'round_data.csv'
- static get_stats(fight_stats: bs4.element.ResultSet, fighter: int, round_: int, finish_round: int) Tuple[str, ...][source]
Extracts round statistics for a specific fighter in a given fight.
- Parameters:
fight_stats – A ResultSet containing fight statistics.
fighter – The index of the fighter (0 or 1).
round – The round number.
finish_round – The total number of rounds.
- Returns:
A tuple of statistics for the specified fighter in the given round. Returns “” for all fields if an error occurs.
- Raises:
ValueError – If fighter is not 0 or 1.
- sort_fields: List[str] = ['fight_id', 'fighter_id', 'round']
- class ufcscraper.fight_scraper.UpcomingFightScraper(data_folder: Path | str, n_sessions: int | None = None, delay: float | None = None)[source]
Bases:
BaseFightScraperScrapes fight data for upcoming events from the UFCStats website.
This class inherits from FightScraper and is specifically designed to scrape fight data for upcoming events. It uses the UpcomingEventScraper to get event URLs and then scrapes fight details from those events.
- data = Empty DataFrame Columns: [fight_id, event_id, fighter_1, fighter_2, title_fight, weight_class] Index: []
- dtypes: Dict[str, type] = {'event_id': <class 'str'>, 'fight_id': <class 'str'>, 'fighter_1': <class 'str'>, 'fighter_2': <class 'str'>, 'title_fight': <class 'str'>, 'weight_class': <class 'str'>}
- event_scraper
alias of
UpcomingEventScraper
- filename: str = 'upcoming_fight_data.csv'
- remove_rows_from_table(fight_ids: List[str]) None[source]
Removes rows from the fight data table based on fight IDs.
- Parameters:
fight_ids – A list of fight IDs to be removed from the data.
- scrape_fights() None[source]
Scrapes fight data and saves it to CSV files.
This method scrapes fight details and saves them to a CSV file.
- sort_fields: List[str] = ['event_id', 'fight_id']