Appearance
question:Given a text file called "reviews.txt" containing multiple lines of user reviews, write a function named `extract_positive_reviews` that: 1. Reads the file using Python's built-in file handling functions. 2. Filters the reviews containing the word "good" (case-insensitive). 3. Writes these positive reviews to a new file named "positive_reviews.txt". 4. Prints the number of positive reviews extracted. This exercise is designed to help you practice file handling operations, string manipulation, and basic text processing techniques in Python. **Function Signature:** ```python def extract_positive_reviews(file_path: str) -> int: pass ``` **Detailed Steps:** 1. Open "reviews.txt" in read mode and read its content line by line. 2. Check each line to see if it contains the word "good" in any case (i.e., "Good", "GOOD", "good"). 3. Collect all lines that contain the word "good". 4. Write the positive reviews to a new file called "positive_reviews.txt". 5. Print the count of positive reviews. 6. Return the count of positive reviews. **Note:** Make sure the text file "reviews.txt" is present in the current working directory. This file contains multiple lines of text representing user reviews, and you need to filter and write the reviews with the word "good" to a new file.
answer:def extract_positive_reviews(file_path: str) -> int: positive_reviews = [] with open(file_path, 'r') as file: reviews = file.readlines() for review in reviews: if 'good' in review.lower(): positive_reviews.append(review) with open('positive_reviews.txt', 'w') as file: file.writelines(positive_reviews) positive_review_count = len(positive_reviews) print(f"Number of positive reviews: {positive_review_count}") return positive_review_count
question:You are required to write a function named `analyze_employee_performance` which takes a pandas DataFrame (`df`) as input. The DataFrame contains employee data including columns for `department`, `performance_score`, and `years_with_company`. The objective of this function is to create a bar plot showing the average performance score per department, and a line plot overlayed to show the average years with the company per department. Each department should be labeled on the x-axis. You will need to make use of the following libraries: - `matplotlib.pyplot` for plotting the data. - `pandas` for handling the DataFrame. The main steps are as follows: 1. Calculate the average performance score for each department. 2. Calculate the average years with the company for each department. 3. Create a bar plot for the average performance scores. 4. Overlay a line plot for the average years with the company. 5. Ensure that both plots use the same x-axis for the departments. 6. Save the plot to a file named `employee_performance_analysis.png`. **Function Signature:** ```python def analyze_employee_performance(df: pd.DataFrame) -> None: pass ``` **Example Usage:** ```python import pandas as pd data = { 'department': ['HR', 'Engineering', 'Marketing', 'HR', 'Engineering', 'Marketing'], 'performance_score': [70, 80, 60, 75, 85, 65], 'years_with_company': [3, 5, 2, 4, 6, 3] } df = pd.DataFrame(data) analyze_employee_performance(df) ``` The function should generate a bar plot showing the average performance scores for HR, Engineering, and Marketing departments, with a line plot overlayed showing the average years with the company for each department.
answer:import pandas as pd import matplotlib.pyplot as plt def analyze_employee_performance(df: pd.DataFrame) -> None: Analyzes employee performance by creating a bar plot for average performance score per department and a line plot for average years with the company per department. Args: df (pd.DataFrame): DataFrame containing employee data with 'department', 'performance_score', and 'years_with_company' columns. # Calculate the average performance score per department avg_performance = df.groupby('department')['performance_score'].mean() # Calculate the average years with company per department avg_years = df.groupby('department')['years_with_company'].mean() # Create a bar plot for average performance scores fig, ax1 = plt.subplots() avg_performance.plot(kind='bar', ax=ax1, color='skyblue', position=1) ax1.set_ylabel('Average Performance Score') # Create a secondary axis for the line plot ax2 = ax1.twinx() avg_years.plot(kind='line', ax=ax2, color='green', marker='o') ax2.set_ylabel('Average Years with Company') ax1.set_xlabel('Department') ax1.set_title('Average Performance Score and Years with Company by Department') # Save the plot to a file plt.savefig('employee_performance_analysis.png') plt.close()
question:You are given an undirected graph represented as an adjacency matrix. Each cell in the matrix contains either a `0` (indicating no edge) or a positive integer (indicating the weight of the edge between the nodes). Your task is to write a Python function named `prims_algorithm` that takes this adjacency matrix as input and returns the minimum spanning tree (MST) of the graph using Prim's algorithm. Specifically, you need to: 1. Initialize a starting node and an empty list to store the edges of the MST. 2. Use a priority queue to keep track of the minimum edge weights. 3. Expand the MST by adding the minimum weight edges until all nodes are included. Function Signature: ```python def prims_algorithm(adj_matrix: List[List[int]]) -> List[Tuple[int, int, int]]: ``` Input: - `adj_matrix`: A list of lists representing the adjacency matrix of the graph, where `adj_matrix[i][j]` is the weight of the edge between node `i` and node `j`. Output: - A list of tuples representing the edges in the minimum spanning tree. Each tuple should be of the form (node1, node2, weight), where node1 and node2 are the connected nodes and weight is the weight of the connecting edge. Constraints: - The number of nodes in the graph does not exceed 100. - You can use the `heapq` library for the priority queue. Example: ```python import heapq from typing import List, Tuple def prims_algorithm(adj_matrix: List[List[int]]) -> List[Tuple[int, int, int]]: n = len(adj_matrix) visited = [False] * n mst_edges = [] min_heap = [(0, 0, 0)] while len(mst_edges) < n - 1: weight, node1, node2 = heapq.heappop(min_heap) if not visited[node2]: visited[node2] = True if node1 != node2: mst_edges.append((node1, node2, weight)) for next_node in range(n): if not visited[next_node] and adj_matrix[node2][next_node] != 0: heapq.heappush(min_heap, (adj_matrix[node2][next_node], node2, next_node)) return mst_edges adj_matrix = [ [0, 2, 0, 6, 0], [2, 0, 3, 8, 5], [0, 3, 0, 0, 7], [6, 8, 0, 0, 9], [0, 5, 7, 9, 0] ] print(prims_algorithm(adj_matrix)) ``` Expected Output: ``` [(0, 1, 2), (1, 2, 3), (1, 4, 5), (0, 3, 6)] ```
answer:import heapq from typing import List, Tuple def prims_algorithm(adj_matrix: List[List[int]]) -> List[Tuple[int, int, int]]: n = len(adj_matrix) visited = [False] * n mst_edges = [] min_heap = [(0, 0, 0)] while len(mst_edges) < n - 1: weight, node1, node2 = heapq.heappop(min_heap) if not visited[node2]: visited[node2] = True if node1 != node2: mst_edges.append((node1, node2, weight)) for next_node in range(n): if not visited[next_node] and adj_matrix[node2][next_node] != 0: heapq.heappush(min_heap, (adj_matrix[node2][next_node], node2, next_node)) return mst_edges
question:You are tasked with creating a function `summarize_temperature_statistics` that takes a list of daily temperature readings and a time frame specified by a start date and an end date. This function will analyze the provided temperature data to determine various statistics for each month within the given time frame. The function should return a DataFrame with the summary statistics including average, minimum, maximum, and standard deviation of temperatures for each month. Use the `pandas` library to implement the function. It should adhere to the following requirements: 1. Generate a full range of dates between the start and end dates to ensure all months in the range are included. 2. Calculate the average, minimum, maximum, and standard deviation of temperatures for each month within the given time frame. 3. Summarize these statistics in a DataFrame. Here is the function signature: ```python import pandas as pd from datetime import datetime def summarize_temperature_statistics(temperatures: list, dates: list, start_date: datetime, end_date: datetime) -> pd.DataFrame: Summarize temperature statistics by calculating average, minimum, maximum, and standard deviation for each month within the specified date range. Parameters ---------- temperatures : list List of daily temperature readings (as floats or integers). dates : list List of dates corresponding to the temperature readings in 'YYYY-MM-DD' format (as strings). start_date : datetime The start date of the period for which to calculate statistics. end_date : datetime The end date of the period for which to calculate statistics. Returns ------- df_stats : pd.DataFrame DataFrame containing the average, minimum, maximum, and standard deviation of temperatures for each month. ```
answer:import pandas as pd from datetime import datetime def summarize_temperature_statistics(temperatures: list, dates: list, start_date: datetime, end_date: datetime) -> pd.DataFrame: Summarize temperature statistics by calculating average, minimum, maximum, and standard deviation for each month within the specified date range. Parameters ---------- temperatures : list List of daily temperature readings (as floats or integers). dates : list List of dates corresponding to the temperature readings in 'YYYY-MM-DD' format (as strings). start_date : datetime The start date of the period for which to calculate statistics. end_date : datetime The end date of the period for which to calculate statistics. Returns ------- df_stats : pd.DataFrame DataFrame containing the average, minimum, maximum, and standard deviation of temperatures for each month. # Creating DataFrame from the given data df = pd.DataFrame({ 'Date': pd.to_datetime(dates), 'Temperature': temperatures }) # Filtering the DataFrame to include only data within the specified date range mask = (df['Date'] >= start_date) & (df['Date'] <= end_date) df = df.loc[mask].copy() # Creating a 'Month' column for grouping by each month df['Month'] = df['Date'].dt.to_period('M') # Grouping by 'Month' and calculating the required statistics df_stats = df.groupby('Month').agg( Average_Temperature=('Temperature', 'mean'), Min_Temperature=('Temperature', 'min'), Max_Temperature=('Temperature', 'max'), StdDev_Temperature=('Temperature', 'std') ).reset_index() # Convert 'Month' back to string for readability df_stats['Month'] = df_stats['Month'].astype(str) return df_stats