Appearance
question:**Robot Cleaner Path** You are tasked with programming a robot vacuum cleaner that navigates and cleans a rectangular grid. The grid is represented as a 2D array where each cell can either be an open space, an obstacle, or the cleaner's starting position. The cleaner aims to visit all the open spaces, avoiding obstacles, and return to its starting position. The robot follows these rules: 1. It starts from a given position and moves up, down, left, or right. 2. It cannot move diagonally. 3. It must visit each open space at least once. 4. It must avoid the obstacles. 5. After visiting all open spaces, it must return to its starting position. # Requirements: 1. **Input and Output Formats**: * A 2D list representing the grid with: - `'S'` for the starting position. - `'.'` for open spaces. - `'#'` for obstacles. * The function should output a boolean indicating whether the robot can visit all open spaces and return to the start (`True` or `False`). 2. **Constraints**: * The grid dimension will not exceed 30x30. * There will be only one starting position. * The robot cannot move out of the grid bounds. # Example ```python grid = [ ['S', '.', '.', '#'], ['.', '#', '.', '.'], ['.', '.', '.', 'S'] ] print(robot_cleaner_path(grid)) # Output: True grid = [ ['S', '#', '.'], ['#', '.', '#'], ['#', '.', 'S'] ] print(robot_cleaner_path(grid)) # Output: False grid = [ ['S', '#', '.'], ['#', '#', '#'], ['#', '.', 'S'] ] print(robot_cleaner_path(grid)) # Output: False ``` # Function Signature ```python def robot_cleaner_path(grid: List[List[str]]) -> bool: pass ``` In this problem, you must assess various paths the robot might take to ensure it visits all open spaces at least once and successfully returns to its starting point, managing obstacles along the way.
answer:from typing import List, Tuple def robot_cleaner_path(grid: List[List[str]]) -> bool: def dfs(x: int, y: int, visited: set) -> bool: if not (0 <= x < len(grid) and 0 <= y < len(grid[0])): # out of bounds return False if grid[x][y] == '#' or (x, y) in visited: # obstacle or already visited return False visited.add((x, y)) # If we've visited all open spaces, check if we can return to start. if len(visited) == open_spaces: return True # Explore surrounding cells for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)]: nx, ny = x + dx, y + dy if dfs(nx, ny, visited): return True # Backtrack visited.remove((x, y)) return False open_spaces = sum(row.count('.') for row in grid) + 1 # Including the starting position start_pos = [(i, j) for i in range(len(grid)) for j in range(len(grid[0])) if grid[i][j] == 'S'][0] visited = set() if dfs(start_pos[0], start_pos[1], visited): # Check if after visiting all, we can return to start return (start_pos[0], start_pos[1]) in visited return False
question:# Coding Assessment Question You are a backend engineer tasked with developing new functionalities for a popular customer review platform. Your goal is to demonstrate your ability to handle data processing and aggregation by implementing a function that efficiently calculates aggregated review statistics for various products. Write a function `calculate_review_stats(reviews: list[dict]) -> dict` that performs the following operations: 1. **Average Rating**: Calculate the average rating for each product. 2. **Total Reviews**: Count the total number of reviews for each product. 3. **Max Rating**: Identify the highest rating received for each product. 4. **Min Rating**: Identify the lowest rating received for each product. 5. **Ratings Summary**: Provide a summary of rating counts (1 to 5 stars) for each product. Your function should handle the following input: * **reviews**: A list of dictionaries, where each dictionary represents a single review and contains the product ID (`product_id`), the rating given (`rating` between 1 and 5), and a review text (`review_text`). Function Signature ```python def calculate_review_stats(reviews: list[dict]) -> dict: pass ``` Input Conditions and Prechecks * The `reviews` list should be non-empty. * Each review dictionary must contain the keys `product_id`, `rating`, and `review_text`. * The `rating` value should be an integer between 1 and 5. Expected Output The function should return a dictionary where each key is a product ID, and the associated value is another dictionary containing: * `average_rating`: Float value representing the average rating. * `total_reviews`: Integer representing the total number of reviews. * `max_rating`: Integer representing the highest rating received. * `min_rating`: Integer representing the lowest rating received. * `ratings_summary`: Dictionary where keys are rating scores (1 to 5) and values are the counts of each rating. Example Usage ```python reviews = [ {"product_id": "A1", "rating": 4, "review_text": "Great product!"}, {"product_id": "A1", "rating": 5, "review_text": "Excellent!"}, {"product_id": "A2", "rating": 3, "review_text": "Average."}, {"product_id": "A1", "rating": 2, "review_text": "Not as expected."}, {"product_id": "A2", "rating": 4, "review_text": "Good value for money."}, ] print(calculate_review_stats(reviews)) # Output: # { # "A1": { # "average_rating": 3.67, # "total_reviews": 3, # "max_rating": 5, # "min_rating": 2, # "ratings_summary": {1: 0, 2: 1, 3: 0, 4: 1, 5: 1} # }, # "A2": { # "average_rating": 3.5, # "total_reviews": 2, # "max_rating": 4, # "min_rating": 3, # "ratings_summary": {1: 0, 2: 0, 3: 1, 4: 1, 5: 0} # } # } ``` Ensure your implementation is optimized for large datasets and consider edge cases such as missing ratings or duplicate reviews.
answer:from collections import defaultdict def calculate_review_stats(reviews): Calculate review statistics for each product from a list of reviews. Args: reviews (list[dict]): A list of dictionaries where each dictionary has keys: 'product_id', 'rating', and 'review_text'. Returns: dict: A dictionary where each key is a product ID and the value is another dictionary containing average rating, total reviews, max rating, min rating, and ratings summary. # Initialize default dictionaries to store stats products = defaultdict(lambda: { "total_rating": 0, "total_reviews": 0, "max_rating": 0, "min_rating": 6, "ratings_summary": {1: 0, 2: 0, 3: 0, 4: 0, 5: 0} }) # Process each review for review in reviews: product_id = review['product_id'] rating = review['rating'] # Update stats products[product_id]["total_rating"] += rating products[product_id]["total_reviews"] += 1 products[product_id]["max_rating"] = max(products[product_id]["max_rating"], rating) products[product_id]["min_rating"] = min(products[product_id]["min_rating"], rating) products[product_id]["ratings_summary"][rating] += 1 # Calculate average and final statistics result = {} for product_id, stats in products.items(): result[product_id] = { "average_rating": round(stats["total_rating"] / stats["total_reviews"], 2), "total_reviews": stats["total_reviews"], "max_rating": stats["max_rating"], "min_rating": stats["min_rating"], "ratings_summary": stats["ratings_summary"] } return result
question:# Task Implement a function `sparse_string_matching` to optimize searching for a substring in a large sparse text using a binary search technique. # Scenarios Your function will be essential for processing and searching through large files of sparse text quickly, which is crucial for text analysis and data retrieval tasks. # Input - A list of strings `sparse_text` where empty strings represent gaps in the sparse text document. - A string `search_term` representing the substring you need to find in the `sparse_text`. # Output - An integer representing the index of the first occurrence of `search_term` in `sparse_text`. Return `-1` if `search_term` is not found. # Constraints 1. The `sparse_text` list contains up to `10^6` elements. 2. The length of `search_term` is between 1 and 100 characters. 3. The search should ignore the empty strings. # Requirements - Utilize an enhanced binary search to navigate through the sparse structure efficiently. - Ensure the solution has an optimal time complexity for large inputs. # Hints - Skip over the empty strings while performing the binary search. - Consider adapting the find-first-non-empty approach to locate valid text strings during the search. # Example ```python def sparse_string_matching(sparse_text: list[str], search_term: str) -> int: # Your implementation here pass # Example Test Cases: sparse_text = ["", "", "", "this", "", "is", "", "", "a", "simple", "", "example"] print(sparse_string_matching(sparse_text, "simple")) # Expected output: 9 (index in sparse_text) sparse_text = ["", "", "", "linear", "", "", "search", ""] print(sparse_string_matching(sparse_text, "search")) # Expected output: 6 (index in sparse_text) sparse_text = ["", "", "", "", "", "", "", ""] print(sparse_string_matching(sparse_text, "text")) # Expected output: -1 (search term not found) sparse_text = ["", "", "find", "", "me"] print(sparse_string_matching(sparse_text, "find")) # Expected output: 2 (index in sparse_text) sparse_text = ["search", "", "", "in", "", "", "sparse", "text", ""] print(sparse_string_matching(sparse_text, "text")) # Expected output: 7 (index in sparse_text) ```
answer:def sparse_string_matching(sparse_text, search_term): Uses a modified binary search to find the first occurrence of search_term in sparse_text. Skips over empty strings in sparse_text. low, high = 0, len(sparse_text) - 1 while low <= high: mid = (low + high) // 2 # Find the nearest non-empty string to the right left, right = mid, mid while left >= low and sparse_text[left] == "": left -= 1 while right <= high and sparse_text[right] == "": right += 1 # Determine the closer non-empty midpoint if left < low and right > high: return -1 # no non-empty strings in range mid = right if left < low or (right <= high and mid - left > right - mid) else left # Perform the comparison to adjust binary search range if sparse_text[mid] == search_term: return mid elif sparse_text[mid] < search_term: low = mid + 1 else: high = mid - 1 return -1
question:# Task Create a k-Means Clustering model to identify clusters in a given dataset. Implement functionality to initialize centroids, assign clusters, update centroids, and evaluate the model with the Within-Cluster-Sum of Squared Errors (WCSS). # Problem Statement You need to implement the k-Means algorithm from scratch in a class `KMeansClustering`. The k-means algorithm can be summarized in the following steps: 1. Initialize k centroids randomly. 2. Assign each data point to the nearest centroid. 3. Update the centroids as the mean of the assigned points. 4. Repeat steps 2 and 3 until convergence. # Requirements: 1. Implement the `fit` method to perform the iterative clustering process. 2. Implement the `predict` method to assign clusters to new data points based on the fitted model. 3. Implement a `score` method to compute the WCSS. 4. Provide a method to plot the clustered data points and centroids. # Class Signature ```python class KMeansClustering: def __init__(self, n_clusters: int, max_iter: int = 300, tol: float = 1e-4) -> None: # Your code here def fit(self, x_train: np.ndarray) -> None: # Your code here def predict(self, x_test: np.ndarray) -> np.ndarray: # Your code here def score(self, x_test: np.ndarray) -> float: # Your code here def plot(self, x_data: np.ndarray) -> None: # Your code here ``` # Input and Output - `KMeansClustering(n_clusters: int, max_iter: int = 300, tol: float = 1e-4)`: - **n_clusters**: number of clusters (k). - **max_iter**: maximum number of iterations. - **tol**: tolerance for convergence. - Raises ValueError if n_clusters <= 0, max_iter <= 0, or tol <= 0. - `fit(x_train: np.ndarray)`: - **x_train**: data points for clustering. - Initializes centroids, assigns points to clusters, and updates centroids until convergence. - Raises ValueError if x_train is not a two-dimensional numpy array. - `predict(x_test: np.ndarray) -> np.ndarray`: - **x_test**: new unlabelled data points. - Returns an array of cluster labels for each data point in x_test. - Raises ValueError if x_test is not a two-dimensional numpy array. - `score(x_test: np.ndarray) -> float`: - **x_test**: data points to evaluate the clustering. - Returns the WCSS value. - Raises ValueError if x_test is not a two-dimensional numpy array. - `plot(x_data: np.ndarray) -> None`: - **x_data**: data points for plotting. - Plots the data points colored by their cluster along with the centroids. - Raises ValueError if x_data is not a two-dimensional numpy array. # Constraints - The number of clusters should be a positive integer. - Max iterations should be a positive integer. - Tolerance should be a positive float. - Input data for x should be a two-dimensional numpy array. # Performance Requirements - The `fit` method should handle typical dataset sizes efficiently (< 10⁶ data points). - The algorithm should provide reasonable results in terms of clustering quality and computational efficiency. Your job is to implement the classes and methods as described in the inputs and outputs for an effective assessment.
answer:import numpy as np import matplotlib.pyplot as plt class KMeansClustering: def __init__(self, n_clusters: int, max_iter: int = 300, tol: float = 1e-4) -> None: if n_clusters <= 0 or max_iter <= 0 or tol <= 0: raise ValueError("n_clusters, max_iter and tol should be positive values.") self.n_clusters = n_clusters self.max_iter = max_iter self.tol = tol self.centroids = None def fit(self, x_train: np.ndarray) -> None: if not isinstance(x_train, np.ndarray) or len(x_train.shape) != 2: raise ValueError("x_train should be a two-dimensional numpy array.") n_samples, _ = x_train.shape random_indices = np.random.choice(n_samples, self.n_clusters, replace=False) self.centroids = x_train[random_indices] for _ in range(self.max_iter): # Assign clusters distances = np.linalg.norm(x_train[:, np.newaxis] - self.centroids, axis=2) labels = np.argmin(distances, axis=1) # Calculate new centroids new_centroids = np.array([x_train[labels == j].mean(axis=0) for j in range(self.n_clusters)]) # Check for convergence if np.all(np.linalg.norm(self.centroids - new_centroids, axis=1) < self.tol): break self.centroids = new_centroids def predict(self, x_test: np.ndarray) -> np.ndarray: if not isinstance(x_test, np.ndarray) or len(x_test.shape) != 2: raise ValueError("x_test should be a two-dimensional numpy array.") distances = np.linalg.norm(x_test[:, np.newaxis] - self.centroids, axis=2) return np.argmin(distances, axis=1) def score(self, x_test: np.ndarray) -> float: if not isinstance(x_test, np.ndarray) or len(x_test.shape) != 2: raise ValueError("x_test should be a two-dimensional numpy array.") labels = self.predict(x_test) distances = np.linalg.norm(x_test - self.centroids[labels], axis=1) return np.sum(distances ** 2) def plot(self, x_data: np.ndarray) -> None: if not isinstance(x_data, np.ndarray) or len(x_data.shape) != 2: raise ValueError("x_data should be a two-dimensional numpy array.") labels = self.predict(x_data) for i in range(self.n_clusters): cluster_points = x_data[labels == i] plt.scatter(cluster_points[:, 0], cluster_points[:, 1], label=f"Cluster {i+1}") plt.scatter(self.centroids[:, 0], self.centroids[:, 1], color='black', marker='x', s=100, label='Centroids') plt.legend() plt.show()