Taxi Fare Prediction Model (ONNX)
This repository contains an ONNX (Open Neural Network Exchange) model for predicting taxi fares in New York City. The model is a RandomForestRegressor trained using scikit-learn on historical taxi trip data.
Model Description
This model predicts the fare_amount of a taxi trip based on several features: vendor_id, rate_code, passenger_count, trip_distance, payment_type, and trip_time_in_secs (which was engineered into trip_time_in_mins).
The rfr_model_pipeline.onnx file is the exported Random Forest Regressor pipeline, including preprocessing steps (KNNImputer, RobustScaler for numerical features, and OneHotEncoder for categorical features).
Training Data
The model was trained on the small-taxi-fare-train.csv dataset, which contains details of taxi trips. Key features include:
vendor_id: The ID of the taxi vendor.rate_code: The rate type of the taxi trip.passenger_count: The number of passengers.trip_time_in_secs: The duration of the trip (transformed totrip_time_in_mins).trip_distance: The distance of the trip.payment_type: The payment method (e.g., CSH, CRD).fare_amount: The target label (the taxi fare to be predicted).
Outliers and unusual entries (e.g., trip_distance of 0, passenger_count of 0) were handled during preprocessing.
Evaluation
The model's performance was evaluated using the following metrics on a test set:
- Mean Absolute Error (MAE): 0.623
- Mean Squared Error (MSE): 10.518
- R^2 Score: 0.887
The Random Forest Regressor demonstrated the best performance among the tested models (SGDRegressor, Lasso, Ridge).
Limitations and Bias
- The model is trained on a specific subset of New York taxi data and may not generalize well to other regions or significantly different time periods.
- It relies on features available at the start of a trip, excluding factors like real-time traffic or unforeseen delays that could impact actual fare.
- Potential biases from the original dataset (e.g., specific vendor biases, passenger count distribution) might be reflected in the predictions.
License
This model is provided under the MIT License (or specify your chosen license).