Taxi Fare Prediction Model (ONNX)

This repository contains an ONNX (Open Neural Network Exchange) model for predicting taxi fares in New York City. The model is a RandomForestRegressor trained using scikit-learn on historical taxi trip data.

Model Description

This model predicts the fare_amount of a taxi trip based on several features: vendor_id, rate_code, passenger_count, trip_distance, payment_type, and trip_time_in_secs (which was engineered into trip_time_in_mins).

The rfr_model_pipeline.onnx file is the exported Random Forest Regressor pipeline, including preprocessing steps (KNNImputer, RobustScaler for numerical features, and OneHotEncoder for categorical features).

Training Data

The model was trained on the small-taxi-fare-train.csv dataset, which contains details of taxi trips. Key features include:

  • vendor_id: The ID of the taxi vendor.
  • rate_code: The rate type of the taxi trip.
  • passenger_count: The number of passengers.
  • trip_time_in_secs: The duration of the trip (transformed to trip_time_in_mins).
  • trip_distance: The distance of the trip.
  • payment_type: The payment method (e.g., CSH, CRD).
  • fare_amount: The target label (the taxi fare to be predicted).

Outliers and unusual entries (e.g., trip_distance of 0, passenger_count of 0) were handled during preprocessing.

Evaluation

The model's performance was evaluated using the following metrics on a test set:

  • Mean Absolute Error (MAE): 0.623
  • Mean Squared Error (MSE): 10.518
  • R^2 Score: 0.887

The Random Forest Regressor demonstrated the best performance among the tested models (SGDRegressor, Lasso, Ridge).

Limitations and Bias

  • The model is trained on a specific subset of New York taxi data and may not generalize well to other regions or significantly different time periods.
  • It relies on features available at the start of a trip, excluding factors like real-time traffic or unforeseen delays that could impact actual fare.
  • Potential biases from the original dataset (e.g., specific vendor biases, passenger count distribution) might be reflected in the predictions.

License

This model is provided under the MIT License (or specify your chosen license).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support