The Proposed System
In this section, the Proposed System to detect fruits and vegetables are described in detail. The system created to recognize fruits and vegetables consists of 3 steps: deciding on a model and training of a model, compression of the model, and deployment of the model.
As the first step of the system, POS device camera captures the images containing of fruits or vegetables with 900×1600 resolution. Later, these images are resized to 320×320 resolution to be given to the proposed object detection model. To detect the objects in an image, YOLOv5s implemented on PyTorch framework is selected and used.
Model is trained with a dataset obtained by K. Patel, containing 14 types of fruits and vegetable images with a resolution of 3024×4032. To further decrease the model size, full integer quantization is applied to the model. Along with that, PyTorch model is converted to of TensorFlow Lite model since this framework is supported by Android.
The quantized YOLOv5s model is deployed to a 400TR Android POS device developed by Token Financial Technologies. Finally, the output of the created system is input images with bounding boxes around the objects with confidence scores. All of the steps explained above can be seen in Fig. 1.
The final system is obtained by comparing different object detection models and quantization techniques as explained in the sub-sections below. Also, details about deployment of model to the device is further explained.
Deciding and Training The Mode
The proposed system detects fruits and vegetables using a camera. According to prior works, fruit and vegetable detection can be achieved by classification or object detection models [1-8]. Better results are obtained using object detection models since there could be multiple fruits/vegetables that are located apart in real-life scenarios. Thus it is decided to use object detection for the proposed system.
Object detector models can be separated into two groups which are one-stage and two-stage detectors. Since the proposed system should be working in real-time with a limited resource device, it is decided to use a one-stage detector. One of the most successful models in one-stage detectors is YOLO. Therefore YOLO is chosen as an object detection model for the proposed system.
Once it was decided to use YOLO for object recognition, it was necessary to decide which version of YOLO to use. The last two versions of YOLO (YOLOv5s and YOLOv4) are compared to choose a model for the proposed system. After the comparison of both versions, YOLOv5s is chosen for the proposed system. Detailed comparisons are present in the “Experiments and Results” section. In the training phase of YOLOv5s, transfer learning is applied by using the weights created for the COCO dataset. The model is trained using the dataset that is provided by K. Patel.
Compressing Model
The device that is used for the proposed system has limited resources. Even the trained model has relatively few parameters for object detectors, better results in speed and size manner can be obtained by decreasing parameters further. Model compression technique is used to further decrease the parameters of the model.
Quantization is used as a model compression technique since both size and inference time are reduced with this technique. There are different quantization methods, but the ones that are tried in this work are post-training quantization techniques. As it can be understood from the name, these quantization methods are applied after the model is trained. Before deploying the trained model to the device it should be converted from PyTorch to TensorFlow Lite model. Quantization techniques are applied to the model while this conversion.
3 different quantization techniques are applied to the final model and results are compared. The best result is obtained using full integer quantization. After full integer quantization is applied to the model, both model size and inference time are reduced by ¼. Detailed comparisons are present in the “Experiments and Results” section.
Model Deployment
400TR, an Android-based POS device, which is a product of Token Financial Technologies, is chosen to be used as a device in the system. 400TR meets the requirements for a fruit and vegetable detection system.
400TR has an Android 9.0 operating system with a 1.5GHz MT8167A CPU, 5-megapixel auto-focus camera, 2 GB LPDDR3 800 MHZ RAM, 16 GB e MMC micro SD memory and 5.5″ user, 3″ client touch screen.
Therefore the Android application can run on this device without the need for extra hardware. Since it is not required new hardware to use this system, it can be used in any business that uses a POS device. Also since the system runs in a payment system, detected fruit and vegetables can be directly directed to sale.
Finally, the resulting model had to be transferred to the device. Since the device used for this system is Android-based, the PyTorch model had to be converted to the TensorFlow / TensorFlow Lite model with Android support. After this process was performed with quantization, it was transferred to the Android application.