Integration of Object Recognition, Color Classification and QR Decoding for the Purposes of Intelligent Robots

Preprint

Article

Integration of Object Recognition, Color Classification and QR Decoding for the Purposes of Intelligent Robots

Altmetrics

Downloads

Views

Comments

Radoslav Vasilev^*,Nayden Chivarov,Valentina Ivanova

This version is not peer-reviewed

Submitted:

02 October 2024

Posted:

02 October 2024

You are already at the latest version

Alerts

Abstract

This article presents a conceptual program model that integrates three main methods: object recognition, color classification, and QR decoding of information. These methods are well-established in the fields of Robotics and Artificial Intelligence, yet they remain subjects of ongoing research interest. The program model organizes the execution of the methods in a sequence directed toward recognized objects. In this way, color classification and QR decoding are performed only on already recognized objects. Colors and QR codes that do not belong to recognized objects are not analyzed, which can provide intelligent robots with reliable and accurate information about the objects they can recognize. For research purposes, image sensors (cameras), images saved to disk, clipboard, or shared memory between applications are used. The article shows that the program model can be augmented with various sensors and algorithms for the comprehensive construction of program control of autonomous robots. Future research that builds on current studies will focus on measuring the distance to recognized objects and structuring the collected information about these objects, which will create opportunities for perceptual anchoring and logical reasoning in the program control of intelligent robots.

Keywords:

Subject: Computer Science and Mathematics - Robotics

1. Introduction

Intelligent robots (IRs) are agents physically integrated into specific external environments. They employ sensors to monitor their surroundings and utilize executive mechanisms to enact actions within them. The behavior of an IR cannot be comprehensively considered or evaluated without accounting for the environment it operates in and the tasks it performs. The robot, its environment, and the task at hand are interdependent and mutually influential. The interest behind research and development in IR is driven by the necessity and aspiration for robots to function effectively in everyday human environments such as offices, hospitals, museums, galleries, and warehouses. Mobile intelligent robots [1] must adapt their behavior in dynamic and changing conditions, which requires these robots to possess qualities different from those of industrial robots. This article presents a program model that integrates capabilities for object recognition, color classification of recognized objects, and QR code decoding. These three algorithms are combined into a unified program model with a specific execution sequence. The model can provide the intelligent robot with rich and accurate information about its surrounding environment.

The article presents the possibility of enhancing the program model with various sensors and algorithms, making it flexible and capable of adapting to different robotic systems. The model can be included as part of the overall intelligent software management of a robot, which will enable the robot to make informed decisions based on the collected data.

For research and experimentation purposes, the program includes a menu that allows the use of multiple image sources: cameras, images saved on disk, clipboard, or shared memory between applications. Several experimental results are presented to demonstrate the functionality of the program model.

The use of sensors (RGB-D cameras [2]) for measuring distance to recognized objects and the storage of structured information generated by the execution of the program model will be the subject of future research that builds upon the current studies. Structuring information appropriately within the computer system and generating instances that carry this information creates opportunities for implementing perceptual anchoring [3], i.e., establishing a connection between perceptions and signs. This connection can be utilized for the purposes of logical reasoning and decision-making by the IR.

2. Program Model

The program model is implemented as a console C++ application that integrates several analysis algorithms applied sequentially to the input images. The program serves as a conceptual model with an architecture that encompasses the following three main functionalities (Figure 1):

Object recognition: identification and localization of various types of objects within the input images.
Color classification of objects: color analysis of the recognized objects to determine their color within one of six possible color categories.
QR decoding: extracting information from QR codes located within the recognized objects.

This software architecture provides an integrated solution for tasks related to computer vision, which is essential for the development of intelligent robots. The program is built using system libraries for shared memory [4] and the OpenCV library [5], which provides a comprehensive solution for image processing and visualization.

Figure 2 shows a block diagram that presents the overall organization of the program model and its structured approach. There are three main components: Initialization, Selection Menu, and Image Processing.

The program model contains the following main components:

Initialization: Creation of variables and windows for displaying images, a trackbar window for setting RGB thresholds (filtering) applied to the input image, and camera initialization.
Selection menu: The program provides a menu from which users can select from a list of alternative sources for loading images or can exit the program.
Cyclic image processing: The loop provides continuous updates of the images, which are processed using various methods from OpenCV-setting a color threshold, finding contours, approximating the detected contours, and determining the objects detected in the image. A specialized function is called to perform color classification of the recognized objects. Each recognized object is labeled with an appropriate name and number. After recognition and color classification, the program provides the opportunity to decode QR codes that are located within the recognized objects. The program executes this cycle until the user presses the "e/E" key.
End of program: The program completes its execution if the user selects to exit the program by pressing the "e/E" key.

In summary, the functionality of the program model consists of the following:

Capturing a frame (image) from a source selected from the menu;
Extracting information from the frame and recognizing objects;
Color classification of the recognized objects;
Assigning identifiers to the recognized objects;
Decoding QR codes within the objects, if any;
Displaying results in windows.

3. Object Recognition

3.1. Presentation of the Algorithm

The goal of object recognition is to identify and localize all instances of objects from one or more known classes in images [6]. These classes are defined according to the application’s objectives. Typically, there are a small number of objects in the image, but their location and scale may vary.

The object recognition algorithm implemented in the program model allows for the recognition of several object classes in real-time. At this stage, the program does not store information about the objects, meaning that the algorithm only determines the class of a given object and its characteristics. For experimental purposes, simple 2D shapes (classes) have been chosen, which can have various applications in robotics. The classes that can be recognized by the algorithm include: circles, triangles, rectangles, pentagons, and hexagons. The recognition is robust to rotation and scale changes of the objects.

When executing the program, it goes through the following steps:

Step 1: Choosing a source for images - usb camera, disk or other.

cout << "******************************" << "\n";

cout << "SELECT AN OPTION FROM THE MENU" << "\n";

cout << "******************************" << "\n";

cout << "1. Load from camera" << "\n";

cout << "2. Load from Shared Memory" << "\n";

cout << "3. Load from Picture" << "\n";

cout << "4. Load from Clipboard" << "\n";

cout << "5. Exit" << "\n";

cout << "Your choice: ";

cin >> choice;

Step 2: The Update function is called in the main program loop. The inRange function creates a binary image where the pixels that fall within the specified color range are white, while the rest are black. It takes the current frame, applies the specified color boundaries to it, and stores the resulting image in thresholded

void Update() {

inRange(frame, Scalar(lowB, lowG, lowR), Scalar(highB, highG, highR), thresholded);

//...

}

The inRange function performs color-based segmentation using the lower and upper color boundaries, which are set interactively via a trackbar (Figure 3).

The inRange function can work with both RGB (Red, Green, Blue) and HSV (Hue, Saturation, Value) images [7]. HSV is more effective for color segmentation and is closer to how humans perceive colors, as the model mimics the human ability to distinguish colors [8]. The HSV color model is less sensitive to changes in lighting, making it suitable for color segmentation. In the current implementation, inRange uses the RGB model because it is intuitive and convenient for visualizing the results, facilitating color interpretation in the context of technical goals and experiments.

Step 3: Contours of objects with well-defined boundaries in the image are detected. The contour of an object plays a crucial role in areas such as semantic segmentation and image classification. Extracting contours is a difficult task, especially when the contour is incomplete or open [9].

The OpenCV library provides the function findContours, which detects contours in the created binary image (thresholded).

void Update() {

// Segments the image based on a specified color range

inRange(frame, Scalar(lowB, lowG, lowR), Scalar(highB, highG, highR), thresholded);

vector<vector<Point>> contours; //Vector to store contours

// Finds the contours in the binarized image

findContours(thresholded, contours, RETR_EXTERNAL, CHAIN_APPROX_SIMPLE);

//...

}

Step 4: All found contours are traversed, and each contour is simplified using the approxPolyDP function. This function reduces the number of points on the contour to describe the shape with a minimal number of vertices while preserving the essential geometry. Based on the number of vertices of the contours, the class of the object is determined (for example, triangle, rectangle, pentagon, etc.). For each recognized object, the colorClassification function is called to determine the color of the object.

void Update() {

// Segments the image based on a specified color range

inRange(frame, Scalar(lowB, lowG, lowR), Scalar(highB, highG, highR), thresholded);

vector<vector<Point>> contours; //Vector to store contours

// Finds the contours in the binarized image

findContours(thresholded, contours, RETR_EXTERNAL, CHAIN_APPROX_SIMPLE);

//loop the contours

for (const auto& contour : contours) {

double epsilon = 0.02 * arcLength(contour, true);

vector<Point> approx;

approxPolyDP(contour, approx, epsilon, true); //approximation of contour

// Skip small or non-convex objects

if (fabs(contourArea(contour)) < 100 || !isContourConvex(approx))

continue;

if (approx.size() == 3) {

// triangle