Convolutional Neural Network: Guide to understanding them

Sumary

Intro

What is Convolutional Neural Network?

How Convolutional Neural Networks Work

Convolutional Neural Network Architecture

Convolutional Layer

Additional Convolutional Layer

Pooling Layer

Fully-connected Layer

Convolutional Neural Network Applications

Convolutional Neural Network vs Generative Adversarial Network

Characteristic extraction

Data generation

Transfer learning

Model type

Realism and quality

Computational complexity

Ethical and safety issues

Convolutional Neural Network in your business

Intro

The field of computer vision and AI has experienced incredible growth over the past few years.

The ability of neural networks to recognize complex patterns in data makes them an important tool for AI. We explain in depth how they work and the most important use cases.

What is Convolutional Neural Network?

Convolutional Neural Networks (CNN) are a subset of Machine Learning and are at the heart of Deep Learning algorithms. They are composed of layers of nodes containing an input layer, one or more hidden layers, and an output layer.

Each node connects to another and has an associated weight and threshold. If the output of any individual node exceeds the specified threshold value, that node becomes active and sends data to the next network layer. Otherwise, no data is transmitted to the next network layer.

They are particularly effective for processing and analyzing visual data, and different types of neural networks are used for different types of data and use cases. For example, recurrent neural networks are typically used to process natural language and speech recognition, while CNNs are used more for classification and computer vision tasks.

How Convolutional Neural Networks Work

Before starting, it is necessary to clarify some basic concepts about neural networks:

Neurons: are the most basic unit of a neural network. They are composed of a sum of linear functions and a nonlinear function, known as the activation function, is applied to them.
Input layer: each neuron in the input layer corresponds to one of the input features.
Hidden layer: these are the layers between the input layer and the output layer. Each neuron in this layer is summed with the result of the neurons in the previous layers and multiplied by a nonlinear function.
Output layer: the number of neurons in the output layer corresponds to the number of output classes. Once a prediction has been made, a loss is calculated and the network enters an iterative self-improvement process through which the weights are adjusted with backpropagation to reduce this error.

Convolutional neural networks use a series of layers, each detecting different features of an input image. Depending on the complexity of its purpose, a CNN can contain up to thousands of layers, each of which builds on the results of the previous layers to recognize detailed patterns.

The process begins by sliding a filter designed to detect certain features over the input image, known as a convolution operation. The result of this process is a feature map that highlights the presence of the detected image features.

Initial filters typically detect basic features, such as lines or simple textures. Later layer filters are more complex and combine the basic features identified earlier to recognize more complex patterns.

Between these layers, the network takes steps to reduce the spatial dimensions of the feature maps to improve efficiency and accuracy. In the final layers, the model makes a final decision based on the output of the previous layers.

Source: Towards Data Science

Convolutional Neural Network Architecture

As mentioned above, the operation of CNNs may seem simple at first glance: the user provides an input image in the form of a pixel array, which has 3 dimensions:

Two dimensions for a grayscale.
A third dimension with a depth of 3 to represent the fundamental colors (red, green, blue).

Unlike a standard multilayer perceptron (MLP) model, which contains only a single classification part, the convolutional network architecture has several layers:

Convolutional Layer

This is the fundamental component of a CNN and where most of the calculations are performed. This layer uses a filter or kernel to move through the receptive field of an input image and detect the presence of specific features.

The process begins by sliding the kernel over the width and height of the image, and then traversing the entire image in several iterations. At each position, a scalar product is computed between the kernel weights and the image pixel values under the kernel. This transforms the input image into a set of feature maps, each of which represents the presence and intensity of a given feature at various points in the image.

Suppose the input is a color image, which is made up of a 3D array of pixels. Thus, the input will have three dimensions (height, width, and depth), plus a feature detector, which will move through the receptive fields of the image to check if the feature is present (convolution).

This detector or filter is a two-dimensional array of weights representing part of the image. Although it can vary in size, it is usually a 3×3 matrix, which also determines the size of the receptive field. This filter is applied to an area of the image and a scalar product is calculated between the input pixels and the filter. This scalar product is fed into an output matrix. The final result of the series of scalar products of the input and the filter is known as a feature map, activation map, or convolved feature.

Some parameters are adjusted during training through the process of backpropagation and gradient descent. However, there are three parameters that affect the size of the output volume that must be set before neural network training begins:

The number of filters affects the depth of the result.
Pitch is the distance or number of pixels the kernel travels over the input matrix. Although it is uncommon to find step values of two or more, a larger step produces a smaller result.
Zero padding is used when filters do not match the input image. This sets all elements outside the input matrix to zero, which produces a larger or equally sized output. The types are:
1. Valid fill: the last convolution is skipped if the dimensions do not line up.
2. Same fill: ensures that the output layer is the same size as the input layer.
3. Full fill: increases the output size by adding zeros to the input border

After each operation, a CNN applies a rectified linear unit transformation to the feature map, introducing nonlinearity into the model.

Source: Developers Break

Additional Convolutional Layer

When a convolution layer follows the initial layer, the structure of the CNN can become hierarchical, as later layers can see pixels within the receptive fields of earlier layers.

Each individual part of the image forms a lower-level pattern in the neural network, and the combination of its parts represents a higher-level pattern, creating a hierarchy of features within the CNN. Finally, the convolutional layer converts the image into numerical values, allowing the neural network to interpret and extract relevant patterns.

Pooling Layer

Also known as downsampling, they perform dimensionality reduction, which reduces the number of parameters in the input.
Similar to the first layer, the clustering operation sweeps the entire input with a filter, but the difference is that this filter has no weights. Instead, the kernel applies an aggregation function to the values within the receptive field, filling the output matrix. There are two types:

Maximum Clustering: as the filter moves through the input, it selects the pixel with the maximum value to send to the output matrix.
Average clustering: as the filter moves through the input, it calculates the average value within the receptive field to send to the output matrix.

The disadvantage of this layer is that a lot of information can be lost, but it also helps to reduce complexity, improve efficiency, and limit the risk of overfitting.

Fully-connected Layer

The pixel values of the input image are not directly connected to the greeting layer in the partially connected layers. That’s where the fully connected layer comes in, where each node in the output layer is directly connected to a node in the previous layer.

This layer performs the classification task based on the features extracted through the previous layers and their different filters.

Convolutional Neural Network Applications

Before CNNs existed, objects were identified using time-consuming feature extraction methods that had to be performed manually. With these convolutional networks, a more scalable approach to image classification and object detection is achieved.

By employing linear algebra principles, CNNs can recognize patterns in an image. Therefore, their most widespread applications are:

Image and voice recognition: these networks automatically recognize objects or people in images and videos, such as photo tagging in smartphones, facial recognition systems, or voice assistants.
Medical diagnostics: image recognition technology with AI improves medical diagnostics thanks to its ability to analyze medical images such as X-rays, MRIs, etc.
Autonomous vehicles: CNNs are used in autonomous cars to recognize obstacles and specific features of the road.
Social networks: they are also commonly used for text mining, which allows social network platforms to automatically moderate content and create personalized advertising.
Marketing and retail: they can be used to mine data, allowing visual searches for products and their location.
Agriculture: they are used to monitor crops, detect diseases, and predict yields. They can analyze images captured by drones or satellites to provide information to farmers.

Convolutional Neural Network vs Generative Adversarial Network

Convolutional neural networks (CNNs) and generative antagonistic networks (GANs) are two fundamental technologies that have played key roles in the advancement of computer vision.

To summarize what has already been discussed throughout the article, CNNs are inspired by the structure and functioning of the human visual system and consist of several layers, including convolutional layers, grouping layers, and fully connected layers.

Generative Adversarial Networks (GANs), on the other hand, consist of two neural networks, a generator and a discriminator, which are trained simultaneously through a competitive process. The generator attempts to create false data that cannot be distinguished from real data, while the discriminator attempts to differentiate between real and false data. GANs are best known for their generative capabilities and their potential to create realistic images, but they also have applications in image processing for computer vision.

If we compare one with the other, the main differences are as follows:

Characteristic extraction

CNNs are designed for feature extraction by using convolutional layers to capture these hierarchical features from the input data. This is crucial for tasks such as recognition, segmentation, and object detection. In addition, they can learn discriminative features directly from the data.

GANs, on the other hand, are not intrinsically designed for feature extraction. Although their discriminator learns to differentiate features, it is not their primary purpose, as they focus on data generation and manipulation.

Source: LeewayHertz

Data generation

As discussed earlier, the primary function of CNNs is to process existing data for classification, detection, or segmentation tasks, they are not typically used for data generation. However, there are some variants, such as variational autoencoders (VAE), that can be adapted for data generation.

On the contrary, GANs excel in this task, as they can create highly realistic images, making them the best choice for use cases such as image synthesis, super-resolution, and style transfer.

Transfer learning

In this case, CNNs are well suited for transfer learning, as pre-trained models are available and easy to tune for specific tasks, which is very beneficial when working with limited data.

In turn, although GANs are less frequently used in this scenario, some models are pre-trained.

Model type

CNNs are discriminative models, as they focus on distinguishing and classifying data.

On the contrary, GANs are generative models, designed to create new data that can be distinguished from real data.

Realism and quality

CNNs process existing images, not produce them, so their performance depends on the quality and quantity of data used for training. Although they are capable of achieving high accuracy in recognition tasks, they do not inherently generate realistic images.

GANs have the ability to produce images that can “fool” humans due to their realism, so they have set benchmarks in image generation and quality.

Computational complexity

On the one hand, we find CNNs, which require a great computational effort during training, especially when working with deep architectures and large datasets. However, they are also relatively fast when making predictions once trained.

GANs are also computationally and training resource intensive. With these demands, real-time applications can be challenging.

Ethical and safety issues

Here we find similarities, as both face ethical and security challenges that have yet to be resolved. For example, CNNs have raised ethical concerns related to privacy, bias, and surveillance, especially with regard to facial recognition.

GANs are not far behind in these challenges, as their use to create deepfakes raises significant ethical and security issues, as the technology can be used for misinformation, impersonation, or other malicious purposes.

Therefore, both CNNs and GANs have their own sales and applications in the field of AI, especially in computer vision. The choice between one or the other depends on the specific requirements of the task at hand. CNNs are the ideal choice for tasks that involve recognizing and analyzing existing data, while GANs excel in tasks that require data generation, image synthesis, and creative expression.

However, these technologies are not mutually exclusive; in some applications, they can complement each other. As machine vision continues to evolve, both will play a key role in shaping the future of visual perception and understanding.

Convolutional Neural Network in your business

CNNs have revolutionized the field of AI and offer numerous benefits in a variety of industries. In fact, other advances such as hardware enhancements, new data collection methods, and advanced architectures such as capsule networks can further optimize CNNs and integrate them into more technologies, resulting in expanded use cases.

If you want to take your business to the next level with the artificial intelligence solutions that best fit your case, the experts at Plain Concepts can help. We’ll design your strategy, protect your environment, choose the best solutions, close technology and data gaps, and establish rigorous oversight to achieve responsible AI. So you can achieve rapid productivity gains and build the foundation for new business models based on hyper-personalization or continuous access to relevant data and information.

We have a team of experts who have been successfully applying this technology in numerous projects, ensuring the security of our clients. We have been bringing AI to our clients for more than 10 years and now we propose an AI Adoption Framework:

Unlock the potential of end-to-end generative AI.
Accelerate your AI journey with our experts.
Understand how your data should be structured and governed.
Explore generative AI use cases that fit your goals.
Create a tailored plan with realistic timelines and estimates.
Build the patterns, processes, and teams you need.
Deploy AI solutions to support your digital transformation.

Start your journey to innovation today!

Cookie	Duration	Description
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
attributionCookie	session	No description
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
cppro-ft	1 year	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	session	No description
cppro-ft-style	session	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	1 year	No description
i18n	10 years	No description available.
IE-jwt	62 years 6 months 9 days 9 hours	No description
IE-LANG_CODE	62 years 6 months 9 days 9 hours	No description
IE-set_country	62 years 6 months 9 days 9 hours	No description
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
wmc	9 years 11 months 30 days 11 hours 59 minutes	No description

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.

Cookie	Duration	Description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	1 year	No description
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjSession_1776154	session	No description
_hjSessionUser_1776154	session	No description
_hjTLDTest	1 year	No description
_hjTLDTest	1 year	No description
_hjTLDTest	session	No description
_hjTLDTest	session	No description
_lfa_test_cookie_stored	past	No description

Cookie	Duration	Description
loglevel	never	No description available.
prism_90878714	1 month	No description
redirectFacebook	2 minutes	No description
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.