Copyright © This item is protected by copyright, with all rights reserved. All Versions
![]()
Do you want to delete this Institutional Publication? ![]() Explainable Deep Machine Learning for Medical Image AnalysisExplanations justify the development and adoption of algorithmic solutions for prediction problems in medical image analysis. This thesis introduces two guiding principles for creating and exploiting explanations of deep networks and medical image data. The first guiding principle is to use explanations to expose inefficiencies in the design of models and image datasets. The second principle is to leverage tools of compression and fixed-weight methods that minimize learning to make more efficient and effective models and more usable medical image datasets. The outcome is more effective deep learning in medical image analysis. Application of these guiding principles in different settings results in five main contributions: (a) improved understanding of biases present in deep networks and medical images, (b) improved predictive and computational performance of predictive models, (c) creation of ante-hoc models that are interpretable by design, (d) creation of smaller image datasets, and (e) improved visual privacy. This thesis falls within the scope of the TAMI project for Transparent Artificial Machine Intelligence and focuses on explainable artificial intelligence (XAI) for medical image data. Degree Type
Degree Name
Usage metrics
![]() Article ContentsPreprocessing, segmentation, feature extraction, concluding remarks, authorship statement..
MRI image analysis methods and applications: an algorithmic perspective using brain tumors as an exemplar![]()
Vachan Vadmal, Grant Junno, Chaitra Badve, William Huang, Kristin A Waite, Jill S Barnholtz-Sloan, MRI image analysis methods and applications: an algorithmic perspective using brain tumors as an exemplar, Neuro-Oncology Advances , Volume 2, Issue 1, January-December 2020, vdaa049, https://doi.org/10.1093/noajnl/vdaa049
The use of magnetic resonance imaging (MRI) in healthcare and the emergence of radiology as a practice are both relatively new compared with the classical specialties in medicine. Having its naissance in the 1970s and later adoption in the 1980s, the use of MRI has grown exponentially, consequently engendering exciting new areas of research. One such development is the use of computational techniques to analyze MRI images much like the way a radiologist would. With the advent of affordable, powerful computing hardware and parallel developments in computer vision, MRI image analysis has also witnessed unprecedented growth. Due to the interdisciplinary and complex nature of this subfield, it is important to survey the current landscape and examine the current approaches for analysis and trend trends moving forward. MRI imaging, analytics, imaging informatics, deep learning. The past decade has seen a remarkable change in the availability of powerful, inexpensive computer hardware that has been a major driving force for the progression of machine vision in medical research. This has resulted in advances in digital MRI imaging analysis that ranges from simple tumor identification to the assessment of tumor response and treatments in clinical oncology. 1 Due to the interdisciplinary nature of the field, principles from physics, computer science, and computer graphics are used to address medical imaging informatics problems. With the existence of vast amounts of imaging data procured during standard clinical practice, a primary focus among investigators has been to use image analysis to augment current standards of tumor detection and to gain new insights about the nature of diseases. The stages in a typical workflow are image acquisition, preprocessing, segmentation, and feature extraction. These key terms that define a typical workflow were queried to find current literature in repositories such as Elsevier, IEEE Xplore, Radiology, PubMed, and Google Scholar. This review discusses past and current methods employed in each of these stages as well as the rising popularity of artificial intelligence (AI)-based approaches, using brain tumors as an exemplar. A glossary of key terms is provided in the supplementary materials for ease of reference as these topics are presented. The first step in any data-driven study is to preprocess the raw images. Preprocessing removes noise by ensuring there is a degree of parity among all the images that in turn make the following segmentation and feature extraction steps more effective. 2 This involves performing operations to remove artifacts, modify image resolution, and address contrast differences that arise from different acquisition hardware and parameters. One common source of noise is bias fields, which are caused by low-frequency signals emitted from the MRI machine combined with patient anatomy that ultimately leads to inhomogeneities in the magnetic field. 3 The resulting images, therefore, have variations in intensity for the same tissue when each tissue should correspond to a specific intensity level. 4 , 5 Another source of noise arises from temporal data. During the course of treatment, patients often have a series of pre- and post-images. These imaging series are valuable for analytics, but is almost impossible for the patient to be in the same exact position for the pre- and post-scans. This can make it difficult to discern the status of the tumor not only for imaging software, but also for radiologists. Thus, images taken over a timeframe must be aligned in a process known as image registration. To address the contrast differences in studies where images are taken from multiple sources and machines, images undergo normalization of color or grayscale values. 6 Normalization is almost universal in controlled imaging studies and is necessary when employing machine-learning techniques. Normalization effectively defines a new range of color values relative to other images in the data set. Before normalization, it may be necessary to remove noise existing on scans of any modality, including the signal from the patient’s skull for patients with brain tumors. Skull stripping is employed to reduce noise from the scans and increase the signal intensities. MR Bias CorrectionDespite the use of higher field strength MRI scanners, inhomogeneities in the magnetic field coupled with general anatomical noise from tissue attenuation will result in minute, visibly undetectable intensity variations in the resulting images. 5 , 7 Because these nonuniformities can skew results of segmentation and statistical features detection, they need to be corrected before proceeding with the rest of the analytical pipeline. 5 The 2 main methodologies for reducing bias field are prospective and retrospective methods. 3 Prospective approaches attempt to reduce the bias field by altering the image capture sequence on the MRI hardware side. Retrospective approaches apply post processing strategies on the already captured image. Retrospective methods can be classified into 4 main subcategories: filtering, surface fitting, segmentation, and histogram. Filtering MethodsFiltering-based methods are perhaps the oldest, easiest, and least computationally demanding of the 4 categories. Filtering removes aspects that meet or do not meet a specified threshold. For MR images, the noise that is removed are artifacts corresponding to low frequencies. However, because the filtering is rather crude, there is a high probability of removing valid signals when using low-pass filtering techniques and the chance of creating new artifacts called edge effects. Research has been conducted to mitigate edge effects, but the overall result still shows bias field. 3 This is important when analyzing brain tumor images as it is crucial to properly identify the structural differences that change as the disease progresses such as the necrotic area and the tumor. The 2 main classical filtering methods still used today are homomorphic filtering and homomorphic unsharp masking. Here, the image is first log transformed followed by a transformation into the frequency domain. Then the bias field is removed via a low-pass filter with the corrected image being the difference between the original image and the bias field. This bias field image is often called the background image. 4 Homomorphic unsharp masking performs the same operations without log transforming the image. Surface FittingThe surface fitting approach is parametric in that it attempts to extract the background image by representing the image as a parametric surface and fitting a 2D image to it. 3 , 7 The 2 main categories of surface fitting methods are intensity and gradient based. Intensity-based methods operate under the precondition that there is no significant intensity variation for a single tissue type. Similarly, gradient-based methods operate with the assumption that there is an even dispersion of bias field and are corrected by estimating the variation in intensity gradients. 3 Because accurate segmentation of regions of interest (ROI) is the goal of bias correction, the 2 steps can be combined. The 2 main segmentation-based approaches are both iterative algorithms: expectation maximum (EM) and fuzzy c-means. The EM algorithm is a machine learning-based approach used to iteratively converge a parametric model’s parameters based on the maximum likelihood probability. The EM approach can use different criteria to estimate the model’s parameters. The fuzzy c-means method also iteratively segments by minimizing a cost function as it steps through a vector of the image’s pixel intensities. 4 EM-based approaches have fallen out of in favor of fuzzy c-means. A histogram is a list that runs the length of the number of intensity values and counts the frequency of each pixel intensity for a given image. An example of a histogram showing the 8-bit pixel value distribution of a slice can be seen in Figure 1b and 1c . Approaches that use intensity distributions are popular and a standard way many research studies correct bias in MR images. 3 The nonparametric nonuniform normalization method (N3) has, since its inception in 1998, been shown to produce the best bias correction. Since then, the N3 method has been upgraded, and the current standard for bias correction is the N4 method. A popular software that contains the N4 bias correction can be found in the Nipype Python package. Chang and coworkers used the N4 bias correction in deep learning based study utilizing TensorFlow to predict isocitrate dehydrogenase status in low- and high-grade gliomas. 8 Although there are several approaches to address bias correction, the area still remains one of the active researches. ![]() (a) An axial slice near the middle of the brain and its associated histograms. (b) A histogram of all gray-level values (0–255). (c) A histogram of all gray-level values but 0 (1–255). Image RegistrationImage registration is the process by which 2 images are spatially aligned using a combination of geometric transformations governed by an optimizer. An image can be geometrically represented and transformed in multiple ways, each with its own pros and cons. It is crucial that key biological landmarks are in the same location for an accurate comparison and analysis. For example, studies may have multitemporal (occurring over a period of time) and/or multimodal (having different contrasts) patient MR imaging data. Due to the breadth of the different kinds of problems that exist when registering images, no one method works for all cases. 9 , 10 In cases that involve brain tumors, especially well-defined glioblastoma multiforme, image registration is crucial as the extraction of accurate morphological features depends on correct alignment of the tumor region. Registration can be divided into 4 main components: feature space, search space, search strategy, and the similarity. Each provides vital information to determine which registration technique to use. 10 Feature space refers to the area of interest to be used as the basis for registration, for example, edges, outlines, tumors. Search space refers to how the image will be transformed to align with the source. Search strategy follows up by determining what transformation to choose based on previous transformation results. The similarity is a comparison metric between the source and target images that are being aligned. This forms the basis of how to frame the registration problem. Recently, advances in image registration research has made this less experimental and more applicable. In practice, the de facto standard for research-based MR image registration and segmentation utilizes the software suite, Insight ToolKit (ITK). 9 ITK (version 5.0) consists of a robust set of algorithms and a structured used in many medical imaging-based software such as 3D Slicer and ITK-Snap. In addition to ITK, the FMRIB Software Library ( 11 FSL) also offers a set of robust image registration frameworks; FMRIB’s Linear Image Registration Tool and its nonlinear counterpart, FMRIB’s Nonlinear Image Registration Tool. Both ITK and FSL are highly regarded for registration. Links to the mentioned software can be found in the Supplementary Materials. Traditional RegistrationPrinciple axis transformation. Principal axes transformation, first reported in 1990s, is a classical way of registering images based off the rigid body rotation concept in Newtonian dynamics. 11 Using brain tumors as the exemplar, we start with the brain. The rigid body is the overall shape of the brain. The brain is treated as a body of mass (ellipse), exhibiting the properties of a mass body such as a center of mass. In this algorithm, the center of mass, or centroid, of the head is calculated. It is important to note that the centroid is computed from the bounding surface of the brain and not the actual dimensions of the image. This is computed by finding the mean intensity level for the x and y axes. Calculated by 11 : where I refers to the intensity of the pixel at coordinate ( x , y ). The moment of inertia matrix of the rigid body is also calculated. This is a standard property of the rigid body that describes the rotational moment from the center of mass. The eigenvector column vectors are then calculated from the inertia matrix, that is then used to find the axes of the ellipse of the head. This is done for both target and source images. The maximum eigenvector is used to calculate the angle with the horizontal axes, which is then compared against source and target images. The difference in angle between source and target is used to dictate how much to align the source to the target. 11 Advantages of this algorithm are as follows: (1) it is easier to register images of different contrasts (modalities), for example PD to T2, and (2) it is a completely unsupervised process. Finite Fourier TransformAnother unsupervised, rigid-body-based method utilizes a comparison of the source and target images via the frequency domain through Fourier transformations. 12 The basis of this algorithm is that given 2 images, the source s 0 ( x , y ) and the target or translated image s 1 ( x , y ), where the target s 1 is assumed to be rotated by an angle θ and translated by pixel distances (Δ x , Δ y ). 12 Thus, the problem now is to find the translation distances (Δ x , Δ y ) and θ , which is accomplished by Fourier transforming s 0 ( x , y ) and s 1 ( x , y ) to S 0 (ξ, η ) and S 1 (ξ, η ), converting the problem to the frequency domain from the spatial domain. In this process, the image is a discrete source of information and the underlying Fourier transform becomes a discrete Fourier transform. The above equations describe the conversion from a 2D matrix representation of the image, f ( x , y ) to the frequency domain F and the reverse process. Following that, the ratio of the 2 images is taken in the frequency domain to determine the rotation angle of the target needed to align with the source. ITK Registration MethodsITK takes an input of 2 images: the source and target. The source is the image to transform to be aligned with the target. The source and target are input into 2 interpolators with a similarity metric process that assesses how closely aligned the source is to the target. With a predefined threshold set, the image iterates through the loop driven by the optimizer algorithms that continues transforming the image until convergence is met. There are 4 main software components of the ITK registration workflow: transformations, the interpolator, the similarity metric, and the optimizer. ITK TransformationsTransformations in the context of image registration and ITK moves points from one space to another—the input to output space. 13 Medical images and MR scans are in a voxel coordinate space and need to be converted into physical coordinate space before any transformations can occur. ITK has its own C++ classes representing certain important geometric properties of images for optimal transformation. These geometric objects are ITK Point, Vector, and CovariantVector. ITK also requires the Jacobian matrix in order to perform transformations. In the matrix, the elements represent the degree of change a transformation will have on the input space to the output space for each point. Linear Geometric TransformsThese are transformations where a function maps the pixels from one space to another expressed as follows: T : R n → R m . In order for the transform to be linear, it must meet the following criteria: All linear transformations are achieved using matrix multiplication and addition, keeping the vector space the same. Affine TransformationAffine transformation is the simplest and most widely used linear transforms that treats the image as a rigid body. The affine family encompasses all rigid body transforms and contains operations that are uniform and nonuniform scales, rotations, shears, and reflections. It provides 12 degrees of freedom in the 3D space. The mathematical operations applied are straightforward and not computationally intensive. The matrix operation, below, for a rotation in 2D illustrates an affine transformation. 14 These affine transforms are composited together to produce the desired alignment, dictated by the metric and optimizer. The crux of the registration difficulty comes with optimizing the transform. An example of a translation of a point is as follows: ITK InterpolatorsThe interpolator functions similarly to interpolation in general image processing. Interpolation is the process where one image is remapped onto a new image space through transformations. In order to determine the new image pixels after transformation, interpolation is necessary. Since the advent of image processing and manipulation software such as Adobe PhotoShop, there are some default interpolators used universally for general image manipulation that ITK employs. ITK utilizes the following interpolation algorithms: Nearest Neighbor, linear, b-spline, and windowed sinc interpolation (higher order). 13 ITK Similarity MetricsThe similarity metric is primarily responsible for comparing how closely 2 images are to each other based on a predefined parameter of comparison. This is a crucial process that can significantly affect the resulting registration. A similarity metric can also be used during texture analysis. The metric that is utilized depends on the kind of image data. With unimodal images, it is preferable to use an intensity based metric. In contrast, a multimodal image set is better suited to a mutual information similarity metric. Since ITK v3, the number of similarity metrics has been refactored and reduced. Metrics included in ITK v5 are as follows: mean square, correlation, mutual information, joint histogram/mutual information, demons, and ANTS neighborhood correlation metrics. Means SquareThe means square method for assessing similarity between images compares pixel intensities at a given coordinate. This method is pixel intensity driven in the grayscale and is quick to compute. If images A and B are represented by a matrix, i is the pixel index, and N is the total number of pixels, the means square metric is calculated as follows 13 : A value of 0 indicates that A and B are the same, with increasing values indicating increasing dissimilarity. Mutual InformationThe mutual information method is an area-based method and can be readily applied in assessing the similarities of 2 images being registered. The basis of mutual information comes from the entropy of one random variable to another. Entropy is the measure of randomness of a random variable that is computed using the formula for Shannon entropy 15 : The mutual information in terms of entropy is written in the following 3 equivalent ways: The mutual information expressions above are analogous to conditional probability. I ( A ; B ) in the second equation states that based on the knowledge of B , there is a decrease in the uncertainty of A . For MRI images, the random variables are the source and target. To interpret mutual information in the context of equation 2: image A at pixel a is the uncertainty of a minus the uncertainty of pixel intensity given the corresponding pixel intensity at b is the mutual information of a and b . 16 Achieving the maximum mutual information indicates a successful registration. The uncertainty comparison between source and target demonstrates how the mutual information metric works on multi-modal image sets performing a relative comparison of intensity values putting it closer in line with feature and area based methods versus intensity-based methods. ITK OptimizersOptimization is the last step in the iterative process of registration. The optimizer’s function consists of a cost function that takes the output value from the similarity metric to calculate and determine the next set of transform parameters to decrease the next metric value. This is an iterative process, of which ITK has many to choose from depending on the transition and metric used. 13 NormalizationNormalization is the process by which gray or color values across multiple images are scaled down to a common set of relative gray values. This ensures that variation in acquisition parameters among scanners is accounted for and that similar tissues appear in a common range of values across all images. The classic method for normalization is histogram matching; however, other methods are better suited for MRI images, such as nonparametric and nonuniform intensity normalization. 17 , 18 Skull Stripping Used When Studying the BrainSkull stripping, or brain extraction, is a computational process that removes extraneous material not critical for analysis such as the skull, fat, and skin. 19 , 20 The removal of extraneous information reduces the amount of noise in the system creating a cleaner platform from which features can be segmented and further analyzed. Because the problem is well defined, the process has been refined to where fully automated methods often do a clean job. The skull appears as a bright ring surrounding the brain allowing for an accurate mask to be created for brain extraction. The Brain Extraction Tool (BET) from FSL is an excellent, fully automated process that performs this task with great success. Segmentation occurs after preprocessing and is where an image is divided into disparate, nonoverlapping regions whose texture features share degrees of homogeneity. In patients with brain cancers, the goal would be to delineate the ROI containing tumor, edema, or other distinguishing features. Segmentation of tumors is a very important part of general clinical diagnosis that also forms the basis of imaging studies. In most segmentation challenges, segmentation algorithms are assessed by the accuracy of segmentation of white matter, gray matter, and cerebrospinal fluid. Over the years, segmentation strategies have been developed and are categorized in different ways. There are 3 main types of segmentation that range in their degree of computer-aided automation: manual segmentation, supervised, and unsupervised. 6 Manual segmentation requires the expertise of a neuroradiologist to draw a perimeter around the area containing the pathology and is completely computer unaided. Supervised segmentation involves input from the user, instructing the algorithm how to perform and what constraints to abide by. The most difficult is unsupervised segmentation method, which requires no user input. Unsupervised segmentation is an area of active research. It is especially problematic with gliomas due to the nature of the disease and surrounding tissue. In some cases, regions can be segmented during the registration process as some alignment functions also recognize distinct regions. A visualization of some of the common segmentation filters applied to an example image can be found in Figure 2 . ![]() The application of 4 common filters used for segmentation in Insight ToolKit. From left to right and top to bottom, the filters are as follows: simple thresholding, binary thresholding, Otsu’s thresholding, region growing, confidence connected, the gradient magnitude, fast marching, and watershed. It is important to note that none of the parameters have been tuned for any of these filters. Segmentation MethodsRegion-growing algorithms. Region growing is a contextual form of segmentation that accounts for the distance of pixels to the current region at hand. Region growing algorithms are considered classical methods that form the foundation for complex permutations of region growing based methods. The basis of region growing is that a random pixel (seed point) is selected either manually or by the computer and the region around that chosen pixel is compared to its neighbors. Similar pixels are grouped according to some parameter as the region grows out from that seed point. Although this method is conceptually quite simple, it can be overly sensitive. Thus, most software packages that utilize region-growing-type algorithms take into account those shortcomings and have developed some complexity. Connected ThresholdOne type of region growing method implemented in ITK is thresholding, specifically connected thresholding. Thresholding turns a grayscale image to a black and white scale by changing each pixel to either black or white depending on a specified gray value cutoff. For example, a simple rule may specify that all pixel values less than constant T will be black and those greater than or equal will be white. The connected threshold method in ITK takes in several parameters as user input: the random coordinates (seed), and upper and lower bounds for the intensities of the region growing algorithm represented as follows: I(X)∈[lower, upper]. 13 As these 3 parameters are required, it is a semiautomatic process. The bounds for the intensities can be determined by observing where the maxima lie on the histogram, calculated either before running through the main region growing algorithm or through observation. Usually, the values for the threshold will lie between 2 maxima. Once the 3 parameters are calculated and input, the process of region growing and thresholding begins by visiting neighboring pixels and determining if they fall under the interval. The process is quick with low computational requirements. Due to the simplicity of the algorithm, it is susceptible to noise and complicated patterns such as inhomogeneities and disconnected regions. This algorithm is ideal for quick prototyping but is limiting. Neighborhood Connected SegmentationThe neighborhood connected method is similar to the connected threshold method. The main differences are that instead of only looking at the next pixel from the current working pixel, the algorithm looks at a neighborhood of pixels and their intensities, like that of a kernel. In this context, a kernel is a fixed square matrix with real number values that iterates over an image from its center point. Depending on the filter, a set of algebraic operations is performed on the current working pixel I ( x , y ) replacing its value with the new one computed from the kernel. Otsu’s SegmentationMR images are grayscale with a typical bit depth of 8 (ie, 8-bit images) where each pixel carries 256 (2 8 ) gray level values. Otsu’s algorithm is an automated binarization method that attempts to separate the foreground from the background by minimizing the within class variance. The problem is essentially divided into 2 parts: background and foreground. For each part, the weight, mean, and variances are calculated as the algorithm iterates along each threshold value (0–255). Although this algorithm works, it is not the most computationally efficient. The process can run faster by using between class variance and optimizing for the largest value. Confidence ConnectedThis semi-automatic method utilizes basic statistical features of the image to apply the filter. Here, the user provides a numerical constant and starting seed location. The method calculates the mean intensity and standard deviation of the region and defines an interval based off the constant value provided. This interval given image is represented as follows: I ( X ) ∈ [ μ − c σ , μ + c σ ] . Neighboring pixels that fall in the interval are found and kept record. This process iterates for either a specified number of iterations or until no more pixels fall under the interval. A pitfall of this method is that the region growing is susceptible to incorrect segmentation when the tissue is statistically inhomogeneous. The output is a binary image with a mask, where the segmented region appears in white and the rest in black. Watershed AlgorithmIn nature, land topography dictates how water flows and watershed algorithms in segmentation emulate this. By analyzing the topography of the landscape, the problem is redefined using gradient descents. Gradient descent is an iterative optimization algorithm that attempts to find the local minima of a function and is widely used in machine learning. In this case, the image is represented as a height function whose minimum is sought. There are 2 ways to optimize the function, by either starting from the bottom and finding the maximum or starting from the top and finding the minimum. The ITK framework employs the latter. Level Set AlgorithmsThe level set family of algorithms originated from the research conducted by Sethian and coworkers, who developed an algorithm that can automatically track curves in any dimension. 21 The level set methodologies have been applied to other fields, including medical image analysis, and form the basis of a family of segmentation algorithms. The fundamental problem is to accurately model a curve. The straightforward way is to parameterize a curve with a set of explicit equations. This approach, however, is both complex and computationally intensive. Additionally, limitations arise when boundaries intersect, divide, and rejoin over time. To solve this problem, the level set method builds the curve as it propagates in space. The initial level set, where the curve has no change in elevation, is called the zero level set and is represented by φ ( x , y ) = 0 . The 2 main ways to describe the curve are through its normal and tangent vectors N → , T → both of which are related to the gradient of φ . The other main property of a propagating curve is its velocity V . The normal vector is defined by N → = − ∇ ϕ | ∇ ϕ | and the tangent is defined by T → = ∇ ϕ | ∇ ϕ | . The normal vector is negative to ensure it points in the inward direction of the curve. The curve’s movement is described in terms of both the explicit curve C and implicit curve φ . The curve C and its movement are described as a function of time with d C d t = V and is related to the implicit definition by d ϕ d t = V | ∇ ϕ | . This forms the basis of the level set methodology. ITK represents the level set function as a higher dimensional function from the beginning as Ψ ( X , t ) where the zero level set is Γ ( X , t ) = { Ψ ( X , t ) = 0 } . 14 Here, X refers to the n -dimensional surface and t the time step. Internally in ITK, the level set works via the following general partial differential equation: In the equation, α , β , γ are constants that serve as weights to influence the advection, propagation, and spatial modifier for the curvature, respectively. Fast Marching SegmentationThe fast marching method is a level set that can quickly resolve shapes when the problem is fairly simple. In fast marching, the problem is framed around movement of the curve starting from the zero level set ϕ ( x , y ) = 0 and the propagating speed of the curve F ( x , y ) > 0. 22 Fast marching functions by aiming to solve the Eikonal partial differential equation, an equation used to model many physical phenomena. The solution to this equation is a set of points, which is the curve and verified to be accepted. In ITK, this starting set of points is user provided as a seed point for the algorithm to start its curve propagation. Because the level set family of algorithms is able to merge with other growing curves, it is preferential to even use multiple seed points for efficient computation. Shape DetectionShape detection was pioneered by Malladi and Sethian and forgoes the parameterized, geometric Lagrangian approach taken by earlier “snake” methods for level sets. 21 The ITK shape detection filter implements Malladi’s principles by requiring 2 objects of input: the initial ITK image as a level set and its edge potential image, produced via sigmoid filter, which is used to help determine the speed of front propagation. Before the original image goes through the shape detection module, it is first preprocessed with a Gaussian filter, followed by the sigmoid filter to create its complementary edge potential image. Briefly, the process is: Read image with ITK Smooth with anisotropic filter Smooth again with Gaussian filter Produce edge potential image with Sigmoid filter Use seeds and distance parameters to create a level set from the ITK image Pass level set and edge potential into shape detection module and post process with a binary filter to reveal the segmented image Geodesic Active ContourThe Geodesic active contour method, proposed by Caselles et al., sought to solve the limitations of the classical “snake”-based method of curve tracking that fails when topological changes are presented. 18 This is achieved by starting from the classical snake’s energy-based representation of the curve, called E ( C ) and expressed as follows: Here, α and λ are constants greater than 0, the first integral represents the contour’s smoothness, and the second the attraction of the contour to an arbitrary object in the image I . 23 Maupertuis’ and Fermat’s Principles are combined with Sethian’s level set to derive the implicit parameterization of curves via geodesics. In ITK, this underlying theory is abstracted to a workflow similar to that of the shape detection. For the ITK geodesic pipeline, the parameters that can be changed affect the propagation, curvature, and advection of the curves that are drawn from the source image. The pipeline parameters are: seed coordinate, distance, σ for the sigmoid filter, α and β constants, and a propagation scaling value. The last commonly used segmentation method found in ITK, as well as SciKit and OpenCV, is Canny edge detection, which works by calculating the gradient of the image and using the resulting matrices to “find” the edge. A Gaussian filter is commonly applied first to the image to remove noise and smooth edges. In the ITK workflow, the 2 parameters that can be modified are the variance for the Gaussian filter, and a threshold value for the binary thresholding at the very end. Atlas-Based Segmentation with a Focus on Brain ImagingUnlike previous techniques, atlas-based segmentation is not a de novo technique. An atlas is a template that outlines and defines the main anatomical structures and their coordinates, typically on the 3 anatomical planes (axial, sagittal, coronal). For brain imaging, an atlas of a healthy human brain is used to perform and aid in segmenting features. Several standards and types of atlases exist including the Talairach Atlas and Allen Brain Atlas. The 2 main types of atlases are topological (deterministic) and probabilistic. 24 The Talairach atlas is topological and attempts to map out a healthy male and female brain volumetrically using a combination of imaging modalities such as CT and MR. It is often sourced from only one sample. Probabilistic atlases, in contrast, are created from multiple subjects in order to probabilistically determine the chances of a certain feature appearing in a certain region of the brain. It is akin to the creation of a probability distribution for a random variable of brain atlases. These atlases address the shortcomings of the Talairach atlas by establishing a probabilistic map of brain tissue features often produced from a large sample of subjects. A major source of data for probabilistic atlases comes from the UCLA Brain Mapping Center, part of the International Consortium for Brain Mapping. Using these atlases, it is possible to segment features from new scans. The first step is typically to undergo preprocessing steps that involve skull stripping and image registration to the atlas. Once completed, there are 3 main atlas-based segmentation strategies that can be utilized: label propagation, multiatlas propagation, and probabilistic atlas segmentation. The simplest method is label propagation, which assumes that once the image is registered to the atlas, many of the major anatomical structures are approximately in the same voxels. The general framework of label propagation algorithms attempts to map the labels from the atlas onto the image of interest. These mappings are almost like a continuation of registration, as the mathematical techniques often used are ones such as affine transformations and principle axes. More complex methods can also be used, such as the level set-based approaches. Label propagation is limited as it simply outlines major contours and cannot identify new features. Multiatlas propagation is the application of label propagation across multiple atlases. The biggest challenge with this approach is choosing how to aggregate and register the labels across multiple atlases. One common method is to use a weighting function for each atlas to classify voxels and has seen favorable accuracy. For general probabilistic segmentation, the approach is Bayesian and expressed as p ( l ( x ) | c ) ∗ p ( c ) ; the conditional probability of the pixel intensity given a class c (label of a feature) and the class prior p ( c ). 24 The probabilistic strategy tends to work best when segmenting new features, such as tumors. Segmentation and Brain TumorsThe preprocessing steps prior to segmentation are necessary to increase the probability and quality of accurate segmentation. In the clinical setting, a licensed radiologist parses through a patient’s data, identifies key features through segmentation, and reports their findings—an arduous process that takes years of experience and time. Computationally driven segmentation of brain tumors is necessary to reduce this overhead while procuring the same quality of information for data driven studies. However, gliomas, the most common type of malignant brain tumor in adults, can manifest in any region of the brain and are much harder to detect when they are lower grade. Fortunately, the availability of neural network frameworks has given researchers a new tool to address the segmentation challenge. Once the ROI have been accurately segmented and classified, the next step is to find meaning from the newly sorted information through feature extraction. The major features that are used are first order, gray level co-occurrence, structural, and transform features. Each of these provides information about an image or image series. ![]() First-Order FeaturesFirst-order statistical features are those that are directly computed from the gray value intensities in the image. These are computationally simple and form the basis of second and higher order features. Table 1 summarizes the most used first-order features for texture analysis where m , n , and f refer to the length, width, and the image, respectively. 13 , 17 , 25 A Summary of Common First-Order Statistical Features and Their Significance in regards to a Grayscale Image
Gray-Level Co-occurrence MatricesGray-level co-occurrence matrices (GLCM), developed by Haralick et al., show the occurrence of gray levels per pixel relative to other pixels. 26 These matrices can show the run length of the gray level values in 4 directions θ : 0°, 45°, 90°, 135°. Among the most common GLCM are the gray level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), neighborhood gray tone difference matrix, and gray-level dependence matrix. A GLCM has square dimensions of length equal to the number of gray values. The matrix is computed by counting the frequency of gray value i to gray value j per a determined spatial relationship. The most common spatial relationship used is adjacency to the current pixel. A GLRLM maps out how many continuous gray-level values exist in the image along a defined angle θ . For example, if an image has 8 gray-level values and is of dimension 10 × 10 pixels, the resulting GLRLM has dimensions 8 × 10. The rows represent each gray-level value and the columns represent the length of contiguous pixels of that gray-level value. The number of unique occurrences is counted. A GLSZM is similar to the GLRLM in that continuous gray-level values are counted, with the added condition it counts every single connected instance not restricted by an angle θ . This results in only one matrix. With the results taken from the first and GLCM statistics, models to characterize images or an image series are built (3D). Structural FeaturesStructural features, or morphological features, describe the shapes of a ROI. Common 3D structural features are volume and shape metrics. 27 Volumetric features include volumes of contrast-enhanced tumor, peritumoral edema, necrosis, and nonenhancing tumor. Ratios of each of these regions can be taken to compute a comparative metric. Shape features include the bounding ellipsoid volume ratio, which is the ratio of the tumor’s volume to the volume of the smallest ellipsoid that bounds the tumor. The orientation of the ellipsoid highlights the spatial position of the tumor. Metrics of sphericity, measuring roundness, compare the ratio of the surface area of the tumor to the surface area of a sphere of equivalent volume. In addition to 3D features, 2D shape features are computed on a per slice basis and include a tumor’s centroid, mean radial distance, radial distance standard deviation, mass circularity, entropy of radial distance, area ratio, zero crossing count, and mass boundary roughness. Transform FeaturesThe other approach to extracting features is by decomposing the image into a frequency domain allowing for a spectral analysis approach. Common transform methods are wavelet transforms, Fourier, and discrete cosine. Statistical TestsOnce the features have been extracted, perhaps the largest hurdle is result interpretation and statistical testing. When testing for normality, tests such as the t -test, ANOVA, the Kruskal–Wallis, and Mann–Whitney are used. When multiple groups with multiple features exist, the Tukey honest significant difference test or Benjamin–Hochberg tests are utilized. For a logistic regression type analysis, the standard Cox regression model and receiver operating characteristic analysis are often used. 28 Biomarker RecordingAll of the features discussed quickly create a massive matrix of data, making it difficult to properly maintain records of each biological feature, or biomarker. To address this data management problem, the image standardization biomarker initiative (IBSI) was founded to devise a set of rules to standardize the extraction and naming of imaging biomarkers, enhance reproducibility, suggest workflows, and establish biomarker reporting guidelines. 29 The IBSI has proposed a general scheme for biomedical image processing workflows. As the field is dynamic, this scheme is not permanent but rather a guideline for investigators. This review has been structured in a way that follows IBSI’s scheme: data acquisition, preprocessing, segmentation, image interpolation (optional), feature extraction, and feature data. A high-level visualization of this workflow can be found in the flowchart in Figure 3 . Interpolation is optional in cases where patients do not have the same number of slices. This often occurs in multi-institutional studies. Interpolating missing images for parity is sometimes necessary, depending on the type of analysis to be performed. ![]() A flowchart of a general MR image analytics workflow and a potential use of AI-based methods in the segmentation process block. The main quantitative image features that the IBSI outlines are morphology, local intensity, intensity-based statistics, intensity histogram, intensity volume histogram, gray level co-occurrence, run length, size zone, distance zone matrix, neighborhood gray tone difference, and neighborhood gray-level dependence matrix. For each of the features in these groups, IBSI assigns a standard code. For example, the mean intensity statistical feature is assigned the ID of Q4LE. To further remove ambiguity and avoid the misuse of terminology when discussing certain radiomics terms the nomenclature is defined. The guidelines set forth by IBSI are thorough and outline a typical image processing workflow for investigators. Approaches Using AIAI is a relatively new field that emerged in the mid-20th century. AI can be generally defined as the study of rational agents, their composition, and construction that encompasses both machine learning and deep learning approaches. 30 Minsky and Papert formulated the early theory of perceptron which was a generalizable model establishing the foundation of neural networks that comprises the basis of many modern deep learning models today. Neural networks are made up of nodes (neurons), an activation a , and a set of parameters Θ = { W , B } , which are weights and biases, respectively. The activation is simply a linear combination of the input x to the parameters multiplied by a transfer function σ which is expressed as a = σ ( w T x + b ) . 25 Common transfer functions are the sigmoid and hyperbolic tangent functions. These inputs x undergo numerous transformations that in turn form the hidden layers of a deep neural network (DNN). One of the most widely used DNN, especially in MRI imaging, is the convolutional neural network (CNN). 31 Its use can be observed in all parts of the workflow. Among these DNN, some of the most widely used network architectures are ResNet, generative adversarial neural networks (GANs), and U-nets. The last of which is particularly important as the authors who formulated the U-net architecture did so with a focus of applying it to segment medical data. 32 The foundational blocks that comprise a CNN are its convolutional and pooling layers. A convolutional layer, which takes a layer of neurons as input, applies a filter to that layer of neurons. The raw input image is the initial layer set followed by filters to produce a feature map of the original data. This is then passed further through the network. The feature map is what the network deems as a unique feature. Often, the convolutional layers and its filters will produce a vast number of features in its map. When this occurs, a pooling layer is added, condensing the feature map to reduce its size. The third important block is batch normalization. As the name suggest, batch normalization normalizes data and consequently accelerates the learning process in the CNN. This is achieved through normalization of new inputs before each layer. In constructing these CNNs, network architects have increased freedom to choose the location and number of the convolutions, in addition to other features. This ability allows for the generation of unique networks that can be used to accomplish individual imaging goals. In contrast to traditional methods where registration and segmentation can be answered by framing the problem in different ways, deep learning accomplishes this through the construction of novel networks, training it with a large data set, and assessing the results. 33 Deep learning approaches also start and end differently compared with traditional methods. Deep learning algorithms require a training set of already preprocessed, normalized images that are cropped to the same dimensions. This is crucial as the quality of the input dictates output quality. Although the aforementioned methods in ITK function can successfully segment disparate regions, they do require manual tweaking, which becomes cumbersome with large data sets. With deep learning, the CNN performs the tweaking automatically while iterating through the convolutions. Applications of deep learning have been used in all aspects of MRI image data including image registration, segmentation, and feature extraction and classification. 18 Besides the application of CNNs to address the traditional problems, they can be used unconventionally to generate artificial data via GANs. This has led to research in ways GANs can be used to denoise data and find artifacts. 31 Recent developments in this area have been the use of GANs to super-sample low resolution MRI images to create resulting data that has effectively higher spatial resolution than the source while maintaining source structural integrity. 34 Neural Networks and Brain TumorsWhen applied to general imaging analytics, neural networks have had some success when compared with prior methods, which was related to previous over tumor segmentation in MRI images. 35 Segmentation of brain tumor features is challenging due to the wide variability at present and progression of disease, making the accuracy of CNNs more attractive for use in this complex disease. Unlike tabulated information, 3D MRI scans contain vast amounts of information. When training a model using imaging data, a CNN can often times create millions of parameters as it attempts to find and classify features. 35 Typically, MRI images can be fed into a model by dividing each slice into patches or by supplying the whole slice image. Zhao et al. employed a fully convolutional neural network that was found to be more efficient by reading the full slice. 35 As these novel approaches are more commonly applied to brain tumors, it is expected that novel discoveries for patient translational will be utilized. MRI imaging analysis advanced significantly since the advent of computer vision and computer graphics. Many advances were made in parallel and led to the creation of key tools such as ITK and FSL. Both are widely used among researchers with continued refinement. AI is being applied to many areas, including MRI imaging analysis, which is now moving at an accelerated pace as new deep learning-based research is conducted. This application of AI will undoubtedly open new areas of research and investigation, particularly for challenging diseases such as brain tumors. This work was supported through developmental funds from CWRU School of Medicine and University Hospitals Research Division. Conflict of interest statement . None of the authors have any conflicts of interest to disclose. All authors participated in the manuscript draft and revision. Cai WL , Hong GB . Quantitative image analysis for evaluation of tumor response in clinical oncology . Chronic Dis Transl Med. 2018 ; 4 ( 1 ): 18 – 28 . Google Scholar van Ginneken B , Schaefer-Prokop CM , Prokop M . Computer-aided diagnosis: how to move from the laboratory to the clinic . Radiology. 2011 ; 261 ( 3 ): 719 – 732 . Song S , Zheng Y , He Y . A review of methods for bias correction in medical images . Biomed Eng Rev. 2017 ; 3 ( 1 ). doi: 10.18103/bme.v3i1.1550 Juntu J , Sijbers J , Dyck D , Gielen J . Bias field correction for MRI images. In: Kurzyński M , Puchała E , Woźniak M , Żołnierek A , eds. Computer Recognition Systems . Vol 30 . Berlin/Heidelberg, Germany : Springer Berlin Heidelberg ; 2005 : 543 – 551 . doi: 10.1007/3-540-32390-2_64 Google Preview Leger S , Löck S , Hietschold V , Haase R , Böhme HJ , Abolmaali N . Physical correction model for automatic correction of intensity non-uniformity in magnetic resonance imaging . Phys Imaging Radiat Oncol. 2017 ; 4 : 32 – 38 . Iqbal S , Khan MUG , Saba T , Rehman A . Computer-assisted brain tumor type discrimination using magnetic resonance imaging features . Biomed Eng Lett. 2018 ; 8 ( 1 ): 5 – 28 . Brinkmann BH , Manduca A , Robb RA . Optimized homomorphic unsharp masking for MR grayscale inhomogeneity correction . IEEE Trans Med Imaging. 1998 ; 17 ( 2 ): 161 – 171 . Chang K , Bai HX , Zhou H , et al. Residual convolutional neural network for the determination of idh status in low- and high-grade gliomas from MR imaging . Clin Cancer Res. 2018 ; 24 ( 5 ): 1073 – 1081 . Avants BB , Tustison NJ , Stauffer M , Song G , Wu B , Gee JC . The insight toolkit image registration framework . Front Neuroinform. 2014 ; 8 : 44 . Kostelec PJ , Periaswamy S . Image registration for MRI . Modern Signal Processing 2013 ; 46 : 161 – 184 . Alpert NM , Bradshaw JF , Kennedy D , Correia JA . The principal axes transformation: a method for image registration . J Nucl Med . 1990 ; 31 ( 10 ): 1717 – 1722 . De Castro E , Morandi C . Registration of translated and rotated images using finite Fourier transforms . IEEE Trans Pattern Anal Mach Intell. 1987 ; 9 ( 5 ): 700 – 703 . Johnson HJ , McCormick MM. The ITK Software Guide Book 2: Design and Functionality . 4th ed. Vol. 536 . Clifton Park, NY : Kitware, Inc . Jenkinson M , Smith S . A global optimisation method for robust affine registration of brain images . Med Image Anal. 2001 ; 5 ( 2 ): 143 – 156 . Shannon CE. A Mathematical Theory of Communication . Vol. 55 . Nokia Bell Labs. Pluim JPW , Maintz JBA , Viergever MA . Mutual-information-based registration of medical images: a survey . IEEE Trans Med Imaging. 2003 ; 22 ( 8 ): 986 – 1004 . Aggarwal N , Agrawal R . First and second order statistics features for classification of magnetic resonance brain images . J Signal Inf Process. 2012 ; 03 ( 02 ): 146 – 153 . Litjens G , Kooi T , Bejnordi BE , et al. A survey on deep learning in medical image analysis . Med Image Anal. 2017 ; 42 : 60 – 88 . Bahadure NB , Ray AK , Thethi HP . Image analysis for MRI based brain tumor detection and feature extraction using biologically inspired BWT and SVM . Int J Biomed Imaging. 2017 ; 2017 : 9749108 . Varuna Shree N , Kumar TNR . Identification and classification of brain tumor MRI images with feature extraction using DWT and probabilistic neural network . Brain Inform. 2018 ; 5 ( 1 ): 23 – 30 . Malladi R , Sethian JA , Vemuri BC . Shape modeling with front propagation: a level set approach . IEEE Trans Pattern Anal Mach Intell. 1995 ; 17 ( 2 ): 158 – 175 . Sethian JA . Evolution, implementation, and application of level set and fast marching methods for advancing fronts . J Comput Phys. 2001 ; 169 ( 2 ): 503 – 555 . Caselles V , Kimmel R , Sapiro G . Geodesic active contours. In: Proceedings of IEEE International Conference on Computer Vision . Cambridge, MA : IEEE Computer Society Press ; 1995 : 694 – 699 . Cabezas M , Oliver A , Lladó X , Freixenet J , Cuadra MB . A review of atlas-based segmentation for magnetic resonance brain images . Comput Methods Programs Biomed. 2011 ; 104 ( 3 ): e158 – e177 . Nabizadeh N , Kubat M . Brain tumors detection and segmentation in MR images: gabor wavelet vs. statistical features . Comput Electr Eng. 2015 ; 45 : 286 – 301 . Haralick RM , Shanmugam K , Dinstein I . Textural features for image classification . IEEE Trans Syst Man Cybern. 1973 ; SMC-3 ( 6 ): 610 – 621 . Sanghani P , Ang BT , King NKK , Ren H . Overall survival prediction in glioblastoma multiforme patients from volumetric, shape and texture features using machine learning . Surg Oncol. 2018 ; 27 ( 4 ): 709 – 714 . Varghese BA , Cen SY , Hwang DH , Duddalwar VA . Texture analysis of imaging: what radiologists need to know . AJR Am J Roentgenol. 2019 ; 212 ( 3 ): 520 – 528 . Zwanenburg A , Leger S , Vallières M , Löck S . Image biomarker standardisation initiative . ArXiv161207003 Cs Eess. 2019 . http://arxiv.org/abs/1612.07003 . Accessed January 23, 2020 . Russell SJ , Norvig P , Davis E. Artificial Intelligence: A Modern Approach . 3rd ed. Upper Saddle River, NJ : Prentice Hall ; 2010 . Selvikvåg Lundervold A , Lundervold A . An overview of deep learning in medical imaging focusing on MRI . Z Für Med Phys. 2018 . Ronneberger O , Fischer P , Brox T . U-Net: convolutional networks for biomedical image segmentation . ArXiv150504597 Cs. 2015 . http://arxiv.org/abs/1505.04597 . Accessed April 16, 2019 . Işın A , Direkoğlu C , Şah M . Review of MRI-based brain tumor image segmentation using deep learning methods . Procedia Comput Sci. 2016 ; 102 : 317 – 324 . Lyu Q , Shan H , Wang G . Multi-contrast super-resolution mri through a progressive network . ArXiv190801612 Phys. 2019 . http://arxiv.org/abs/1908.01612 . Accessed November 20, 2019 . Zhao X , Wu Y , Song G , Li Z , Zhang Y , Fan Y . A deep learning model integrating FCNNs and CRFs for brain tumor segmentation . Med Image Anal. 2018 ; 43 : 98 – 111 .
Email alertsRelated articles in pubmed, citing articles via.
Affiliations
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
This Feature Is Available To Subscribers OnlySign In or Create an Account This PDF is available to Subscribers Only For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
Master Thesis-Medical Image Analysis using Deep Learning
Figures and Tables from this paper![]() 80 ReferencesDeep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases, a weakly supervised method for instance segmentation of biological cells, a dataset and a technique for generalized nuclear segmentation for computational pathology.
U-Net: Convolutional Networks for Biomedical Image Segmentation
Self-Supervised Nuclei Segmentation in Histopathological Images Using AttentionMethods for segmentation and classification of digital microscopy tissue images, chest radiograph pathology categorization via transfer learning, robust nuclei segmentation in histopathology using asppu-net and boundary refinement, inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images, a nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution, related papers. Showing 1 through 3 of 0 Related Papers ![]() An official website of the United States government The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site. The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
![]() Medical image analysis based on deep learning approachMuralikrishna puttagunta. Department of Computer Science, School of Engineering and Technology, Pondicherry University, Pondicherry, India Medical imaging plays a significant role in different clinical applications such as medical procedures used for early detection, monitoring, diagnosis, and treatment evaluation of various medical conditions. Basicsof the principles and implementations of artificial neural networks and deep learning are essential for understanding medical image analysis in computer vision. Deep Learning Approach (DLA) in medical image analysis emerges as a fast-growing research field. DLA has been widely used in medical imaging to detect the presence or absence of the disease. This paper presents the development of artificial neural networks, comprehensive analysis of DLA, which delivers promising medical imaging applications. Most of the DLA implementations concentrate on the X-ray images, computerized tomography, mammography images, and digital histopathology images. It provides a systematic review of the articles for classification, detection, and segmentation of medical images based on DLA. This review guides the researchers to think of appropriate changes in medical image analysis based on DLA. IntroductionIn the health care system, there has been a dramatic increase in demand for medical image services, e.g. Radiography, endoscopy, Computed Tomography (CT), Mammography Images (MG), Ultrasound images, Magnetic Resonance Imaging (MRI), Magnetic Resonance Angiography (MRA), Nuclear medicine imaging, Positron Emission Tomography (PET) and pathological tests. Besides, medical images can often be challenging to analyze and time-consuming process due to the shortage of radiologists. Artificial Intelligence (AI) can address these problems. Machine Learning (ML) is an application of AI that can be able to function without being specifically programmed, that learn from data and make predictions or decisions based on past data. ML uses three learning approaches, namely, supervised learning, unsupervised learning, and semi-supervised learning. The ML techniques include the extraction of features and the selection of suitable features for a specific problem requires a domain expert. Deep learning (DL) techniques solve the problem of feature selection. DL is one part of ML, and DL can automatically extract essential features from raw input data [ 88 ]. The concept of DL algorithms was introduced from cognitive and information theories. In general, DL has two properties: (1) multiple processing layers that can learn distinct features of data through multiple levels of abstraction, and (2) unsupervised or supervised learning of feature presentations on each layer. A large number of recent review papers have highlighted the capabilities of advanced DLA in the medical field MRI [ 8 ], Radiology [ 96 ], Cardiology [ 11 ], and Neurology [ 155 ]. Different forms of DLA were borrowed from the field of computer vision and applied to specific medical image analysis. Recurrent Neural Networks (RNNs) and convolutional neural networks are examples of supervised DL algorithms. In medical image analysis, unsupervised learning algorithms have also been studied; These include Deep Belief Networks (DBNs), Restricted Boltzmann Machines (RBMs), Autoencoders, and Generative Adversarial Networks (GANs) [ 84 ]. DLA is generally applicable for detecting an abnormality and classify a specific type of disease. When DLA is applied to medical images, Convolutional Neural Networks (CNN) are ideally suited for classification, segmentation, object detection, registration, and other tasks [ 29 , 44 ]. CNN is an artificial visual neural network structure used for medical image pattern recognition based on convolution operation. Deep learning (DL) applications in medical images are visualized in Fig. 1 . ![]() a X-ray image with pulmonary masses [ 121 ] b CT image with lung nodule [ 82 ] c Digitized histo pathological tissue image [ 132 ] Neural networksHistory of neural networks. The study of artificial neural networks and deep learning derives from the ability to create a computer system that simulates the human brain [ 33 ]. A neurophysiologist, Warren McCulloch, and a mathematician Walter Pitts [ 97 ] developed a primitive neural network based on what has been known as a biological structure in the early 1940s. In 1949, a book titled “Organization of Behavior” [ 100 ] was the first to describe the process of upgrading synaptic weights which is now referred to as the Hebbian Learning Rule. In 1958, Frank Rosenblatt’s [ 127 ] landmark paper defined the structure of the neural network called the perceptron for the binary classification task. In 1962, Windrow [ 172 ] introduced a device called the Adaptive Linear Neuron (ADALINE) by implementing their designs in hardware. The limitations of perceptions were emphasized by Minski and Papert (1969) [ 98 ]. The concept of the backward propagation of errors for purposes of training is discussed in Werbose1974 [ 171 ]. In 1979, Fukushima [ 38 ] designed artificial neural networks called Neocognitron, with multiple pooling and convolution layers. One of the most important breakthroughs in deep learning occurred in 2006, when Hinton et al. [ 9 ] implemented the Deep Belief Network, with several layers of Restricted Boltzmann Machines, greedily teaching one layer at a time in an unsupervised fashion. In 1989, Yann LeCun [ 71 ] combined CNN with backpropagation to effectively perform the automated recognition of handwritten digits. Figure Figure2 2 shows important advancements in the history of neural networks that led to a deep learning era. ![]() Demonstrations of significant developments in the history of neural networks [ 33 , 134 ] Artificial neural networksArtificial Neural Networks (ANN) form the basis for most of the DLA. ANN is a computational model structure that has some performance characteristics similar to biological neural networks. ANN comprises simple processing units called neurons or nodes that are interconnected by weighted links. A biological neuron can be described mathematically in Eq. ( 1 ). Figure Figure3 3 shows the simplest artificial neural model known as the perceptron. ![]() Perceptron [ 77 ] Training a neural network with Backpropagation (BP)In the neural networks, the learning process is modeled as an iterative process of optimization of the weights to minimize a loss function. Based on network performance, the weights are modified on a set of examples belonging to the training set. The necessary steps of the training procedure contain forward and backward phases. For Neural Network training, any of the activation functions in forwarding propagation is selected and BP training is used for changing weights. The BP algorithm helps multilayer FFNN to learn input-output mappings from training samples [ 16 ]. Forward propagation and backpropagation are explained with the one hidden layer deep neural networks in the following algorithm. The backpropagation algorithm is as follows for one hidden layer neural network
Feed-forward propagation:
y k _ in = b k + ∑ j = 1 p z j w jk and calculate activation y k = f ( y k _ in ) Backpropagation At output-layer neurons δ k = ( t k − y k ) f ′ ( y k _ in ) At Hidden layer neurons δ j = f ′ z j _ in ∑ k m δ k w jk
Each output layer ( Y k , k = 1, 2, …. m ) updates its weights ( J = 0, 1, … P ) and bias w jk ( new ) = w jk ( old ) + ηδ k z j ; b k ( new ) = b k ( old ) + ηδ k Each hidden layer ( Z J , J = 1, 2, … p ) updates its weights ( i = 0, 1, … n ) biases: w ij ( new ) = w ij ( old ) + ηδ j x i ; b j ( old ) = b j ( old ) + ηδ j
Activation functionThe activation function is the mechanism by which artificial neurons process and transfers information [ 42 ]. There are various types of activation functions which can be used in neural networks based on the characteristic of the application. The activation functions are non-linear and continuously differentiable. Differentiability property is important mainly when training a neural network using the gradient descent method. Some widely used activation functions are listed in Table Table1 1 . Activation functions
Deep learningDeep learning is a subset of the machine learning field which deals with the development of deep neural networks inspired by biological neural networks in the human brain . AutoencoderAutoencoder (AE) [ 128 ] is one of the deep learning models which exemplifies the principle of unsupervised representation learning as depicted in Fig. 4a . AE is useful when the input data have more number of unlabelled data compared to labeled data. AE encodes the input x into a lower-dimensional space z. The encoded representation is again decoded to an approximated representation x ′ of the input x through one hidden layer z. ![]() a Autoencoder [ 187 ] b Restricted Boltzmann Machine with n hidden and m visible units [ 88 ] c Deep Belief Networks [ 88 ] Basic AE consists of three main steps: Encode: Convert input vector x ϵ R m into h ϵ R n , the hidden layer by h = f ( wx + b )where w ϵ R m ∗ n and b ϵ R n . m and n are dimensions of the input vector and converted hidden state. The dimension of the hidden layer h is to be smaller than x . f is an activate function. Decode: Based on the above h , reconstruct input vector z by equation z = f ′ ( w ′ h + b ′ ) where w ′ ϵ R n ∗ m and b ′ ϵ R m . The f ′ is the same as the above activation function. Calculate square error: L recons ( x , z) = ∥ x − z∥ 2 , which is the reconstruction error cost function. Reconstruct error minimization is achieved by optimizing the cost function (2) Another unsupervised algorithm representation is known as Stacked Autoencoder (SAE). The SAE comprises stacks of autoencoder layers mounted on top of each other where the output of each layer was wired to the inputs of the next layer. A Denoising Autoencoder (DAE) was introduced by Vincent et al. [ 159 ]. The DAE is trained to reconstruct the input from random noise added input data. Variational autoencoder (VAE) [ 66 ] is modifying the encoder where the latent vector space is used to represent the images that follow a Gaussian distribution unit. There are two losses in this model; one is a mean squared error and the Kull back Leibler divergence loss that determines how close the latent variable matches the Gaussian distribution unit. Sparse autoencoder [ 106 ] and variational autoencoders have applications in unsupervised, semi-supervised learning, and segmentation. Restricted Boltzmann machineA Restricted Boltzmann machine [RBM] is a Markov Random Field (MRF) associated with the two-layer undirected probabilistic generative model, as shown in Fig. Fig.4b. 4b . RBM contains visible units (input) v and hidden (output) units h . A significant feature of this model is that there is no direct contact between the two visible units or either of the two hidden units. In binary RBMs, the random variables ( v , h ) takes ( v , h ) ∈ {0, 1} m + n . Like the general Boltzmann machine [ 50 ], the RBM is an energy-based model. The energy of the state { v , h } is defined as (3) where v j , h i are the binary states of visible unit j ∈ {1, 2, … m } and hidden unit i ∈ {1, 2, .. n }, b j , c i are their biases of visible and hidden units, w ij is the symmetric interaction term between the units v j and h i them. A joint probability of ( v , h ) is given by the Gibbs distribution in Eq. ( 4 ) Z is a “partition function” that can be given by summing over all possible pairs of visual v and hidden h (5). A significant feature of the RBM model is that there is no direct contact between the two visible units or either of the two hidden units. In term of probability, conditional distributions p ( h | v ) and p ( v | h ) is computed as (6) p h v = ∏ i = 1 n p h i v For binary RBM condition distribution of visible and hidden are given by (7) and (8) where σ( · ) is a sigmoid function RBMs parameters ( w ij , b j , c i ) are efficiently calculated using the contrastive divergence learning method [ 150 ]. A batch version of k-step contrastive divergence learning (CD-k) can be discussed in the algorithm below [ 36 ] ![]() Deep belief networksThe Deep Belief Networks (DBN) proposed by Hinton et al. [ 51 ] is a non-convolution model that can extract features and learn a deep hierarchical representation of training data. DBNs are generative models constructed by stacking multiple RBMs. DBN is a hybrid model, the first two layers are like RBM, and the rest of the layers form a directed generative model. A DBN has one visible layer v and a series of hidden layers h (1) , h (2) , …, h ( l ) as shown in Fig. Fig.4c. 4c . The DBN model joint distribution between the observed units v and the l hidden layers h k ( k = 1, … l ) as (9) where v = h (0) , P ( h k | h k + 1 ) is a conditional distribution (10) for the layer k given the units of k + 1 A DBN has l weight matrices: W (1) , …. , W ( l ) and l + 1 bias vectors: b (0) , …, b ( l ) P ( h ( l ) , h ( l − 1) ) is the joint distribution of top-level RBM (11). The probability distribution of DBN is given by Eq. ( 12 ) Convolutional neural networks (CNN)In neural networks, CNN is a unique family of deep learning models. CNN is a major artificial visual network for the identification of medical image patterns. The family of CNN primarily emerges from the information of the animal visual cortex [ 55 , 116 ]. The major problem within a fully connected feed-forward neural network is that even for shallow architectures, the number of neurons may be very high, which makes them impractical to apply to image applications. The CNN is a method for reducing the number of parameters, allows a network to be deeper with fewer parameters. CNN’s are designed based on three architectural ideas that are shared weights, local receptive fields, and spatial sub-sampling [ 70 ]. The essential element of CNN is the handling of unstructured data through the convolution operation. Convolution of the input signal x ( t ) with filter signal h ( t ) creates an output signal y ( t ) that may reveal more information than the input signal itself. 1D convolution of a discrete signals x ( t ) and h ( t ) is (13) A digital image x ( n 1 , n 2 ) is a 2-D discrete signal. The convolution of images x ( n 1 , n 2 ) and h ( n 1 , n 2 ) is (14) where 0 ≤ n 1 ≤ M − 1, 0 ≤ n 2 ≤ N − 1. The function of the convolution layer is to detect local features x l from input feature maps x l − 1 using kernels k l by convolution operation (*) i.e. x l − 1 ∗ k l . This convolution operation is repeated for every convolutional layer subject to non-linear transform (15) where k mn l represents weights between feature map m at layer l − 1 and feature map n at l . x m l − 1 represents the m feature map of the layer l − 1 and x n l is n feature map of the layer l . b m l is the bias parameter. f (.) is the non-linear activation function. M l − 1 denotes a set of feature maps. CNN significantly reduces the number of parameters compared with a fully connected neural network because of local connectivity and weight sharing. The depth, zero-padding, and stride are three hyperparameters for controlling the volume of the convolution layer output. A pooling layer comes after the convolutional layer to subsample the feature maps. The goal of the pooling layers is to achieve spatial invariance by minimizing the spatial dimension of the feature maps for the next convolution layer. Max pooling and average pooling are commonly used two different polling operations to achieve downsampling. Let the size of the pooling region M and each element in the pooling region is given as x j = ( x 1 , x 2 , … x M × M ), the output after pooling is given as x i . Max pooling and average polling are described in the following Eqs. ( 16 ) and ( 17 ). The max-pooling method chooses the most superior invariant feature in a pooling region. The average pooling method selects the average of all the features in the pooling area. Thus, the max-pooling method holds texture information that can lead to faster convergence, average pooling method is called Keep background information [ 133 ]. Spatial pyramid pooling [ 48 ], stochastic polling [ 175 ], Def-pooling [ 109 ], Multi activation pooling [ 189 ], and detailed preserving pooling [ 130 ] are different pooling techniques in the literature. A fully connected layer is used at the end of the CNN model. Fully connected layers perform like a traditional neural network [ 174 ]. The input to this layer is a vector of numbers (output of the pooling layer) and outputs an N-dimensional vector (N number of classes). After the pooling layers, the feature of previous layer maps is flattened and connected to fully connected layers. The first successful seven-layered LeNet-5 CNN was developed by Yann LeCunn in 1990 for handwritten digit recognition successfully. Krizhevsky et al. [ 68 ] proposed AlexNet is a deep convolutional neural network composed of 5 convolutional and 3 fully-connected layers. In AlexNet changed the sigmoid activation function to a ReLU activation function to make model training easier. K. Simonyan and A. Zisserman invented the VGG-16 [ 143 ] which has 13 convolutional and 3 fully connected layers. The Visual Geometric Group (VGG) research group released a series of CNN starting from VGG-11, VGG-13, VGG-16, and VGG-19. The main intention of the VGG group to understand how the depth of convolutional networks affects the accuracy of the models of image classification and recognition. Compared to the maximum VGG19, which has 16 convolutional layers and 3 fully connected layers, the minimum VGG11 has 8 convolutional layers and 3 fully connected layers. The last three fully connected layers are the same as the various variations of VGG. Szegedy et al. [ 151 ] proposed an image classification network consisting of 22 different layers, which is GoogleNet. The main idea behind GoogleNet is the introduction of inception layers. Each inception layer convolves the input layers partially using different filter sizes. Kaiming He et al. [ 49 ] proposed the ResNet architecture, which has 33 convolutional layers and one fully-connected layer. Many models introduced the principle of using multiple hidden layers and extremely deep neural networks, but then it was realized that such models suffered from the issue of vanishing or exploding gradients problem. For eliminating vanishing gradients’ problem skip layers (shortcut connections) are introduced. DenseNet developed by Gao et al. [ 54 ] consists of several dense blocks and transition blocks, which are placed between two adjacent dense blocks. The dense block consists of three layers of batch normalization, followed by a ReLU and a 3 × 3 convolution operation. The transition blocks are made of Batch Normalization, 1 × 1 convolution, and average Pooling. Compared to state-of-the-art handcrafted feature detectors, CNNs is an efficient technique for detecting features of an object and achieving good classification performance. There are drawbacks to CNNs, which are that unique relationships, size, perspective, and orientation of features are not taken into account. To overcome the loss of information in CNNs by pooling operation Capsule Networks (CapsNet) are used to obtain spatial information and most significant features [ 129 ]. The special type of neurons, called capsules, can detect efficiently distinct information. The capsule network consists of four main components that are matrix multiplication, Scalar weighting of the input, dynamic routing algorithm, and squashing function. Recurrent neural networks (RNN)RNN is a class of neural networks used for processing sequential information (deal with sequential data). The structure of the RNN shown in Fig. 5a is like an FFNN and the difference is that recurrent connections are introduced among hidden nodes. A generic RNN model at time t , the recurrent connection hidden unit h t receives input activation from the present data x t and the previous hidden state h t − 1 . The output y t is calculated given the hidden state h t . It can be represented using the mathematical Eqs. ( 18 ) and ( 19 ) as ![]() a Recurrent Neural Networks [ 163 ] b Long Short-Term Memory [ 163 ] c Generative Adversarial Networks [ 64 ] Here f is a non-linear activation function, w hx is the weight matrix between the input and hidden layers, w hh is the matrix of recurrent weights between the hidden layers and itself w yh is the weight matrix between the hidden and output layer, and b h and b y are biases that allow each node to learn and offset. While the RNN is a simple and efficient model, in reality, it is, unfortunately, difficult to train properly. Real-Time Recurrent Learning (RTRL) algorithm [ 173 ] and Back Propagation Through Time (BPTT) [ 170 ] methods are used to train RNN. Training with these methods frequently fails because of vanishing (multiplication of many small values) or explode (multiplication of many large values) gradient problem [ 10 , 112 ]. Hochreiter and Schmidhuber (1997) designed a new RNN model named Long Short Term Memory (LSTM) that overcome error backflow problems with the aid of a specially designed memory cell [ 52 ]. Figure Figure5b 5b shows an LSTM cell which is typically configured by three gates: input gate g t , forget gate f t and output gate o t , these gates add or remove information from the cell. An LSTM can be represented with the following Eqs. ( 20 ) to ( 25 ) Generative adversarial networks (GAN)In the field of deep learning, one of the deep generative models are Generative Adversarial Networks (GANs) introduced by Good Fellow in [ 43 ]. GANs are neural networks that can generate synthetic images that closely imitate the original images. In GAN shown in Fig. Fig.5c, 5c , there are two neural networks, namely generator, and discriminator, which are trained simultaneously. The generator G generates counterfeit data samples which aim to “fool” the discriminator D , while the discriminator attempts to correctly distinguish the true and false samples. In mathematical terms, D and G play a two player minimax game with the cost function of (26) [ 64 ]. Where x represents the original image, z is a noise vector with random numbers. p data ( x ) and p z ( z ) are probability distributions of x and z , respectively. D ( x ) represents the probability that x comes from the actual data p data ( x ) rather than the generated data. 1 − D ( G (z)) is the probability that it can be generated from p z (z). The expectation of x from the real data distribution p data is expressed by E x ~ p data x and the expectation of z sampled from noise is E z ~ P z z . The goal of the training is to maximize the loss function for the discriminator, while the training objective for the generator is to reduce the term log (1 − D ( G ( z ))).The most utilization of GAN in the field of medical image analysis is data augmentation (generating new data) and image to image translation [ 107 ]. Trustability of the Generated Data, Unstable Training, and evaluation of generated data are three major drawbacks of GAN that might hinder their acceptance in the medical community [ 183 ]. Ronneberger et al. [ 126 ] proposed CNN based U-Net architecture for segmentation in biomedical image data. The architecture consists of a contracting path (left side) to capture context and an expansive symmetric path (right side) that enables precise localization. U-Net is a generalized DLA used for quantification tasks such as cell detection and shape measurement in medical image data [ 34 ]. Software frameworksThere are several software frameworks available for implementing DLA which are regularly updated as new approaches and ideas are created. DLA encapsulates many levels of mathematical principles based on probability, linear algebra, calculus, and numerical computation. Several deep learning frameworks exist such as Theano, TensorFlow, Caffe, CNTK, Torch, Neon, pylearn, etc. [ 138 ]. Globally, Python is probably the most commonly used programming language for DL. PyTorch and Tensorflow are the most widely used libraries for research in 2019. Table Table2 2 shows the analysis of various Deep Learning Frameworks based on the core language and supported interface language. Comparison of various Deep Learning Frameworks
Use of deep learning in medical imagingX-ray image. Chest radiography is widely used in diagnosis to detect heart pathologies and lung diseases such as tuberculosis, atelectasis, consolidation, pleural effusion, pneumothorax, and hyper cardiac inflation. X-ray images are accessible, affordable, and less dose-effective compared to other imaging methods, and it is a powerful tool for mass screening [ 14 ]. Table Table3 3 presents a description of the DL methods used for X-ray image analysis. An overview of the DLA for the study of X-ray images
S. Hwang et al. [ 57 ] proposed the first deep CNN-based Tuberculosis screening system with a transfer learning technique. Rajaraman et al. [ 119 ] proposed modality-specific ensemble learning for the detection of abnormalities in chest X-rays (CXRs). These model predictions are combined using various ensemble techniques toward minimizing prediction variance. Class selective mapping of interest (CRM) is used for visualizing the abnormal regions in the CXR images. Loey et al. [ 90 ] proposed A GAN with deep transfer training for COVID-19 detection in CXR images. The GAN network was used to generate more CXR images due to the lack of the COVID-19 dataset. Waheed et al. [ 160 ] proposed a CovidGAN model based on the Auxiliary Classifier Generative Adversarial Network (ACGAN) to produce synthetic CXR images for COVID-19 detection. S. Rajaraman and S. Antani [ 120 ] introduced weakly labeled data augmentation for increasing training dataset to improve the COVID-19 detection performance in CXR images. Computerized tomography (CT)CT uses computers and rotary X-ray equipment to create cross-section images of the body. CT scans show the soft tissues, blood vessels, and bones in different parts of the body. CT is a high detection ability, reveals small lesions, and provides a more detailed assessment. CT examinations are frequently used for pulmonary nodule identification [ 93 ]. The detection of malignant pulmonary nodules is fundamental to the early diagnosis of lung cancer [ 102 , 142 ]. Table Table4 4 summarizes the latest deep learning developments in the study of CT image analysis. A review of articles that use DL techniques for the analysis of the CT image
AUC: area under ROC curve; FROC: Area under the Free-Response ROC Curve; SN: sensitivity; SP: specificity; MAE: mean absolute error LIDC: Lung Image Database Consortium; LIDC-IDRI: Lung Image Database Consortium-Image Database Resource Initiative. Li et al. 2016 [ 74 ] proposed deep CNN for the detection of three types of nodules that are semisolid, solid, and ground-glass opacity. Balagourouchetty et al. [ 5 ] proposed GoogLeNet based an ensemble FCNet classifier for The liver lesion classification. For feature extraction, basic Googlenet architecture is modified with three modifications. Masood et al. [ 95 ] proposed the multidimensional Region-based Fully Convolutional Network (mRFCN) for lung nodule detection/classification and achieved a classification accuracy of 97.91%. In lung nodule detection, the feature work is the detection of micronodules (less than 3 mm) without loss of sensitivity and accuracy. Zhao and Zeng 2019 [ 190 ] proposed DLA based on supervised MSS U-Net and 3DU-Net to automatically segment kidneys and kidney tumors from CT images. In the present pandemic situation, Fan et al. [ 35 ] and Li et al. [ 79 ] used deep learning-based techniques for COVID-19 detection from CT images. Mammograph (MG)Breast cancer is one of the world’s leading causes of death among women with cancer. MG is a reliable tool and the most common modality for early detection of breast cancer. MG is a low-dose x-ray imaging method used to visualize the breast structure for the detection of breast diseases [ 40 ]. Detection of breast cancer on mammography screening is a difficult task in image classification because the tumors constitute a small part of the actual breast image. For analyzing breast lesions from MG, three steps are involved that are detection, segmentation, and classification [ 139 ]. The automatic classification and detection of masses at an early stage in MG is still a hot subject of research. Over the past decade, DLA has shown some significant overcome in breast cancer detection and classification problem. Table Table5 5 summarizes the latest DLA developments in the study of mammogram image analysis. Summary of DLA for MG image analysis
MIAS: Mammographic Image Analysis Society dataset; DDSM: Digital Database for Screening Mammography; BI-RADS: Breast Imaging Reporting and Data System; `WBCD: Wisconsin Breast Cancer Dataset; DIB-MG: data-driven imaging biomarker in mammography. FFDMs: Full-Field Digital Mammograms; MAMMO: Man and Machine Mammography Oracle; FROC: Free response receiver operating characteristic analysis; SN: sensitivity; SP: specificity. Fonseca et al. [ 37 ] proposed a breast composition classification according to the ACR standard based on CNN for feature extraction. Wang et al. [ 161 ] proposed twelve-layer CNN to detect Breast arterial calcifications (BACs) in mammograms image for risk assessment of coronary artery disease. Ribli et al. [ 124 ] developed a CAD system based on Faster R-CNN for detection and classification of benign and malignant lesions on a mammogram image without any human involvement. Wu et al. [ 176 ] present a deep CNN trained and evaluated on over 1,000,000 mammogram images for breast cancer screening exam classification. Conant et al. [ 26 ] developed a Deep CNN based AI system to detect calcified lesions and soft- tissue in digital breast tomosynthesis (DBT) images. Kang et al. [ 62 ] introduced Fuzzy completely connected layer (FFCL) architecture, which focused primarily on fused fuzzy rules with traditional CNN for semantic BI-RADS scoring. The proposed FFCL framework achieved superior results in BI-RADS scoring for both triple and multi-class classifications. HistopathologyHistopathology is the field of study of human tissue in the sliding glass using a microscope to identify different diseases such as kidney cancer, lung cancer, breast cancer, and so on. The staining is used in histopathology for visualization and highlight a specific part of the tissue [ 45 ]. For example, Hematoxylin and Eosin (H&E) staining tissue gives a dark purple color to the nucleus and pink color to other structures. H&E stain plays a key role in the diagnosis of different pathologies, cancer diagnosis, and grading over the last century. The recent imaging modality is digital pathology Deep learning is emerging as an effective method in the analysis of histopathology images, including nucleus detection, image classification, cell segmentation, tissue segmentation, etc. [ 178 ]. Tables Tables6 6 and and7 7 summarize the latest deep learning developments in pathology. In the study of digital pathology image analysis, the latest development is the introduction of whole slide imaging (WSI). WSI allows digitizing glass slides with stained tissue sections at high resolution. Dimitriou et al. [ 30 ] reviewed challenges for the analysis of multi-gigabyte WSI images for building deep learning models. A. Serag et al. [ 135 ] discuss different public “Grand Challenges” that have innovations using DLA in computational pathology. Summary of articles using DLA for digital pathology image - Organ segmentation
Summary of articles using DLA for digital pathology image - Detection and classification of disease
NODE: Neural Ordinary Differential Equations; IoU: mean Intersection over Union coefficient Other imagesEndoscopy is the insertion of a long nonsurgical solid tube directly into the body for the visual examination of an internal organ or tissue in detail. Endoscopy is beneficial in studying several systems inside the human body, such as the gastrointestinal tract, the respiratory tract, the urinary tract, and the female reproductive tract [ 60 , 101 ]. Du et al. [ 31 ] reviewed the Applications of Deep Learning in the Analysis of Gastrointestinal Endoscopy Images. A revolutionary device for direct, painless, and non-invasive inspection of the gastrointestinal (GI) tract for detecting and diagnosing GI diseases (ulcer, bleeding) is Wireless capsule endoscopy (WCE). Soffer et al. [ 145 ] performed a systematic analysis of the existing literature on the implementation of deep learning in the WCE. The first deep learning-based framework was proposed by He et al. [ 46 ] for the detection of hookworm in WCE images. Two CNN networks integrated (edge extraction and classification of hookworm) to detect hookworm. Since tubular structures are crucial elements for hookworm detection, the edge extraction network was used for tubular region detection. Yoon et al. [ 185 ] developed a CNN model for early gastric cancer (EGC) identification and prediction of invasion depth. The depth of tumor invasion in early gastric cancer (EGC) is a significant factor in deciding the method of treatment. For the classification of endoscopic images as EGC or non-EGC, the authors employed a VGG-16 model. Nakagawa et al. [ 105 ] applied DL technique based on CNN to enhance the diagnostic assessment of oesophageal wall invasion using endoscopy. J.choi et al. [ 22 ] express the feature aspects of DL in endoscopy. Positron Emission Tomography (PET) is a nuclear imaging tool that is generally used by the injection of particular radioactive tracers to visualize molecular-level activities within tissues. T. Wang et al. [ 168 ] reviewed applications of machine learning in PET attenuation correction (PET AC) and low-count PET reconstruction. The authors discussed the advantages of deep learning over machine learning in the applications of PET images. AJ reader et al. [ 123 ] reviewed the reconstruction of PET images that can be used in deep learning either directly or as a part of traditional reconstruction methods. The primary purpose of this paper is to review numerous publications in the field of deep learning applications in medical images. Classification, detection, and segmentation are essential tasks in medical image processing [ 144 ]. For specific deep learning tasks in medical applications, the training of deep neural networks needs a lot of labeled data. But in the medical field, at least thousands of labeled data is not available. This issue is alleviated by a technique called transfer learning. Two transfer learning approaches are popular and widely applied that are fixed feature extractors and fine-tuning a pre-trained network. In the classification process, the deep learning models are used to classify images into two or more classes. In the detection process, Deep learning models have the function of identifying tumors and organs in medical images. In the segmentation task, deep learning models try to segment the region of interest in medical images for processing. SegmentationFor medical image segmentation, deep learning has been widely used, and several articles have been published documenting the progress of deep learning in the area. Segmentation of breast tissue using deep learning alone has been successfully implemented [ 104 ]. Xing et al. [ 179 ] used CNN to acquire the initial shape of the nucleus and then isolate the actual nucleus using a deformable pattern. Qu et al. [ 118 ] suggested a deep learning approach that could segment the individual nucleus and classify it as a tumor, lymphocyte, and stroma nuclei. Pinckaers and Litjens [ 115 ] show on a colon gland segmentation dataset (GlaS) that these Neural Ordinary Differential Equations (NODE) can be used within the U-Net framework to get better segmentation results. Sun 2019 [ 149 ] developed a deep learning architecture for gastric cancer segmentation that shows the advantage of utilizing multi-scale modules and specific convolution operations together. Figure Figure6 6 shows U-Net is the most usually used network for segmentation (Fig. (Fig.6 6 ). ![]() U-Net architecture for segmentation,comprising encoder (downsampling) and decoder (upsampling) sections [ 135 ] The main challenge posed by methods of detection of lesions is that they can give rise to multiple false positives while lacking a good proportion of true positive ones . For tuberculosis detection using deep learning methods applied in [ 53 , 57 , 58 , 91 , 119 ]. Pulmonary nodule detection using deep learning has been successfully applied in [ 82 , 108 , 136 , 157 ]. Shin et al. [ 141 ] discussed the effect of CNN pre-trained architectures and transfer learning on the identification of enlarged thoracoabdominal lymph nodes and the diagnosis of interstitial lung disease on CT scans, and considered transfer learning to be helpful, given the fact that natural images vary from medical images. Litjens et al. [ 85 ] introduced CNN for the identification of Prostate cancer in biopsy specimens and breast cancer metastasis identification in sentinel lymph nodes. The CNN has four convolution layers for feature extraction and three classification layers. Riddle et al. [ 124 ] proposed the Faster R-CNN model for the detection of mammography lesions and classified these lesions into benign and malignant, which finished second in the Digital Mammography DREAM Challenge. Figure Figure7 7 shows VGG architecture for detection. ![]() CNN architecture for detection [ 144 ] An object detection framework named Clustering CNN (CLU-CNNs) was proposed by Z. Li et al. [ 76 ] for medical images. CLU-CNNs used Agglomerative Nesting Clustering Filtering (ANCF) and BN-IN Net to avoid much computation cost facing medical images. Image saliency detection aims at locating the most eye-catching regions in a given scene [ 21 , 78 ]. The goal of image saliency detection is to locate a given scene in the most eye-catching regions. In different applications, it also acts as a pre-processing tool including video saliency detection [ 17 , 18 ], object recognition, and object tracking [ 20 ]. Saliency maps are a commonly used tool for determining which areas are most important to the prediction of a trained CNN on the input image [ 92 ]. NT Arun et al. [ 4 ] evaluated the performance of several popular saliency methods on the RSNA Pneumonia Detection dataset and was found that GradCAM was sensitive to the model parameters and model architecture. ClassificationIn classification tasks, deep learning techniques based on CNN have seen several advancements. The success of CNN in image classification has led researchers to investigate its usefulness as a diagnostic method for identifying and characterizing pulmonary nodules in CT images. The classification of lung nodules using deep learning [ 74 , 108 , 117 , 141 ] has also been successfully implemented. Breast parenchymal density is an important indicator of the risk of breast cancer. The DL algorithms used for density assessment can significantly reduce the burden of the radiologist. Breast density classification using DL has been successfully implemented [ 37 , 59 , 72 , 177 ]. Ionescu et al. [ 59 ] introduced a CNN-based method to predict Visual Analog Score (VAS) for breast density estimation. Figure Figure8 8 shows AlexNet architecture for classification. ![]() CNN architecture for classification [ 144 ] Alcoholism or alcohol use disorder (AUD) has effects on the brain. The structure of the brain was observed using the Neuroimaging approach. S.H.Wang et al. [ 162 ] proposed a 10-layer CNN for alcohol use disorder (AUD) problem using dropout, batch normalization, and PReLU techniques. The authors proposed a 10 layer CNN model that has obtained a sensitivity of 97.73, a specificity of 97.69, and an accuracy of 97.71. Cerebral micro-bleeding (CMB) are small chronic brain hemorrhages that can result in cognitive impairment, long-term disability, and neurologic dysfunction. Therefore, early-stage identification of CMBs for prompt treatment is essential. S. Wang et al. [ 164 ] proposed the transfer learning-based DenseNet to detect Cerebral micro-bleedings (CMBs). DenseNet based model attained an accuracy of 97.71% (Fig. (Fig.8 8 ). Limitations and challengesThe application of deep learning algorithms to medical imaging is fascinating, but many challenges are pulling down the progress. One of the limitations to the adoption of DL in medical image analysis is the inconsistency in the data itself (resolution, contrast, signal-to-noise), typically caused by procedures in clinical practice [ 113 ]. The non-standardized acquisition of medical images is another limitation in medical image analysis. The need for comprehensive medical image annotations limits the applicability of deep learning in medical image analysis. The major challenge is limited data and compared to other datasets, the sharing of medical data is incredibly complicated. Medical data privacy is both a sociological and a technological issue that needs to be discussed from both viewpoints. For building DLA a large amount of annotated data is required. Annotating medical images is another major challenge. Labeling medical images require radiologists’ domain knowledge. Therefore, it is time-consuming to annotate adequate medical data. Semi-supervised learning could be implemented to make combined use of the existing labeled data and vast unlabelled data to alleviate the issue of “limited labeled data”. Another way to resolve the issue of “data scarcity” is to develop few-shot learning algorithms using a considerably smaller amount of data. Despite the successes of DL technology, there are many restrictions and obstacles in the medical field. Whether it is possible to reduce medical costs, increase medical efficiency, and improve the satisfaction of patients using DL in the medical field cannot be adequately checked. However, in clinical trials, it is necessary to demonstrate the efficacy of deep learning methods and to develop guidelines for the medical image analysis applications of deep learning. Conclusion and future directionsMedical imaging is a place of origin of the information necessary for clinical decisions. This paper discusses the new algorithms and strategies in the area of deep learning. In this brief introduction to DLA in medical image analysis, there are two objectives. The first one is an introduction to the field of deep learning and the associated theory. The second is to provide a general overview of the medical image analysis using DLA. It began with the history of neural networks since 1940 and ended with breakthroughs in medical applications in recent DL algorithms. Several supervised and unsupervised DL algorithms are first discussed, including auto-encoders, recurrent, CNN, and restricted Boltzmann machines. Several optimization techniques and frameworks in this area include Caffe, TensorFlow, Theano, and PyTorch are discussed. After that, the most successful DL methods were reviewed in various medical image applications, including classification, detection, and segmentation. Applications of the RBM network is rarely published in the medical image analysis literature. In classification and detection, CNN-based models have achieved good results and are most commonly used. Several existing solutions to medical challenges are available. However, there are still several issues in medical image processing that need to be addressed with deep learning. Many of the current DL implementations are supervised algorithms, while deep learning is slowly moving to unsupervised and semi-supervised learning to manage real-world data without manual human labels. DLA can support clinical decisions for next-generation radiologists. DLA can automate radiologist workflow and facilitate decision-making for inexperienced radiologists. DLA is intended to aid physicians by automatically identifying and classifying lesions to provide a more precise diagnosis. DLA can help physicians to minimize medical errors and increase medical efficiency in the processing of medical image analysis. DL-based automated diagnostic results using medical images for patient treatment are widely used in the next few decades. Therefore, physicians and scientists should seek the best ways to provide better care to the patient with the help of DLA. The potential future research for medical image analysis is the designing of deep neural network architectures using deep learning. The enhancement of the design of network structures has a direct impact on medical image analysis. Manual design of DL Model structure requires rich knowledge; hence Neural Network Search will probably replace the manual design [ 73 ]. A meaningful feature research direction is also the design of various activation functions. Radiation therapy is crucial for cancer treatment. Different medical imaging modalities are playing a critical role in treatment planning. Radiomics was defined as the extraction of high throughput features from medical images [ 28 ]. In the feature, Deep-learning analysis of radionics will be a promising tool in clinical research for clinical diagnosis, drug development, and treatment selection for cancer patients . Due to limited annotated medical data, unsupervised, weakly supervised, and reinforcement learning methods are the emerging research areas in DL for medical image analysis. Overall, deep learning, a new and fast-growing field, offers various obstacles as well as opportunities and solutions for a range of medical image applications. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Contributor InformationMuralikrishna Puttagunta, Email: moc.liamg@04939ilarum . S. Ravi, Email: moc.liamg@eticivars . Information
InitiativesYou are accessing a machine-readable page. In order to be human-readable, please install an RSS reader. All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess . Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers. Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal. Original Submission Date Received: .
![]() Article Menu
Find support for a specific problem in the support section of our website. Please let us know what you think of our products and services. Visit our dedicated information section to learn more about MDPI. JSmol ViewerExplainable deep learning models in medical image analysis. ![]() 1. Introduction2. taxonomy of explainability approaches, 2.1. model specific vs. model agnostic, 2.2. global methods vs. local methods, 2.3. pre-model vs. in-model vs. post-model, 2.4. surrogate methods vs. visualization methods, 3. explainability methods—attribution based, 3.1. perturbation based methods—occlusion, 3.2. backpropagation based methods, 4. applications, 4.1. attribution based, 4.1.1. brain imaging, 4.1.2. retinal imaging, 4.1.3. breast imaging, 4.1.4. ct imaging, 4.1.5. x-ray imaging, 4.1.6. skin imaging, 4.2. non-attribution based, 4.2.1. attention based, 4.2.2. concept vectors, 4.2.3. expert knowledge, 4.2.4. similar images, 4.2.5. textual justification, 4.2.6. intrinsic explainability, 5. discussion, conflicts of interest.
Click here to enlarge figure
Share and CiteSingh, A.; Sengupta, S.; Lakshminarayanan, V. Explainable Deep Learning Models in Medical Image Analysis. J. Imaging 2020 , 6 , 52. https://doi.org/10.3390/jimaging6060052 Singh A, Sengupta S, Lakshminarayanan V. Explainable Deep Learning Models in Medical Image Analysis. Journal of Imaging . 2020; 6(6):52. https://doi.org/10.3390/jimaging6060052 Singh, Amitojdeep, Sourya Sengupta, and Vasudevan Lakshminarayanan. 2020. "Explainable Deep Learning Models in Medical Image Analysis" Journal of Imaging 6, no. 6: 52. https://doi.org/10.3390/jimaging6060052 Article MetricsArticle access statistics, further information, mdpi initiatives, follow mdpi. ![]() Subscribe to receive issue release notifications and newsletters from MDPI journals Your browser does not support javascript. Some site functionality may not work as expected.
Images Research Guide: Image AnalysisAnalyze images. Content analysis
Visual analysis
Contextual information
Image source
Technical quality
developed by Denise Hattwig , [email protected] More ResourcesNational Archives document analysis worksheets :
Visual literacy resources :
![]() Quick Links:![]() ENC 1101- Prof. Berkley
Images Databases
Image Analysis Essay Assignment Description : Write an argumentative essay based on an image. The argument should focus on the image and the message the image conveys. All evidence for your argument should come from the image. The analysis should come from you. An excellent essay will analyze the image in a way that conveys a deeper meaning than one gets from simply observing the image. Assignment Outcomes : The Image Analysis Essay should demonstrate your ability to make a logical argument that is well supported by evidence and correct use of MLA format and citation style. Assignment Requirements : Write an argumentative essay on an image. The image can not include any text. Have an arguable thesis that is well supported by every paragraph of the essay. Have a conclusion that answers the questions, “So what?” The only required source is the image itself. If necessary for your argument, you may bring in other sources that give historical era, artist’s information, or other background material that provides context for the image. All sources must be from a credible, academic source like those found in the Broward College databases. Correctly cite and document sources according to MLA format, using both in-text citations and the works cited list. Essays must be 800-1,000 words minimum. Advice : Choose an image that evokes a strong reaction in you. Look for an image that is rich, so you have plenty of material with which to work. You may also want to tie it thematically to the research you've done in the other two essays. Norman Rockwell Museum(works best in explorer).
How to Write a Visual Analysis Essay: Mastering Artful Interpretations 👌![]() Samuel Gorbold Setting itself apart from other essays, visual analysis essays necessitate a thorough examination of design elements and principles. Whether it's the mysterious smile of the 'Mona Lisa' or a striking photograph capturing a fleeting moment, visual art has the power to move us. Writing this kind of paper is like peeling back the layers of a visual story, uncovering its meanings, and unraveling its impact. Think of it as decoding the secrets a picture holds. Imagine standing in front of a famous painting, like the 'Mona Lisa' in the Louvre. Millions are drawn to it, captivated by the tale it tells. Your essay lets you share your perspective on the stories hidden in images. If you're feeling unsure about tackling this kind of essay, don't worry—check out this blog for a straightforward guide. The expert team at our essay service online will walk you through each step of writing the essay, offering tips and examples along the way. ![]() What Is a Visual Analysis EssayA visual analysis essay is a unique form of writing that delves into the interpretation of visual elements within an image, such as a painting, photograph, or advertisement. Rather than focusing solely on the subject matter, this type of essay scrutinizes the design elements and principles employed in the creation of the visual piece. Design Elements: These include fundamental components like color, size, shape, and line. By dissecting these elements, you gain a deeper understanding of how they contribute to the overall composition and convey specific messages or emotions. Design Principles: Equally important are the design principles—balance, texture, contrast, and proportion. These principles guide the arrangement and interaction of the design elements, influencing the visual impact of the entire composition. Purpose: The goal is not only to describe the visual content but also to decipher its underlying meaning and the artistic choices made by the creator. It goes beyond the surface level, encouraging the writer to explore the intentions behind the visual elements and how they communicate with the audience. Stepwise Approach: To tackle this essay, follow a stepwise approach. Begin by closely observing the image, noting each design element and principle. Then, interpret how these choices contribute to the overall message or theme. Structure your essay to guide the reader through your analysis, providing evidence and examples to support your interpretations. Tips for How to Write a Visual Analysis Essay Successfully:
Sample Visual Analysis Essay OutlineThis sample outline offers a framework for organizing a comprehensive structure for a visual analysis essay, ensuring a systematic exploration of design elements and principles. Adjustments can be made based on the specific requirements of the assignment and the characteristics of the chosen visual piece. Now, let's delve into how to start a visual analysis essay using this template. I. Visual Analysis Essay Introduction A. Briefly introduce the chosen visual piece
B. Provide a thesis statement
II. Description of the Visual Piece A. Present an overview of the visual content
III. Design Elements Analysis
B. Size and Shape
IV. Design Principles Analysis
C. Contrast
D. Proportion
V. Interpretation and Analysis A. Explore the overall meaning or message conveyed by the visual piece
VI. Conclusion A. Summarize the key points discussed in the analysis B. Restate the thesis in the context of the insights gained C. Conclude with a reflection on the overall impact and effectiveness of the visual piece. An In-Depth Guide to Analyzing Visual ArtThis in-depth guide on how to start a visual analysis essay begins with establishing a contextual foundation, progresses to a meticulous description of the painting, and culminates in a comprehensive analysis that unveils the intricate layers of meaning embedded in the artwork. As we navigate through each step of writing a visual analysis paper, the intention is not only to see the art but to understand the language it speaks and the stories it tells. Step 1: Introduction and BackgroundAnalyzing the art requires setting the stage with a solid analysis essay format - introduction and background. Begin by providing essential context about the artwork, including details about the artist, the time period, and the broader artistic movement it may belong to. This preliminary step allows the audience to grasp the significance of the painting within a larger cultural or historical framework. Step 2: Painting DescriptionThe next crucial phase in visual analysis involves a meticulous examination and description of the painting itself. Take your audience on a vivid tour through the canvas, unraveling its visual elements such as color palette, composition, shapes, and lines. Provide a comprehensive snapshot of the subject matter, capturing the essence of what the artist intended to convey. This step serves as the foundation for the subsequent in-depth analysis, offering a detailed understanding of the visual elements at play. Step 3: In-Depth AnalysisWith the groundwork laid in the introduction and the painting description, now it's time to dive into the heart of writing a visual analysis paper. Break down the visual elements and principles, exploring how they interact to convey meaning and emotion. Discuss the deliberate choices made by the artist in terms of color symbolism, compositional techniques, and the use of texture. Consider the emotional impact on the viewer and any cultural or historical influences that might be reflected in the artwork. According to our custom essay service experts, this in-depth analysis goes beyond the surface, encouraging a profound exploration of the artistic decisions that shape the overall narrative of the visual piece. How to Write a Visual Analysis Essay: A Proper StructureUsing the conventional five-paragraph essay structure proves to be a reliable approach for your essay. When examining a painting, carefully select the relevant aspects that capture your attention and analyze them in relation to your thesis. Keep it simple and adhere to the classic essay structure; it's like a reliable roadmap for your thoughts. ![]() IntroductionThe gateway to a successful visual analysis essay lies in a compelling introduction. Begin by introducing the chosen visual piece, offering essential details such as the title, artist, and date. Capture the reader's attention by providing a brief overview of the artwork's significance. Conclude the introduction with a concise thesis statement, outlining the main point of your analysis and previewing the key aspects you will explore. Crafting a robust thesis statement is pivotal in guiding your analysis. Clearly articulate the primary message or interpretation you aim to convey through your essay. Your thesis should serve as the roadmap for the reader, indicating the specific elements and principles you will analyze and how they contribute to the overall meaning of the visual piece. The body is where the intricate exploration takes place. Divide this section into coherent paragraphs, each dedicated to a specific aspect of your analysis. Focus on the chosen design elements and principles, discussing their impact on the composition and the intended message. Support your analysis with evidence from the visual piece, providing detailed descriptions and interpretations. Consider the historical or cultural context if relevant, offering a well-rounded understanding of the artwork. Conclude with a concise yet impactful conclusion. Summarize the key points discussed in the body of the essay, reinforcing the connection between design elements, principles, and the overall message. Restate your thesis in the context of the insights gained through your analysis. Leave the reader with a final thought that encapsulates the significance of the visual piece and the depth of understanding achieved through your exploration. In your essays, it's important to follow the usual citation rules to give credit to your sources. When you quote from a book, website, journal, or movie, use in-text citations according to the style your teacher prefers, like MLA or APA. At the end of your essay, create a list of all your sources on a page called 'Sources Cited' or 'References.' The good news for your analysis essays is that citing art is simpler. You don't need to stress about putting art citations in the middle of your sentences. In your introduction, just explain the artwork you're talking about—mentioning details like its name and who made it. After that, in the main part of your essay, you can mention the artwork by its name, such as 'Starry Night' by Vincent van Gogh. This way, you can keep your focus on talking about the art without getting tangled up in the details of citing it in your text. Always keep in mind that using citations correctly makes your writing look more professional. Visual Analysis Essay ExampleTo provide a clearer illustration of a good paper, let's delve into our sample essay, showcasing an exemplary art history visual analysis essay example. Unveiling the Details in Image Analysis EssayHave you ever gazed at an image and wondered about the stories it silently holds? Describing images in visual analysis papers is not just about putting what you see into words; it's about unraveling the visual tales woven within every pixel. So, how do you articulate the unspoken language of images? Let's examine below: ![]()
Final ThoughtsAs we conclude our journey, consider this: how might your newfound appreciation for the subtleties of visual description enhance your understanding of the world around you? Every image, whether captured in art or everyday life, has a story to tell. Will you be the perceptive storyteller, wielding the brush of description to illuminate the tales that images whisper? The adventure of discovery lies in your hands, and the language of images eagerly awaits your interpretation. How will you let your descriptions shape the narratives yet untold? Keep exploring, keep questioning, and let the rich tapestry of visual storytelling unfold before you. And if you're looking for a boost on how to write a thesis statement for a visual analysis essay, order an essay online , and our experts will gladly handle it for you! ![]() How Do You Make a Good Conclusion to a Visual Analysis Essay?How do you write a visual analysis essay thesis, what is a good approach to writing a visual analysis paper formally. Samuel Gorbold , a seasoned professor with over 30 years of experience, guides students across disciplines such as English, psychology, political science, and many more. Together with EssayHub, he is dedicated to enhancing student understanding and success through comprehensive academic support. ![]()
![]() Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices. III. Rhetorical Situation 3.14 Writing a Visual AnalysisTerri Pantuso While visuals such as graphs and charts can enhance an argument when used to present evidence, visuals themselves can also present an argument. Every time you encounter an ad for a certain product, stop and consider what exactly the creators of that visual want you to believe. Who is the target audience? Does the message resonate more with one group of people than another? While most advertisements or political cartoons seem to be nebulous conveyors of commerce, if you look closely you will uncover an argument presented to you, the audience. So how do you write a visual rhetorical analysis essay? First, you’ll want to begin by examining the rhetorical strengths and weaknesses of your chosen visual. If your purpose is to write an argument about the visual, such as what artworks are considered “fine art,” then your focus will be on demonstrating how the visual meets the criteria you establish in your thesis . To do this, try a method adapted from one on working with primary sources where you Observe, Reflect and Question. [1] Arguments About a VisualTake for example Vincent Van Gogh’s “The Starry Night” (Figure 3.14.1). [2] If you want to argue that the painting is a classic example of fine art, you’ll first have to define the criteria for your terms “classic” and “fine art.” Next, you’ll want to look for elements within the painting to support your claim. As you study the painting, try the following strategy for analysis: Describe/Observe ; Respond/Reflect ; Analyze/Question . ![]() Describe/Observe First, describe what you see in the visual quite literally. Begin by focusing on colors, shading, shapes, and font if you’re analyzing an advertisement. In the case of “The Starry Night,” you might begin by describing the various shades of blue, the black figures that resemble buildings, or shades of yellow that cast light. As you describe them, observe the texture, shape, contour, etc. about each element. For this initial stage, you are simply describing what you observe. Do not look deeper at this point. Respond/Reflect Next, respond to the ways in which the things you described have impacted you as a viewer. What emotions are evoked from the various shadings and colors used in the ad or painting? If there are words present, what does the artist’s font selection do for the image? This is where you’ll want to look for appeals to ethos and pathos. In the case of “The Starry Night,” how does the use of black create depth and for what reason? Reflect on how the intermittent use of shades of blue impacts the overall impression of the painting. At this stage, you are questioning the elements used so that you may move to the final stage of analysis. Analyze/Question After you’ve described and reflected upon the various elements of the visual, question what you have noted and decide if there is an argument presented by the visual. This assessment should be made based upon what you’ve observed and reflected upon in terms of the content of the image alone. Ask yourself if the arrangement of each item in the visual impacts the message? Could there be something more the artist wants you to gather from this visual besides the obvious? Question the criteria you established in your thesis and introduction to see if it holds up throughout your analysis. Now you are ready to begin writing a visual rhetorical analysis of your selected image. Arguments Presented By/Within a VisualIn the summer of 2015, the Bureau of Land Management ran an ad campaign with the #mypubliclandsroadtrip tag. The goal of this campaign was to “explore the diverse landscapes and resources on [our] public lands, from the best camping sites to cool rock formations to ghost towns.” [3] The photo below (Figure 3.14.2) [4] is of the King Range National Conservation Area (NCA) in California which was the first NCA designated by Congress in 1970. [5] Returning to the Observe, Reflect and Question method, analysis of this photo might focus on what the image presents overall as well as arguments embedded within the image. ![]() As with “The Starry Night”, you might start by describing what you see in the visual quite literally. Begin by focusing on colors, shading, shapes, and font. With the Bureau of Land Management ad, you could begin by describing the multiple shades of blues and browns in the landscape. Next, you might focus on the contrasts between the sea and land, and the sea and sky. Making note of textures presented by various rock formations and the sand would add depth to your analysis. You might also note the solitary person walking along the shoreline. Finally, you would want to observe the placement of the sun in the sky at the horizon. Next, respond to the ways in which the things you described have impacted you as a viewer. What emotions are evoked from the various shadings and colors used in the photo? How does the artist’s font selection impact the image? Through these observations, you will be able to identify appeals to ethos and pathos. In the Bureau of Land Management ad, you might respond to the various shades of blue as seemingly unreal yet reflect on their natural beauty as a way of creating an inviting tone. Next, reflect on the textures presented by the rocks and sand as a way of adding texture to the image. This texture further contributes to the welcoming mood of the image. By focusing on the solitary person in the image, you might respond that this landscape offers a welcoming place to reflect on life decisions or to simply enjoy the surroundings. Finally, you might respond to the placement of the sun as being either sunrise or sunset. After describing and reflecting on the various elements of the visual, question what you have noted and decide if there is an argument presented by the image. Again, this assessment should be made based upon what you’ve observed and reflected upon in terms of the content of the image alone. Using the Bureau of Land Management ad, you might ask if the font choice was intentional to replicate the rolling waves, or if the framing around the edges of the image is done intentionally to tie back into the Bureau logo in the upper right-hand corner. Once you’ve moved beyond the surface image, question the criteria you established in your thesis and introduction to see if it holds up throughout your analysis. Now you are ready to begin writing a visual rhetorical analysis of an argument presented by/within your selected image.
To resound, reverberate, or vibrate; to produce a positive emotional response about a subject. Cloudy, hazy, or murky; ambiguous, imprecise, or vague. A statement, usually one sentence, that summarizes an argument that will later be explained, expanded upon, and developed in a longer essay or research paper. In undergraduate writing, a thesis statement is often found in the introductory paragraph of an essay. The plural of thesis is theses . Ceasing and beginning or stopping and starting in a recurrent, cyclical or periodic pattern. 3.14 Writing a Visual Analysis Copyright © 2023 by Terri Pantuso is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted. ![]() Quantum Probes for Far-field thermal Sensing and ImagingQuantum-enhanced approaches enable high-resolution imaging and sensing with signal-to-noise ratios beyond classical limits. However, operating in the quantum regime is highly susceptible to environmental influences and experimental conditions. Implementing these techniques necessitates highly controlled environments or intricate preparation methods, which can restrict their practical applications. This thesis explores the practical applications of quantum sensing, focusing on thermal sensing with bright quantum sources in biological and electronic contexts. Additionally, I discuss the development of a multimode source for quantum imaging applications and an on-chip atomic interface for scalable light-atom interactions. I built all the experimental setups from the beginning; a microscope setup for nanodiamond-based thermal sensing inside living cells, a four-wave mixing setup using a Rb cell for thermal imaging of microelectronics and multimode source, and a vacuum chamber for on-chip atomic interface. Quantum sensing can be realized using atomic spins or optical photons possessing quantum information. Among these, color centers inside diamonds stand out as robust quantum spin defects (effective atomic spins), maintaining their quantum properties even in ambient conditions. In this thesis, I studied the role of an ensemble of color centers inside nanodiamonds as a probe of temperature in a living cell. Our approach involves incubating nanodiamonds in endothelial culture cells to achieve sub-kelvin sensitivity in temperature measurement. The results reveal a temperature error of 0.38 K and a sensitivity of 3.46 K/sqrt(Hz) after 83 seconds of measurement. Furthermore, I discuss the constraints of nanodiamond temperature sensing in living cells, propose strategies to surmount these limitations, and explore potential applications arising from such measurements. Another ubiquitous quantum probe is light with quantum properties. Photons, the particles of light, can carry quantum correlations and have minimal interactions with each other and, to some extent, the environment. This capability theoretically allows for quantum-enhanced imaging or sensing of sample’s properties. In this thesis, I report on the demonstration of quantum-enhanced temperature sensing in microelectronics using bright quantum optical signals. I discuss the first demonstration of quantum thermal imaging used to identify hot spots and analyze heat transport in electronic systems. To achieve this, we employed lock-in detection of thermoreflectivity, enabling us to measure temperature changes in a micro-wire induced by an electric current with an accuracy better than 0.04 degrees, averaged over 0.1 seconds. Our results demonstrate a nearly 50 % improvement in accuracy compared to using classical light at the same power, marking the first demonstration of below-shot-noise thermoreflectivity sensing. We applied this imaging technique to both aluminum and niobium-based circuits, achieving a thermal resolution of 42 mK during imaging. We scanned a 48 × 48 μm area with 3-4 dB squeezing compared to classical measurements. Based on these results, we infer possibility of generating a 256×256 pixel image with a temperature sensitivity of 42 mK within 10 minutes. This quantum thermoreflective imaging technique offers a more accurate method for detecting electronic hot spots and assessing heat distribution, and it may provide insights into the fundamental properties of electronic materials and superconductors. In transitioning from single-mode to multimode quantum imaging, I conducted further research on techniques aimed at generating multimode quantum light. This involved an in-depth analysis of the correlation characteristics essential for utilizing quantum light sources in imaging applications. To achieve the desired multimode correlation regime, I developed a system centered on warm Rubidium vapor with nonlinear gain and feedback processes. The dynamics of optical nonlinearity in the presence of gain and feedback can lead to complexity, even chaos, in certain scenarios. Instabilities in temporal, spectral, spatial, or polarization aspects of optical fields may arise from chaotic responses within an optical x (2) or x (3) nonlinear medium positioned between two cavity mirrors or preceding a single feedback mirror. However, the complex mode dynamics, high-order correlations, and transitions to instability in such systems remain insufficiently understood. In this study, we focused on a x (3) medium featuring an amplified four-wave mixing process, investigating noise and correlations among multiple optical modes. While individual modes displayed intensity fluctuations, we observed a reduction in relative intensity noise approaching the standard quantum limit, constrained by the camera speed. Remarkably, we recorded a relative noise reduction exceeding 20 dB and detected fourth-order intensity correlations among four spatial modes. Moreover, this process demonstrated the capability to generate over 100 distinct correlated quadruple modes. In addition to conducting multimode analysis to develop a scalable imaging system, I have explored methodologies aimed at miniaturizing light-atom interactions on a chip for the scalable generation of quantum correlations. While warm atomic vapors have been utilized for generating or storing quantum correlations, they are plagued by challenges such as inhomogeneous broadening and low coherence time. Enhancing control over the velocity, location, and density of atomic gases could significantly improve light-atom interaction. Although laser cooling is a common technique for cooling and trapping atoms in a vacuum, its implementation in large-scale systems poses substantial challenges. As an alternative, I focused on developing an on-chip system integrated with atomic vapor controlled by surface acoustic waves (SAWs). Surface acoustic waves are induced by an RF signal along the surface of a piezoelectric material and have already been proven to be effective for manipulating particles within microfluidic channels. Expanding upon this concept, I investigated the feasibility of employing a similar approach to manipulate atoms near the surface of a photonic circuit. The interaction between SAWs and warm atomic vapor is expected as a mechanism for controlling atomic gases in proximity to photonic chips for quantum applications. Through theoretical analysis spanning molecular dynamics and fluid dynamics regimes, I identified the experimental conditions necessary to observe acoustic wave behavior in atomic vapor. To validate this theory, I constructed an experiment comprising a vacuum chamber housing Rb atoms and a lithium niobate chip featuring interdigital transducers for launching SAWs. However, preliminary experimental results yielded no significant signals from SAW-atom interactions. Subsequent analysis revealed that observing such interactions requires sensitivity and signal-to-noise ratio (SNR) beyond the capabilities of the current setup. Multiple modifications, including increasing buffer gas pressure and mitigating RF cross-talk, are essential for conclusively observing and controlling these interactions. STTR Program (Contract No. FA864920P0542) awarded by the United States Air Force Research LabKirk grant awarded by purdue’s birck nanotechnology center, career: active nonlinear photonics with applications in quantum networks. Directorate for Engineering DoD-NDEP Award number HQ0034-21-1-0014Degree type.
Campus location
Advisor/Supervisor/Committee ChairAdditional committee member 2, additional committee member 3, additional committee member 4, usage metrics.
![]() ![]() |
IMAGES
VIDEO
COMMENTS
This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I Solorzano, L., Partel, G., Wählby, C. "TissUUmaps: Interactive ... digital image analysis along with data analysis and data visualization. Understanding how diseases work and how to stop them is key and is today
dependencies among image patches and enhance the learning diversity. Also, the architecture uses Monte Carlo (MC) dropout for measuring the uncertainty of image predictions and deciding whether an input image is accurate based on the gener-ated uncertainty score. The third contribution of the thesis introduces a novel model
Thesis directed by Associate Professor Catalin Grigoras ABSTRACT Due to the widespread availability of image processing software, it has become easier to produce visually convincing image forgeries. To overcome this issue, there has been considerable work in the digital image analysis field to determine forgeries when no visual indications exist.
Image analysis is used as a fundamental tool for recognizing, differentiating, and. quantifying diverse types of images, including grayscale and color images, multi-. spectral images for a few ...
Medical image analysis often requires developing elaborate algorithms that are im-plemented as computational pipelines. A growing number of large medical imaging studies necessitate development of robust and exible pipelines. In this thesis, we present contributions of two kinds: (1) an open source framework for building pipelines
Medical imaging plays a significant role in different clinical applications such as medical procedures used for early detection, monitoring, diagnosis, and treatment evaluation of various medical conditions. Basicsof the principles and implementations of artificial neural networks and deep learning are essential for understanding medical image analysis in computer vision. Deep Learning ...
This thesis develops deep learning models and techniques for medical image analysis, reconstruction and synthesis. In medical image analysis, we concentrate on understanding the content of the medical images and giving guidance to medical practitioners. In particular, we investigate deep learning ways to address classification, detection ...
Explanations justify the development and adoption of algorithmic solutions for prediction problems in medical image analysis. This thesis introduces two guiding principles for creating and exploiting explanations of deep networks and medical image data. The first guiding principle is to use explanations to expose inefficiencies in the design of models and image datasets. The second principle ...
With the advent of affordable, powerful computing hardware and parallel developments in computer vision, MRI image analysis has also witnessed unprecedented growth. Due to the interdisciplinary and complex nature of this subfield, it is important to survey the current landscape and examine the current approaches for analysis and trend trends ...
This Master Thesis provides a summary overview on the use of current deep learning-based object detection methods for the analysis of medical images, in particular from microscopic tissue sections, and aims at making the results reproducible. This Master Thesis provides a summary overview on the use of current deep learning-based object detection methods for the analysis of medical images, in ...
Introduction. Medical image analysis is a critical component of modern healthcare, allowing physicians to diagnose, monitor, and treat a wide range of medical conditions. However, the ...
This paper discusses the new algorithms and strategies in the area of deep learning. In this brief introduction to DLA in medical image analysis, there are two objectives. The first one is an introduction to the field of deep learning and the associated theory. The second is to provide a general overview of the medical image analysis using DLA.
Recent advances in machine learning has enabled notable progress in many aspects of image analysis. In this thesis, we present three applications to exemplify such advancement, including shadow detection, satellite image forensics and eating scene segmentation and clustering. Shadow detection and removal are of great interest to the image processing and image forensics community. In this ...
ISSN 1651-6214 ISBN 978-91-554-9567-1. urn:nbn:se:uu:diva-283846. Dissertation presented at Uppsala University to be publicly examined in 2446, ITC, Lägerhyddsvägen 2, Hus 2, Uppsala, Thursday, 9 June 2016 at 10:15 for the degree of Doctor of Philosophy. The examination will be conducted in English.
The analysis is broken down into Section 4.1 and Section 4.2 depending upon the use of attributions or other methods of explainability. The evolution, current trends, and some future possibilities of the explainable deep learning models in medical image analysis are summarized in Section 5.
Visual analysis is an important step in evaluating an image and understanding its meaning. It is also important to consider textual information provided with the image, the image source and original context of the image, and the technical quality of the image. The following questions can help guide your analysis and evaluation. Content analysis.
3D Slicer (Slicer) 1 is a free, open source software application for medical image analysis that is actively used in neurosurgical planning, guidance, and follow-up. Started as a master's thesis in 1995, it is developed today mostly by professional engineers in close collaboration with algorithm developers and application domain scientists.
Sample Outline of Visual Analysis Essay. Introduction: Tell the basic facts about the art (see citing your image). Get the reader interested in the image by using one of the following methods: Describe the image vividly so the reader can see it. Tell about how the image was created. Explain the purpose of the artist.
Assignment Outcomes: The Image Analysis Essay should demonstrate your ability to make a logical argument that is well supported by evidence and correct use of MLA format and citation style. Assignment Requirements: Write an argumentative essay on an image. The image can not include any text. Have an arguable thesis that is well supported by ...
Methodology: Based on Scopus, this paper surveys the development trajectory of destination image using a literature review with the solo keyword "destination image" from 1990 to 2019.
digitalcommons.unl.edu
Step 1: Introduction and Background. Analyzing the art requires setting the stage with a solid analysis essay format - introduction and background. Begin by providing essential context about the artwork, including details about the artist, the time period, and the broader artistic movement it may belong to.
Question the criteria you established in your thesis and introduction to see if it holds up throughout your analysis. Now you are ready to begin writing a visual rhetorical analysis of your selected image. Arguments Presented By/Within a Visual. In the summer of 2015, the Bureau of Land Management ran an ad campaign with the # ...
Quantum-enhanced approaches enable high-resolution imaging and sensing with signal-to-noise ratios beyond classical limits. However, operating in the quantum regime is highly susceptible to environmental influences and experimental conditions. Implementing these techniques necessitates highly controlled environments or intricate preparation methods, which can restrict their practical ...