Altmetrics
Downloads
314
Views
186
Comments
0
This version is not peer-reviewed
Artificial Intelligence Models, Tools and Applications
Submitted:
05 August 2024
Posted:
06 August 2024
You are already at the latest version
ACDM | Autoregressive Cascade Multiscale Diffusion |
ACDMSR | Accelerated Conditional Diffusion Model for Image Super-Resolution |
AIoT | Artificial Intelligence of Things |
BerDiff | Bernoulli Diffusion Model |
BLIP | Bootstrapped Language-Image Pretraining |
BuilDiff | Building Diffusion |
CDDM | Conditional Denoising Diffusion Model |
CDMs | Classifier-guided Diffusion Models |
CIDEr | Consensus-based Image Description Evaluation |
CLE | Controllable Light Enhancement Diffusion |
CLIP | Contrastive Language-Image Pre-training |
CLIPSonic | Controlled Language-Image Pretraining Sonic |
CMD | Conditional Diffusion Models |
DDIM | Denoising Diffusion Implicit Models |
DDPMs | Denoising Diffusion Probabilistic Models |
DeScoD-ECG | Denoising Score-based Diffusion for Electrocardiogram |
DiffWave | Diffusion Waveform |
DiffDreamer | Diffusion Dreamer |
DiffLL | Diffusion Model for Low-Light |
DMs | Diffusion Models |
DMSEtext | Diffusion Model for Speech Enhancement text |
DisC-Diff | Discriminator Consistency Diffusion |
DSBID | Diffusion-based Stochastic Blind Image Deblurring |
DSC | Dice Similarity Coefficient |
DICDNet | Deep Interpretable Convolutional Dictionary Networks |
EquiDiff | Equivariant Diffusion |
FID | Frechet Inception Distance |
GED | Generalized Energy Distance |
GNNs | Graph Neural Networks |
HiFi-Diff | Hierarchical Feature Conditional Diffusion |
HQS | Hybrid Quality Score |
ID3PM | Identity Denoising Diffusion Probabilistic Model |
IS | Inception Score |
KID | Kernel Inception Distance |
LEs | Learnable Unauthorized Examples |
LDMs | Latent Diffusion Models |
LPIPS | Learned Perceptual Image Patch Similarity |
MAAT | Metric Anomaly Anticipation |
MAD | Mean Absolute Deviation |
MAE | Mean Absolute Error |
MatFusion | Material Fusion |
MOS | Mean Opinion Score |
MPJPE | Mean Per-Joint Position Error |
NILM | Non-Intrusive Load Monitoring |
NASDM | Nuclei-Aware Semantic Diffusion Model |
NIQE | Naturalness Image Quality Evaluator |
OMOMO | Object Motion Guided Human Motion Synthesis |
PatchDDM | Patch-based Diffusion Denoising Model |
PRD | Percent Root Mean Square Difference |
PSNR | Peak Signal-to-Noise Ratio |
RGB-D-Fusion | Red-Green-Blue Depth Fusion |
RNNs | Recurrent Neural Networks |
RMSE | Root Mean Square Error |
SAG | Self-Attention Guidance |
SBDMs | Score-Based Diffusion Models |
SDEs | Stochastic Differential Equations |
SDG | Semantic Diffusion Guidance |
SegDiff | Segmentation Diffusion |
SketchFFusion | Sketch-Driven Fusion |
SMOS | Style Similarity MOS |
SSIM | Structural Similarity Index Measure |
SDEs | Stochastic Differential Equations |
TFDPM | Temporal and Feature Pattern-based Diffusion Probabilistic Model |
VDMs | Variational Diffusion Models |
VTF-GAN | Visible-to-Thermal Facial GAN |
Year | Proposed Algorithm | Used Datasets | Applications |
---|---|---|---|
2020 | DDPMs [2] | CIFAR-10 [16], LSUN [17], CelebA [18] | Image generation |
2020 | Score-Based DMs [19] | CIFAR-10, CelebA, LSUN | Image generation |
2020 | SDEs [9] | CIFAR-10, CelebA, LSUN, FFHQ | Image generation |
2021 | Classifier-guided DMs (CDMs) [20] | ImageNet, LSUN, CIFAR-10 | Image generation |
2021 | Variational Diffusion Models (VDMs) [21] | CIFAR-10, CelebA, LSUN | Image generation |
2021 | Improved DDPMs [22] | CIFAR-10, CelebA, LSUN | Image generation |
2021 | Diffusion Waveform (DiffWave) [23] | LJSpeech, VCTK | Audio generation |
2021 | Segmentation Diffusion (SegDiff) [24] | Cityscapes, Pascal VOC | Image segmentation |
2021 | Generative LIkelihood-based DEcompression (GLIDE) [25] | MS-COCO, ImageNet | Image reconstruction |
2022 | Latent Diffusion Models (LDMs) [4] | LAION-400M, CelebA-HQ | Image generation, Text-to-image |
2022 | Image Transformers [26] | ImageNet, COCO | Image generation |
2022 | Multiscale Diffusion Models [27] | ImageNet, CIFAR-10, LSUN | Image generation |
2022 | Video-DDPM [28] | Kinetics-600, UCF-101 | Video generation |
2023 | Adaptive Diffusion Models [29] | CIFAR-10, CelebA, FFHQ | Image generation |
Ref. | Algorithms | Applications | Dataset | Evaluations | Limitations |
---|---|---|---|---|---|
[35] | DT for Content Shielding in Stable DMs | Content shielding in text-to-image Diffusion Models using DT to prevent generation of unwanted concepts | COCO 30K | FID post-DT: 13.04, IS post-DT: 38.25 | DT may limit model’s flexibility for diverse contexts. |
[36] | TFDPM | Detecting cyber-physical system attacks using TFDPM with Graph Attention Networks for channel data correlation | PUMP, SWAT, WADI | Pr: 0.96, Re: 0.91, F1: 0.91 | Struggles with discrete signal modeling, needs SDE frameworks for better generative capabilities. |
[37] | Maat: Anomaly Anticipation for Cloud Services | Anomaly anticipation using a two-stage Diffusion Model for cloud services, integrating metric forecasting and anomaly detection | AIOps18, Hades, Yahoo!S5 | Pr: 0.97, Re: 0.91, F1: 0.91 | Limited generalizability and adaptability post-training. |
[31] | Diffusion-based Stochastic Blind Image Deblurring | Blind image deblurring using Diffusion Models for multiple reconstructions | GoPro | FID: 4.04, KID: 0.98, LPIPS: 0.06, PSNR: 31.66, SSIM: 0.95 | High computational demands, needs optimized sampling or network architecture. |
[32] | Come-Closer-Diffuse-Faster | Accelerating CMDs for applications like super-resolution and MRI reconstruction | FFHQ, AFHQ, fastMRI | FID varies; PSNR: 33.41 (best MRI case) | Optimal starting values (t0) vary, needs automation for practical deployment. |
[33] | Selective Diffusion Distillation | Image manipulation balancing fidelity and editability without excessive noise trade-offs | N/A | FID: 6.07, CLIP Similarity: 0.23 | Reliance on correct timestep selection for semantic guidance may limit flexibility. |
[34] | Object Motion Guided Human Motion Synthesis (OMOMO) | Full-body human motion synthesis guided by object motion using a conditional Diffusion Model | Custom dataset | MPJPE: 12.42, Troot: 18.44, Cprec: 0.82, Crec: 0.70, F1: 0.72 | Limited representation of dexterous hand movements and intermittent contact scenarios. |
[2] | DDPMs | Image generation using DDPMs | CIFAR-10, LSUN, CelebA | FID: 3.17, IS: 9.46 | High computational cost and slow sampling speed. |
[19] | Improved Techniques for Training Score-Based Generative Models | Improved image generation using score-based models | CIFAR-10, CelebA, LSUN | FID: 2.87, IS: 9.68 | Training complexity and large computational resources required. |
[9] | Score-Based Generative Modeling through SDEs | Image generation using SDEs for better quality | CIFAR-10, CelebA, LSUN, FFHQ | FID: 2.92, IS: 9.62 | SDE-based models can be computationally expensive. |
[20] | Diffusion Models Beat Generative Adversarial Networks (GANs) on Image Synthesis | Image synthesis outperforming GANs using Diffusion Models | ImageNet, LSUN, CIFAR-10 | FID: 2.97, IS: 9.57 | Large model size and slow training times. |
[21] | VDMs | Image generation using variational Diffusion Models | CIFAR-10, CelebA, LSUN | FID: 3.12, IS: 9.53 | Complex model design and high computational cost. |
[22] | Improved Denoising Diffusion Probabilistic Models | Enhanced DDPMs for better image quality | CIFAR-10, CelebA, LSUN | FID: 3.05, IS: 9.50 | Requires extensive hyperparameter tuning. |
[4] | High-Resolution Image Synthesis with Latent Diffusion Models (LDMs) | High-resolution image and text-to-image synthesis | LAION-400M, CelebA-HQ | FID: 1.97, IS: 10.32 | High memory usage and computational cost. |
[26] | Image Transformers with Autoregressive Models for High-Fidelity Image Synthesis | High-fidelity image synthesis using transformers | ImageNet, COCO | FID: 2.30, IS: 9.95 | Transformer models are computationally intensive. |
[27] | Cascaded Diffusion Models for High-Fidelity Image Generation | High-fidelity image generation using multiscale Diffusion Models | ImageNet, CIFAR-10, LSUN | FID: 2.15, IS: 9.88 | Cascaded models require extensive computational resources. |
[29] | Optimizing Diffusion Models for Image Synthesis | Adaptive Diffusion Models for better image synthesis | CIFAR-10, CelebA, FFHQ | FID: 1.89, IS: 10.45 | Adaptive models can be complex and resource-intensive. |
[23] | DiffWave | Audio generation using Diffusion Models | LJSpeech, VCTK | FID: 3.67, PSNR: 34.10 | High computational cost and slow sampling speed. |
[28] | Video Diffusion Models (Video-DDPM) | Video generation using Diffusion Models | Kinetics-600, UCF-101 | FID: 3.85, SSIM: 0.92 | High computational demands and slow training times. |
[24] | SegDiff | Image segmentation using Diffusion Models | Cityscapes, Pascal VOC | FID: 3.50, SSIM: 0.87 | Limited scalability to larger datasets. |
[25] | GLIDE | Photorealistic image generation and editing with text guidance | MS-COCO, ImageNet | FID: 3.21, IS: 9.67 | Text-guided models require extensive training data. |
Ref. | Algorithms | Applications | Dataset | Evaluations | Limitations |
---|---|---|---|---|---|
[40] | SAG in DDMs | Image generation improvement | ImageNet, LSUN | FID: 2.58, sFID: 4.35 | Needs broader application integration. |
[41] | Learnable State-Estimator-Based Diffusion Model | Inverse imaging problems (inpainting, deblurring, JPEG restoration) | FFHQ, LSUN-Bedroom | PSNR: 27.98, LPIPS: 0.09, FID: 25.45 | Limited generative capabilities, needs domain adaptation. |
[51] | Score Dynamics (SD) | Accelerating molecular dynamics simulations | Alanine dipeptide, short alkanes in aqueous solution | Wall-clock speedup up to 180X | Requires large datasets; generalization challenges. |
[42] | CMDs for Speech Enhancement (DMSEtext) | Speech enhancement for TTS model training | Real-world recordings | MOS Cleanliness: 4.32 ± 0.08, Overall Impression: 4.17 ± 0.06, PER: 17.6% | Needs text conditions for best results. |
[52] | Conditional Diffusion Model for HDR Reconstruction | HDR image reconstruction from LDR images | Benchmark datasets for HDR imaging | PSNR-µ: 44.11, PSNR-L: 41.73, SSIM-µ: 0.99, SSIM-L: 0.99, HDR-VDP-2: 65.52, LPIPS: 0.01, FID: 6.20 | Slow inference speed; improve distortion metrics. |
[44] | CLIPSonic | Text-to-audio synthesis using unlabeled videos | VGGSound, MUSIC | FAD: CLIPSonic-ZS on MUSIC 19.30, CLIPSonic-PD on MUSIC 13.51; CLAP score: CLIPSonic-ZS on MUSIC 0.28, CLIPSonic-PD on MUSIC 0.25 | Performance drop in zero-shot modality transfer. |
[46] | SDG | Fine-grained image synthesis with text and image guidance | FFHQ, LSUN | FID: 14.37 (image guidance on FFHQ), 28.38 (text guidance on FFHQ); Top-5 Retrieval Accuracy: 0.742 (image guidance), 0.878 (text guidance) | Potential misuse in image generation. |
[47] | DiffDreamer: Conditional Diffusion Model for Scene Extrapolation | Unsupervised 3D scene extrapolation | LHQ, ACID | Achieves low FID scores across various step intervals, e.g., 20 steps: FID: 34.49; 100 steps: FID: 51.00 on LHQ | Real-time synthesis not feasible; limited content diversity. |
[48] | Diffusart: Conditional Diffusion Probabilistic Models for Line Art Colorization | Interactive line art colorization with user guidance | Danbooru2021 | SSIM: 0.81, LPIPS: 0.14, FID: 6.15 | Bias towards white; limits color diversity. |
[49] | SketchFFusion: A Conditional Diffusion Model for Sketch-guided Image Editing | Sketch-guided image editing for local fine-tuning using generated sketches | CelebA-HQ, COCO-AIGC | FID: 9.07, PSNR: 26.74, SSIM: 0.88 | Supports only binary sketches; limits color editing. |
[50] | Semantic-Conditional Diffusion Networks for Image Captioning | Advanced text-to-image captioning using semantic-driven Diffusion Models | COCO | B@1: 79.0, B@2: 63.4, B@3: 49.1, B@4: 37.3, CIDEr: 131.6 | Lacks real-time processing; needs optimization. |
[38] | EquiDiff: Deep Generative Model for Vehicle Trajectory Prediction | Trajectory prediction for autonomous vehicles using a deep generative model with SO(2)-equivariant transformer | NGSIM | RMSE for 5s trajectory prediction shows competitive results | Effective short-term; higher errors in long-term predictions. |
[53] | Efficient MRI Synthesis with Conditional Diffusion Probabilistic Models | Efficient synthesis of 3D brain MRIs using a conditional Diffusion Model | ADNI-1, UCSF, SRI International | MS-SSIM: 78.6% | Focused on T1-weighted MRIs; explore more types. |
Ref. | Algorithms | Applications | Dataset | Evaluations | Limitations |
---|---|---|---|---|---|
[54] | Autoregressive conditional Diffusion-based Models (ACDM) | NVS from a single image | RealEstate10K, MP3D, CLEVR | LPIPS: 0.33, PSNR: 15.51 on RealEstate10K; LPIPS: 0.50, PSNR: 14.83 on MP3D; FID: 26.76 on RealEstate10K; FID: 73.16 on MP3D | Requires complex geometric consistency and heavy computational resources for extrapolating views. |
[55] | CLE Diffusion | Low light enhancement | LOL, MIT-Adobe FiveK | PSNR: 29.81, SSIM: 0.97 on MIT-Adobe FiveK; PSNR: 25.51, SSIM: 0.89, LPIPS: 0.16, LI-LPIPS: 0.18 on LOL | Slow inference speed and limited capability in handling complex lighting and blurry scenes. |
[56] | Diffusion-based inpainting model for 3D facial BRDF reconstruction | Facial texture completion and reflectance reconstruction from a single image | MultiPIE | PSNR: 26.00, SSIM: 0.93 at 0° angle on MultiPIE; Sampling time: 17 sec | Limited by input image quality and potential under-representation of ethnic diversity in training data. |
[52] | CMDs | HDR reconstruction from multi-exposed LDR images | Benchmark datasets for HDR imaging | PSNR-µ: 22.25, SSIM-µ: 0.84, LPIPS: 0.03 on Hu’s dataset | Slow inference speed due to iterative denoising process. |
[57] | RGB-D-Fusion diffusion probabilistic models | Depth map generation and super-resolution from monocular images | Custom dataset with ≈ 25,000 RGB-D images from 3D models of people | MSE: 1.48, IoU: 0.99, VLB: 16.95 with UNet3+ model | High computational resources required for training and sampling. |
[58] | Disentangled CMDs (DisC-Diff) | Multi-contrast MRI super-resolution | IXI dataset and clinical brain MRI dataset | PSNR: 37.77 dB, SSIM: 0.99 on 2× scale in clinical dataset | Requires accurate condition sampling for model precision. |
Ref. | Algorithms | Applications | Dataset | Evaluations | Limitations |
---|---|---|---|---|---|
[59] | DocDiff conditional Diffusion Model | Document image enhancement including deblurring, denoising, and watermark removal | Document Deblurring Dataset | MANIQA: 0.72, MUSIQ: 50.62, DISTS: 0.06, LPIPS: 0.03, PSNR: 23.28, SSIM: 0.95 | May lose high-frequency information, leading to distorted text edges. Relies on the quality of low-frequency content recovery by the Coarse Predictor module. |
[60] | VTF-GAN | Thermal facial imagery generation for telemedicine | Eurecom and Devcom datasets | FID: 47.35, DBCNN: 34.34%, MSE: 0.88, SPEC: -1.1% for VTF-GAN with Fourier Transform-Guided (FFT-G) | Generation constrained to static environments; performance untested in dynamic, variable conditions affecting thermal emission. |
[61] | ID3PM | Inversion of pre-trained face recognition models, generating identity-preserving face images | LFW, AgeDB-30, CFP-FP datasets | LFW: 99.20%, AgeDB-30: 94.53%, CFP-FP: 96.13% with ID3PM using InsightFace embeddings | Generation quality may vary with the diversity of embeddings; control over the generation process might need fine-tuning for specific applications. |
[62] | FreeDoM | Conditional image and latent code generation | Multiple datasets for segmentation maps, sketches, texts | Distance: 1696.1, FID: 53.08 for segmentation maps with FreeDoM | High sampling time cost; struggles with fine-grained control in large data domains; may produce poor results with conflicting conditions. |
Ref. | Algorithms | Applications | Dataset | Evaluations | Limitations |
[65] | BerDiff: Conditional Bernoulli Diffusion for Medical Image Segmentation | Advanced medical image segmentation using Bernoulli diffusion to produce accurate and diverse segmentation masks | LIDC-IDRI, BRATS 2021 | Achieves state-of-the-art performance with metrics on LIDC-IDRI - GED: 0.24, HM-IoU: 0.60, and on BRATS 2021 - Dice: 89.7. | Focuses only on binary segmentation and requires significant time for iterative sampling. |
[66] | NASDM: Nuclei-Aware Semantic Tissue Generation Framework | Generative modeling of histopathological images conditioned on semantic instance masks | Colon dataset | FID: 15.7, IS: 2.7, indicating high-quality and semantically accurate synthetic image generation. | Further development is required for varied histopathological settings and end-to-end tissue generation that includes mask synthesis. |
[67] | Hierarchical Feature Conditional Diffusion (HiFi-Diff) | MR image super-resolution with arbitrary reduction of inter-slice spacing | HCP-1200 dataset | PSNR: 39.50±2.29, SSIM: 0.99 for ×4 SR task | Slow sampling speed, suggesting potential improvements through faster algorithms or knowledge distillation. |
Ref. | Algorithms | Applications | Dataset | Evaluations | Limitations |
---|---|---|---|---|---|
[72] | Diffusion Classifier using text-to-image Diffusion Models | Zero-shot classification using generative models | Standard image classification benchmarks (e.g., ImageNet, CIFAR10) | Zero-shot classification accuracy on ImageNet using Diffusion Classifier: 58.9% | Performance gap in zero-shot recognition compared to SOTA discriminative models |
[73] | CMDs for semantic image synthesis | Semantic synthesis for abdominal CT, used in data augmentation | Not specified | FID: 10.32, PSNR: 16.14, SSIM: 0.64, DSC: 95.6% for mask-guided DDPM at 100k iterations | High sampling time and computational cost |
[74] | Learnable Unauthorized Examples (LEs) using joint-CMDs | Countermeasure to unlearnable examples in Machine Learning models | CIFAR-10, CIFAR-100, SVHN | Test accuracy on CIFAR-10 using LE: 94.0%, CIFAR-100: 67.8%, SVHN: 94.9% | Limited by distribution mismatches |
[79] | Diffusion-based Non-Intrusive Load Monitoring (DiffNILM) Diffusion Probabilistic Model | Non-intrusive Load Monitoring (NILM) for appliance power consumption pattern disaggregation | REDD and UKDALE datasets | F1-Score: 0.79 for refrigerator on REDD, MAE: 4.54 for microwave on UKDALE | Generation of power waveforms not always sufficiently smooth; computational efficiency not optimized |
[80] | Noise-Robust Expressive Text-to-Speech model (NoreSpeech) | Expressive TTS in noise environments | Not specified | MOS: 4.11, SMOS: 4.14 for NoreSpeech with T-SSL in noisy conditions | Dependent on quality of style teacher model |
[81] | Diffusion-based data augmentation for nuclei segmentation | Nuclei segmentation in histopathology image analysis | MoNuSeg and Kumar datasets | Dice score: 0.83, AJI: 0.68 with 100% augmented data on MoNuSeg dataset | Dependent on the quality of synthetic data |
[75] | AT-VarDiff | Atmospheric turbulence (AT) correction | Comprehensive synthetic atmospheric turbulence dataset | LPIPS: 0.11, FID: 32.69, NIQE: 6.46 | May not generalize well to real-life atmospheric turbulence images. |
[76] | MatFusion Diffusion Models (unconditional and conditional) | SVBRDF estimation from photographs | Large set of 312,165 synthetic spatially varying material exemplars | RMSE on property maps: 0.04, LPIPS error on renders: 0.21 | Limited by the variation in lighting conditions. |
[77] | Point cloud Diffusion Models with image conditioning schemes | 3D building generation from images | BuildingNet-SVI and BuildingNL3D datasets | CD: 3.14, EMD: 10.84, F1 score: 21.41 on BuildingNet-SVI | Constrained by specific image viewing angles. |
[78] | ACDMSR: Accelerated Conditional Diffusion Model for Image Super-Resolution | Enhancing super-resolution using Diffusion Models conditioned on pre-super-resolved images | DIV2K, Set5, Set14, Urban100, BSD100, Manga109 | LPIS: 0.08, PSNR: 25.95, SSIM: 0.67 | Challenges remain in processing images with more complex degradation patterns. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 MDPI (Basel, Switzerland) unless otherwise stated