# Innovative FPGA Solutions for Volumetric Integral Based Image Compression/Decompression with Vivado HLS

# <sup>1</sup>S. Sumathi, <sup>2</sup>A.E Prabhu, <sup>3</sup>K. Pranitha

Professor, Department of ECE, Sairam Engineering College, Chennai. Assistant Professor, Rajalakshmi Engineering College, Chennai. Full Time research Scholar, Anna University, Chennai Corresponding author mail id: <u>pranithalaya13@gmail.com</u>

## Abstract

This paper focuses on achieving efficient image compression and decompression using Vivado HLS (High-Level Synthesis). Volumetric Integral-Based Compression approach for image decompression, implemented via 'C' code and VHDL design blocks on the Arty Z7-20 kit has been proposed. Evaluation spans involves both software and hardware simulations, design block synthesis, implementation stages, and includes bitstream generation. Results from hardware implementation are directly observed on the Arty Z7-20 kit, with emphasis on key performance metrics such as compression ratio and latency.

**Keywords**— Vivado HLS, Vivado HLX, Image compression/decompression, VHDL, Implementation

# **I INTRODUCTION**

Image decompression applications are essential in telehealth for storage and communication purposes, particularly in tasks such as region detection, segmentation for identifying tumor cells and skeletal structures like bones. Effective segmentation aids in early cancer detection and assessing knee cartilage thickness. Image decompression also plays a crucial role in transferring medical images between centers, benefiting telemedicine in rural and remote areas. Depending on the image characteristics, compression methods may adopt either lossy or lossless techniques. Our implementation uses Vivado HLS on the Arty Z7-20 kit. The Xilinx Vivado design suite enhances design logic with IP module integration from its catalog, including tools like system generator for DSP designs and HLS repository for custom designs. Third-party IPs are packaged using Vivado's IP packager tool, supporting both project and nonproject modes. In project mode, RTL designs follow Xilinx guidelines, progressing through simulation, synthesis, and implementation phases. Non-project mode uses Xilinx Core Instance (XCI) files for synthesis and simulation outputs. Vivado's implementation process includes placing and routing the netlist while adhering to logical, physical, and timing constraints. It supports both Synopsys and Xilinx design commands, converting C, C++, and system code to RTL designs with optimizations for area and throughput.

Sub-processes include optimized design, power optimization, placement, and routing stages, culminating in bitstream generation using the 'write bitstream' command. This produces a '.bit file' ready for FPGA deployment, optimizing logic placement and routing for efficient performance.

# **II CURRENT APPROACH**

Marcin kowalczyk et al., [1] [27] Real Time System Implementation for Video Processing this paper aims to provide, implementation of canny edge algorithm on realtime hardware/software video processing system in ZynqFPGA platform. The above proposed work carried out inxilinx environment by using Vivado HLS tool and it has achieved detection of efficient edges for the input streamof 1080p full HD in a real time with effective time consumption. Kai Zhu et al., [2] [20] A New RSA imageencryption algorithm based on singular value decomposition, here RSA encryption algorithm based on singular value decomposed, it analyse the datasets in the form of statistical order and insecured basis. Better encryption efficiency has been obtained and it is applied in the areas of military, medicaland in digital images.

Jia Zhaoyang et.al, [3] [26] Study on digital image inpainting method based on multispectral image decomposition synthesis, provide the analytical report of image reproduction for affected painting arts. The Proposed method multispectral image decomposition synthesis, it removes the structure and texture of the input image used. During inpainting process, the proposed method reduces the accuracy of output image due to affected area. To avoid this, synthesis correlationhas been performed between colors components present in the input image. It concentrates on two parameters, MSE and PSNR. Value obtained has been indicated as MSE = 2.7951 and PSNR= 44.1681.

Amir.Hajirassoulina, et al., [4] [5] [19] High Throughput2D spatial image filters on FPGA's; it aims to provide clear explanation about implementation of two -dimensional spatial filters on FPGA's. It shows improved performance DSP blocks, which has a eligibility to produce effective pixel arrangement.Nowadays FPGAs are an exploited heterogeneous resource, which allows user functions to be implemented with greater performance, low power and less area. The above DSP blocks are added with number of features, interconnectivity, architectures to provide efficient two - dimensional spatial filter implementation.

Xin Zhong et al., [6] [7] [25] A High-capacity reversible watermarking scheme based on shape decomposition for medical image, this describes about the proposed algorithm, bottom-up saliency detection has been applied to medical images to detect the region of interest (ROI). It generates square shape for non-region of interest (NROI). It has been experimented on OASIS medical image dataset which consists of 416 subjects. The proposed algorithm produced effective watermarking capacity and improved image fidelity.

A. cortes et al. [8] [21] [24] proposed high level synthesis: Productivity, performance and software constraints, it explains about the efficiency of HLS tool and also it describes about the evaluation of how HLS tool is applicable to the real-world applications. Review of stereo matching has been done for the applications of image denoising, image retrieval, feature extraction and face recognition.

The proposed method provides best platform to interconnect software and hardware designs using HLS. Stereo matching algorithm achieved 3.5x to 67.9 x speeds with 5x reduction in design effort. The above achievement is less than manual RTL design.

Xilinx ug 940,898 [9] [10][23] Design and Evaluation of an FPGA based Hardware Accelerator for DeflateDecompression; it aims to provide efficient data transferusing deflate lossless data compression/decompression with the proposed hardware accelerators. It explains about the effective implementation of deflate decompression algorithm using HLS designs coded inC++, and this code dumped in Xilinx virtex ultrascale +class FPGA. While decompressing, input and output throughputs are 70.7 (246.4) and 130.6 (386.6) MB/s dynamically and statically encoded files. Maximum throughput can be achieved up to 375 MB/s.

Xilinx ug 986,939 [11] [12] HLS – Based Optimization and Design Space Exploration for Applications with Variable Loop Bounds in this paper, proposed framework design space exploration (DSE) has been introduced to improve the efficiency of FPGA's and to obtain efficient FPGA design parameters. DSE framework has been designed using high level synthesis tool (HLS). HLS based FPGA optimization and DSE framework provides high performance design blocks even in different loops. This application is used for effective baseline implementation throughput of 75x.

Xilinx ug 948, 1027, 1037 [13] [14] [15] Design of Embedded Architecture for Pedestrian Detection in Image and Video here; detecting pedestrian in a particular time is a main factor. Here hardware architecture for pedestrian detection system has been proposed. The system consists of effective extractor and classifier which help to detect an object in a particular time. Hardware architecture consists of many designblocks. These design blocks are designed by using Xilinx HLS tool, software development kit (SDK) for hardware – software co design. Implementation of a proposed system shows effective classification with less energy and time consumption. Proposed system has a capabilityto detect a pedestrian for high-definition video (HD) at 180 frames per second.

Pranitha et al., [16] [17] [7] [9], survey of image compression methods and comparison has been done for continuous wavelet transform, stationary wavelet transform, data compression using 2D wavelet analysis and the proposed method Bisectional Cylindrical Wavelet Transform. Z. Wang et al. [18] proposed speculative parallel decompression algorithm to improve the efficiency of decompression and it has been implemented in Apache spark. Efficient decompression rate 2.6x has been achieved. J. Ouyang et al. [22] proposed compression and decompression technique for FPGA based accelerator in order to achieve cost effective FPGA and to decrease the cost of IDC. Low power FPGA produces effective resource utilization. Pranitha et al., [18] [26] further research work has been extended to implementation process using Vivado HLS. Hardware and software co-design has been made using Vivado HLS. Implementation process and block design has been carried out effectively in ARTY and ZYBO kits.

## **III MATERIALS AND METHODS**

The proposed work describes about the efficient performance of image decompression using Bisectional Cylindrical Wavelet Transform (BCWT) and it is implemented in Arty Z7 20 kit. Initially image compression has been done in MATLAB environment and obtained 87.5% of satellite image compression. Now, same input image with the pixel size of 1300 x775 has been applied to BCWT for image decompression in Vivado HLS environment. This work mainly used for the applications in the area of remote sensing, medical field and in telehealth. Decompression process is mainly designed for the fast transmission of image datasets. In remote sensing field, compressed and decompressed images are transferred for the weather forecasting to detect and to take precautionary steps to manage the disaster. Here the images are captured initially, by using BCWT it can be effectively compressed and decompressed to identify the clear state of weather condition. For earth observation satellite images, foraerial view it will provide the clear decompressed images. Here Bisectional Cylindrical Wavelet Transformis designed in 'C' code in Vivado HLS environment to perform image compression initially, with the pixel size of 256x256 and then it is decompressed with the size of 1300 x 775 by using user defined function declared in Vivado HLS environment. Here the 'core data' file is created accordingly to the input and function values. This 'core data' file should be checked in 'make file. Rules' and then only it will proceed for 'C' simulation process. After performing 'C' simulation process 'csim.exe', 'csim.mk' files are generated. Then it will proceed for 'C synthesis' and build process, reports are generated accordingly. As a result, compressed and decompressed images are obtained.

#### Steps to be executed in the Vivado HLS environment:

In the Vivado HLS IDE, the input image with a resolution of 1300 x 775 is initially used for compression. A unit integer function is declared with a size of 1080 x 1920, and a memory copy function handles compression and decompression data. The image filter with AXI-Stream and Inter-Pix treats the image as a 4x4 bit stream. The HLS dataflow uses the 'S-axilite' function for interfacing, creating ports with 'm-axi' depth. The 'conv' function converts the integer unit function to image format. Dataflow coefficients [3][3] are used for loop operations. The 'mat' function compares HLS data flow with the image pixel value. 'AXI to mat' function manages image formats and data flows, storing compressed output of 256x256 in the anchor port and decompressed output matching the original size in the destination port. Kernel function removes noise, and 'C' simulation and synthesis processes interface with HLX design blocks. Successful runs of Vivado HLS and HLX tools allow viewing compression and decompression outputs including hardware simulation and latency reports.

#### Steps to be executed in the Vivado HLX environment:

In the Vivado HLX environment, VHDL code is used to create architecture design blocks. A 'Design Wrapper' is created, with VHDL code for clock ports, DDS compiler, I/O ports, AXI streams for input and output, PWM ports, HDMI ports for receiver and transmitter, LED & RGB ports, and software interfacing ports. Some ports, such as clock, address, DDR, and HDMI receiver, are defined as 'in out' ports. Specific values are set for HDMI transmitter and receiver data, output LED, output RGB, and software interfacing.

These values are declared before block design. The 'RGB to DVI' video encoder is used as a source for image preprocessing, and a video timing controller manages image IN and OUT timings.

Following, 'AXI video direct memory accesses will process AXI stream input and output values. The 'AXI stream subset converter' converts 'AXI' to 'Mat' values for a compression size of [256x256]. The 'AXI Stream' handles image IN and OUT for decompression, as specified in the 'HLS' environment. A video timing controller monitors image IN and OUT timings. Finally, the 'DVB2RGB' image decoder (sink) is used for post-production, considering reset and encoding values. External pin connections are made accordingly. Software simulation, synthesis, and implementation are completed after designing the architecture blocks, and then the bitstream file is generated.

## IV RESULTS AND DISCUSSIONS

Software Simulation results obtained in Vivado HLX environment



Figure 1 Simulation result using Arty Z7 20 kit



Figure 2 Synthesis result using Arty Z7 20 kit



Figure 3 Implementation result using Arty Z7 20 kit

# **Project Summary Report**

| Power Summary<br>Overview   Dashboard                                                                                             |                                                                                                                        |  |
|-----------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|--|
|                                                                                                                                   |                                                                                                                        |  |
| Project Name:<br>Project location: D:<br>1vivado_proj<br>Product family:<br>Project part:<br>Top module name:<br>Torgat longuago: | Arty-Z7-20-OOB<br>:JArty-Z7-20-OOB-2018.2-<br><mark>Zynq-7000</mark><br>xc7z020clg400-1<br>design_1_wrapper<br>Vorilog |  |

| Synthesis                                 |                           |  |
|-------------------------------------------|---------------------------|--|
| Statura                                   | Comulato                  |  |
| Status:                                   | Complete                  |  |
| Messages:                                 | 405 Warning               |  |
| Active run:                               | synth_2                   |  |
| Part:                                     | xc7z020clg400-1           |  |
| Strategy:                                 | Vivado Synthesis Defaults |  |
| Report Strategy: Vivado Synthesis Default |                           |  |
| _                                         |                           |  |

| Power                                  |               |  |  |
|----------------------------------------|---------------|--|--|
| Total On-Chip Power                    | r: 1.872W     |  |  |
| Junction Temperatur                    | re: 46.6°C    |  |  |
| Thermal Margin:                        | 38.4°C (3.2W) |  |  |
| Effective xJA:                         | 11.5°C/W      |  |  |
| Power supplied to off-chip devices: 0W |               |  |  |
| Confidence level:                      | Low           |  |  |
| Turnlamontad Daman Danaut              |               |  |  |

| Implementati                                  | on Summary               |
|-----------------------------------------------|--------------------------|
| Status:                                       | write bitstream complete |
| Messages:                                     | 1 critical warning       |
| 0                                             | 23 warnings              |
| Active run:                                   | imp_2                    |
| Part:                                         | xc7z020clg400-1          |
| Strategy:                                     | Vivado Implementation    |
|                                               | Defaults                 |
| <b>Report Strategy: Vivado Implementation</b> |                          |
|                                               | <b>Default Reports</b>   |
| T NT                                          |                          |



Figure 4 Project Summary Report using Arty Z7 20 kit

#### Software Simulation results obtained in Vivado HLX environment



Figure 5 Input image with the size of 1300 x 775 in JPEG format

**Compressed output image in JPEG format** 



Figure 6 Compressed output image with the size of 256x256

Decompressed output image in JPEG format



Figure 7 Decompressed output image with the size of 1300 x 775

#### **Disclosure Statement**

No potential conflict of interest was reported by the authors.

#### **Performance Parameters**

The Proposed Volumetric Integral Based Compression Method has achieved, Latency value = Minimum value as 248 and Maximum value as 2132572. Compression ratio = Uncompressed size / compressed size. Value of compression ratio obtained 5.3. Percentage of compression level obtained is 88.5%.

#### Implementation of Arty z7 20t kit



Figure 8 Implementation Result using Arty z7 20t kit.

# Conclusion

This paper demonstrates efficient image compression and decompression using the Proposed Volumetric Integral-Based Compression method, achieving significant improvements in compression ratio and latency. Implementation on the Arty Z7-20 kit was successful, culminating in bitstream generation. Decompression results exhibit efficient latency. Screenshots from software simulation, synthesis, implementation, and hardware simulation, alongside a project summary report, are included. Successful interfacing of Vivado HLX and HLS with the Arty Z7-20 kit are evaluated accordingly.

## REFERENCES

- [1] Marcin kowalczyk et.al, Real time implementation of contextual image processing operations for 4k video stream in zynq ultrascale + MPSOC, 2018, 978-1-5386-8237-1/18, IEEE.
- [2] Kai zhu et.al., A New RSA image encryption algorithm based on singular value decomposition, International journal of pattern recognition and Artificial intelligence, 2018[World scientific].

- [3] Pranitha.K, Dr.G.Kavya, Literature Survey Of Image Compression/Decompression Techniques For space and Telehealth applications, oxidation communications, book 2 volume 42(2019), pp 151 – 159.
- [4] Jia ZhaoYang et.al, Study on digital image inpainting method based on multispectral image decomposition synthesis, International journal of pattern recognition and Artificial intelligence, 2018[World scientific].
- [5] Amir.Hajirassoulina et.al, Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms, Signal processing: image communication, 2018, 68(2018)101-119[Elsevier].
- [6] Ghislain takam Tchendjou et.al, Fuzzy logic based objective image quality assessment with FPGA implementation, Journal of system architecture, 2017, doi: 10.1016/j.sysarc.2017.12.002.
- [7] Xin Zhong et.al., A High capacity reversible watermarking scheme based on shape decomposition for medical images, International journal of pattern recognition and Artificial intelligence, 2018[World scientific].
- [8] Pranitha.K, Dr.G.Kavya, 2018, Data compression with high peak signal to noise ratio using Bisectional Cylindrical wavelet transform for a satellite image, International journal of Engineering and Technology(UAE) volume 7, No (4.6) (2018).
- [9] A.cortes et.al, High level synthesis using vivado HLS for zynq SOC: image processing case studies, 978-1-5090-4565-5/16, 2016, IEEE.
- [10] Pranitha.K, Dr.G.Kavya, A Systematic Method for Hardware Software Codesign using Vivado HLS, International Journal of Recent Technology and Engineering (IJRTE), Volume-8 Issue-4, November 2019, PP 467-472.
- [11] Xilinx. High-Level Synthesis. http://www.xilinx.com/support/documentation/sw\_manuals/xilinx2015\_4/ ug902-vivadohigh-level-synthesis.pdf, November 2015.
- [12] Xilinx. High-Level Synthesis. <u>http://www.xilinx.com/</u> support/documentation/sw\_manuals/xilinx2015\_4/ ug898-vivadohigh- level-synthesis.pdf, December 2018.
- [13] Xilinx. High-Level Synthesis. <u>http://www.xilinx.com/</u> support/documentation/sw\_manuals/xilinx2015\_4/ ug940- Hardware design user guide.pdf, June 2018.
- [14] Pranitha.K, Dr.G.Kavya, M.Arun kumar, Implementation and Elaborated Block design for Zybo kit Using Vivado High Level Synthesis tool, Test Engineering and Management, 2020, pp 14623- 14629.
- [15] Xilinx. High-Level Synthesis. <u>http://www.xilinx.com/</u> support/documentation/sw\_manuals/xilinx2015\_4/ ug939- Designing with IP tutorial.pdf, June 2018.
- [16] Xilinx. High-Level Synthesis. <u>http://www.xilinx.com/</u> support/documentation/sw\_manuals/xilinx2015\_4/ ug986- Implementation user guide.pdf, June 2018.

- [17] Pranitha.K, Dr.G.Kavya, M.Arun kumar, A Detailed Illustration of VLSI Block Design Implementation Process Using Vivado HLS and Arty kit, Universal Journal of Electrical and Electronics Engineering, 7(3):201-208,2020,DOI:10.13189/ujeee.2020.070304.
- [18] Xilinx. High-Level Synthesis. <u>http://www.xilinx.com/</u> support/documentation/sw\_manuals/xilinx2015\_4/ ug948- Model based design using system generator.pdf, December 2018.
- [19] Xilinx. High-Level Synthesis. <u>http://www.xilinx.com/</u> support/documentation/sw\_manuals/xilinx2015\_4/ ug1037- AXI reference Guide.pdf, July 2017.
- [20] Pranitha. K, Dr.G.Kavya, An efficient image compression architecture based on optimized 9/7 wavelet transform with hybrid post processing and entropy encoder module, published in Microprocessors and Microsystems 98 (2023) 104821.