November 2017, Volume 4, Issue 11 JETIR (ISSN-2349-5162)
Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org
LOSSLESS VERSUS LOSSY COMPRESSION:
THE TRADE OFFS
1
Ayush Raniwala
Student, Computer Science Dept.
SVKM‟s NMIMS Mukesh Patel School of Technology Management
& Engineering,Mumbai, India
2
Shreyas Singhvi
Student, Computer Science Dept.
SVKM‟s NMIMS Mukesh Patel School of Technology Management
& Engineering,Mumbai, India
ABSTRACT—This paper discusses lossless and lossy data
compression in text and image compression and compares their
performance in the same. The paper begins with selecting the
best possible basic lossless compression technique from the one
chosen and comparing it with another basic lossy technique.
The techniques considered are Huffman, Shannon–Fano
Coding and SPIHT techniques. So far the preferred technique
for image compression has been lossy since the performance is
found to be far superior to that of lossless but using this can
sometimes lead to loss of important information, thus this paper
aims to help in selecting the best possible algorithm depending
on the task in hand.
Keywords— Image Compression, Lossless Compression, Lossy
Compression, Huffman Coding, Shanon-Fano Coding, SPIHT
based image compression
I. INTRODUCTION
Compression is the procedure of encoding information to less
bits than the first representation so it consumes less storage space
and less transmission time while conveying more than a system
[1].
Compression can be classified into two types lossless and
lossy compression, the two differ on the basis of how the data
recovered is compared to its initial form, As the name suggests in
lossless compression techniques, no information is lost. In other
words, the reconstructed data from the compressed data is
identical to the original form. Whereas in lossy compression,
some information is lost, i.e. the decompressed data is similar to
the original but not identical. Data compression is possible due to
the redundancy in the everyday data.
Fig 1. Data Compression and Decompression [2]
Uncompressed multimedia (graphics, audio and video) data
requires considerable storage capacity and transmission
bandwidth. Despite rapid progress in mass-storage density,
processor speeds, and digital communication system
performance, demand for data storage capacity and data-
transmission bandwidth continues to outstrip the capabilities of
available technologies. The recent growth of data intensive
multimedia-based web applications has not only sustained the
need for more efficient ways to encode signals and images but
have made compression of such signals central to storage and
communication technology. With this is in mind it has become
very important for us to find the perfect technique for image
compression since in certain application‟s loss of even a bit of
data is not tolerable but at the same time needs to be quickly
transmitted an example of this would be an x-ray of a patient sent
to his doctor, whereas in some minimal loss of data wouldn‟t
matter if the rate of compression is higher.
II. LOSSLESS COMPRESSION
Lossless data compression is a procedure that permits the
utilization of data compression calculations to pack the content
data furthermore permits the precise unique data to be remade
from the compacted data as shown in Fig 1.
The prevalent ZIP record organize for data compression is a
use of lossless data compression approach. Lossless compression
is utilized when it is vital that the first data and the decompressed
data be indistinguishable. Most lossless techniques utilise the
redundancy in data by combining them together during
compression, During image compression, compression is
achieved by removing unnecessary metadata from JPEG and
PNG files. RAW, BMP, GIF, and PNG are all lossless image
formats. The only disadvantage of lossless image compression is
that the size of the file is comparatively higher then what it would
be with lossy.
Lossless compression techniques may be classified by kind of
data they are intended to pack. Most lossless compression
projects utilize two various types of calculations: one which
creates a factual model for the info data and another which maps
the information data to bit strings utilizing this model as a part of
such a route, to the point that as often as possible experienced
data will deliver shorter yield than improbable data [2].
A. Basic Lossless Techniques
The techniques briefly studied are:
1. Huffman coding
2. Shannon–Fano Coding
1. Huffman Coding:
Huffman data compression is a lossless compression technique
which use‟s the probability of the occurrence of a symbol to
remove redundant data, it follows top down approach where the
binary tree is built from the top down to generate an optimal
result, the characters in a data file are converted to binary code
where the most common characters in the file have the shortest
binary codes and the least common characters have the longest
binary code. A Huffman code can be determined by successively
constructing a binary tree as shown in Fig 3, whereby the leaves
represent the characters that are to be encoded, every node
contains the probability of occurrence of the characters belonging
to the sub tree beneath the node, the edges are labelled with the
bits 0 and 1. It is also known as prefix coding or prefix
elimination coding. Illustrated below in Fig 2, is the method to
find the codes of the respective entities after the binary tree is
formed.