Seminars 2021
Title: | Database Alignment: Fundamental Limits and Efficient Algorithms |
Speaker: | Professor Negar Kiyavash |
Date: | 30 March 2021 |
Time: | 5.00pm |
Venue: | Zoom (ID and PW will be given upon registration) |
Host: | Assistant Professor Kiah Han Mao |
Abstract: | As data collection becomes ubiquitous, understanding the potential benefits as well as the risks posed by the availability of such large amount of data becomes more pressing. Identifying how data from different sources relate to each other, could allow to merge and augment data. On the positive side, this could help for instance in deducing functionality of proteins by comparing protein interaction networks of different species. On the negative side, such alignment could cause unintended exposure of confidential information. A famous case of such breach occurred when customer data from the anonymous Netflix Prize database was revealed through alignment with public IMDB profiles. In this talk we present information-theoretic results on database alignment, showing how the size of databases and the correlation between their elements determines the success of alignment. Database alignment is closely related to equally interesting problem of network alignment, a generalization of the graph isomorphism problem. View Seminar Recording |
Title: | Computational and statistical techniques for scientific analysis in gravitational-wave astronomy |
Speaker: | Dr Alvin Chua |
Date: | 26 March 2021 |
Time: | 10.00am |
Venue: | Zoom (ID and PW will be given upon registration) |
Host: | Associate Professor Chan Song Heng |
Abstract: | Gravitational-wave astronomy is the study of astrophysical phenomena through their gravitational radiation. The field is barely out of its infancy, having been born from the first direct detection of gravitational waves by the LIGO detectors in 2015. Scientific analysis in gravitational-wave astronomy is uniquely challenging: Difficult computational and statistical problems are posed both by the modeling of possible signals from highly relativistic sources, and by the extraction and characterization of actual signals in noise-dominated data. In this talk, I discuss various ways in which techniques from areas such as reduced-order modeling, deep learning, and information geometry can be brought to bear on such problems, so as to facilitate or even enable the science that can be achieved with contemporary and near-future detectors. View Seminar Recording |
Title: | Information-theoretic Privacy Watchdog: Properties, Optimization and Generalizations |
Speaker: | Professor Parastoo Sadeghi |
Date: | 16 March 2021 |
Time: | 9.00am |
Venue: | Zoom (ID and PW will be given upon registration) |
Host: | Assistant Professor Kiah Han Mao |
Abstract: | In this talk, we explore the use of information density in scoring privacy risks in data sharing. Assume that the useful data to be shared is correlated with some sensitive data to be protected. We discuss how information density and its generalizations can be used in a privacy watchdog method that classifies useful data into low-privacy-risk and high-privacy-risk symbols. We will then present insights into the type of privacy-preserving mappings (for releasing useful data) that minimize information leakage from high-privacy-risk symbols. We will discuss practical methods to strike a balance between utility in the released data and privacy about the sensitive variable, where both utility and privacy are measured by mutual information. The basis of this talk is our recent extension and generalization of the information theory watchdog privacy method in the literature (by Hsu, Asoodeh and Calmon at ISIT 2019). At the same time, our work provides an in-depth understanding of a specific case of the privacy funnel method. The generalization of information density that we will present is in the direction of its stochastic alpha-norm. We will also discuss the relation of this generalized privacy-risk score to alpha Sibson mutual information. View Seminar Recording |
Title: | Service Rates in Distributed Systems with Redundancy |
Speaker: | Professor Emina Soljanin |
Date: | 2 March 2021 |
Time: | 9.00am |
Venue: | Zoom (ID and PW will be given upon registration) |
Host: | Assistant Professor Kiah Han Mao |
Abstract: | Distributed systems running applications such as machine learning and edge computing strive to maximize the number of concurrent data access service requests that the system can support. Redundant data storage has emerged as an efficient and robust way to enable simultaneous access of different data files by many users competing for the system’s resources. Replication and coding of data affect the service rates at which the users can access the stored files. This talk will introduce the notion of a service rate region. It will postulate the service rate region as an essential consideration in the design of erasure-coded distributed systems. Service rate regions for some code families will be derived. We will explain the recently recognized connections with batch and switch codes and combinatorial optimization on graphs. We will discuss some systems issues as well. View Seminar Recording |
Title: | DNA Punch-Cards: Implementations and Coding-Theoretic Questions |
Speaker: | Professor Olgica Milenkovic |
Date: | 23 February 2021 |
Time: | 9.00am |
Venue: | Zoom (ID and PW will be given upon registration) |
Host: | Assistant Professor Kiah Han Mao |
Abstract: | Recent implementations of synthetic DNA-based data storage systems have demonstrated several promising applications of macromolecular recorders. However, the proposed systems suffer from high cost, read-write latency and error-rates that render them noncompetitive with traditional silicon-based devices. One means to avoid synthesizing DNA is to use readily available, naturally occurring DNA. As the nucleotide sequences of native DNA are fixed, they cannot be edited to accommodate arbitrary user-defined content. Hence, instead of changing the sequence content, one may adopt an alternative recording strategy -- akin to card punching -- that modifies the topology of native DNA to encode desired information. We describe the first macromolecular storage paradigm in which data is written in the form of “nicks” at predetermined positions on the sugar-phosphate backbone of double–stranded native DNA. The platform accommodates parallel nicking on one and multiple genomic DNA fragments, and paired nicking and disassociation for creating “toehold” regions that enable single-bit random access and strand displacement. It also provides a large mass of inexpensive DNA that may be used for multiple, errorfree readout cycles via current sequencing technologies. As a proof of concept, we used the multiple-turnover programmable artificial restriction enzymes to punch both text and image files into the PCR products of Escherichia coli genomic DNA fragments in vitro. The encoded data was reliably reconstructed through sequence alignment and read coverage analysis. The described storage implementation is accompanied by a number of new coding-theoretic questions and constructions connecting combinatorial designs and set discrepancy theory.
|
Title: | Capacity and Coding for Binary-Input Memoryless Channels with Runlength-Limited Input Constraints |
Speaker: | Professor Navin Kashyap |
Date: | 2 February 2021 |
Time: | 5.00pm |
Venue: | Zoom (ID and PW will be given upon registration) |
Host: | Assistant Professor Kiah Han Mao |
Abstract: | Discrete memoryless channels (DMCs) with constraints imposed on input sequences form an important class of finite-state channels, which model a variety of situations in digital communications and storage. Of particular importance are binary-input DMCs (or BI-DMCs) with (d,k)-runlength-limited (or (d,k)-RLL) input constraints, as they serve as useful models of random noise affecting data stored on magnetic or optical storage media. These channel models have a long history, starting with the work of Zehavi and Wolf (1988) on the binary symmetric channel with (d,k)-RLL input constraints. However, our understanding of the information-theoretic capacity of these channels is still quite limited. In this talk, we survey some of the recent developments in this field, paying special attention to the feedback capacity and coding for these channels, about which we now know a great deal more. For concreteness, we will focus on the case of the binary erasure channel with (d,k)-RLL constraints. The talk is based on joint work with Oron Sabag, Bashar Huleihel, Haim Permuter, Aashish Tolambiya, and Arvind Rameshwar. View Seminar Recording |
Title: | Equiangular lines and frames over finite fields |
Speaker: | Dr Gary R.W. Greaves |
Date: | 27 January 2021 |
Time: | 3.30pm |
Venue: | Zoom (ID and PW will be given upon registration) |
Host: | Associate Professor Chan Song Heng |
Abstract: | Given some dimension d, what is the maximum number of lines in d dimensions such that the angle between any pair of lines is constant? (Such a system of lines is called "equiangular".) This classical problem was initiated by Haantjes in 1948 in the context of elliptic geometry, and it is currently enjoying renewed interest due to its connection to quantum information theory. In this talk, I will give an overview of recent progress on equiangular line systems. Next, I will report on the solution to the Euclidean version of the problem in dimensions 14 and 16. Finally, I will present an analogous problem over finite fields, highlighting some curiosities one encounters when working with finite characteristics. |