Maria K. Michael, University of Cyprus, Cyprus
The era of nanoscale technology has ushered designs of unprecedented complexity and immense integration densities. Billions of transistors now populate modern multicore microprocessor chips and the trend is only expected to grow, leading to single-chip many-core systems. Diminutive feature sizes, however, put undue strain on the reliability and long-term endurance of these modern systems making them increasingly vulnerable. Increased early-life and aging/wear-out induced failures could be accommodated, in a gracefully degraded system, via run-time mechanisms to protect against undesired system behavior by facilitating monitoring, detection, mitigation, and/or recovery from faults throughout the lifetime of the system. This lecture focuses on recent research efforts to develop techniques for (i) on-line monitoring and fault detection, (ii) test scheduling and system availability optimization, and (iii) efficient check-pointing and roll-back mechanisms to assist towards recovery, at the presence of permanent fault(s) in shared-memory multicore systems. An O/S-integrated and fine-grained framework will be presented which allows for real-time observation and on-demand Software-Based Self-Test (SBST) of the system’s cores. This new SBST paradigm aims to reduce testing time by avoiding the unnecessary over-testing of the cores’ functional units and by taking advantage of O/S optimized scheduling, load balancing and cache-aware mechanisms, ultimately leading towards an efficient and resilient system. The lecture will conclude with a brief discussion on how to extend the framework for aging prediction and mitigation.
Maria K. Michael is an Associate Professor at the Department of Electrical and Computer Engineering at the University of Cyprus. She is also a founding member and the Director of Education and Training at the KIOS Research and Innovation Center of Excellence, also at the University of Cyprus. Maria has a Ph.D. degree from the ECE Dept. of Southern Illinois University, Carbondale-USA. Her research expertise falls in the areas of test and reliability of digital circuits and chip-level architectures, with emphasis on embedded and general-purpose multicore systems reliability and on-line testing, dynamic/intelligent parallel CAD algorithms for automatic testing and fault simulation, intelligent methods for design, test and fault tolerance, delay test and emerging fault models. Recent research interests expand to design and optimization of embedded systems and other chip-level architectures, dynamic self-detecting and self-healing architectures, and dependability and security in the hardware backbone of cyber-physical systems. She has published more than 70 papers in high-caliber refereed journals and international conferences and she serves on steering, organizing and program committees of several IEEE and ACM conferences in the areas of test and reliability. She is a co-recipient of a Best Paper Award of MSE’2009. She is a member of the IEEE and the ACM.
Anton Klotz, Cadence Design Systems, Germany
10 years ago a press-release has announced a new initiative called Cadence Academic Network, which is aimed on strengthen the collaboration between Cadence and academia. 10 years later the Academic Network can be called a full success for Cadence, but also for the network members, which are numerous universities, research institutes and startup companies. I will present the history, the structure and activities of the Academic Network with an outlook into the future plans and ideas.
Anton Klotz has studied Technical Computer Sciences at Mannheim University. In 2004 he joined Cadence, where we worked for 10 years as Service Application Engineer for Physical Verification. Since 2015 Anton is University Program Manager for the EMEA region, running the Cadence Academic Network initiative.
Artur Jutman, Testonica Lab OÜ, Estonia
The state-of-the-art strategies for fault tolerance and error mitigation do not contribute to self-health awareness: they just help mitigating non-permanent faults and operate until the accumulated wear-out may not be tolerated any longer. The technology trend observed in microelectronics today leads to system resilience, which in addition to physical hardening and classical fault tolerance, would also exploit the extensive natural redundancy of multi-core systems together with a smarter task scheduling. As a result, the presence of permanent defects should not mean the end of life if the electronic system is able to assess and become aware of the health status of its components and sub-systems, thereby keeping operating on a partially damaged HW while coping with the reduced processing capacity. Such a health-awareness framework requires massive in-situ on-chip fault event and process data collection from sensors, checkers and protection logic, which represents a real challenge. There are some well-known domain-specific health-awareness techniques like the S.M.A.R.T technology in computer hard drives. However, no established general-purpose industrial solutions for SoCs are known so far. The purpose of this presentation is to introduce the basic concepts as well as present an on-chip Health Monitoring and Fault Management infrastructure specifically targeting complex SoCs and invite the audience for further discussion.
Artur Jutman is the Managing Director of Testonica Lab. He received his PhD degree in computer engineering from TU Tallinn, Estonia in 2004. He has also been a visiting researcher in: TU Darmstadt and TU Ilmenau in Germany, University of Linkoping and University of Jonkoping in Sweden as well as in TU Warsaw in Poland. Dr. Jutman is a member of the executive committee of the Nordic Test Forum society. He has been actively involved in numerous FP5, FP6, FP7 R&D projects and served as a coordinator in FP7 STREP project BASTION and EUROSTAS project COMBOARD. Dr. Jutman has co-authored over 150 research papers. He has been invited to give several keynotes, invited talks, embedded and full tutorials (incl. several TTEP tutorials) at several international conferences and symposia around the world.
Maximilien Glorieux, IROC Technologies, France (tentatively)
Hardware is intrinsically unreliable. External and internal perturbations can cause data corruption, faulty states and unpredictable circuit behavior. At the same time, electronic devices face tough requirements in terms of performance, functionality, features, cost and reliability while the industry is submitted to pressure to reduce design and manufacturing cycle times. Many high reliability applications (automotive, aerospace, networking, high-performance computing, medical), transient faults (including single events, soft errors) are the premier cause of errors and failures producing various effects, from intermittent data errors – easily detectable and correctable through protocol or software approaches to permanent failures that requires maintenance. Today's reliability standards (such as the ISO 26262) set extraordinary reliability requirements that are difficult to fulfill through classical methods. In this course, we will investigate modern fault, error and failure analysis methods for complex designs, hardening and protection methods to minimize the functional impact of the event and finally, reliability budgeting approaches allowing the reliability engineer and system architect to build a fault-managed, high-quality system.
Maximilien Glorieux received his master degree from ISEN engineering school, Lille, France in 2010. He started preparing this Ph.D as a part of a collaboration between Aix-Marseille University (France) and ST Microelectronics (Crolles, France), and defended it in 2014. He is currently working with IROC Technologies since 2014. His activities include the reliability analysis, the testing and the hardening of complex circuits for aerospace, automotive, networking and medical. He is also involved in several research project with major partners from academia, industry and agencies. His research topics include fault propagation analysis in digital circuits and advanced mitigation solution for highly reliable applications.
Paolo Rech, UFRGS, Brazil
Radiation experiments using accelerated particles beams are one of the most used ways to measure the radiation-induced error rates of electronic devices and systems. Preparing a setup for beam experiments and understanding the particles interaction can be challenging.
In the talk, we will describe how particles are accelerated and how their flux is measured. Then, we will show some of the most used facilities to measure the error rate of electronic devices (e.g., LANSCE, Los Alamos, NM, ChipIR, Didcot, UK, etc.). We will provide some shrewdness necessary to successfully perform a radiation experiments and list some (funny) stories about beam experiments. Finally, we will discuss how to test complex system and applications, including error quantification and qualification, with examples that range from object-detection neural networks for automotive applications to physical simulations for high performance computing applications.
Paolo Rech received his master and Ph.D. degrees from Padova University, Padova, Italy, in 2006 and 2009, respectively. He was a Post Doc at LIRMM, Montpellier, France from 2010 to 2012. He is currently an associate professor at the Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil. He is actively collaborating with the Los Alamos National Labs, NM, USA, the Jet Propulsion Lab., Pasadena, USA, NVIDIA, and AMD. His main research interests include the evaluation and mitigation of radiation-induced errors in modern computing systems for HPC and safety-critical applications. During his career he has performed more than 20 beam test campaigns with heavy ions, neutrons, and protons.
Raoul Velazco, TIMA Laboratory, France
Integrated circuits operating in radiation harsh environments (space, nuclear, Earth atmosphere at high altitudes,…) can be perturbed by direct or indirect ionization resulting from the resulting from the reactions of energetic particles (heavy ions, protons, neutrons,….) with atoms of the circuit’s substrate.
These phenomena, gathered under the acronym SEE (Single Event Effects) have a common consequence which is the generation, at a random instant and site, of a current pulse which if it occurs in a sensitive area of the circuit may result in transient faults and also in destructive faults. Representative examples are Single Event Upsets (SEU) and Single Event Latchups (SEL).
SEUs, also called “upsets”, “soft errors” or “bit flips” are responsible of the change of the content of the circuit’s memory cells, while SELs may result in a short-circuit mass/power-supply and thus triggering a parasitic structure which disrupts proper functioning of the part, possibly even leading to its destruction due to overcurrent.
Last 40 years SEEs have represented a main challenge for space applications, due to the presence if very energetic particles, such as heavy ions, issued from galactic cosmic rays. The permanent progress in the IC’s manufacturing technologies make possible that SEE phenomena may occur in circuits issued from advanced manufacturing technologies operating in the Earth’s atmosphere. Indeed, SEUs were observed in commercial aircrafts and even at ground level. It is important to note that in such cases the impinging particle are mainly neutrons, and so no ionizing particles, but the ionization is produced as a consequence of the interaction of the impinging neutron with atoms present in the Silicon substrate.
The evaluation of the SEE’s sensitivity of an integrated circuit is generally done via the so called « accelerated tests » during which the circuit is exposed to a particle beam (heavy ions, protons, neutrons) issued from facilities such as particle cyclotrons, linear accelerators or neutron sources. SEE testing of programmable circuits (microprocessors, microcontrollers, DSPs and FPGAs) is not an easy task. Indeed, these devices have a very huge number of memory cells with a large scope of functions, and being the main targets of SEEs this implies a deep study to deal with their content change. The sensitivity with respect to SEUs of such programmable circuits will strongly depend on the executed application and thus programs with similar complexity would be used during the radiation test campaigns.
In this talk will be presented the context of the researches concerning radiation effects on complex integrated circuits done at the team RIS (Robust Integrated Systems) of TIMA Labs. This will be illustrated by the results issued from the so-called “real life experiments” performed on-board commercial flights and balloons and with those issued from particle accelerators. An error-rate prediction method will be presented and illustrated with results issued from its application to representative complex circuits studied during the researches done at TIMA/RIS.
Raoul VELAZCO was born in Montevideo (Uruguay). He received the PhD and the Doctor ès Sciences in Computer Sciences in 1982 and 1990respectively, both from INPG in Grenoble. With the CNRS (French Research Agency) since 1984, where he is Director of Researches, he is presently a coleader of the RIS research group at TIMA laboratory (Grenoble).
His main research activities focus the study of radiation effects on microelectronic circuits, the design hardening techniques and the development and exploitation of experiments devoted to operate on board satellites. He has supervised 21 PHD and is author or co-author of more than 200 scientific publications 35 of them in IEEE Transactions of Nuclear Science.
Since 2005, he is the general co-chair of SERESSA (international School of Effects if Radiations on Embedded Systems for Space Applications). SERESSA 2018 will be held in The Netherlands in cooperation with European Space Agency.
Victor Champac, INAOE, Puebla, Mexico
FinFET technology has been adopted starting 22nm technology node for high performance and power-efficient applications. Smartphones and computers are two important key drivers for FinFET technology. Other end-user applications are tablets, wearables, high-end networks and automotive. A dramatic gain in performance at low operating voltage is obtained. Important challenges in design and test tasks are addressed in this talk. The quantized width that is a function of the number of fins adds complexity to increase the driving capability of digital logic gates. The use of complex interconnect structures (MOL), and multi-fin and multi-finger devices in circuits based on FinFET technology pose a challenge for the design and test of circuits. The complexity to test important defects, like open-gate and short defects, in logic cells based on FinFET technology are analyzed. It is shown that significant test effort is needed to catch these defects in FinFET technology to obtain electronics products with higher quality.
Victor Champac received the Ph.D. degree in 1993 from the Polytechnic University of Catalonia (UPC), Spain. Since 1993 he is with the National Institute for Astrophysics, Optics and Electronics (INAOE-Mexico) where he is Titular Professor. Dr. Champac is IEEE Senior Member. He was co-founder of the Test Technology Technical Council-Latin America of IEEE Computer Society. He was the co-General Chair of the 2nd, 9th, 14th and 16th IEEE Latin American Test Workshop (symposium since 16th edition). He is member of the Board Director of Journal of Electronics Testing: Theory an Applications (JETTA). He participates in the Program Committee of several international conferences. He also serves as reviewer in several international conferences and journals. He has published over 120 papers at international conferences and journals. His research lines include: defect modeling in leading technologies, development of new test strategies for advanced technologies, aging reliable circuit design, and circuit design under process variations.
Zainalabedin Navabi, University of Tehran, Iran / Worcester Polytechnic Institute, USA
Today’s complex design of digital systems requires methodology and abstraction far beyond the existing RTL. In the recent years, a new abstraction for design and description of hardware for coping with this complexity has evolved. This abstraction is generally referred to as ESL (Electronic System Level). ESL is a new culture in design that includes way of thinking, methodology, abstraction, languages, and tools. One such tool is an environment for design space exploration (DSE). ESL separates computation from communication in an abstract system definition and uses C++ based SystemC language as its de facto description language. An abstract DSE environment must include ways of incorporating low level physical conducts into upper level descriptions, in order to allow such conducts to be considered and dealt with at the design level.
This talk presents a SystemC based ESL DSE that allows a system level description to have access to low level physical communication conducts at an abstract definition of the system. The talk begins with an introduction to design abstraction levels, which is followed by presentation of SystemC at RTL, and continuing into ESL description of a system. Various types of ESL communication mechanisms i.e., channels and TLM-2.0 will be discussed. We will then show how physical conducts of communication links modeled in C++ can be turned into SystemC modules to be incorporated into channel and TLM communications.
Dr. Zainalabedin Navabi is a professor of Electrical and Computer Engineering at the University of Tehran, and an adjunct professor at Worcester Polytechnic Institute. Dr. Navabi is the author of several textbooks and computer based trainings on VHDL, Verilog and related tools and environments. Dr. Navabi’s involvement with hardware description languages begins in 1976, when he started the development of a register-transfer level simulator for one of the very first HDLs. In 1981 he completed the development of a synthesis tool that generated MOS layout from an RTL description. Since 1981, Dr. Navabi has been involved in the design, definition and implementation of Hardware Description Languages. He has written numerous papers on the application of HDLs in simulation, synthesis and test of digital systems. He started one of the first full HDL courses at Northeastern University in 1990. Since then he has conducted many short courses and tutorials on this subject in the United States, Europe and Asia. Since early 1990’s he has been involved in developing, producing, and broadcasting online and video lectures on HDLs, Digital System Test, and various aspects of automated design. In addition to being a professor, he is also a consultant to CAE companies. Dr. Navabi received his M.S. and Ph.D. from the University of Arizona in 1978 and 1981, and his B.S. from the University of Texas at Austin in 1975. He is a senior member of IEEE, a member of IEEE Computer Society, member of ASEE, and ACM.
Books by Prof. Zainalabedin Navabi:
Fabian Vargas, Catholic University – PUCRS, Brazil
Technology scaling, which made electronics accessible and affordable for almost everyone on the globe, has advanced IC and electronics since sixties. Nevertheless, it is well recognized that such scaling has introduced new (and major) reliability challenges to the semiconductor industry. This tutorial addresses the background mechanisms impacting reliability of very deep submicron (VDSM) integrated circuits (ICs). Issues like total-ionizing dose (TID), single-event effects (SEEs) and electromagnetic interference (EMI) are presented and their combined effects on the reliability of modern ICs is discussed. Reliability failure mechanisms for radiation, the way they are modeled and how they are impacting IC lifetime will be covered. Laboratory test setup and recent results from experimental measurements are described. Classic design solutions to counteract with TID, SEEs and EMI in VDSM ICs as well as the recent achievements on the development of on-chip sensors for leveraging robustness of embedded systems for critical applications are introduced.
Fabian Vargas obtained his Ph.D. Degree in Microelectronics from the Institut National Polytechnique de Grenoble (INPG), France, in 1995. At present, he is Full Professor at the Catholic University (PUCRS) in Porto Alegre, Brazil. His main research domains involve the HW-SW co-design and test of system-on-chip (SoC) for critical applications, system-level design methodologies for radiation, accelerated aging and electromagnetic compatibility, embedded sensor design for characterization, reliability and aging binning. Among several activities, Prof. Vargas has served as Technical Committee Member or Guest-Editor in many IEEE-sponsored conferences and journals. He holds 6 BR and international patents, co-authored a book and published over 200 refereed papers. Prof. Vargas is associate researcher of the BR National Science Foundation since 1996. He co-founded the IEEE-Computer Society Latin American Test Technology Technical Council (LA-TTTC) in 1997 and the IEEE Latin American Test Symposium - LATS (former Latin American Test Workshop - LATW) in 2000. Prof. Vargas received for several times the Meritorious Service Award of the IEEE Computer Society for providing significant services as chair of the IEEE Latin American Regional TTTC Group and the LATS. Prof. Vargas is a Golden Core Member of the IEEE Computer Society and Senior Member of the IEEE.
Maarja Kruusmaa, IT Faculty of Tallinn University of Technology, Estonia
Publishing is an essential and core part of every researcher and writing is a skill that most of us use but only few of us master. This presentation addresses ethical aspects of academic writing, discusses acceptable practices in publishing, author guidelines of established journals and unwritten norms that the research communities tend to follow, also taking notice of that rules and norms can vary from one community to another. We will discuss the notions of plagiarism, how to cite your sources and how to avoid self-plagiarism. We define the concept of authorship and how it might vary in different research communities. The presentation will include examples and thought experiments of how to solve hypothetical ethical dilemmas.
Prof. Maarja Kruusmaa is a vice-dean of IT Faculty of Tallinn University of Technology and the manager of the IT PhD program. She has successfully supervised 13 PhD theses and continues supervising PhDs and postdocs and well as forming the PhD education in the faculty and consulting the enrolled PhD students. Her own research field is underwater robotics and bio-robotics where she runs a research group of Centre for Biorobotics, coordinates and participates in several European research consortia. She is also a visiting professor of Norwegian Institute of Science and Technoloy.