报告题目：Reliability-Aware Energy Management in Multicore Real time Systems with Task Replication
报告人： Frederic Vivien 教授 INRIA（法国国家信息与自动化研究所）
主持人： 刘 静教授
Low-energy consumption is vital in multi-core real time systems due to their ever-increasing computation requirements and the fact that they are mostly supplied with batteries. To achieve high reliability targets, task replication is a usually a powerful way. We propose a reliability aware energy management approach via task replication and analysis the complicity. We explicitly take into account the coverage factor of the fault detection techniques and the negative impact of Dynamic Voltage Scaling (DVS) on the rate of transient faults leading to soft errors. Our approach extends both execution time and period of tasks while preserving their utilization. This leads to optimize the duplication of task due to a period extension that is exploited by stretch for energy management.
Frederic Vivien 是法国国家信息与自动化研究所（Institut national de recherche en informatique et en automatique，简称INRIA）教授，在系统可靠性与高性能计算方面开展了多年的研究工作，在IEEE Transactions on Parallel and Distributed Systems, Theory of Computing Systems等期刊和HiPC等权威国际会议发表论文一百余篇。目前的研究工作包括：多核实时系统工作流调度、多核系统容错算法、混合计算等。
报告人： Yves Robert 教授 ENS-LYON（里昂高师）
主持人： 刘 静教授
We consider the problem of orchestrating the execution of workflow applications structured as Directed Acyclic Graphs (DAGs) on parallel computing platforms that are subject to fail-stop failures. The objective is to minimize expected overall execution time, or makespan. A solution to this problem consists of a schedule of the workflow tasks on the available processors and of a decision of which application data to checkpoint to stable storage, so as to mitigate the impact of processor failures. To address this challenge, we consider a restricted class of graphs, Minimal Series-Parallel Graphs (M-SPGS), which is relevant to many real-world workflow applications. For this class of graphs, we propose a recursive list-scheduling algorithm that exploits the M-SPG structure to assign sub-graphs to individual processors, and uses dynamic programming to decide how to checkpoint these sub-graphs. We assess the performance of our algorithm for production workflow configurations, comparing it to an approach in which all application data is checkpointed and an approach in which no application data is checkpointed. Results demonstrate that our algorithm outperforms both the former approach, because of lower checkpointing overhead, and the latter approach, because of better resilience to failures.
Yves Robert 是里昂高师教授（Professor, Ecole Normale Supérieure de Lyon），IEEE会士（Fellow of the IEEE），在系统可靠性与高性能计算方面开展了多年的研究工作，在IEEE Trans. Computers、IEEE Trans. Parallel Distributed Systems等期刊发表论文158篇，在国际会议发表论文253篇，并在多个国际会议上作主题演讲（Keynote）。目前的研究工作包括：多准则工作流调度、多核系统容错算法、随机调度等。