“智信讲坛”（第八十九）期学术报告-同济大学嵌入式系统与服务计算教育部重点实验室

“智信讲坛”（第八十九）期学术报告

浏览次数：

【保护视力背景色：

】【字色：红蓝褐绿黑紫粉红深蓝】【字体:8 7 6 5 4 3 2 1】

题目：Performance Optimization for Large-Scale Distributed Machine-Learning Applications:A Swiss-Army-Knife Approach
　　报告人：Yonggang Wen
　　时间：2017年4月25日周二上午10:00
　　地点：电信楼403室
　　报告人简介：
　　
　　Dr. Yonggang Wenis an associate professor with School of Computer Science and Engineering (SCSE) at Nanyang Technological University (NTU), Singapore. He is also the Assistant Chair for Innovation at SCSE and the founding director of SCSE Innovation Lab at NTU. He received his PhD degree in Electrical Engineering and Computer Science (minor in Western Literature) from Massachusetts Institute of Technology (MIT), Cambridge, USA, in 2008. Previously he has worked in Cisco to lead product development in content delivery network, which had a revenue impact of 3 Billion US dollars globally. Dr. Wen has published over 170 papers in top journals and prestigious conferences. His systems research has gained global recognitions. His work in Multi-Screen Cloud Social TV has been featured by global media (more than 1600 news articles from over 29 countries) and received ASEAN ICT Award 2013 (Gold Medal). His work on Cloud3DView for Data Centre Life-Cycle Management, as the only academia entry, has won the2015 Data Centre Dynamics Awards – APAC (the ‘Oscar’ award of data centre industry) and 2016 ASEAN ICT Awards (Gold Medal). He is the winner of 2017 Nanyang Award for Innovation and Entrepreneurship, the highest recognition at NTU. He is a co-recipient of Best Paper Awards at2016 IEEE Globecom, 2016 IEEE InfocomMuSIC Workshop, 2015 EAI Chinacom, 2014 IEEE WCSP, 2013 IEEE Globecom and 2012 IEEE EUC, and a co-recipient of 2015 IEEE Multimedia Best Paper Award. He serves on editorial boards for IEEE Communications Survey& Tutorials, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Wireless Communication, IEEE Transactions on Signal and Information Processing over Networks, IEEE Access Journal and Elsevier Ad Hoc Networks, and was elected as the Chair for IEEE ComSocMultimedia Communication Technical Committee (2014-2016). His research interests include cloud computing, green data center, big data analytics, multimedia network and mobile computing.
　　
　　内容提要：
　　
　　The parameter server (PS) framework is widely used to train machine learning (ML) models in parallel. Many distributed ML systems have been designed based on the PS framework, such as Petuum, MxNet, TensorFlow and Factorbird. It tackles the big data problem by having worker nodes perform data-parallel computation, and having server nodes maintain globally shared parameters. When training big models, worker nodes frequently pull parameters from server nodes and push updates to server nodes, often resulting in high communication overhead. Our investigations show that modern distributed ML applications could spend up to 5 times more time on communication than computation. To address this problem, we propose a novel communication layer for the PS framework named Parameter Flow (PF), which employs three techniques. First, we introduce an update-centric communication (UCC) model to exchange data between worker/server nodes via two operations: broadcast and push. Second, we develop a dynamic value-bounded ﬁlter (DVF) to reduce network trafﬁc by selectively dropping updates before transmission. Third, we design a tree-based streaming broadcasting (TSB) system to efﬁciently broadcast aggregated updates among worker nodes. Our proposed PF can signiﬁcantly reduce network trafﬁc and communication time. Extensive performance evaluations have showed that PF can speed up popular distributed ML applications by a factor of up to 4.3 in a dedicated cluster, and up to 8.2 in a shared cluster, compared to a generic PS system without PF.
　　
　　
　　
　　
　　
　　欢迎各位老师同学踊跃参加！
　　
　　　

发布日期：2017-04-24

上一条：美伊利诺伊斯理工大学孙贤和教授学术报告下一条：“智信讲坛”（第八十八）期学术报告