Intelligent Computing in Signal Processing and Pattern Recognition: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, (Lecture Notes in Control and Information Sciences)

Lecture Notes in Control and Information Sciences 345 Editors: M. Thoma, M. Morari

De-Shuang Huang, Kang Li, George William Irwin (Eds.)

Intelligent Computing in Signal Processing and Pattern Recognition International Conference on Intelligent Computing, ICIC 2006 Kunming, China, August 16–19, 2006

ABC

Series Advisory Board F. Allgöwer, P. Fleming, P. Kokotovic, A.B. Kurzhanski, H. Kwakernaak, A. Rantzer, J.N. Tsitsiklis

Editors De-Shuang Huang

George William Irwin Queen’s University Belfast, UK E-mail: [emailprotected]

Institute of Intelligent Machines Chinese Academy of Sciences Hefei, Anhui, China E-mail: [emailprotected]

Kang Li Queen’s University Belfast, UK E-mail: [emailprotected]

Library of Congress Control Number: 2006930912 ISSN print edition: 0170-8643 ISSN electronic edition: 1610-7411 ISBN-10 3-540-37257-1 Springer Berlin Heidelberg New York ISBN-13 978-3-540-37257-8 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com c Springer-Verlag Berlin Heidelberg 2006 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientiﬁc Publishing Services, Chennai, India Cover design: design & production GmbH, Heidelberg Printed on acid-free paper

SPIN: 11816515

89/techbooks

543210

Preface

The International Conference on Intelligent Computing (ICIC) was formed to provide an annual forum dedicated to the emerging and challenging topics in artificial intelligence, machine learning, bioinformatics, and computational biology, etc. It aims to bring together researchers and practitioners from both academia and industry to share ideas, problems and solutions related to the multifaceted aspects of intelligent computing. ICIC 2006 held in Kunming, Yunnan, China, August 16-19, 2006, was the second International Conference on Intelligent Computing, built upon the success of ICIC 2005 held in Hefei, China, 2005. This year, the conference concentrated mainly on the theories and methodologies as well as the emerging applications of intelligent computing. It intended to unify the contemporary intelligent computing techniques within an integral framework that highlights the trends in advanced computational intelligence and bridges theoretical research with applications. In particular, bio-inspired computing emerged as having a key role in pursuing for novel technology in recent years. The resulting techniques vitalize life science engineering and daily life applications. In light of this trend, the theme for this conference was “Emerging Intelligent Computing Technology and Applications”. Papers related to this theme were especially solicited, including theories, methodologies, and applications in science and technology. ICIC 2006 received over 3000 submissions from 36 countries and regions. All papers went through a rigorous peer review procedure and each paper received at least three review reports. Based on the review reports, the Program Committee finally selected 703 high-quality papers for presentation at ICIC 2006. These papers cover 29 topics and 16 special sessions, and are included in five volumes of proceedings published by Springer, including one volume of Lecture Notes in Computer Science (LNCS), one volume of Lecture Notes in Artificial Intelligence (LNAI), one volume of Lecture Notes in Bioinformatics (LNBI), and two volumes of Lecture Notes in Control and Information Sciences (LNCIS). This volume of Lecture Notes in Control and Information Sciences (LNCIS) includes 149 papers covering one topics of Intelligent Computing in Signal Processing and Pattern Recognition and one topics of Special Session on Computing for Searching Strategies to Control Dynamic Processes. The organizers of ICIC 2006, including Yunan University, the Institute of Intelligent Machines of the Chinese Academy of Science, and Queen’s University Belfast, have made enormous effort to ensure the success of ICIC 2006. We hereby would like to thank the members of the ICIC 2006 Advisory Committee for their guidance and advice, the members of the Program Committee and the referees for their collective effort in reviewing and soliciting the papers, and the members of the Publication Committee for their significant editorial work. We would like to thank

Preface

Alfred Hofmann, executive editor from Springer, for his frank and helpful advice and guidance throughout and for his support in publishing the proceedings in the Lecture Notes series. In particular, we would like to thank all the authors for contributing their papers. Without the high-quality submissions from the authors, the success of the conference would not have been possible. Finally, we are especially grateful to the IEEE Computational Intelligence Society, The International Neural Network Society and the National Science Foundation of China for the their sponsorship.

June 2006

De-Shuang Huang Institute of Intelligent Machines, Chinese Academy of Sciences, China Kang Li Queen’s University Belfast, UK George William Irwin Queen’s University Belfast, UK

ICIC 2006 Organization

General Chairs

De-Shuang Huang, China Song Wu, China George W. Irwin, UK

International Advisory Committee Aike Guo, China Alfred Hofmann, Germany DeLiang Wang, USA Erke Mao, China Fuchu He, China George W. Irwin, UK Guangjun Yang, China Guanrong Chen, Hong Kong Guoliang Chen, China Harold Szu, USA John L. Casti, USA

Marios M. Polycarpou, USA Mengchu Zhou, USA Michael R. Lyu, Hong Kong MuDer Jeng, Taiwan Nanning Zheng, China Okyay Knynak, Turkey Paul Werbos, USA Qingshi Zhu, China Ruwei Dai, China Sam Shuzhi GE, Singapore Sheng Zhang, China Shoujue Wang, China

Songde Ma, China Stephen Thompson, UK Tom Heskes, Netherlands Xiangfan He, China Xingui He, China Xueren Wang, China Yanda Li, China Yixin Zhong, China Youshou Wu, China Yuanyan Tang, Hong Kong Yunyu Shi, China Zheng Bao, China

Program Committee Chairs

Kang Li, UK Prashan Premaratne, Australia

Steering Committee Chairs:

Sheng Chen, UK Xiaoyi Jiang, Germany Xiao-Ping Zhang, Canada

Organizing Committee Chairs:

Yongkun Li, China Hanchun Yang, China Guanghua Hu, China

Special Session Chair:

Wen Yu, Mexico

VIII

Organization

Tutorial Chair:

Sudharman K. Jayaweera, USA

Publication Chair:

Xiaoou Li, Mexico

International Liasion Chair:

C. De Silva, Liyanage, New Zealand

Publicity Chairs:

Simon X.Yang, Canada Jun Zhang, Sun Yat-Sen, China Cheng Peng, China

Exhibition Chair:

Program Committee: Aili Han, China Arit Thammano, Thailand Baogang Hu, China Bin Luo, China Bin Zhu, China Bing Wang, China Bo Yan, USA Byoung-Tak Zhang, Korea Caoan Wang, Canada Chao Hai Zhang, Japan Chao-Xue Wang, China Cheng-Xiang Wang, UK Cheol-Hong Moon, Korea Chi-Cheng Cheng, Taiwan Clement Leung, Australia Daniel Coca, UK Daqi Zhu, China David Stirling, Australia Dechang Chen, USA Derong Liu, USA Dewen Hu, China Dianhui Wang, Australia Dimitri Androutsos, Canada Donald C. Wunsch, USA Dong Chun Lee, Korea Du-Wu Cui, China Fengling Han, Australia Fuchun Sun, China

Guang-Bin Huang, Singapore Guangrong Ji, China Hairong Qi, USA Hong Qiao, China Hong Wang, China Hongtao Lu, China Hongyong Zhao, China Huaguang Zhang, China Hui Wang, China Vitoantonio Bevilacqua, Italy Jiangtao Xi, Australia Jianguo Zhu, Australia Jianhua Xu, China Jiankun Hu, Australia Jian-Xun Peng, UK Jiatao Song, China Jie Tian, China Jie Yang, China Jin Li, UK Jin Wu, UK Jinde Cao, China Jinwen Ma, China Jochen Till, Germany John Q. Gan, UK Ju Liu, China K. R. McMenemy, UK Key-Sun Choi, Korea

Luigi Piroddi, Italy Maolin Tang, Australia Marko Hoþevar, Slovenia Mehdi Shafiei, Canada Mei-Ching Chen, Taiwan Mian Muhammad Awais, Pakistan Michael Granitzer, Austria Michael J.Watts, New Zealand Michiharu Maeda, Japan Minrui Fei, China Muhammad Jamil Anwas, Pakistan Muhammad Khurram Khan, China Naiqin Feng, China Nuanwan Soonthornphisaj, Thailand Paolo Lino, Italy Peihua Li, China Ping Guo, China Qianchuan Zhao, China Qiangfu Zhao, Japan Qing Zhao, Canada Roberto Tagliaferri, Italy Rong-Chang Chen, Taiwan RuiXiang Sun, China

Organization

Girijesh Prasad, UK Sanjay Sharma, UK Seán McLoone, Ireland Seong G. Kong, USA Shaoning Pang, New Zealand Shaoyuan Li, China Shuang-Hua Yang, UK Shunren Xia, China Stefanie Lindstaedt, Austria Sylvia Encheva, Norway Tai-hoon Kim, Korea Tai-Wen Yue, Taiwan Takashi Kuremoto, Japan Tarık Veli Mumcu, Turkey Tian Xiang Mei, UK

Liangmin Li, UK Tim. B. Littler, UK Tommy W. S. Chow, Hong Kong Uwe Kruger, UK Wei Dong Chen, China Wenming Cao, China Wensheng Chen, China Willi Richert, Germany Worapoj Kreesuradej, Thailand Xiao Zhi Gao, Finland Xiaoguang Zhao, China Xiaojun Wu, China Xiaolong Shi, China Xiaoou Li, Mexico Xinge You, Hong Kong Xiwen Zhang, China

Saeed Hashemi, Canada Xiyuan Chen, China Xun Wang, UK Yanhong Zhou, China Yi Shen, China Yong Dong Wu, Singapore Yuhua Peng, China Zengguang Hou, China Zhao-Hui Jiang, Japan Zhen Liu, Japan Zhi Wang, China Zhi-Cheng Chen, China Zhi-Cheng Ji, China Zhigang Zeng, China Ziping Chiang, Taiwa

Reviewers Xiaodan Wang, Lei Wang, Arjun Chandra, Angelo Ciaramella, Adam Kalam, Arun Sathish, Ali Gunes, Jin Tang, Aiguo He, Arpad Kelemen, Andreas Koschan, Anis Koubaa, Alan Gupta, Alice Wang, Ali Ozen, Hong Fang, Muhammad Amir Yousuf , An-Min Zou, Andre Döring, Andreas Juffinger, Angel Sappa, Angelica Li, Anhua Wan, Bing Wang, Rong Fei, Antonio Pedone, Zhengqiang Liang , Qiusheng An, Alon Shalev Housfater, Siu-Yeung Cho, Atif Gulzar, Armin Ulbrich, Awhan Patnaik, Muhammad Babar, Costin Badica, Peng Bai, Banu Diri, Bin Cao, Riccardo Attimonelli, Baohua Wang, Guangguo Bi, Bin Zhu, Brendon Woodford, Haoran Feng, Bo Ma, Bojian Liang, Boris Bacic, Brane Sirok, Binrong Jin, Bin Tian, Christian Sonntag, Galip Cansever, Chun-Chi Lo, ErKui Chen, Chengguo Lv, Changwon Kim, Chaojin Fu, Anping Chen, Chen Chun , C.C. Cheng, Qiming Cheng, Guobin Chen, Chengxiang Wang, Hao Chen, Qiushuang Chen, Tianding Chen, Tierui Chen, Ying Chen, Mo-Yuen Chow, Christian Ritz, Chunmei Liu, Zhongyi Chu, Feipeng Da, Cigdem Turhan, Cihan Karakuzu, Chandana Jayasooriya, Nini Rao, Chuan-Min Zhai, Ching-Nung Yang, Quang Anh Nguyen, Roberto Cordone, Changqing Xu, Christian Schindler, Qijun Zhao, Wei Lu, Zhihua Cui, Changwen Zheng, David Antory, Dirk Lieftucht, Dedy Loebis, Kouichi Sakamoto, Lu Chuanfeng, Jun-Heng Yeh, Dacheng Tao, Shiang-Chun Liou, Ju Dai , Dan Yu, Jianwu Dang, Dayeh Tan, Yang Xiao, Dondong Cao, Denis Stajnko, Liya De Silva, Damien Coyle, Dian-Hui Wang, Dahai Zhang, Di Huang, Dikai Liu, D. Kumar, Dipak Lal Shrestha, Dan Lin, DongMyung Shin, Ning Ding, DongFeng Wang, Li Dong, Dou Wanchun, Dongqing Feng, Dingsheng Wan, Yongwen Du, Weiwei Du, Wei Deng, Dun-wei Gong, DaYong Xu, Dar-Ying Jan, Zhen Duan, Daniela Zaharie,

Organization

ZhongQiang Wu, Esther Koller-Meier, Anding Zhu, Feng Pan, Neil Eklund, Kezhi Mao, HaiYan Zhang, Sim-Heng Ong, Antonio Eleuteri, Bang Wang, Vincent Emanuele, Michael Emmerich, Hong Fu, Eduardo Hruschka, Erika Lino, Estevam Rafael Hruschka Jr, D.W. Cui, Fang Liu, Alessandro Farinelli, Fausto Acernese, Bin Fang, Chen Feng, Huimin Guo, Qing Hua, Fei Zhang, Fei Ge, Arnon Rungsawang, Feng Jing, Min Feng, Feiyi Wang, Fengfeng Zhou, Fuhai Li, Filippo Menolascina, Fengli Ren, Mei Guo, Andrés Ferreyra, Francesco Pappalardo, Chuleerat Charasskulchai, Siyao Fu, Wenpeng Ding, Fuzhen Huang, Amal Punchihewa, Geoffrey Macintyre, Xue Feng He, Gang Leng, Lijuan Gao, Ray Gao, Andrey Gaynulin, Gabriella Dellino, D.W. Ggenetic, Geoffrey Wang, YuRong Ge, Guohui He, Gwang Hyun Kim, Gianluca Cena, Giancarlo Raiconi, Ashutosh Goyal, Guan Luo, Guido Maione, Guido Maione, Grigorios Dimitriadis, Haijing Wang, Kayhan Gulez, Tiantai Guo, Chun-Hung Hsieh, Xuan Guo, Yuantao Gu, Huanhuan Chen, Hongwei Zhang, Jurgen Hahn, Qing Han, Aili Han, Dianfei Han, Fei Hao, Qing-Hua Ling, Hang-kon Kim, Han-Lin He, Yunjun Han, Li Zhang, Hathai Tanta-ngai, HangBong Kang, Hsin-Chang Yang, Hongtao Du, Hazem Elbakry, Hao Mei, Zhao L, Yang Yun, Michael Hild, Heajo Kang, Hongjie Xing, Hailli Wang, Hoh In, Peng Bai, Hong-Ming Wang, Hongxing Bai, Hongyu Liu, Weiyan Hou, Huaping Liu, H.Q. Wang, Hyungsuck Cho, Hsun-Li Chang, Hua Zhang, Xia Huang, Hui Chen, Huiqing Liu, Heeun Park, Hong-Wei Ji, Haixian Wang, Hoyeal Kwon, H.Y. Shen, Jonghyuk Park, Turgay Ibrikci, Mary Martin, Pei-Chann Chang, Shouyi Yang, Xiaomin Mu, Melanie Ashley, Ismail Altas, Muhammad Usman Ilyas, Indrani Kar, Jinghui Zhong, Ian Mack, Il-Young Moon, J.X. Peng , Jochen Till, Jian Wang, Quan Xue, James Govindhasamy, José Andrés Moreno Pérez, Jorge Tavares, S. K. Jayaweera, Su Jay, Jeanne Chen, Jim Harkin, Yongji Jia, Li Jia, Zhao-Hui Jiang, Gangyi Jiang, Zhenran Jiang, Jianjun Ran, Jiankun Hu, Qing-Shan Jia, Hong Guo, Jin Liu, Jinling Liang, Jin Wu, Jing Jie, Jinkyung Ryeu, Jing Liu, Jiming Chen, Jiann-Ming Wu, James Niblock, Jianguo Zhu, Joel Pitt, Joe Zhu, John Thompson, Mingguang Shi, Joaquin Peralta, Si Bao Chen, Tinglong Pan, Juan Ramón González González, JingRu Zhang, Jianliang Tang, Joaquin Torres, Junaid Akhtar, Ratthachat Chatpatanasiri, Junpeng Yuan, Jun Zhang, Jianyong Sun, Junying Gan, Jyh-Tyng Yau, Junying Zhang, Jiayin Zhou, Karen Rosemary McMenemy, Kai Yu, Akimoto Kamiya, Xin Kang, Ya-Li Ji, GuoShiang Lin, Muhammad Khurram, Kevin Curran, Karl Neuhold, Kyongnam Jeon, Kunikazu Kobayashi, Nagahisa Kogawa, Fanwei Kong, Kyu-Sik Park, Lily D. Li, Lara Giordano, Laxmidhar Behera, Luca Cernuzzi, Luis Almeida, Agostino Lecci, Yan Zuo, Lei Li, Alberto Leva, Feng Liang, Bin Li, Jinmei Liao, Liang Tang, Bo Lee, Chuandong Li, Lidija Janezic, Jian Li, Jiang-Hai Li, Jianxun Li, Limei Song, Ping Li, Jie Liu, Fei Liu, Jianfeng Liu, Jianwei Liu, Jihong Liu, Lin Liu, Manxi Liu, Yi Liu, Xiaoou Li, Zhu Li, Kun-hong Liu, Li Min Cui, Lidan Miao, Long Cheng , Huaizhong Zhang, Marco Lovera, Liam Maguire, Liping Liu, Liping Zhang, Feng Lu, Luo Xiaobin, Xin-ping Xie, Wanlong Li, Liwei Yang, Xinrui Liu, Xiao Wei Li, Ying Li, Yongquan Liang, Yang Bai, Margherita Bresco, Mingxing Hu, Ming Li, Runnian Ma, Meta-Montero Manrique, Zheng Gao, Mingyi Mao, Mario Vigliar, Marios Savvides, Masahiro Takatsuka, Matevz Dular, Mathias Lux, Mutlu Avci, Zhifeng Hao, Zhifeng Hao, Ming-Bin Li, Tao Mei, Carlo Meloni, Gennaro Miele, Mike Watts, Ming Yang,

Organization

Jia Ma, Myong K. Jeong, Michael Watts, Markus Koch, Markus Koch, Mario Koeppen, Mark Kröll, Hui Wang, Haigeng Luo, Malrey Lee, Tiedong Ma, Mingqiang Yang, Yang Ming, Rick Chang, Nihat Adar, Natalie Schellenberg, Naveed Iqbal, Nur Bekiroglu, Jinsong Hu, Nesan Aluha, Nesan K Aluha, Natascha Esau, Yanhong Luo, N.H. Siddique, Rui Nian, Kai Nickel, Nihat Adar, Ben Niu, Yifeng Niu, Nizar Tayem, Nanlin Jin, Hong-Wei Ji, Dongjun Yu, Norton Abrew, Ronghua Yao, Marco Moreno-Armendariz, Osman Kaan Erol, Oh Kyu Kwon, Ahmet Onat, Pawel Herman, Peter Hung, Ping Sun, Parag Kulkarni, Patrick Connally, Paul Gillard, Yehu Shen, Paul Conilione, Pi-Chung Wang, Panfeng Huang, Peter Hung, Massimo Pica Ciamarra, Ping Fang, Pingkang Li, Peiming Bao, Pedro Melo-Pinto, Maria Prandini, Serguei Primak, Peter Scheir, Shaoning Pang, Qian Chen, Qinghao Rong, QingXiang Wu, Quanbing Zhang, Qifu Fan, Qian Liu, Qinglai Wei, Shiqun Yin, Jianlong Qiu, Qingshan Liu, Quang Ha, SangWoon Lee , Huaijing Qu, Quanxiong Zhou , Qingxian Gong, Qingyuan He, M.K.M. Rahman, Fengyuan Ren, Guang Ren, Qingsheng Ren, Wei Zhang, Rasoul Milasi, Rasoul Milasi, Roberto Amato, Roberto Marmo, P. Chen, Roderick Bloem, Hai-Jun Rong, Ron Von Schyndel, Robin Ferguson, Runhe Huang, Rui Zhang, Robin Ferguson, Simon Johnston, Sina Rezvani, Siang Yew Chong, Cristiano Cucco, Dar-Ying Jan, Sonya Coleman, Samuel Rodman, Sancho SalcedoSanz, Sangyiel Baik, Sangmin Lee, Savitri Bevinakoppa, Chengyi Sun, Hua Li, Seamus McLoone, Sean McLoone, Shafayat Abrar, Aamir Shahzad, Shangmin Luan, Xiaowei Shao, Shen Yanxia, Zhen Shen, Seung Ho Hong, Hayaru Shouno, Shujuan Li, Si Eng Ling, Anonymous, Shiliang Guo, Guiyu Feng, Serafin Martinez Jaramillo, Sangwoo Moon, Xuefeng Liu, Yinglei Song, Songul Albayrak, Shwu-Ping Guo, Chunyan Zhang, Sheng Chen, Qiankun Song, Seok-soo Kim, Antonino Staiano, Steven Su, Sitao Wu, Lei Huang, Feng Su, Jie Su, Sukree Sinthupinyo, Sulan Zhai, Jin Sun, Limin Sun, Zengshun Zhao, Tao Sun, Wenhong Sun, Yonghui Sun, Supakpong Jinarat, Srinivas Rao Vadali, Sven Meyer zu Eissen, Xiaohong Su , Xinghua Sun, Zongying Shi, Tony Abou-Assaleh, Youngsu Park, Tai Yang, Yeongtak Jo, Chunming Tang, Jiufei Tang, Taizhe Tan, Tao Xu, Liang Tao, Xiaofeng Tao, Weidong Xu, Yueh-Tsun Chang, Fang Wang, Timo Lindemann, Tina Yu, Ting Hu, Tung-Kuan Liu, Tianming Liu, Tin Lay Nwe, Thomas Neidhart, Tony Chan, Toon Calders, Yi Wang, Thao Tran, Kyungjin Hong, Tariq Qureshi, Tung-Shou Chen, Tsz Kin Tsui, Tiantian Sun, Guoyu Tu, Tulay Yildirim, Dandan Zhang, Xuqing Tang, Yuangang Tang, Uday Chakraborty, Luciana Cariello, Vasily Aristarkhov, Jose-Luis Verdegay, Vijanth Sagayan Asirvadam, Vincent Lee, Markus Vincze, Duo Chen, Viktoria Pammer, Vedran Sabol, Wajeeha Akram, Cao Wang , Xutao Wang, Winlen Wang, Zhuang Znuang, Feng Wang, Haifeng Wang, Le Wang, Wang Linkun, Meng Wang, Rongbo Wang, Xin Wang, Xue Wang, Yan-Feng Wang, Yong Wang, Yongcai Wang, Yongquan Wang, Xu-Qin Li, Wenbin Liu, Wudai Liao, Weidong Zhou, Wei Li, Wei Zhang, Wei Liang, Weiwei Zhang, Wen Xu, Wenbing Yao, Xiaojun Ban, Fengge Wu, Weihua Mao, Shaoming Li, Qing Wu, Jie Wang, Wei Jiang, W Jiang, Wolfgang Kienreich, Linshan Wang, Wasif Naeem, Worasait Suwannik, Wolfgang Slany, Shijun Wang , Wooyoung Soh, Teng Wang, Takashi Kuremoto, Hanguang Wu, Licheng Wu, Xugang Wang, Xiaopei Wu, ZhengDao Zhang, Wei Yen, Yan-Guo Wang, Daoud Ait-Kadi, Xiaolin Hu, Xiaoli Li, Xun

XII

Organization

Wang, Xingqi Wang, Yong Feng, Xiucui Guan, Xiao-Dong Li, Xingfa Shen, Xuemin Hong, Xiaodi Huang, Xi Yang, Li Xia, Zhiyu Xiang, Xiaodong Li, Xiaoguang Zhao, Xiaoling Wang, Min Xiao, Xiaonan Wu, Xiaosi Zhan, Lei Xie, Guangming Xie, Xiuqing Wang, Xiwen Zhang, XueJun Li, Xiaojun Zong, Xie Linbo, Xiaolin Li, Xin Ma, Xiangqian Wu, Xiangrong Liu, Fei Xing, Xu Shuzheng, Xudong Xie, Bindang Xue, Xuelong Li, Zhanao Xue, Xun Kruger, Xunxian Wang, Xusheng Wei, Yi Xu, Xiaowei Yang, Xiaoying Wang, Xiaoyan Sun, YingLiang Ma, Yong Xu, Jongpil Yang, Lei Yang, Yang Tian, Zhi Yang, Yao Qian, Chao-bo Yan, Shiren Ye, Yong Fang, Yanfei Wang, Young-Gun Jang, Yuehui Chen, Yuh-Jyh Hu, Yingsong Hu, Zuoyou Yin, Yipan Deng, Yugang Jiang, Jianwei Yang, Yujie Zheng, Ykung Chen, Yan-Kwang Chen, Ye Mei, Yongki Min, Yongqing Yang, Yong Wu, Yongzheng Zhang, Yiping Cheng, Yongpan Liu, Yanqiu Bi, Shengbao Yao, Yongsheng Ding, Haodi Yuan, Liang Yuan, Qingyuan He, Mei Yu, Yunchu Zhang, Yu Shi, Wenwu Yu, Yu Wen, Younghwan Lee, Ming Kong, Yingyue Xu, Xin Yuan, Xing Yang, Yan Zhou, Yizhong Wang, Zanchao Zhang, Ji Zhicheng, Zheng Du, Hai Ying Zhang, An Zhang, Qiang Zhang, Shanwen Zhang, Shanwen Zhang, Zhang Tao, Yue Zhao, R.J. Zhao, Li Zhao, Ming Zhao, Yan Zhao, Bojin Zheng, Haiyong Zheng, Hong Zheng, Zhengyou Wang, Zhongjie Zhu, Shangping Zhong, Xiaobo Zhou, Lijian Zhou, Lei Zhu, Lin Zhu, Weihua Zhu, Wumei Zhu, Zhihong Yao, Yumin Zhang, Ziyuan Huang, Chengqing Li, Z. Liu, Zaiqing Nie, Jiebin Zong, Zunshui Cheng, Zhongsheng Wang, Yin Zhixiang, Zhenyu He, Yisheng Zhong, Tso-Chung Lee, Takashi Kuremoto Tao Jianhua, Liu Wenjue, Pan Cunhong, Li Shi, Xing Hongjie, Yang Shuanghong, Wang Yong, Zhang Hua, Ma Jianchun, Li Xiaocui, Peng Changping, Qi Rui, Guozheng Li, Hui Liu, Yongsheng Ding, Xiaojun Liu, Qinhua Huang

Table of Contents

Intelligent Computing in Signal Processing and Pattern Recognition An 802.11-Based Location Determination Approach for Context-Aware System Chun-Dong Wang, Ming Gao, Xiu-Feng Wang . . . . . . . . . . . . . . . . . . . .

A Face Recognition System on Distributed Evolutionary Computing Using On-Line GA Nam Mi Young, Md. Rezaul Bashar, Phill Kyu Rhee . . . . . . . . . . . . . . .

A Fuzzy Kohonen’s Competitive Learning Algorithm for 3D MRI Image Segmentation Jun Kong, Jianzhong Wang, Yinghua Lu, Jingdan Zhang, Jingbo Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Hybrid Genetic Algorithm for Two Types of Polygonal Approximation Problems Bin Wang, Chaojian Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach Yongni Shao, Yong He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Novel Approach in Sports Image Classiﬁcation Wonil Kim, Sangyoon Oh, Sanggil Kang, Kyungro Yoon . . . . . . . . . . . .

A Novel Biometric Identiﬁcation Approach Based on Human Hand Jun Kong, Miao Qi, Yinghua Lu, Shuhua Wang, Yuru Wang . . . . . . . .

A Novel Color Image Watermarking Method Based on Genetic Algorithm Yinghua Lu, Jialing Han, Jun Kong, Gang Hou, Wei Wang . . . . . . . . .

A Novel Emitter Signal Recognition Model Based on Rough Set Guan Xin, Yi Xiao, He You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Novel Model for Independent Radial Basis Function Neural Networks with Multiresolution Analysis GaoYun An, QiuQi Ruan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

XIV

Table of Contents

A Novelty Automatic Fingerprint Matching System Tianding Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Abnormal Pattern Parameters Estimation of Control Chart Based on Wavelet Transform and Probabilistic Neural Network Shaoxiong Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 An Error Concealment Technique Based on JPEG-2000 and Projections onto Convex Sets Tianding Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 An Extended Learning Vector Quantization Algorithm Aiming at Recognition-Based Character Segmentation Lei Xu, Bai-Hua Xiao, Chun-Heng Wang, Ru-Wei Dai . . . . . . . . . . . . . 131 Improved Decision Tree Algorithm: ID3+ Min Xu, Jian-Li Wang, Tao Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Application of Support Vector Machines with Binary Tree Architecture to Advanced Radar Emitter Signal Recognition Gexiang Zhang, Haina Rong, Weidong Jin . . . . . . . . . . . . . . . . . . . . . . . . 150 Automatic Target Recognition in High Resolution SAR Image Based on Electromagnetic Characteristics Wen-Ming Zhou, Jian-She Song, Jun Xu, Yong-An Zheng . . . . . . . . . . 162 Boosting in Random Subspace for Face Recognition Yong Gao, Yangsheng Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Component-Based Human Body Tracking for Posture Estimation Kyoung-Mi Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Computation of the Probability on the Number of Solution for the P3P Problem Jianliang Tang, Xiao-Shan Gao, Wensheng Chen . . . . . . . . . . . . . . . . . . 191 Context-Awareness Based Adaptive Classiﬁer Combination for Object Recognition Mi Young Nam, Battulga Bayarsaikhan, Suman Sedai, Phill Kyu Rhee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Detecting All-Zero Coeﬃcient Blocks Before Transformation and Quantization in H.264/AVC Zhengyou Wang, Quan Xue, Jiatao Song, Weiming Zeng, Guobin Chen, Zhijun Fang, Shiqian Wu . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Table of Contents

Eﬃcient KPCA-Based Feature Extraction: A Novel Algorithm and Experiments Yong Xu, David Zhang, Jing-Yu Yang, Zhong Jing, Miao Li . . . . . . . . 220 Embedded System Implementation for an Object Detection Using Stereo Image Cheol-Hong Moon, Dong-Young Jang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Graphic Editing Tools in Bioluminescent Imaging Simulation Hui Li, Jie Tian, Jie Luo, Yujie Lv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Harmonics Real Time Identiﬁcation Based on ANN, GPS and Distributed Ethernet Zhijian Hu, Chengxue Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 The Synthesis of Chinese Fine-Brushwork Painting for Flower Tianding Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Hybrid Bayesian Super Resolution Image Reconstruction Tao Wang, Yan Zhang, Yong Sheng Zhang . . . . . . . . . . . . . . . . . . . . . . . . 275 Image Hiding Based Upon Vector Quantization Using AES Cryptosystem Yanquan Chen, Tianding Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Image Ownership Veriﬁcation Via Unitary Transform of Conjugate Quadrature Filter Jianwei Yang, Xinxiang Zhang, Wen-Sheng Chen, Bin Fang . . . . . . . . . 294 Inter Layer Intra Prediction Using Lower Layer Information for Spatial Scalability Zhang Wang, Jian Liu, Yihua Tan, Jinwen Tian . . . . . . . . . . . . . . . . . . 303 Matching Case History Patterns in Case-Based Reasoning Guoxing Zhao, Bin Luo, Jixin Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Moment Invariant Based Control System Using Hand Gestures P. Premaratne, F. Safaei, Q. Nguyen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Multiple-ROI Image Coding Method Using Maxshift over Low-Bandwidth Kang Soo You, Han Jeong Lee, Hoon Sung Kwak . . . . . . . . . . . . . . . . . . 334 Multi-resolution Image Fusion Using AMOPSO-II Yifeng Niu, Lincheng Shen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

XVI

Table of Contents

Multiscale Linear Feature Extraction Based on Beamlet Transform Ming Yang, Yuhua Peng, Xinhong Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Multisensor Information Fusion Application to SAR Data Classiﬁcation Hai-Hui Wang, Yan-Sheng Lu, Min-Jiang Chen . . . . . . . . . . . . . . . . . . . 364 NDFT-Based Audio Watermarking Scheme with High Robustness Against Malicious Attack Ling Xie, Jiashu Zhang, Hongjie He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 New Multiple Regions of Interest Coding Using Partial Bitplanes Scaling for Medical Image Compression Li-bao Zhang, Ming-quan Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 Particle Swarm Optimization for Road Extraction in SAR Images Ge Xu, Hong Sun, Wen Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Pattern Recognition Without Feature Extraction Using Probabilistic Neural Network ¨ un¸c Polat, T¨ Ov¨ ulay Yıldırım . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Power Transmission Towers Extraction in Polarimetric SAR Imagery Based on Genetic Algorithm Wen Yang, Ge Xu, Jiayu Chen, Hong Sun . . . . . . . . . . . . . . . . . . . . . . . . 410 Synthesis Texture by Tiling s-Tiles Feng Xue, Yousheng Zhang, Julang Jiang, Min Hu, Tao Jiang . . . . . . . 421 Relaxation Labeling Using an Improved Hopﬁeld Neural Network Long Cheng, Zeng-Guang Hou, Min Tan . . . . . . . . . . . . . . . . . . . . . . . . . . 430 Adaptive Rank Indexing Scheme with Arithmetic Coding in Color-Indexed Images Kang Soo You, Hyung Moo Kim, Duck Won Seo, Hoon Sung Kwak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 Revisit to the Problem of Generalized Low Rank Approximation of Matrices Chong Lu, Wanquan Liu, Senjian An . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 Robust Face Recognition of Images Captured by Diﬀerent Devices Guangda Su, Yan Shang, Baixing Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Robust Feature Extraction for Mobile-Based Speech Emotion Recognition System Kang-Kue Lee, Youn-Ho Cho, Kyu-Sik Park . . . . . . . . . . . . . . . . . . . . . . 470

Table of Contents

XVII

Robust Segmentation of Characters Marked on Surface Jong-Eun Ha, Dong-Joong Kang, Mun-Ho Jeong, Wang-Heon Lee . . . 478 Screening of Basal Cell Carcinoma by Automatic Classiﬁers with an Ambiguous Category Seong-Joon Baek, Aaron Park, Daejin Kim, Sung-Hoon Hong, Dong Kook Kim, Bae-Ho Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 Segmentation of Mixed Chinese/English Documents Based on Chinese Radicals Recognition and Complexity Analysis in Local Segment Pattern Yong Xia, Bai-Hua Xiao, Chun-Heng Wang, Yao-Dong Li . . . . . . . . . . 497 Sigmoid Function Activated Blocking Artifacts Reduction Algorithm Zhi-Heng Zhou, Sheng-Li Xie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Simulation of Aging Eﬀects in Face Images Junyan Wang, Yan Shang, Guangda Su, Xinggang Lin . . . . . . . . . . . . . 517 Synthetic Aperture Radar Image Segmentation Using Edge Entropy Constrained Stochastic Relaxation Yongfeng Cao, Hong Sun, Xin Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 The Inﬂuence of Channel Coding on Information Hiding Bounds and Detection Error Rate Fan Zhang, Xinhong Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 Wavelet Thinning Algorithm Based Similarity Evaluation for Oﬄine Signature Veriﬁcation Bin Fang, Wen-Sheng Chen, Xinge You, Tai-Ping Zhang, Jing Wen, Yuan Yan Tang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 When Uncorrelated Linear Discriminant Analysis Are Combined with Wavelets Xue Cao, Jing-Yu Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 2D Direct LDA for Eﬃcient Face Recognition Un-Dong Chang, Young-Gil Kim, Dong-Woo Kim, Young-Jun Song, Jae-Hyeong Ahn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 3-D Curve Moment Invariants for Curve Recognition Dong Xu, Hua Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 3D Ear Reconstruction Attempts: Using Multi-view Heng Liu, Jingqi Yan, David Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578

XVIII

Table of Contents

A Class of Multi-scale Models for Image Denoising in Negative Hilbert-Sobolev Spaces Jun Zhang, Zhihui Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 A Detection Algorithm of Singular Points in Fingerprint Images Combining Curvature and Orientation Field Xiaolong Zheng, Yangsheng Wang, Xuying Zhao . . . . . . . . . . . . . . . . . . . 593 A Mathematical Framework for Optical Flow Computation Xiaoxin Guo, Zhiwen Xu, Yueping Feng, Yunxiao Wang, Zhengxuan Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 A Method for Camera Pose Estimation from Object of a Known Shape Dong-Joong Kang, Jong-Eun Ha, Mun-Ho Jeong . . . . . . . . . . . . . . . . . . . 606 A Method of Radar Target Recognition Basing on Wavelet Packets and Rough Set Hong Wang, Shanwen Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 A Multi-resolution Image Segmentation Method Based on Evolution of Local Variance Yan Tian, Yubo Xie, Fuyuan Peng, Jian Liu, Guobo Xing . . . . . . . . . . 620 A New Denoising Method with Contourlet Transform Gangyi Jiang, Mei Yu, Wenjuan Yi, Fucui Li, Yong-Deak Kim . . . . . . 626 A Novel Authentication System Based on Chaos Modulated Facial Expression Recognition Xiaobin Luo, Jiashu Zhang, Zutao Zhang, Hui Chen . . . . . . . . . . . . . . . 631 A Novel Computer-Aided Diagnosis System of the Mammograms Weidong Xu, Shunren Xia, Huilong Duan . . . . . . . . . . . . . . . . . . . . . . . . . 639 A Partial Curve Matching Method for Automatic Reassembly of 2D Fragments Liangjia Zhu, Zongtan Zhou, Jingwei Zhang, Dewen Hu . . . . . . . . . . . . 645 A Split/Merge Method with Ranking Selection for Polygonal Approximation of Digital Curve Chaojian Shi, Bin Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 A Training Strategy of Class-Modular Neural Network Classiﬁer for Handwritten Chinese Character Recognition Xue Gao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657

Table of Contents

XIX

Active Set Iteration Method for New L2 Soft Margin Support Vector Machine Liang Tao, Juan-juan Gu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 Adaptive Eigenbackground for Dynamic Background Modeling Lei Wang, Lu Wang, Qing Zhuo, Huan Xiao, Wenyuan Wang . . . . . . . 670 Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value Dong-Woo Kim, Young-Jun Song, Un-Dong Chang, Jae-Hyeong Ahn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676 An Adaptive MRF-MAP Motion Vector Recovery Algorithm for Video Error Concealment Zheng-fang Li, Zhi-liang Xu, De-lu Zeng . . . . . . . . . . . . . . . . . . . . . . . . . . 683 An Eﬃcient Segmentation Algorithm Based on Mathematical Morphology and Improved Watershed Ge Guo, Xijian Ping, Dongchuan Hu, Juanqi Yang . . . . . . . . . . . . . . . . 689 An Error Concealment Based on Inter-frame Information for Video Transmission Youjun Xiang, Zhengfang Li, Zhiliang Xu . . . . . . . . . . . . . . . . . . . . . . . . . 696 An Integration of Topographic Scheme and Nonlinear Diﬀusion Filtering Scheme for Fingerprint Binarization Xuying Zhao, Yangsheng Wang, Zhongchao Shi, Xiaolong Zheng . . . . . 702 An Intrusion Detection Model Based on the Maximum Likelihood Short System Call Sequence Chunfu Jia, Anming Zhong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709 Analysis of Shell Texture Feature of Coscinodiscus Based on Fractal Feature Guangrong Ji, Chen Feng, Shugang Dong, Lijian Zhou, Rui Nian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Associative Classiﬁcation Approach for Diagnosing Cardiovascular Disease Kiyong Noh, Heon Gyu Lee, Ho-Sun Shon, Bum Ju Lee, Keun Ho Ryu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 Attentive Person Selection for Human-Robot Interaction Diane Rurangirwa Uwamahoro, Mun-Ho Jeong, Bum-Jae You, Jong-Eun Ha, Dong-Joong Kang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728

Table of Contents

Basal Cell Carcinoma Detection by Classiﬁcation of Confocal Raman Spectra Seong-Joon Baek, Aaron Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Blind Signal-to-Noise Ratio Estimation Algorithm with Small Samples for Wireless Digital Communications Dan Wu, Xuemai Gu, Qing Guo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Bootstrapping Stochastic Annealing EM Algorithm for Multiscale Segmentation of SAR Imagery Xian-Bin Wen, Zheng Tian, Hua Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . 749 BP Neural Network Based SubPixel Mapping Method Liguo Wang, Ye Zhang, Jiao Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 Cellular Recognition for Species of Phytoplankton Via Statistical Spatial Analysis Guangrong Ji, Rui Nian, Shiming Yang, Lijian Zhou, Chen Feng . . . . 761 Combination of Linear Support Vector Machines and Linear Spectral Mixed Model for Spectral Unmixing Liguo Wang, Ye Zhang, Chunhui Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 Combining Speech Enhancement with Feature Post-processing for Robust Speech Recognition Jianjun Lei, Jun Guo, Gang Liu, Jian Wang, Xiangfei Nie, Zhen Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 Conic Section Function Neural Networks for Sonar Target Classiﬁcation and Performance Evaluation Using ROC Analysis Burcu Erkmen, Tulay Yildirim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 3D Map Building for Mobile Robots Using a 3D Laser Range Finder Zhiyu Xiang, Wenhui Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Construction of Fast and Robust N-FINDR Algorithm Liguo Wang, Xiuping Jia, Ye Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Dental Plaque Quantiﬁcation Using Cellular Neural Network-Based Image Segmentation Jiayin Kang, Xiao Li, Qingxian Luan, Jinzhu Liu, Lequan Min . . . . . . 797 Detection of Microcalciﬁcations Using Wavelet-Based Thresholding and Filling Dilation Weidong Xu, Zanchao Zhang, Shunren Xia, Huilong Duan . . . . . . . . . . 803

Table of Contents

XXI

ECG Compression by Optimized Quantization of Wavelet Coeﬃcients Jianhua Chen, Miao Yang, Yufeng Zhang, Xinling Shi . . . . . . . . . . . . . . 809 Eﬀects on Density Resolution of CT Image Caused by Nonstationary Axis of Rotation Yunxiao Wang, Xin Wang, Xiaoxin Guo, Yunjie Pang . . . . . . . . . . . . . 815 Embedded Linux Remote Control System to Achieve the Stereo Image Cheol-Hong Moon, Kap-Sung Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 Estimation of Omnidirectional Camera Model with One Parametric Projection Yongho Hwang, Hyunki Hong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827 Expert Knowledge Guided Genetic Algorithm for Beam Angle Optimization Problem in Intensity-Modulated Radiotherapy Planning Yongjie Li, Dezhong Yao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834 Extracting Structural Damage Features: Comparison Between PCA and ICA Luo Zhong, Huazhu Song, Bo Han . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840 Face Alignment Using an Improved Active Shape Model Zhenhai Ji, Wenming Zheng, Ning Sun, Cairong Zou, Li Zhao . . . . . . 846 Face Detection with an Adaptive Skin Color Segmentation and Eye Features Hang-Bong Kang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852 Fall Detection by Wearable Sensor and One-Class SVM Algorithm Tong Zhang, Jue Wang, Liang Xu, Ping Liu . . . . . . . . . . . . . . . . . . . . . . 858 Feature Extraction and Pattern Classiﬁcation on Mining Electroencephalography Data for Brain-Computer Interface Qingbao Liu, Zongtan Zhou, Yang Liu, Dewen Hu . . . . . . . . . . . . . . . . . 864 Feature Extraction of Hand-Vein Patterns Based on Ridgelet Transform and Local Interconnection Structure Neural Network Yu Zhang, Xiao Han, Si-liang Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870 Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel, Carlos A. Reyes-Garc´ıa . . . . . . . . . . . . . . . . 876 Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning Huajie Chen, Wei Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882

XXII

Table of Contents

Grouping Sampling Reduction-Based Linear Discriminant Analysis Yan Wu, Li Dai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 Hierarchical Adult Image Rating System Wonil Kim, Han-Ku Lee, Kyoungro Yoon . . . . . . . . . . . . . . . . . . . . . . . . . 894 Shape Representation Based on Polar-Graph Spectra Haifeng Zhao, Min Kong, Bin Luo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 900 Hybrid Model Method for Automatic Segmentation of Mandarin TTS Corpus Xiaoliang Yuan, Yuan Dong, Dezhi Huang, Jun Guo, Haila Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906 ICIS: A Novel Coin Identiﬁcation System Adnan Khashman, Boran Sekeroglu, Kamil Dimililer . . . . . . . . . . . . . . . 913 Image Enhancement Method for Crystal Identiﬁcation in Crystal Size Distribution Measurement Wei Liu, YuHong Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919 Image Magniﬁcation Using Geometric Structure Reconstruction Wenze Shao, Zhihui Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925 Image-Based Classiﬁcation for Automating Protein Crystal Identiﬁcation Xi Yang, Weidong Chen, Yuan F. Zheng, Tao Jiang . . . . . . . . . . . . . . . 932 Inherit-Based Adaptive Frame Selection for Fast Multi-frame Motion Estimation in H.264 Liangbao Jiao, De Zhang, Houjie Bi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938 Intelligent Analysis of Anatomical Shape Using Multi-sensory Interface Jeong-Sik Kim, Hyun-Joong Kim, Soo-Mi Choi . . . . . . . . . . . . . . . . . . . . 945 Modeling Expressive Music Performance in Bassoon Audio Recordings Rafael Ramirez, Emilia Gomez, Veronica Vicente, Montserrat Puiggros, Amaury Hazan, Esteban Maestre . . . . . . . . . . . . . 951 Modeling MPEG-4 VBR Video Traﬃc by Using ANFIS Zhijun Fang, Shenghua Xu, Changxuan Wan, Zhengyou Wang, Shiqian Wu, Weiming Zeng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958 Multiple Textural Features Based Palmprint Authentication Xiangqian Wu, Kuanquan Wang, David Zhang . . . . . . . . . . . . . . . . . . . . 964

Table of Contents

XXIII

Neural Network Deinterlacing Using Multiple Fields Hyunsoo Choi, Eunjae Lee, Chulhee Lee . . . . . . . . . . . . . . . . . . . . . . . . . . 970 Non-stationary Movement Analysis Using Wavelet Transform Cheol-Ki Kim, Hwa-Sei Lee, DoHoon Lee . . . . . . . . . . . . . . . . . . . . . . . . . 976 Novel Fault Class Detection Based on Novelty Detection Methods Jiafan Zhang, Qinghua Yan, Yonglin Zhang, Zhichu Huang . . . . . . . . . 982 Novel Scheme for Automatic Video Object Segmentation and Tracking in MPEG-2 Compressed Domain Zhong-Jie Zhu, Yu-Er Wang, Zeng-Nian Zhang, Gang-Yi Jiang . . . . . . 988 Oﬄine Chinese Signature Veriﬁcation Based on Segmentation and RBFNN Classiﬁer Zhenhua Wu, Xiaosu Chen, Daoju Xiao . . . . . . . . . . . . . . . . . . . . . . . . . . 995 On-Line Signature Veriﬁcation Based on Wavelet Transform to Extract Characteristic Points LiPing Zhang, ZhongCheng Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1002 Parameter Estimation of Multicomponent Polynomial Phase Signals Han-ling Zhang, Qing-yun Liu, Zhi-shun Li . . . . . . . . . . . . . . . . . . . . . . . 1008 Parameters Estimation of Multi-sine Signals Based on Genetic Algorithms Changzhe Song, Guixi Liu, Di Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013 Fast Vision-Based Camera Tracking for Augmented Environments Bum-Jong Lee, Jong-Seung Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018 Recognition of 3D Objects from a Sequence of Images Daesik Jang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024 Reconstruction of Rectangular Plane in 3D Space Using Determination of Non-vertical Lines from Hyperboloidal Projection Hyun-Deok Kang, Kang-Hyun Jo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1030 Region-Based Fuzzy Shock Filter with Anisotropic Diﬀusion for Adaptive Image Enhancement Shujun Fu, Qiuqi Ruan, Wenqia Wang, Jingnian Chen . . . . . . . . . . . . . 1036 Robust Feature Detection Using 2D Wavelet Transform Under Low Light Environment Jihoon Lee, Youngouk Kim, Changwoo Park, Changhan Park, Joonki Paik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042

XXIV

Table of Contents

Robust Music Information Retrieval in Mobile Environment Won-Jung Yoon, Kyu-Sik Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1051 Robust Speech Feature Extraction Based on Dynamic Minimum Subband Spectral Subtraction Xin Ma, Weidong Zhou, Fang Ju . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056 Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain and Its Application Choong Ho Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062 Shadow Detection Based on rgb Color Model Baisheng Chen, Duansheng Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068 Shape Analysis for Planar Barefoot Impression Li Tong, Lei Li, Xijian Ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075 Statistical Neural Network Based Classiﬁers for Letter Recognition Burcu Erkmen, Tulay Yildirim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081 The Study of Character Recognition Based on Fuzzy Support Vector Machine Yongjun Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087 Tracking, Record, and Analysis System of Animal’s Motion for the Clinic Experiment Jae-Hyuk Han, Young-Jun Song, Dong-Jin Kwon, Jae-Hyeong Ahn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093 VEP Estimation with Feature Enhancement by Whiten Filter for Brain Computer Interface Jin-an Guan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1101 Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identiﬁcation Zhiyong Wu, Lianhong Cai, Helen M. Meng . . . . . . . . . . . . . . . . . . . . . . . 1107

Special Session on Computing for Searching Strategies to Control Dynamic Processes A Study on Optimal Conﬁguration for the Mobile Manipulator Considering the Minimal Movement Jin-Gu Kang, Kwan-Houng Lee, Jane-Jin Kim . . . . . . . . . . . . . . . . . . . . 1113

Table of Contents

XXV

Multi-objective Flow Shop Scheduling Using Diﬀerential Evolution Bin Qian, Ling Wang, De-Xian Huang, Xiong Wang . . . . . . . . . . . . . . . 1125 A Genetic Algorithm for the Batch Scheduling with Sequence-Dependent Setup Times TsiuShuang Chen, Lei Long, Richard Y.K. Fung . . . . . . . . . . . . . . . . . . . 1137 A Study on the Conﬁguration Control of a Mobile Manipulator Base Upon the Optimal Cost Function Kwan-Houng Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145 An Eﬀective PSO-Based Memetic Algorithm for TSP Bo Liu, Ling Wang, Yi-hui Jin, De-xian Huang . . . . . . . . . . . . . . . . . . . 1151 Dual-Mode Control Algorithm for Wiener-Typed Nonlinear Systems Haitao Zhang, Yongji Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157 NDP Methods for Multi-chain MDPs Hao Tang, Lei Zhou, Arai Tamio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163 Research of an Omniberaing Sun Locating Method with Fisheye Picture Based on Transform Domain Algorithm Xi-hui Wang, Jian-ping Wang, Chong-wei Zhang . . . . . . . . . . . . . . . . . . 1169 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175

An 802.11-Based Location Determination Approach for Context-Aware System Chun-Dong Wang1, 2, Ming Gao2, and Xiu-Feng Wang1 1 College

of Information Technical Science, NanKai University, Tianjin 300071, China [emailprotected], [emailprotected] 2 Department of Computer Science & Engineering, Tianjin University of Technology, Tianjin 300191, China {Michael3769, Ten_minutes}@163.com

Abstract. WLAN location determination systems are gaining increasing attention due to the value they add to wireless networks. This paper focuses on how to determine the mobile devices’ location indoor by using signal strength (SS) in 802.11-based system. We propose an 802.11-based location determinat-ion technique Nearest Neighbor in Signal Space (NNSS) which locates mobile objects via collecting the sensed power strengths. Based on NNSS, we present a modification Modified Nearest Neighbor in Signal Space (MNNSS) to enhance the location determination accuracy by taking into account signal strength of more reference points in each estimating location of the mobile objects. In NNSS, we compare the measured SS (signal strength) with the SS of each reference point recorded in database to find the best match, but in MNNSS, we not only compare the measured SS with that of each reference point, but also the reference points around it, so it increases the location determination preciseness. The experimental results show that the location information provided by MNNSS assures higher correctness than NNSS. Implementation of this technique in the WLAN location determination system shows that the average system accuracy is increased by more than 0.5 meters. This significant enhancement in the accuracy of WLAN location determination systems helps increase the set of context-aware applications implemented on top of these systems.

1 Introduction With the development of the wireless network, many techniques and applications about location determination [1-3] especially the context-aware applications [4] have been put forward. According to the current research of location determination approaches, because of the influence of the barriers and other factors, indoor location determination has less accuracy and more complexity, so it become more difficult. The communication system [5] or the Global Positioning System (GPS) [6,7] is usually used to provide location information in outdoor location determination. GPS is a technique widely used. Several satellites are used in the system to position objects. But in the application for indoor location determination, GPS system is not an appropriate technique for its bigger standard error, and the barriers indoor may block its signal. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1 – 8, 2006. © Springer-Verlag Berlin Heidelberg 2006

C.-D. Wang, M. Gao, and X.-F. Wang

Another application for outdoor location determination is the cellular system, and it has the similar disadvantage that it has less accuracy and may be easily influenced by the barriers. So it is difficult to position objects indoor by using GPS system, and it is necessary to develop an indoor position system with more accuracy. Recently many indoor location determination techniques are emerging, for example, the Received Signal Strength (RSS) method [8-10]. We appreciate the RSS method, and this paper is also about it. The remainder of this paper is organized as follows. In Section 2, we classify the location determination approaches into three categories, and introduce the main idea of each category. In section 3, we propose MNNSS algorithm. The experiments and comparisons are described in section 4, and the conclusion is drawn in section 5.

2 Related Work 2.1 Location Determination Approach The location determination approaches used in mobile computing system can be classified into three categories. The first category applies the Time of Arrival (TOA) or Time Difference of Arrival (TDOA) schemes to locate mobile terminals. The principle of TOA and TDOA is estimating the distance between the receiver and each sender according to the traveling time from the senders to the receivers and then calculating the position of the receiver with the help of the known positions of three senders. The second category applies the Angle of Arrival (AOA) schemes to locate mobile terminals. The principle of AOA is estimating the angle of arrival signal and then calculating the position of the sender by the known positions of the receivers and the angles of arrival signals detected by each receiver. The last category utilizes the attenuation of Received Signal Strength (RSS) of the senders nearby to locate mobile terminals. Each category of approaches has its advantage and disadvantage. Although TOA and TDOA can present more accurate location determination results, these locating technologies often require the senders to be equipped with an extremely accurate synchronized timer. Besides, the distance between the senders should be significantly large enough to ensure the difference of the location determination signal arrival time distinguishable. The above constraints induce TOA and TDOA approaches not appropriate for indoor location determination. On the other hand, the AOA approach also requires the sender to be able to detect the direction of arrival signals. This also requires the access point (AP) to equip extra components such as the smart antennas. Besides, the reflection problem due to the indoor natures such as walls and pillars often causes inaccurate location determination results. 2.2 WLAN Location Determination Systems As 802.11-based wireless LANs become more ubiquitous, the importance of WLAN location determination systems [11-14,16-19] increases. Such systems are purely software based and therefore add to the value of the wireless network. A large class of applications, including [15] location-sensitive content delivery, direction finding, asset tracking, and emergency notification, can be built on top of such systems. This

An 802.11-Based Location Determination Approach for Context-Aware System

set of applications can be broadened as the accuracy of WLAN location determination system increases. WLAN location determination systems usually work in two phases: offline training phase and online location determination phase. During the offline phase, the signal strength received from the access points (APs) at selected locations in the area of interest is tabulated, resulting in a so called radio map. During the location determination phase, the signal strength samples received from the access points are used to “search” the radio map to estimate the user location. Radio-map based techniques can be categorized into two broad categories: deterministic techniques and probabilistic techniques. Deterministic techniques [11,12,17] represent the signal strength of an access point at a location by a scalar value, for example, the mean value, and use nonprobabilistic approaches to estimate the user location. For example, in the Radar system [11,12] the authors use nearest neighborhood techniques to infer the user location. On the other hand, probabilistic techniques [13,14,18,19] store information about the signal strength distributions from the access points in the radio map and use probabilistic techniques to estimate the user location. For example, the Horus system uses a Bayesian-based approach to estimate the user location. Youssef et al. (2005) uses a Multivariate Analysis for Probabilistic approach to estimate the user location. WLAN location determination systems need to deal with the noisy characteristics of the wireless channel to achieve higher accuracy. In this paper, we use the RSS approach. The advantage of the RSS approach is that it can be easily applied. We can get the signal strength of the access points at the mobile terminal in the networks that support 802.11 protocol. If we can locate objects with this information, the approach of this category is surely the most cost-efficient one. The disadvantage of RSS approach is that environments can easily influence the signal strength, and it is more serious in indoor environments.

3 Modified Nearest Neighbor(s) in Signal Space Algorithm 3.1 Nearest Neighbor(s) in Signal Space (NNSS) Algorithm NNSS is a kind of RSS approach [8]. We first maintain the power signature of a set of positions. For position i, we define (si(a1), si (a2), … , si (an)) as the power signature, si (aj) denotes the signal strength (SS) received from access point aj at position i, and n is the count of the APs. The position whose power signature is maintained in the database is called a reference point. We define (si’(a1), si’(a2), … , si’(an)) as the measured power signature actually, si’(aj) denotes one mobile terminal receive the SS from access point aj at position i currently. Then we compare the power signature measured by the mobile terminal with the data recorded in the database, and then estimate the position of the mobile terminal. When we estimate the mobile terminal’s position, we determine the location that best matches the observed SS of the mobile terminal. We need a metric and a search methodology to compare multiple locations and pick the one that best matches the observed signal strength. The idea is to compute the distance (in signal space)

C.-D. Wang, M. Gao, and X.-F. Wang

between the observed set of SS measurements and the recorded SS at the reference points, and then pick the location that minimizes the distance. We can use the Euclidean distance measure, i.e.,

Ed =

¦ ( s '(a ) − s (a )) j

(1)

j =1

3.2 The Principle of MNNSS To illustrate our method, we assume the following situation showed in Fig.1.

Fig. 1. The 8 reference points around reference point m , s1-s8 denotes the SS of the 8 neighbors of m. When we want to estimate whether a position is the nearest neighbor of the mobile terminal, we should not only consider the SS of the origin point m, but also consider the SS of the reference points around one position.

In MNNSS, we defined l layers around each reference point. We calculate the Euclidean distance of the reference points of each layer respectively, and used a weight value averaging these results, and then use this new average value to estimate the position. When we estimate reference point i, we must calculate the Euclidean distance of each layer around it respectively. For reference point i, i(u,v) denotes neighbor v in layer u of reference point i. In layer1, there is only one reference point (i.e. i), and we calculate the Euclidean distance S1(i) of layer1 according to the approach described in NNSS algorithm:

S1 (i) =

¦ ( s '(a ) − s (a )) i

(2)

'(a j ) − si(u,v) (a j ))2 .

(3)

j =1

Analogically, in layer u,(u>=2)

1 8(u −1) Su (i) = ¦ 8(u − 1) v =1

¦ (s j =1

i(u,v)

An 802.11-Based Location Determination Approach for Context-Aware System

Su(i) means the average Euclidean distance in layer u around reference point i. As we mentioned before, sometimes we can’t measure the signal strength at some a particular position, so the actual number of the access points in layer u may be less than 8(u-1). Therefore, we should replace 8(u-1) with the actual number in formula (3).

Fig. 2. The layers around reference point O. We defined 3 layers. Layer1 is the position O itself. In layer2, there are 8 reference points around the position O, and there are 16 reference points in layer. Analogically, the layer u has 8(u-1) reference points. But sometimes we can’t measure the signal strength at some a particular position, so the actual number of the access points in layer u may be less than 8(u-1). Thus, we must replace 8(u-1) with the actual number in the following formula.

Then we define:

S(i) = n

in which,

¦w

1 n ¦ (wu Su (i)) . n u =1

(4)

= 1 . Here wu is the weight value, it denotes how important this layer

u =1

is in estimating the result, and n donates the number of layers. We can use different sequence of wu in different application, but obviously, wu must be a decreasing sequence for the layers near the center should play a more important role in calculation. Then we choose the position where we get the minimum of S(i) as nearest neighbor of the mobile terminal.

4 Experiments and Results In this sample paper, we have presented the formatting instructions for ICIC2006. The experiments were carried in the sixth floor of Dept of Computer Science, Tianjin University of Technology. The client is implemented on an Intel-processor laptop with a wireless network card. The OS of the laptop is Linux Red Flag 4.0.

C.-D. Wang, M. Gao, and X.-F. Wang

We placed 2 APs in the fifth floor and one AP in the sixth floor. We classified the whole area into 11 regions: room 601-609, corridor and lobby. Fig.3 shows the layout of the sixth floor of Dept of Computer Science, Tianjin University of Technology.

Fig. 3. The layout of the sixth floor of Dept of Computer Science, Tianjin University of Technology

The performance of the location determination algorithm was measured using the metric error distance and the correctness of region classification. The metric error was defined as the spatial distance between the original position and the position calculated by the location Determination system. Table 1. The error distance in each position (NNSS)

Testing point P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11

Error Distance 2.45 2.13 1.89 1.78 1.57 3.65 3.22 2.33 0.78 2.45 1.78

Testing point P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22

Error Distance 0.98 1.14 2.51 2.21 3.20 1.29 2.43 1.34 1.56 2.34 3.21

In our experiment, we defined 178 reference points in the whole area. At each reference point, we measured the power signature 100 times and stored the average. Then we selected 22 testing positions in the whole region (2 points each region), recorded their actual location and received signal strength. We first used NNSS to position mobile terminals. NNSS classifies all the testing positions into their regions correctly except 2 positions and the mean error distance was 2.10 meters. Table1 shows the error distance in each position. Then we used MNNSS to position mobile terminals. For each compare, we used 2 layers. We set the weight values w1=1, w2=0.5. MNNSS corrected the two errors and classifies all the testing positions. The mean error distance was 1.47 meters. Table2 shows the error distance in each position.

An 802.11-Based Location Determination Approach for Context-Aware System

Table 2. The error distance in each position (MNNSS)

Testing point P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11

Error Distance 1.25 2.13 1.43 1.78 1.57 0.59 1.35 2.33 0.98 1.23 1.78

Testing point P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22

Error Distance 0.98 1.14 2.51 1.38 0.75 1.29 2.43 1.34 1.56 2.34 0.26

5 Conclusions This paper discussed how to use MNNSS to determine the objects location in 802.11 based location determination system. We assume that all data needed in location determination are acquired before, and we didn’t discuss how to set up APs and how to select the reference points. We lay emphasis on the location determination algorithm. NNSS can position objects only by the received signal strength measured by each mobile terminal. This algorithm is cost-efficient, and we don’t need to apply much modification to the communicative devices in the mobile terminals and the service provider. MNNSS is a modification to NNSS. In MNNSS, disturbance of incidents to the location determination results can be avoided by using more access points in each compare. In further research, we can focus on following aspects: Using more layers in our calculation can provide more location determination accuracy, but the costs of calculation will increase. How to balance them is the direction of further research. We are sure that the weight value we used in averaging the Euclidean distance must be decreasing. But how to define them is still a important problem. If it decreases too sharply, the advantage of MNNSS will be weakened, and MNNSS will be not too different from NNSS. If it decreases too slowly, the outer layers will be the same important as the central layers, and this is not reasonable.

Acknowledgements This paper is supported by Tianjin Municipal Education Commission, “the Higher Education Institution Science and Technology Development Fund” (No.20041615).

C.-D. Wang, M. Gao, and X.-F. Wang

References 1. Bahl, P., Balachandran, A., Padmanabhan, V.N.: Enhancements to the RADAR User Location and Tracking System, Microsoft Research Technical Report, February (2000) 2. Robert, J.O., Abowd, G.D.: The Smart Floor: A Mechanism for Natural User Identification and Tracking. Porceedings of the 2000 Conference on Human Factors in Computing Systems (CHI 2000), The Hague, Netherlands, (2000) 1-6 3. Priyantha, N.B., Chakraborty, A., Balakrishnan, H.: The Cricket Location-Support system, Proc. 6th ACM MOBICOM, Boston, MA, (2000) 32-43 4. Mitchell, S., et al.: Context-Aware Multimedia Computing in the Intelligent Hospital. (2000) 13–18 5. Liu, T., Bahl, P.: Mobility Modeling, Location Tracking, and Trajectory Prediction in Wireless ATM Networks. IEEE JSAC, Vol.16 (1998) 922–936 6. Enge,P. and Misra,P.: Special issue on GPS: The Global Positioning System. Proceedings of the IEEE, 87 (1999) 3–15 7. Garmin Cor.: About GPS. Website, 2001, http://www.garmin.com/aboutGPS/ 8. Bahl,P., Padmanabhan, V. N.: ADAR: An RF-Based In-Building User Location and Tracking System. Proc. IEEE Infocom, (2000) 236-241 9. Jin, M.H., Wu, E.H.K., Liao, Y.B., Liao, H.C.: 802.11-based Positioning System for Context Aware Applications. Proceedings of Communication Systems and Applications, (2004) 236-239 10. Lionel, M. N., Liu, Y.H., Lau, Y.C., Abhishek P. P.: LANDMARC: Indoor Location Sensing Using Active RFID. Proceedings of the first IEEE International Conference on Pervasive Computing and Communications (Percom’03), (2003) 239-249 11. Bahl, P., Padmanabhan,V. N.: RADAR: An In-Building RF-based User Location and Tracking System. In IEEE Infocom 2000, vol. 2 (2000) 775–784 12. Bahl, P., Padmanabhan, V. N., A. Balachandran.: Enhancements to the RADAR User Location and Tracking System. Technical ReportMSR-TR-00-12, Microsoft Research, (2000) 13. Castro, P., Chiu, P., Kremenek, T., Muntz, R.: A Probabilistic Location Service for Wireless Network Environments. Ubiquitous Computing 2001, September (2001) 14. Castro, P., Muntz, R.: Managing Context for Smart Spaces. IEEE Personal Communications, (2000) 412-421 15. Chen, G., Kotz, D.: A Survey of Context-Aware Mobile Computing Research. Technical Report Dartmouth Computer Science Technical Report TR2000-381, (2000) 16. Ganu, S., Krishnakumar, A.S., Krishnan, P.: Infrastructurebased Location Estimation in WLAN Networks. In IEEE Wireless Communications and Networking Conference, March (2004) 236-243 17. Krishnan, P., Krishnakumar,A., Ju,W.H., Mallows, C., Ganu. S.: A System for LEASE: Location Estimation Assisted by Stationary Emitters for Indoor RF Wireless Networks. In IEEE Infocom, March (2004) 39-42 18. Ladd, A. M., Bekris, K., Rudys, A., Marceau,G., Kavraki, L. E., Wallach, D. S.: RoboticsBased Location Sensing using Wireless Ethernet. In 8th ACM MOBICOM. Atlanta, GA, September (2002) 69-72 19. Roos, T., Myllymaki, P., Tirri, H.: A Statistical Modeling Approach to Location Estimation. IEEE Transactions on Mobile Computing, Vol.1 (2002) 59–69

A Face Recognition System on Distributed Evolutionary Computing Using On-Line GA Nam Mi Young, Md. Rezaul Bashar, and Phill Kyu Rhee Dept. of Computer Science & Engineering, Inha University 253, Yong-Hyun Dong, Nam-Gu Incheon, South Korea {rera, bashar}@im.inha.ac.kr, [emailprotected]

Abstract. Although there is much research on face recognition, however, yet now there exist some limitations especially in illumination and pose. This paper addresses a novel framework to prevail over the illumination barrier and a robust vision system. The key ideas of this paper are distributed evolutionary computing and on-line GA that is the combining concept of context-awareness and genetic algorithm. This research implements Fuzzy ART that carries out the context-awareness, modeling, and identification for the context environment and the system can also distinguish changing environments. On-line GA stores the experiences to make context knowledge that is used for on-line adaptation. Finally, supervised learning is applied to carry on recognition experiments. Experimental results on FERET data set show that On-line GA based face recognition performance is significantly benefited over the application of existing GA classification.

1 Introduction For high security purposes, biometric technologies are urbanized and face recognition is the basic and elementary step to regulate these technologies. To make a high security application, the accuracy and efficiency of the system must have robust, tolerant, and error free characteristics. The increasing use of biometric technologies in highsecurity applications and beyond has stressed the requirement for highly dependable face recognition systems. Face recognition specialists are giving more concentration to make the recognition system more error free. A survey by [1] expresses that the accuracy of state-of-the-art algorithms is fairly high under constrained conditions, but degrades significantly for images exhibiting pose, illumination and facial expression variations. Current research on face recognition efforts strive to achieve insensitivity to such variations following three main directions [2]: (a) introduction of new classification techniques and similarity measurement analysis, (b) reimbursem*nt of appearance variations, and (c) reinforcement of existing systems with additional modalities that are insensitive to these variations. Knowledge or experience plays a vital rule in accuracy and efficiency for a recognition system. It is desirable from a robust system to recognize an object with a minimum time period. Current tremendous research interest grows on optimization of D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 9 – 18, 2006. © Springer-Verlag Berlin Heidelberg 2006

N.M. Young, Md.R. Bashar, and P.K. Rhee

execution time with optimization of functions having a large number of variables and it is referred as evolutionary computation [3]. Genetic Algorithm (GA) [4, 5, 6] , strong feature for learning to make knowledge in terms of biological chromosome, creates human like brain for recognizing the objects. However, it takes a lot of time to produce brain. To prevail over this curb, this paper proposes a concept of distributed evolutionary computing and on-line GA (OLGA). The main focuses of this paper are to exploit on-line genetic algorithm to make an on-line evolution on illumination variations and distributed evolutionary computing to make a robust recognition scheme with very low time especially in new environment. The efficiency and robustness of the proposed system is demonstrated on a standard face dataset (FERET) of significant size and is compared with state-of-the-art compensation techniques. On the following section we review the related work in this field in different time by different researchers highlighting the novelties of the proposed work. Previous work related on illumination barrier is conferred in section 2, while the evolutionary computing is briefly described in section 3. Section 4 describes the experimental environments and section 5 illustrates design example. The performance of the algorithms is evaluated with extensive experiments in section 6 and section 7 describes the concluding remarks

Previous Work

An image-capturing device captures a superiority image at the present of light. If there are adequate amount of light, there are pleasant pictures and vigorous recognition system. In many research [1, 7], it is initiated that the varying of illumination seriously affects in the performance of face recognition systems. At the same time, face recognition experts feeling more interests to the problem of coping with illumination variations and significant progress has been achieved. Several techniques have been proposed in this area, which may be roughly classified into two main categories [2]. The first category contains techniques looking for illumination insensitive representations of face images. In this category, different preprocessing and filtering techniques are used to eradicate the illumination variation. For example, Hong Liu [8] et. al. proposed a multi-method integration (NMI) scheme using grey, log, and normal histogram techniques to compensate variations of illumination. Jinho Lee [9] et. al generated an illumination subspace for arbitrary 3D faces based on the statistics of measured illuminations under variable lighting circ*mstances. At their experiment, bilinear illumination model and shape-specific illumination subspace techniques are employed and applied on FRGC dataset. Marios Savvides [10] et.al. presents illumination tolerant face recognition system using minimum average correlation energy (MACE) combining with PCA. Laiyun et. al.[6] modeled a face recognition system with the relighting process and applied to CMU-PIE database.. Phill Kyu Rhee [11] et.al. has developed a context-aware evolvable system with the concept of basic genetic algorithm under dynamic and uneven environments. The second approach based on the development of generative appearance models like active shape model and active appearance model, which are able to reconstruct novel gallery images similar to the illumination in the probe images.

A Face Recognition System on Distributed Evolutionary Computing

In parallel to these efforts Computer Graphics scientist has achieved a significant progress for realistic image based rendering and relighting of faces and estimation of the reflectance properties of faces [6]. These researches inspired computer vision work on illumination compensation. The first approach proposed in this paper, since we try to relight the probe image so that it resembles the illumination in gallery images, we propose preprocessing and retinex [11, 12] filtering method to generate a convenient image. Fuzzy Assistance Resonance Theory [12] is exploited to categorize the variant illuminant objects.

3 Distributed Evolutionary Computing (DEC) From the last decade, there is a tremendous interest in the development of the theory and applications of evolutionary computing [3, 5] techniques both in industry and laboratory. Evolutionary computing (EC) is the collection of algorithms based on the evolution of a population towards a solution of a certain problem. These algorithms are exploited successfully in many applications requiring the optimization of a certain multidimensional function. The population of possible solutions evolves from one generation to the next generation, ultimately arriving at a satisfactory solution to the specified problem. These algorithms differ in the way that a new population is generated from the existing one and in the way the members are presented within the algorithm. Three types [5] of evolutionary computing techniques are widely reported recently. These are genetic algorithms, genetic programming, and evolutionary algorithms (EA). The Eas can be divided into evolutionary strategies (ES) and evolutionary programming. Genetic Algorithm (GA) is a search technique used to find approximate solutions to optimization and search problems that relies on a linear representation of genetic materials genes, or genotypes [4, 5]. In GA, a candidate solution for a specific problem is called an individual or a chromosome made up of genes and represented by binary string. To manipulate the genetic composition of a chromosome, GAs use three types of operators: selection, crossover and mutation. The term DEC refers to the technique where chromosomes will be resided at a distant place, not in the executing system. DEC makes the system more convenient and faster. The new idea related to GA, OLGA technique is also innovated to extend and to make more efficient from the existing GA.

4 Proposed Experimental Environment The proposed scheme works in two phases: categorize the environmental context into clusters and recognize the individual objects within a cluster. 4.1 Environmental Context-Awareness Environmental context-awareness is conceded by means of environmental context data that is defined as any observable and relevant attributes, and its interaction with other entities and/or surrounding environment at an instance of time [11].

N.M. Young, Md.R. Bashar, and P.K. Rhee

For identifying and category environmental context data, Fuzzy Adaptive Resonance Theory (FART), a variation of first generation ART [12] algorithm is adopted. First ART, named ART1, works with binary inputs, while FART is a synthesis of the ART algorithm and Fuzzy operators that (FART) allows both binary and continuous input patterns. The image space of object instance with varying illuminations must be clustered properly so that the location error can be minimized.. Thus, FART method, which shows robustness in subjective and ambiguous applications in order to achieve optimal illumination context clustering is preferred for adaptation. The performance of clustering is improved by observing previously clustered data repeatedly. For example, if a dynamic environment has Obj = {O1, O2, .., On} individual objects, then FART system produces CLS1, CLS2, …….CLSm clusters where CLSi = {O1,O2,….Oj}, j < n and CLSi∈Obj. 4.2 On-Line Genetic Algorithm (OLGA) The designed OLGA operates in two modes: the evolutionary mode and the action mode. In the evolutionary mode, it accumulates its knowledge by exploring its application environments, while it performs its designated task using the accumulated knowledge in action mode. For example, a system requires t time for evolutionary mode and it starts action mode after t time. The evolutionary mode can be online or offline adaptation. For offline adaptation, environmental context is categorized or identified according to some predefined characteristics (here, illumination) and genetic algorithm is employed for learning. For online adaptation, when a new context is encountered, it directly interacts with the action mode. Whenever an application environment changes, the system accumulates and stores environmental context knowledge in terms of context category and its corresponding action. FART has its capability for on-line learning that introduces clustering for on-line system in a dynamic environment. For on-line learning, as with the usual work for separating environmental context, FART looks for an unknown type of cluster, if it finds, it makes a new cluster. In Fig. 1, context category module (CCM) performs these operations. Initially, the system accumulates the knowledge and stores in context knowledge (CK) that guarantees optimal performance for individual identified context. The CK stores the expressions of identifiable contexts and their matched actions that will be performed by Adaptation module (AM) that consists of one or more action primitives i,e preprocessing, feature representation etc. This knowledge is also stored at server to make an extensive knowledge database so that when a new context arrives for recognition, the executing system can share knowledge from server. The matched or provided action can be decided by either experimental trial-and-error or some automating procedures. In the operation time, the context expression is determined from the derived context representation, where the derived context is decided from the context data. Evolution action module, EAM searches for the best combining structure of action primitives for an identified context. These action primitives are stored in CK with the corresponding context expression.

A Face Recognition System on Distributed Evolutionary Computing

Fig. 1. On-line learning

OLGA works in two phases. At first it performs off-line evolution to accumulate environmental context knowledge and stores in CK and then it performs on-line evolution. During off-line evolution, CCM categorizes the environmental context into clusters, EAM searches for the best population, if found, and updates the CK as shown in Fig. 2.

Fig. 2. Off-line evolution

The adaptive task is carried out using the knowledge of the CK evolved in the evolutionary mode and then action mode is performed. For on-line evolution, when a new context data is found, it creates a new cluster and makes collection, searches to match existing clusters, if matches it selects action primitives, otherwise send a request to server for providing primitives and performs that action primitives with the help of EAM and CK, and finally updates the CK as shown in Fig. 3.

N.M. Young, Md.R. Bashar, and P.K. Rhee

Fig. 3. On-line evolution

5 Design Example The AM of OLGA consists of three stages: preprocessing, feature extraction and classification. The action primitives in the preprocessing steps are histogram equalization, contrast stretching and retinex [12]. The action primitives in the feature extraction stage are PCA and Gabor representation [12] and finally cosine distance measurement is concerned for classification. The proposed framework is applied in the field of visual information processing i,e face recognition. Face images with different illumination are preferred for this experiment due to its spatial boundaries so that it is easy to distinguish among the

Fig. 4. Example of face images clustered on different illumination

A Face Recognition System on Distributed Evolutionary Computing

C lustering Result 0

M ean

1 0.8 0.6 y 0.4 0.2 0 0

0.2

0.4

0.6

0.8

Fig. 5. The visualization of face image clusters

environmental contexts. In this research, 128 × 128 spatial resolution and 256 gray levels input images are considered for input and hybrid vectorization technique [12]. FART constructs clusters according to variation of illumination using hybrid vectorization technique showing Fig. 4. And Fig. 5 shows the visualization of the clustering result.

6 Experimental Results

Series1

lu st er lu -0 st C er lu st 1 C er lu st 2 C er lu st 3 C er lu st 4 C er lu st 5 C er lu st 6 C er lu st 7 er -8

105 100 95 90 85 80 75 70

Recognition Rate

We have conducted several experiments to evaluate the effectiveness of the proposed OLGA in the area of face recognition. With two properties of GA: off-line GA (usual GA) and on-line GA relating to clusters and times are encountered for experiments. Extensive number of FERET face image dataset with its normal illumination fafb and bad illumination fafc are employed for making artificial environmental contexts and for artificial DEC. Firstly, experiments on off-line GA are represented in Fig. 7 where FART has constructed 9 types of cluster.

Fig. 6. Performance of face recognition with off-line GA

N.M. Young, Md.R. Bashar, and P.K. Rhee

The recognition rate of the proposed OLGA based face recognition for real-time system is shown in Table 1. For the first span of time for gathering information form environmental data 6 clusters are encountered for off-line evolution, while 9 and 13 clusters are encountered for real-time system. Fig. 7 describes the recognition rate for off-line GA. Initially the system has accumulated knowledge from the environmental context through offline evolution and it Table 1. Face recognition ratio and cluster’s number according to time for the proposed OLGA method Illumination context

Time 0

Time 1

Time 2

Cluster 0

96.09%

94.41%

96.42%

Cluster 1

97.45%

97.65%

100.00%

Cluster 2

96.94%

97.33%

99.05%

Cluster 3

95.27%

96.45%

95.65%

Cluster 4

99.36%

94.64%

97.96%

Cluster 5

92.62%

95.96%

97.53%

Cluster 6

96.46%

97.78%

Cluster 7

100.00%

91.48%

Cluster 8

97.22%

98.99%

Cluster 9

96.19%

Cluster 10

97.85%

Cluster 11

96.70%

Cluster 12

96.29%

96.68%

97.13%

Average

Face Recognition on off-line GA

Recognition Rate

0.97 0.965 0.96 0.955 0.95 0.945 0.94 1

Time

Fig. 7. Face recognition rate over time for off-line GA

A Face Recognition System on Distributed Evolutionary Computing

produces more than 96% accuracy, however, when a lot of context categories are present, it takes comparatively more time for evolution, as a result the recognition rate decreases. Fig. 8 describes the recognition rate for OLGA. Gathering knowledge from offline evolution from cluster 0 to 5, the on-line evolution starts and for some times it achieves better performance than previous offline system. Later, as the number of contexts increases, the recognition rate decreases, while the evolution is finished, it receives the highest recognition rate. And finally Fig. 10 shows the comparison between on-line and off-line GA based face recognition system where OLGA shows better performance than off-line GA.

Recognition Rate

Face Recognition on on-line GA

0.972 0.97 0.968 0.966 0.964 0.962 0.96 0.958 0.956 1

Time

Fig. 8. Face recognition rate for on-line GA

Comparison between on-line and off-line GA

Recognition Rate

0.975 0.97 0.965 0.96

on-line GA

0.955

off-line GA

0.95 0.945 0.94 1

9 10 11 12 13

Time

Fig. 9. Comparison between on-line GA and off-line GA

7 Conclusion This paper contributes to the effort for the robust face recognition system that describes the new concepts of OLGA in the area of dynamic environmental objects and

N.M. Young, Md.R. Bashar, and P.K. Rhee

the concepts of DEC to make an efficient system. The proposed system not only produces highly robust and real-time face recognition system on different illumination categorized images but also establishes a new concept of OLGA that reduces the execution time of traditional genetic algorithm with higher performance. As demonstrated by extensive experimental evaluation the proposed OLGA leads to superior face recognition rates.

References 1. Kevin, W., Bowyer, Kyong Chang, Patrick Flynn: A Survey of Approaches and Challenges in 3D and Multi-modal 3D + 2D Face Recognition. Computer Vision and Image Understanding, Vol. 101, Issue 1, January (2006) 1-15 2. Sotiris Malassiotis, Michael, G., Strintzis. : Robust Face Recognition using 2D and 3D Data: Pose and Illumination Compensation. Pattern Recognition, Vol. 32, Issue 2, December (2005) 28~39 3. Tang, Kai Wing., Ray, A., Jarvis.: An evolutionary Computing Approach to Generating Useful and Robust Robot Team Behaviors. IEEE International Conference on Intelligent Robots and Systems, Sedai, Japan, September 28- October2 (2004) 4. Chia-Feng Juang: Combination of Online Clustering and Q-Value Based GA for Reinforcement Fuzzy System Design, IEEE transaction on fuzzy systems. Vol.13, No.3, June (2005) 6~124 5. Vonk, E., Jain, L.C., Hibbs, R.: Integrating Evolutionary Computation with Neural Networks. IEEE conference 0-8186-7085 (1995) 1-95 6. Qing, Laiyun., Shan, Shiguang., Gao, Wen., Du, Bo.: Face Recognition Under Generic Illumination Based on Harmonic Relighting. International Journal on Pattern Recognition and Artificial Intelligence, Vol. 19, No. 4 (2005) 513-531 7. Philips, P.J., Moon, H., Rauss, P.J., Rizvi, S.: The Feret Evaluation Methodology for Face Recognition Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 1090–1100 8. Lee, Jinho., etc.: A Bilinear Illumination Model for Robust Face Recognition. 10th IEEE International conference on Computer Vision (ICCV’05) (2005) 9. Liu, H.,et. al.: Illumination Compensation and Feedback of Illumination Feature in Face Detection. Proc. International Conferences on Information-technology and Informationnet, Beijing, Vol. 3 (2001) 444-449 10. Marios Savvides et.al.: Corefaces- Robust Shift Invariant PCA based Correlation Filter for Illumination Tolerant Face Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04) (2004) 11. Phill Kyu Rhee, et, al.: Context-Aware Evolvable System Framework for Environment Identifying Systems. KES2005(2005) 270-283 12. Mi Young, et. al.: Hybrid Filter Fusion for Robust Visual Information Processing. KES2005 (2005) 186-194

A Fuzzy Kohonen’s Competitive Learning Algorithm for 3D MRI Image Segmentation Jun Kong1, 2, ∗, Jianzhong Wang1, 2, Yinghua Lu1, Jingdan Zhang1, and Jingbo Zhang1 1

Computer School, Northeast Normal University, Changchun, Jilin Province, China 2 Key Laboratory for Applied Statistics of MOE, China {kongjun, wangjz019, luyh, zhangjd358}@nenu.edu.cn

Abstract. Kohonen’s self-organizing feature map (SOFM) is a two-layer feedforward competitive learning network, and has been used as a competitive learning clustering algorithm in brain MRI image segmentation. However, most brain MRI images always present overlapping gray-scale intensities for different tissues. In this paper, fuzzy methods are integrated with Kohonen’s competitive algorithm to overcome this problem (we will name the algorithm F_KCL). The F_KCL algorithm fuses the competitive learning with fuzzy c-means (FCM) cluster characteristic and can improve the segment result effectively. Moreover, in order to enhancing the robustness to noise and outliers, a kernel induced method is exploited in our study to measure the distance between the input vector and the weights (KF_KCL). The efficacy of our approach is validated by extensive experiments using both simulated and real MRI images.

1 Introduction In recent years, various imaging modalities are available for acquiring complementary information for different aspects of anatomy. Examples are MRI (Magnetic Resonance Imaging), Ultrasound, and X-ray imaging including CT (Computed Topography). Moreover, with the increasing size and number of medical images, the use of computers in facilitating their processing and analyses has become necessary [1]. Many issues inherent to medical image make segmentation a difficult task. The objects to be segmented from medical image are true (rather than approximate) anatomical structures, which are often non-rigid and complex in shape, and exhibit considerable variability from person to person. Moreover, there are no explicit shape models yet available for capturing fully the deformations in anatomy. MRI produces high contrast between soft tissues, and is therefore useful for detecting anatomy in the brain. Segmentation of brain tissues in MRI images plays a crucial role in threedimensional (3-D) volume visualization, quantitative morphmetric analysis and structure-function mapping for both scientific and clinical investigations. ∗

Corresponding author. This work is supported by science foundation for young teachers of Northeast Normal University, No. 20061002, China.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 19 – 29, 2006. © Springer-Verlag Berlin Heidelberg 2006

J. Kong et al.

Because of the advantages of MRI over other diagnostic imaging [2], the majority of researches in medical image segmentation pertains to its use for MR images, and there are a lot of methods available for MRI image segmentation [1]. Image segmentation is a way to partition image pixels into similar regions. Clustering methods are tools for partitioning a data set into groups of similar characteristics. Thus, clustering algorithms would naturally be applied in image segmentation [4] [5]. However, the uncertainty of MRI image is widely presented in data because of the noise and blur in acquisition and the partial volume effects originating from the low sensor resolution. In particular, the transitional regions between tissues are not clearly defined and their membership is intrinsically vague. Therefore, fuzzy clustering methods such as the Fuzzy C-Means (FCM) are particularly suitable for the MRI segmentation [6] [7]. However, these FCM-based algorithms are sensitive to noise and dependent on the weighting exponent parameter without a learning scheme [8] [9]. Conversely, neuralnetwork-based segmentation could be used to overcome these adversities [10] [11] [12]. In these neural network techniques, Kohonen’s self-organizing map (SOM) is used most in MRI segmentation [13] [14] [15]. In this paper we address the segmentation problem in the context of isolating the brain tissues in MRI images. Kohonen’s self-organizing feature map (SOFM) is exploited as a competitive learning clustering algorithm in our work. However, the transitional regions between tissues in MRI images are not clearly defined and the noise in the image will leads to further degradation with segmentation. Therefore, fuzzy methods and kernel methods are integrated with Kohonen’s competitive algorithm in this study to overcome the above problems. The rest of this paper is organized as follows. Section 2 presents the fuzzy Kohonen’s competitive algorithm (F_KCL). The kernel-induced distance measure is incorporated with the F_KCL algorithm by replacing the Euclidean distance (KF_KCL) in Section 3. Experimental results are presented in Section 4 and we conclude this paper in Section 5.

2 Fuzzy Kohonen’s Competitive Algorithm 2.1 Conventional Kohonen’s Competitive Learning Algorithm The SOFM consists of an input layer and a single output layer of nodes which usually form a two-dimensional array. The training of SOFM is usually performed using Kohonen’s competitive learning (KCL) algorithm [16]. There are two phases of operation: the similarity matching phase and the weight adaptation phase. Initially, the weights are set to small random values and a vector is presented to the input nodes of the network. During the similarity matching phase, the distances dj between the inputs and the weights are computed as follows: d

= x i − μ ij

j = 1, 2, , M .

(1)

where xi is the ith input vector of X = (x1, x2, …, xN), N is the number of the input vectors, M is the number of output nodes, and wij is the weight from input node i to output node j. Next, the output node g having the minimum distance dg is chosen and is declared as the “winner” node. In the weight adaptation phase, the weights from the

A Fuzzy Kohonen’s Competitive Learning Algorithm

inputs to the “winner” node are adapted. The weight changes are based on the following rule:

wij (t + 1) = wij (t ) + a(t )hij (t )( xi − wij (t )) .

(2)

with 1 if xi − wij ( t ) = min xi − win ( t )

hij (t ) = {

1≤ n ≤ M

(3)

0 otherwise

The parameter a(t) is the learning rate of the algorithm and hij(t) denotes the degree of neuron excitation. It can be seen from the Equation (3) that only the weight of the “winner” note updates during the training iteration. Generally, the learning rate a(t) are monotonically decreasing functions of time [16]. A typical choice for a(t) is

t T

α (t ) = α 0 (1 − ) .

(4)

The training procedure is repeated for the number of steps T which is specified apriori. 2.2 Fuzzy Kohonen’s Competitive Learning Algorithm Though the conventional Kohonen’s competitive algorithm possesses some very useful properties, it is still a hard partition method. As we have mentioned above, most brain MRI images always present overlapping gray-scale intensities for different tissues, particularly in the transitional regions of gray matter and white matter, or cerebrospinal fluid and gray matter. Therefore, fuzzy methods are more suitable for the brain MRI image segmentation because they can retain more information from the original image. The most widely used fuzzy method for image segmentation is fuzzy c-means (FCM) algorithm. The FCM clustering algorithm assigns a fuzzy membership value to each data point based on its proximity to the cluster centroids in the feature space. The standard FCM objective function of partitioning a dataset X =(x1, x2,…,xn) into c clusters is

J m (μ , v) =

¦¦ i =1

μ ijm x j − v i

subject to

¦μ

=1.

(5)

i =1

j =1

where ||·|| stands for the Euclidean norm, vi is the fuzzy cluster centroids,

μij

gives

the membership of the jth data in the ith cluster ci, and m presents the index of fuzziness. The objective function is minimized when pixels close to the centroid of their clusters are assigned high membership values, and low membership values are assigned to pixels with data far from the centroid. The membership function and cluster centers are updated by the following:

μ ij =

¦( k =1

x j − vi x j − vk

)

−2

( m −1 )

(6)

J. Kong et al.

and n

vi =

¦μ

m ij

xj .

j =1 n

¦μ

(7)

m ij

j =1

Based on Equation (6), Karen et al. proposed a generalized Kohonen’s competitive learning algorithm [20]. In their method, the degree of neuron excitation h(t) and learning rate a(t) in Equation (2) are approximated using FCM membership functions μ ij as follow:

ª μ º ij » hij (t ) = « « min μ ij » ¬ 1≤ i ≤ c ¼

f (t ) c

, i = 1, 2, , c .

(8)

and

a i (t ) =

where

μij

a0 . § a0 · ¨¨ ¸¸ + h ij ( t ) © a i ( t − 1) ¹

(9)

is the FCM membership in (6) and f(t) is a positive strict monotone in-

crease function of t which controls the degree of neuron excitation. In general, f(t) = (t ) is chosen. Although the experimental results in [20] show that their method is validated. There are still some problems. Firstly, the two functions in Equation (8) and (9) are very complicated and time consuming. Secondly, the degree of neuron excitation hij(t) in equation (8) will be extreme large as the time t increase, this is because when the iteration time increase, and the network tends towards convergence. For each input data, if its value is close to one of the centroids, its membership to the class it belongs to will be very high, and the membership to other classes will be low, even will be zero sometimes. Thus, the quotient obtained in (8) will be large. The neuron excitation will also be huge after the exponential operation, and increase the computation complexity evidently. 2.3 Our Proposed F_KCL Algorithm Due to the aim of overcoming the problems of Equation (8) and (9), in this section, we will present a new low-complexity method to approximate the neuron excitation and the learning rate as follow:

1 · § hij (t ) = exp ¨ t ( μ ij − ) ¸ c ¹ ©

, i = 1, 2, , c .

(10)

A Fuzzy Kohonen’s Competitive Learning Algorithm

and

a i (t ) =

a i ( t − 1) . a 0 + h ij ( t )

(11)

Transparently, in our proposed method, the neuron excitation and the learning rate are also determined by the membership function, but the hij(t) in our method will not be too large as the time t increase. It is clearly shows that the learning rate ai(t) in (11) monotonically decreases to zero as time t increase.

3 F_KCL Based on Kernel-Induced Distance In Section 2, we have described the fuzzy Kohonen’s competitive learning algorithm (F_KCL). By integrating the FCM cluster with Kohonen’s competitive learning algorithm, F_KCL algorithm can deal with the overlapping grayscale intensities and the not clearly defined borders between tissues successfully. However, the FCM algorithm always suffers sensitivity to the noise and outliers [20], thus, the F_KCL segmentation result will be degradation when applied to the noise corrupted images. Another drawback of the standard FCM is not suitable for revealing non-Euclidean structure of the input data due to the use of Euclidean distance (L2 norm). In order to avoid these disadvantages, Chen and Zhang proposed a kernel-induced distance measure method in [21] and [22]. The kernel methods are one of the most researched subjects within machine learning community in recent years and have been widely applied to pattern recognition and function approximation. In Chen’s study, they used kernel functions to substitute the inner products to realize an implicit mapping into feature space so that their corresponding kernelized versions are constructed. The major characteristic of their approach is that they do not adopt dual representation for data centroid, but directly transform all centroids in the original space, together with given data samples, into high-dimensional feature space with a mapping. Through the kernel substitution, the new class of non-Euclidean distance measures in original data space is obtained as Φ (x

) − Φ (v )

= (Φ (x

) − Φ (v )) (Φ (x ) − Φ (v )) Φ (x ) Φ (x ) − Φ (v ) Φ (x ) − Φ (x ) T

+ Φ (v i

)T Φ (v i ) K (x j , x j ) + K (v i , v i ) −

2 K (x j , v i

Φ (v i ) .

(12)

)

and the kernel function K(x, y) is taken as the radial basis function (RBF) to simplified (12), the typical RBF kernel is:

§ § d ¨−¨ xi − y i ¨ ¨¦ i =1 © K ( x , y ) = exp ¨ δ2 ¨ ¨ ¨ ©

· ¸ ¸ ¹

· ¸ ¸ ¸. ¸ ¸ ¸ ¹

(13)

J. Kong et al.

where d is the dimension of vector x. Obviously, for all x and RBF kernels, we can get K(x, x) = 1. With the above formulations, the kernel version of the FCM algorithm and its membership function are:

J mΦ ( μ , v ) =

¦¦ i =1

n j =1

μ ijm Φ (x j ) − Φ (v i )

¦ ¦ μ (1 − K (x

(1 − K ( x j , v i ))

( m −1)

m ij

i =1

j =1

, v i ))

(14)

and

μ ij =

¦ ( (1 − K ( x k =1

j , v k ))

)

−2

(15)

In our study, for the sake of overcoming the sensitivity to noise and outliers, we also incorporate the kernel distance measure with the F_KCL algorithm (we will name the algorithm KF_KCL), the KF_KCL can uniformly be summarized in the following steps: KF_KCL Algorithm Step 1) Fix the number of cluster c, the train time T; Step 2) Initialize the weights and the learning rate a(0)=ai(0)=1; Step 3) For t = 1, 2,...,T; For j = 1, 2,...,n; Set vi = wij (t ), i = 1, 2,c ; Calculate

μ ij using (15);

Calculate hij(t) using (10); Calculate ai(t) using (11); Update all notes using the following equation:

wij (t + 1) = wij (t ) + a(t )hij (t ) K ( xi , wij (t ))( xi − wij (t ))

(16)

Step 4) End;

4 Experimental Results The proposed KF_KCL algorithm was implemented in Matlab and tested on both simulated MRI images obtained from the BrainWeb Simulated Brain Database at the McConnell Brain Imaging Centre of the Montreal Neurological Institute (MNI), McGill University [17], and on real MRI data obtained from the Internet Brain Segmentation Repository (IBSR) [18]. Extra-cranial tissues are removed from all images prior to segmentation.

A Fuzzy Kohonen’s Competitive Learning Algorithm

4.1 Results Analysis and Comparison

In this section, we apply our algorithm to a simulated data volume with T1-weighted sequence, slice thickness of 1mm, volume size of 21 . The number of tissue classes in the segmentation is set to three, which corresponds to gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF). Background pixels are ignored in our experiment. For image data, there is strong correlation between neighboring pixels. To produce meaningful segmentation, the spatial relationship between pixels is considered in our experiment. The input vector of each pixel is constructed by the intensity of the current pixel and the mean value of its neighborhood, The 3-D neighborhood that we used in our study is a 3-D six-point neighborhood, i.e., north, east, south, west of the center voxel, plus the voxels immediately before and after the voxel. The parameters in the RBF kernel function are set as: σ = 400 , a = 2 and b = 1 . The brain image in Fig. 1(a) is a slice of the simulated 3-D volume, the segment result using our proposed KF_KCL algorithm is given in Fig. 1(b), Fig. 1(c) and (d) are the segmentation results obtained by the standard KCL algorithm and the FCM cluster algorithm, the “ground truth” of Fig. 1(a) is shown in Fig. 1(e). Though the visual inspection shows that the images in the Fig. 1(b), (c), and (d) are nearly same between each other and they are all similar to the “ground truth”, when we compare the three images with the “ground truth” and get the similarity indices [19] in Table 1, we can see that the result of our proposed algorithm is better than the standard KCL and the FCM algorithm. In addition, the KF_KCL converges faster than the KCL.

(a)

(b)

(d)

(c)

(e)

Fig. 1. (a) The original slice of the 3-D brain image (z = 70). (b) Segmentation result by KF_KCL algorithm with 5 iterations. (c) Segmentation result by standard KCL algorithm with 10 iterations. (d) Segmentation result by FCM. (e) The ground truth of (a).

J. Kong et al. Table 1. Similarity index for different methods in Fig. 1

KF_KCL

WM GM CSF

98.85 97.76 96.93

Standard KCL

97.37 95.10 92.00

FCM

97.38 96.79 96.99

After tested and compared our algorithm on the noise-free image, we then apply the KF_KCL algorithm to the images corrupted by noise and other imaging artifact. Fig. 2(a) shows the same slice as Fig. 1(a) but with 5% Rician noise, Fig. 2(b) is the segmentation result by KF_KCL algorithm, the images in Fig. 2(c) and (d) are the segment result by standard KCL and FCM algorithm. Transparently, the result of the proposed KF_KCL algorithm is better than other two methods. This is because both the kernel-induced distance measure and the spatial constrains can reducing the medical image noise effectively. The similarity indices of the images in Fig. 2 are also calculated and shown in Table 2.

(a)

(b)

(c)

(d)

Fig. 2. (a) The slice of the 3-D brain image (z = 70) with 5% noise. (b) The segment result using KF_KCL. (c) The segment result using KCL. (d) The segment result using FCM. Table 2. Similarity index for different methods in Fig. 2

KF_KCL

WM GM CSF

94.09 92.65 93.10

Standard KCL

92.21 90.46 92.02

FCM

92.55 91.53 90.32

Figure 3 (a) shows a slice of the simulated images corrupted by 3% noise and 20% intensity non-uniformity (INU), Fig. 3 (b) is the KF_KCL segment result. Although there are not any bias estimate and correct methods in our algorithm, by comparing with the “ground truth” in Fig. 3 (c), the similarity indices we obtained of WM, GM and CSF are 93.69%, 90.07% and 92.51% respectively. The similarity

A Fuzzy Kohonen’s Competitive Learning Algorithm

(a)

(b)

(c)

Fig. 3. (a) A slice of the 3-D brain image (z = 130) with 3% noise and 20% INU. (b) The segment result using KF_KCL. (c) The ground truth.

Fig. 4. Segmentation results for the whole brain and the white matter

(a)

(b)

(c)

(e)

(f)

(g)

(d)

(h)

Fig. 5. Segmentation of real MRI images. (a) and (e) are original images. (b) and (f) are the segment result by our proposed KF_KCL algorithm. (c) and (g) are FCM segmentation result. (d) and (h) are the segment result by standard KCL.

J. Kong et al.

index ρ > 70% indicates an excellent similarity [19]. In our experiments, the similarity indices ρ of all the tissues are larger than 90% even for a bad condition with noise and INU, which indicates an excellent agreement between our segmentation results and the “ground truth”. Figure 4 shows the 3-D view segmentation results of the whole brain and the white matter. 4.2 Performance on Actual MRI Data

The images in Fig. 5 (a) and (e) are two slices of real T1-weighted MRI images. Fig. 5 (b) and (f) are KF_KCL segmentation results. Fig. 5 (c) and (g) show the clustering results using FCM, the KCL segment results are shown in Fig. 5 (d) and (h). Visual inspection shows that our approach produces better segmentation than other algorithms.

5 Conclusions A novel fuzzy Kohonen’s competitive learning algorithm with kernel induced distance measure (KF_KCL) is presented in this paper. Because the transitional regions between tissues in MRI brain images are always not clearly defined, fuzzy methods are integrated with Kohonen’s competitive learning (KCL) algorithm to deal with this problem. Though the KCL-based segmentation techniques are useful in reducing the image noise, in order to further increase the segmentation accuracy, kernel methods are also incorporated in our work. The kernel methods have been widely applied to unsupervised cluster in recent years, and the kernel distance measure can effectively overcome the disadvantage of the Euclidean distance measure, e.g. sensitive to noise and outliers. At last, we consider the spatial relationships between image pixels in our experiments. The proposed KF_KCL algorithm is applied on both simulated and real MRI images and compared with the KCL and FCM algorithm. The results reported during the test show that our approach is better than the others.

References 1. Pham, D.L, Xu, C. Y, Prince, J. L: A Survey of Current Methods in Medical image Segmentation. [Technical report version, JHU/ECE 99—01, Johns Hopkins University], Ann. Rev. Biomed. Eng. 2 (2000) 315-37 2. Wells,W. M., Grimson, W. E. L., Kikinis, R., Arrdrige, S. R.: Adaptive Segmentation of MRI Data. IEEE Trans Med Imaging 15 (1996) 429-42 3. Gerig, G, Martin, J., Kikinis, R, Kubler, D, Shenton, M., Jolesz, F. A: Unsupervised Tissue Type Segmentation of 3D Dual-echo MR Head Data. Image Vision Compute, 10 (1992) 349-60 4. Alan, W. C. L., Yan, H.: An Adaptive Spatial Fuzzy Clustering Algorithm for 3-D MR Image Segmentation. IEEE Transaction on Medical Imaging 22 (9) (2003) 1063-1075 5. Philips, W. E., Velthuizen, R. P., Phuphanich, S, Hall, L. O, Clarke, L. P, Silbiger, M. L: Application of Fuzzy C-means Segmentation Technique for Differentiation in MR Images of a Hemorrhagic Glioblastoma Multiforme. Mag Reson Imaging 13 (1995) 277–90

A Fuzzy Kohonen’s Competitive Learning Algorithm

6. Pham, D. L, Prince, J. L.: Adaptive Fuzzy Segmentation of Magnetic Resonance Images. IEEE Trans Med Imaging 18 (1999) 737-752 7. Bezdek, J: Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press (1981) 8. Wu, K. L., Yang, M. S.: Alternative C-means Clustering Algorithms. Pattern Recognition 35 (2002) 2267–2278 9. Hall, L. O, Bensaid, A.M, Clarke L. P, Velthuizen, R. P, Silbiger, M. S, Bezdek, J. C: A Comparison of Neural Network and Fuzzy Clustering Techniques in Segmenting Magnetic Resonance Images of the Brain. IEEE Trans Neural Networks, 3 (1992) 672–682 10. Ozkan, M, Dawant, B. M, Maciunas, R. J: Neural-network-based Segmentation of Multimodal Medical Images: A Comparative and Prospective Study. IEEE Trans Medical Imaging, 12 (1993) 534–544 11. Reddick, W. E, Glass, J. O, Cook, E. N, Elkin, T. D, Deaton, R: Automated Segmentation and Classi-fication of Multispectral Magnetic Resonance Images of Brain using Artificial Neural Networks. IEEE Trans Med Imaging , 16 (1997) 911–918 12. Reddick, W. E, Mulhern, R. K, Elkin, T. D, Glass, J. O, Merchant, T. E, Langston, J. W: A Hybrid Neural Network Analysis of Subtle Brain Volume Differences in Children Surviving Brain Tumors. Mag Reson Imaging, 16 (1998) 413–421 13. Chuang, K. H, Chiu, M. J, Lin, C. C, Chen, J. H: Model-free functional MRI analysis using Kohonen clustering neural network and fuzzy c-means. IEEE Trans Medical Imaging, 18 (1999) 1117–1128 14. Glass, J. O, Reddick, W. E, Goloubeva, O., Yo, V., Steen, R. G: Hybrid Artificial Neural NetWork Segmentation of Precise and Accurate Inversion Recovery (PAIR) Images From Normal Human Brain. Mag Reson Imaging 18 (2000) 1245–1253 15. Kohonen, T.: Self-Organizing Maps. New York: Springer-Verlag (1995) 16. Kwan, R. S., Evans, A., Pike, G: MRI Simulation-based Evaluation of Image-processing and Classification Methods. IEEE Trans. Med. Imaging 18 (11) (1999) 1085-1097. Available: http://www.bic.mni.mcgill.ca/brainweb 17. Kennedy, D. N, Filipek, P. A, Caviness, V. S: Anatomic Segmentation and Volumetric Calculations in Nuclear Magnetic Resonance Imaging. IEEE Transactions on Medical Imaging, 8 (1989) 1-7. Available: http://www.cma.mgh.harvard.edu/ibsr/ 18. Zijdenbos, A., Dawant, B.: Brain Segmentation and White Matter Lesion Detection in MR Images. Crit. Rev. Biomed. Eng. 22(5–6) (1994) 401–465 19. Karen, C. R. Lin, M. S., Yang, H. C., Liu, J.F., Lirng, Wang, P. N.: Generalized Kohonen’s Competitive Learning Algorithm for Ophthalmological MR Image Segmentation. Magnetic Resonance Imaging 21 (2003) 863-870 20. Chen, S. G., Zhang, D. Q.: Robust Image Segmentation using FCM with Spatial Constraints Based on New Kernel Induced Distance Measure. IEEE Transaction on SMC-Part B 34 (2004) 1907-1916 21. Zhang, D. Q., Chen, S. C.: A Novel Kernelized Fuzzy C-means Algorithm with Application in Medical Image Segmentation. Artificial Intelligence in Medicine 32 (2004) 37-50

A Hybrid Genetic Algorithm for Two Types of Polygonal Approximation Problems Bin Wang1, and Chaojian Shi1,2 1

Department of Computer Science and Engineering, Fudan University, Shanghai, 200433, P. R. China 2 Merchant Marine College, Shanghai Maritime University, Shanghai, 200135, P. R. China [emailprotected], [emailprotected]

Abstract. A hybrid genetic algorithm combined with split and merge techniques (SMGA) is proposed for two types of polygonal approximation of digital curve, i.e. Min-# problem and Min-ε Problem. Its main idea is that two classical methods—split and merge techniques are applied to repair infeasible solutions. In this scheme, an infeasible solution can not only be repaired rapidly, but also be pushed to a local optimal location in the solution space. In addition, unlike the existing genetic algorithms which can only solve one type of polygonal approximation problem, SMGA can solve two types of polygonal approximation problems. The experimental results demonstrate that SMGA is robust and outperforms other existing GA-based methods.

Introduction

In image processing, the boundary of an object can be viewed as a closed digital curve. How to represent it for facilitating subsequent image analysis and pattern recognition is a key issue. Polygonal approximation is a good representation method for the closed digital curve. Its basic idea is that a closed digital curve is divided into a ﬁnite number of segments and each segment is approximated by a line segment connecting its two end points. The whole curve is then approximated by the polygon formed by these line segments. Polygonal approximation is a simple and compact representation method which can approximating the curve with any desired level of accuracy. Therefore, this method is widely studied in image processing, pattern recognition, computer graphics, digital cartography, and vector data processing. In general, there are two types of polygonal approximation problems which have attracted many researchers’ interest. They are described as follows: Min-# problem: Given a closed digital curve, approximate it by a polygon with a minimum number of line segments such that the approximation error does not exceed a given tolerance error ε. Min-ε problem: Given a closed digital curve, approximate it by a polygon with a given number of line segments such that the approximation error is minimized.

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 30–41, 2006. c Springer-Verlag Berlin Heidelberg 2006

A Hybrid GA for Two Types of Polygonal Approximation Problems

Both of the above polygonal approximation problems can be formulated as a combinatorial optimization problem. Since an exhaustive search for the optimal solution in the potential solution space will result in an exponential complexity [1], many existing methods for polygonal approximation problems yield suboptimal results to save computational cost. Some existing methods for polygonal approximation problems are based on local search technique. They can be classiﬁed into following categories: (1) sequential tracing approach [2], (2) split method [3], (3) merge method [4], (4) split-and-merge method [5], and (5) dominant point method [6]. These methods work very fast but their results may be very far away from the optimal ones because of their dependence on the selection of starting points or the given initial solutions. In recent years, many nature-inspired algorithms such as genetic algorithms (GA) [1,8,9,10,11], ant colony optimization (ACO)[12], particle swarm optimization (PSO)[13] and so on, have been applied to solve the Min-# problem or the Min-ε problem and presented promising approximation results. In this paper, we focus on using GA-based method to solve polygonal approximation problems. The power of GA arises from crossover, and crossover causes a structured, yet randomized exchange of genetic material between solutions, with the possibility that ’good’ solutions can generate ’better’ ones. However, crossover may also generate infeasible solutions, namely, two feasible parents may generate an infeasible child. This especially arises in combinatorial optimization where the encoding is the traditional bit string representation and crossover is the generalpurpose crossover [11]. Therefore how to cope with the infeasible solution is the main problem involved in using GA-based method for polygonal approximation problems. Among existing GA-based methods for polygonal approximation problems, there are two schemes which are used to cope with infeasible solutions. One is to modify the traditional crossover and constrain it to yield feasible oﬀsprings. Here, we term it constraining method. Yin [8] and Huang[10] adopt this method for solving min-ε problem and min-# problem, respectively. Both of them adopt a modiﬁed version of the traditional two-cut-point crossover. In traditional twocut-point crossover (shown in Fig. 4), two crossover sites are chosen randomly. However, it may generate infeasible solutions. They modiﬁed it by choosing the appropriate crossover point on the chromosome which can maintain the feasibility of oﬀsprings. However, this will require repeated testing candidate crossover points on the chromosome and result in an expensive cost of time. Furthermore, in some case, such crossover sites can not be obtained for Min# problem. For solving min-ε problem, Chen and Ho [11] proposed a novel crossover termed orthogonal-array-crossover which can maintain the feasibility of oﬀsprings. However, the complexity of this kind of crossover is also high and it is only suitable for min-ε problem and not for min-# problem. Another method for coping with the infeasible solutions is penalty function method. Yin [1] adopted this scheme for min-# problem. It’s main idea is that a penalty function is added to the ﬁtness function for decreasing the survival

B. Wang and C. Shi

probability of the infeasible solution. However, it is usually diﬃcult to determine an appropriate penalty function. If the strength of the penalty function is too large, more time will be spent on ﬁnding the feasible solutions than searching the optimum, and if the strength of penalty function is too small, more time will be spent on evaluating the infeasible solutions [11]. For solving the above problems involved in coping with the infeasible solutions, we propose a hybrid genetic algorithm combined with split and merge technique (SMGA) for solving min-ε problem and min-# problem. The main idea of SMGA is that the traditional split and merge technique is employed to repair infeasible solutions. SMGA has following three advantages over the existing GA-based methods. (1) SMGA doesn’t require developing a special penalty function, or modifying and constraining the traditional two-cut-point crossover for avoiding yielding an infeasible solution. In SMGA, an infeasible solution can be transformed into a feasible one through a simple repairing operator. (2) SMGA combines the advantage of GA possessing the strong global search ability, and the merits of the traditional split and merge technique having the strong local search ability. This will improve the solution quality and convergence speed of GA. (3) Diﬀerent from the existing GA-based methods which are designed for solving min-ε problem or min-# problem alone, SMGA are developed for solving both of them. We use four benchmark curves to test SMGA, the experimental results show its superior performance.

Problems Formulation

Deﬁnition 1. A closed digital curve C can be represented by a clockwise ordered sequence of points, that is C = {p1 , p2 , . . . , pN } and this sequence is circular, namely, pN +i = pi , where N is the number of points on the digital curve. Deﬁnition 2. Let p i pj = {pi , pi+1 , . . . , pj } represent the arc starting at point pi and continuing through point pj in the clockwise direction along the curve. Let pi pj denote the line segment connecting points pi and pj . Deﬁnition 3. The approximation error between p i pj and pi pj is deﬁned as follows: d2 (pk , pi pj ), (1) e(p i pj , pi pj ) = pk ∈pi pj

where d(pk , pi pj ) is the perpendicular distance from point pk to the line segment pi pj .

A Hybrid GA for Two Types of Polygonal Approximation Problems

Deﬁnition 4. The polygon V approximating the contour C = {p1 , p2 , . . . , pN } is a set of ordered line segments V = {pt1 pt2 , pt2 pt3 , . . . , ptM −1 ptM , ptM pt1 }, such that t1 < t2 < . . . < tM and {pt1 , pt2 , . . . , ptM } ⊆ {p1 , p2 , . . . , pN }, where M is the number of vertices of the polygon V . Deﬁnition 5. The approximation error between the curve C = {p1 , p2 , . . . , pN } and its approximating polygon V = {pt1 pt2 , pt2 pt3 , . . . , ptM −1 ptM , ptM pt1 } is deﬁned as E(V, C) =

e(pti pti+1 , pti pti+1 ),

(2)

i=1

Then the two types of polygonal approximation problems are formulated as follows: Min-# problem: Given a digital curve C = {p1 , p2 , . . . , pN } and the error tolerance ε. Suppose Ω denotes the set of all the polygons which approximate the curve C. Let SP = {V | V ∈ Ω ∧ E(V, C) ≤ ε}, Find a polygon P ∈ SP such that |P | = min |V |,

(3)

V ∈SP

where |P | denotes the cardinality of P . Min-ε problem: Given a digital curve C = {p1 , p2 , . . . , pN } and an integer M , where 3 ≤ M ≤ N . Suppose Ω denotes the set of all the polygons which approximate the curve C. Let SP = {V | V ∈ Ω ∧ |V | = M }, where |V | denotes the cardinality of V . Find a polygon P ∈ SP such that E(P, C) = min E(V, C).

(4)

V ∈SP

3 3.1

Overview of Split and Merge Techniques Split Technique

Traditional split technique is a very simple method for solving polygonal approximation problem. It is a recursive method starting with an initial curve segmentation. At each iteration, a split procedure is conducted to split the segment at the selected point unless the obtained polygon satisfy the speciﬁed constraint condition. The detail of split procedure is described as follows: Suppose that curve C is segmented into M arcs pt 1 pt2 , . . . , ptM −1 ptM , pt M pt1 , where pti is segment point. Then a split operation on curve C is: for each point pi ∈ ptj ptj+1 , calculate the distance to the corresponding chord D(pi ) = d(pi , ptj ptj+1 ). Seek a point pu on the curve which satisﬁes D(pu ) = max D(pi ). pi ∈C

ptk+1 . Then the arc ptk ptk+1 is segmented at the point pu Suppose that pu ∈ ptk into two arcs p p and p p . Add the point into the set of segment points. tk u u tk+1 Fig. 1 shows a split process. The function of split operator is to ﬁnd a new possible vertex using heuristic method.

B. Wang and C. Shi

split

Fig. 1. Split operation

merge d min

Fig. 2. Merge operation

3.2

Merge Technique

Merge technique is another simple method for yielding approximating polygon of digital curve. It is a recursive method starting with an initial polygon which regards all the points of the curve as its vertexes. At each iteration, a merge procedure is conducted to merge the selected two adjacent segments of the current polygon until the obtained polygon satisfy the speciﬁed constraint condition. The detail of merge procedure is described as follows: Suppose that curve C is segmented into M arcs pt 1 pt2 , . . . , ptM −1 ptM , pt M pt1 , where pti is segment point. Then a merge operation on curve C is deﬁned as: For each segment point pti , calculate the distance it to the line segment which connect its two adjacent points Q(pti ) = d(pti , pti−1 pti+1 ). Select a segment point ptj which satisﬁes Q(ptj ) = min Q(pti ), where V is the set of the current pti ∈V

segment points. Then two arcs ptj−1 ptj and ptj ptj+1 are merged into a single arc ptj+1 . The segment point ptj is removed from the set of the current segment ptj−1 points. Fig. 2 shows a merge process. The function of merge operator is to remove a possible redundant vertex in heuristic way.

4 4.1

The Proposed Genetic Algorithm (SMGA) Chromosome Coding Scheme and Fitness Function

The encoding mechanism maps each approximating polygon to a unique binary string which is used to represent a chromosome. Each gene of the chromosome corresponds to a point of the curve. if and only if its value is 1, its corresponding curve point is considered as a vertex of the approximating polygon. The number

A Hybrid GA for Two Types of Polygonal Approximation Problems

Fig. 3. Mutation

of genes whose value is 1 equals to the number of vertexes of the approximating polygon. For instance, given a curve C = {p1 , p2 , . . . , p10 } and a chromosome ’1010100010’. Then the approximating polygon the chromosome represents is {p1 p3 , p3 p5 , p5 p9 , p9 p1 }. Assume that a chromosome α = b1 b2 . . . bN . For min-ε problem, the ﬁtness function f (α) is deﬁned as follows: f (α) = E(α, C)

(5)

For Min-# problem, the ﬁtness function f (α) is deﬁned as follows: f (α) =

(6)

i=1

For the above ﬁtness functions, the smaller the function value is, the better the individual is. 4.2

Genetic Operators

Selection. Select two individual from the population randomly and leave the best one. Mutation. Randomly select a gene on the chromosome and shift it a site to left or right randomly and set 0 to the original gene site (shown in Fig. 3). Crossover. Here, we use the traditional two-cut-point crossover. Its detail is that: randomly select two sites on the chromosome and exchange the two chromosomes’ substring between the two selected sites. For example: given two parent chromosomes ’1010010101’ and ’1011001010, the randomly selected crossover sites is 4 and 7. Then the two children yielded by two-cut-point crossover are ’1011001101’ and ’1010010010’ (shown in Fig. 4). 4.3

Chromosome Repairing

Two-cut-point crossover may yield infeasible oﬀspring. Here, we develop a method using the split and merge technique introduced in section 3 for repairing the infeasible oﬀsprings. For Min-ε Problem: Suppose that the speciﬁed number of sides of the approximation polygon is M . Then for an infeasible solution α, we have L(α) = M ,

B. Wang and C. Shi

Parent 1

Offspring 1

Parent 2

Offspring 2

Fig. 4. Two-cut-point crossover

where L(α) denotes the number of sides of the approximating polygon α. Then the infeasible solution α can be repaired through following process: If L(α) > M then repeat conducting merge operation until L(α) = M . If L(α) < M , then repeat conducting split operation until L(α) = M . For Min-# Problem: Suppose that the speciﬁed error tolerance is ε. Then for an infeasible solution α, we have E(α) > ε, where E(α) is the approximation error. Then the infeasible solution α can be repaired through following process: If E(α) > ε, then repeat conducting split operation until E(α) ≤ ε. Computational Complexity: Supposed that the number of curve points is n and the number of sides of the infeasible solution is k. From the deﬁnitions of the split and merge operations, the complexity of the split procedure is O(n − k) and that of the merge procedure is O(k). For Min-ε problem: suppose that the speciﬁed number of sides of the approximating polygon is m. If k < m, then repairing the current infeasible solution will require recalling split procedure m − k times. Thus the complexity of the repairing process is O((n − k)(m − k)). If k > m, then repairing the current infeasible solution will require recalling merge procedure k − m times. Therefore, the complexity of the repairing process is O(k(k − m)). For Min-# problem: it is diﬃcult to exactly compute the complexity of the repairing process. Here, we give the complexity of the worst case. In the worst case, we have to add all the curve point to the approximating polygon to maintain the feasibility of the solution. In such case, the approximation error is equal to 0. It will require calling split procedure n − k times. Therefore, the complexity of the repairing process in the worst case is O((n − k)2 ). 4.4

Elitism

Elitism is implemented by preserving the best chromosome with no suﬀering from being changed to the next generation.

Experimental Results and Discussion

To evaluate the performance of the proposed SMGA, we utilize four commonlyused benchmark curves, as shown in Fig. 5. Among these curves, (a) is a ﬁgure-8

A Hybrid GA for Two Types of Polygonal Approximation Problems

(a) ﬁgure-8

(b) chromosome

(d) leaf

Fig. 5. Four benchmark curves

curve, (b) is a chromosome-shaped curve, (c) is a curve with four semi-circles and (d) is a leaf-shaped curve. The number of their curve points is 45, 60, 102 and 120 respectively. Literature [6] presented their chain codes. Two groups of experiments are conducted to evaluate the performance of SMGA. One is to apply SMGA to solve the Min-ε problem. The other is to apply SMGA to solve the Min-# problem. All the experiments are conducted using a computer with CPU Pentium-M 1.5 under Windows XP. The parameter of SMGA is set as follows: population size Ns = 31, crossover probability pc = 0.7, mutation probability pm = 0.3 and the maximum number of generations Gn = 80. Table 1. Experimental results of SMGA and EEA [11] for Min-ε problem Curves

semicircle (N = 102)

Figure-8 (N = 45)

chromosome (N = 60)

M 10 12 14 17 18 19 22 27 30 6 9 10 11 13 15 16 8 9 12 14 15 17 18

BEST ε EEA SMGA 38.92 38.92 26.00 26.00 17.39 17.39 12.22 12.22 11.34 11.19 10.04 10.04 7.19 7.01 3.73 3.70 2.84 2.64 17.49 17.49 4.54 4.54 3.69 3.69 2.90 2.90 2.04 2.04 1.61 1.61 1.41 1.41 13.43 13.43 12.08 12.08 5.82 5.82 4.17 4.17 3.80 3.80 3.13 3.13 2.83 2.83

AVERAGE ε EEA SMGA 44.23 42.89 29.42 27.80 20.14 18.55 14.46 13.37 12.79 12.56 11.52 11.22 8.63 7.73 4.87 4.05 3.67 2.93 18.32 17.64 4.79 4.71 3.98 3.73 3.19 3.15 2.36 2.05 1.87 1.69 1.58 1.51 15.56 13.99 13.47 12.76 6.75 5.86 5.13 4.56 4.27 4.07 3.57 3.21 3.04 2.95

VARIANCE EEA SMGA 78.50 25.98 4.68 2.05 4.69 1.41 2.31 1.11 1.47 0.91 0.97 0.50 0.56 0.32 0.57 0.15 0.33 0.04 0.45 0.12 0.15 0.06 0.05 0.02 0.04 0.01 0.06 0.00 0.04 0.01 0.03 0.01 2.42 1.26 1.76 0.55 0.88 0.00 0.59 0.06 0.14 0.04 0.16 0.03 0.05 0.01

B. Wang and C. Shi

(M = 18, ε = 11.34) (a) EEA

(M = 18, ε = 11.19) (e) SMGA

(M = 22, ε = 7.19) (b) EEA

(M = 22, ε = 7.01) (f) SMGA

(M = 27, ε = 3.73) (c) EEA

(M = 27, ε = 3.70) (g) SMGA

(M = 30, ε = 2.84) (d) EEA

(M = 30, ε = 2.64) (h) SMGA

Fig. 6. The comparative results of SMGA and EEA [11] for Min-ε problem, where M is the speciﬁed number of the sides of approximating polygon, ε is the approximation error

5.1

For Min-ε Problem

Ho and Chen [11] proposed a GA-based method, Eﬃcient Evolutionary Algorithm (EEA), which adopted constraining method to cope with infeasible solutions for solving Min-ε problem. Here we use three curves, semicircle, ﬁgure-8 and chromosome to test SMGA and compare it with EEA. For each curve and a speciﬁed number of sides M , the simulation conducts ten independent runs for SMGA and EEA, respectively. The best solution, average solution and variance of solutions during ten independent runs for SMGA and EEA are listed in Table 1. Parts of simulation results of SMGA and EEA are shown in Fig. 6, where M is the speciﬁed number of the sides of approximating polygon, and ε is the approximation error. From Table 1 and Fig. 6, we can see that, for the same number of polygon’s sides, SMGA can obtain approximating polygon with smaller approximation error than EEA. The average coputation time of EEA for three benchmark curves, semicircle, ﬁgure-8 and chromosome, are 0.185s, 0.078s and 0.104s respectively, while SMGA only require 0.020s, 0.011s and 0.015s. It can be seen that SMGA outperforms EEA in the convergence speed. 5.2

For Min-# Problem

Yin [1] proposed a GA-based method for solving Min-# problem (we term it YGA). YGA adopted penalty-function method to cope with infeasible solutions.

A Hybrid GA for Two Types of Polygonal Approximation Problems

Table 2. Experimental results for SMGA and YGA [1] for Min-# problem Curves

Leaf (N = 120)

Chromosome (N = 60)

Semicirle (N = 102)

( ε = 30,M = 20) (a) YGA

( ε = 30,M = 16) (e) SMGA

ε 150 100 90 30 15 30 20 10 8 6 60 30 25 20 15

BEST M YGA SMGA 15 10 16 12 17 12 20 16 23 20 7 6 8 7 10 10 12 11 15 12 12 10 13 12 15 13 19 14 22 15

( ε = 15,M = 23) (b) YGA

( ε = 15,M = 20) (f) SMGA

AVERAGE M YGA SMGA 15.4 10.1 16.2 12.6 17.4 12.8 20.3 16.0 23.1 20.0 7.6 6.0 9.1 7.0 10.4 10.0 12.4 11.0 15.4 12.0 13.3 10.0 13.6 12.1 16.3 13.0 19.5 14.0 23.0 15.2

( ε = 6,M = 15) (c) YGA

( ε = 6,M = 12) (g) SMGA

VARIANCE YGA SMGA 0.5 0.0 0.3 0.0 0.4 0.0 0.3 0.0 0.4 0.0 0.2 0.0 0.3 0.0 0.4 0.0 0.3 0.0 0.4 0.0 0.3 0.0 0.4 0.0 0.5 0.0 0.3 0.0 0.7 0.0

( ε = 15,M = 22) (d) YGA

( ε = 15,M = 15) (h) SMGA

Fig. 7. The comparative results of SMGA and YGA [1] for Min-# problem, where ε is the speciﬁed error tolerance, M is the number of sides of the approximating polygon

Here, we conduct SMGA and YGA using three benchmark curves, leaf, chromosome and semicirle. For each curve and a speciﬁed error tolerance ε, the simulation conducts ten independent runs for SMGA and YGA, respectively. The best solution, average solution and variance of solutions during ten independent runs for SMGA and YGA are listed in Table 2, Parts of simulation results of SMGA

B. Wang and C. Shi

and YGA are shown in Fig. 7, where ε is the speciﬁed error tolerance, M is the number of sides of the approximating polygon. From Table 2 and Fig. 7, we can see that, for the same error tolerance, SMGA yields approximating polygon with relatively smaller number of sides than YGA. The average computation time of YGA for three benchmark curves, leaf, chromosome and semicirle, are 0.201s, 0.09s and 0.137s respectively, while SMGA only require 0.025s, 0.015s and 0.023s for them. It can be seen that SMGA outperforms YGA in the quality of the convergence speed.

Conclusion

We have proposed SMGA successfully to solve two types of polygonal approximation of digital curves, Min-# problem and Min-ε problem. The proposed chromosome-repairing technique of using split and merge techniques eﬀectively overcomes the diﬃcult problem of coping with infeasible solutions. The simulation results have shown that the proposed SMGA outperforms the existing GA-based methods which use other techniques of coping infeasible solutions for two types of polygonal approximation problems.

Acknowledgement The research work in this paper is partially sponsored by Shanghai Leading Academic Discipline Project, T0603.

References 1. Yin, P.Y.: Genetic Algorithms for Polygonal Approximation of Digital Curves. Int. J. Pattern Recognition Artif. Intell. 13 (1999) 1–22 2. Sklansky, J., Gonzalez, V.: Fast Polygonal Approximation of Digitized Curves. Pattern Recognition. 12 (1980) 327–331 3. Douglas, D.H., Peucker, T.K.: Algorithm for the Reduction of the Number of Points Required to Represent a Line or Its Caricature. The Canadian Cartographer. 12(2) (1973) 112–122 4. Leu, J.G., Chen, L.: Polygonal Approximation of 2D Shapes through Boundary Merging. Pattern Recgnition Letters. 7(4) (1988) 231–238 5. Ray, B.K., Ray, K.S.: A New Split-and-Merge Technique for Polygonal Apporximation of Chain Coded Curves. Pattern Recognition Lett. 16 (1995) 161–169 6. Teh, H.C., Chin, R.T.: On Detection of Dominant Points on Digital Curves. IEEE Trans Pattern Anal Mach Intell. 11(8) 859–872 7. Yin, P.Y.: A Tabu Search Approach to the Polygonal Approximation of Digital Curves. Int. J. Pattern Recognition Artif Intell. 14 (2000) 243–255 8. Yin, P.Y.: A New Method for Polygonal Approximation Using Genetic Algorithms. Pattern Recognition letter. 19 (1998) 1017–1026. 9. Huang, S.-C., Sun, Y.-N.: Polygonal Approximation Using Genetic Algorithms. Pattern Recognition. 32 (1999) 1409–1420

A Hybrid GA for Two Types of Polygonal Approximation Problems

10. Sun, Y.-N., Huang, S.-C.: Genetic Algorithms for Error-bounded Polygonal Approximation. Int. J. Pattern Recognition and Artiﬁcial Intelligence. 14(3) (2000) 297–314 11. Ho, S.-Y., Chen, Y.-C.: An Eﬃcient Evolutionary Algorithm for Accurate Polygonal Approximation. Pattern Recognition. 34 (2001) 2305–2317 12. Yin, P.Y.: Ant Colony Search Algorithms for Optimal Polygonal Approximation of Plane Curves. Pattern Recognition. 36 (2003) 1783–1997 13. Yin, P.Y.: A Discrete Particle Swarm Algorithm for Optimal Polygonal Approximation of Digital Curves. Journal of Visual Communication and Image Representation. 15 (2004) 241–260

A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach Yongni Shao and Yong He College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310029, China. [emailprotected]

Abstract. A nondestructive optical method for determining the sugar and acidity contents of peach was investigated. Two types of preprocessing were used before the data were analyzed with multivariate calibration methods of principle component analysis (PCA) and partial least squares (PLS). A hybrid model combined PLS with PCA was put forwarded. Spectral data set as the logarithms of the reflectance reciprocal was analyzed to build a best model for predicting the sugar and acidity contents of peach. A model with a correlation coefficient of 0.94/0.92, a standard error of prediction (SEP) of 0.50/0.07 and a bias of 0.02/-0.01 showed an excellent prediction performance to sugar/acidity. At the same time, the sensitive wavelengths corresponding to the sugar content and acidity of peaches or some element at a certain band were proposed on the basis of regression coefficients by PLS.

1 Introduction Peach is one of the most important fruit in the agriculture markets of China and favored by many people. However the different varieties of peach are of different taste and quality. Both the appearances (shape, color, size, tactility, etc) and the interior qualities (sugar content, acidity and the vitamin content, etc) are the aspects which can be used as the quality criterion of peach, thereinto sugar and acid contents are the most important evaluation criterion which affects the consumers’ appreciation for selection. Most of the methods to measure these qualities are based on complex processing of samples, the most expensive chemical reagents and so on. Marco et al. applied high-performance liquid chromatography (HPLC) to test and analyze the quality of peach [1]. Wu et al. also used HPLC to analyze the change of sugar and organic acid in peach during its maturation [2]. Steinmetz et al. used sensor fusion technology to analyze peach quality [3]. Corrado et al. used electronic nose and visible spectra to analyze the peach qualities including SSC and acidity [4]. Near infrared spectroscopy (NIR) technique has several attractive features including fast analytical speed, ease of operation and nondestructive natures. The most important one is that it can give the response of the molecular transition of its corresponding chemical constituents to the spectrum, such as O-H, N-H, and C-H. In recent years, NIR has attracted considerable attention for the purpose of discrimination between sets of similar biological materials such as citrus oils [5], yogurt variety [6], honey [7], and apple D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 42 – 53, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach

variety [8]. It is also regarded as a method for nondestructive sensing of fruit quality. Lammertyn et al. examined the prediction capacity of the quality characteristics like acidity, firmness and soluble solid content of Jonagold apples with a wavelength range between 380 and 1650nm [9]. Carlini et al. used visible and near infrared spectra to analyze soluble solids in cherry and apricot [10]. Lu evaluated the potential of NIR reflectance for measurement of the firmness and sugar content of sweet cherries [11]. McGlone et al. used Vis/NIR spectroscopy to analyze mandarin fruit [12]. Andre and Marcia predicted solids and carotenoids in tomato by using NIR [13]. There are many multivariate calibration used for quantitative analysis of sample constituents in NIRS. Principal Components Regression (PCR), Partial Least Squares (PLS) and Artificial Neural Networks (ANN) are the most useful multivariate calibration techniques [14, 15, 16]. The PCR can effectively compress the dimensions of the original independent variables by constructing the relationship between the original independent variables and new reduced dimension independent variables. However, the correlation degree of original independent variables and new reduced dimension independent variables is decreased, which lead to low prediction precision. The ANN, which is a popular non-liner calibration method in chemometrics, has a high quality in non-linear approximation. Nevertheless, the weaknesses of this method, such as its low training speed, ease of becoming trapped at a local minimum and over-fitting should be taken into account [17]. The PLS is usually considered for a large number of applications in fruit and juice analysis and is widely used in multivariate calibration. One important practical aspect of PLS is that it takes into account errors both in the concentration estimates and spectra. Therefore, PLS is certainly an invaluable linear calibration tool. Thus, this paper proposed PLS to predict the sugar and acid contents of peach. Although NIR based on non-destructive measurements have been investigated on some fresh fruits, information about peach is limited. It is known that SSC and pH values vary as a function of storage time and temperature. Slaughter studied that Vis/NIR spectroscopy could be used to measure non-destructively the internal quality of peaches and nectarines [18]. Pieris et al. did the research on spatial variation in soluble solids content of peaches by using NIR spectrometer [19]. Ortiz et al. used impact response and NIR to identify woolly peaches [20]. Golic and Walsh used calibration models which were based on near infrared spectroscopy for the in-line grading of peach for total soluble solids content [21]. The objective of this research is to examine the feasibility of using Vis/NIR spectroscopy to detect the sugar and acid contents of intact peach through using a hybrid model, which combined PLS with principle component analysis (PCA). At the same time, try to find sensitive wavelengths corresponding to the sugar and acidity contents of peach.

2 Materials and Methodology 2.1 Experimental Material To get dependable prediction equations from NIRS, it is necessary that the calibration set covers the range of fruit sources to which it will be applied. Three kinds of peaches:

Y. Shao and Y. He

Milu peach (from Fenghua of Zhejiang, China), Dabaitao peach (from Jinhua of Zhejiang, China) and Hongxianjiu peach (from Shandong, China) were used in this experiment. A total of 80 peaches used for the experiment were purchased at a local market and stored for two days at 20°C. By calculating and deleting all samples with PCA, two peaches were detected as outliers. So, 48 peaches were finally used for the calibration model, and 30 samples were used for prediction model. Peaches to be measured were selected to cover two parameters (sugar and acidity contents). All the peaches were cut in half and extracted using a manual fruit squeezer (model: HL-56, Shanghai, China). Samples of the filtered juice were then taken for sugar content measurement using digital refractometer (model: PR-101, ATAGO, Japan) by the China standard for sugar content measurement in fruit (GB12295-90). The measurement for acidity was using a pH meter (SJ-4A, Exact instrument Co., Ltd., Shanghai, China) also by the China standard. 2.2 Spectra Collection For each peach, reflection spectra was taken at three equidistant positions approximately 120° around the equator, and with each reflection spectra the scan number was 10 at exactly the same position, so a total scan for one peach was 30, with a spectrograph (FieldSpec Pro FR (325–1075 nm)/ A110070), Trademarks of Analytical Spectral Devices, Inc. (ASD), using RS2 software for Windows. Considering its 20° field-of-view (FOV), the spectrograph was placed at a height of approximately 100 mm above the sample and a light source of Lowell pro-lam 14.5V Bulb/128690 tungsten halogen (Vis/NIRS) was placed about 300 mm from the center of the peach to make the angle between the incident light and the detector optimally about 45°. To avoid low signal-noise ratio, only the wavelength ranging from 400 to 1000 nm was used in this investigation. In order to obtain enough sensitivity to measure the diffuse reflectance of the intact peach, each spectrum was recorded as log (1/R), where R=reflectance. 2.3 Processing of the Optical Data To test the influence of the preprocessing on the prediction of the calibration model, two types of preprocessing were used. First to reduce the noise, the smoothing way of Savitzky-Golay was used, with a gap of 9 data points. The second type of preprocessing was the use of the multiplicative scatter correction (MSC). This method was used to correct additive and multiplicative effects in the spectra. Once these preprocessing procedures were completed, a hybrid method combined PLS with PCA was used to develop calibration models for predicting the sugar content and the acidity. The pre-process and calculations were carried out using ‘The Unscrambler V9.2’ (CAMO PROCESS AS, Oslo, Norway), a statistical software package for multivariate calibration. 2.4 A Hybrid Method Combined PLS with PCA PLS is a bilinear modeling method where the original independent information (X-data) is projected onto a small number of latent variables (LV) to simplify the relationship between X and Y for predicting with the smallest number of LVs. The standard error of calibration (SEC), the standard error of prediction (SEP) and correlation coefficient (r) were used to judge the success and accuracy of the PLS model.

A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach

In this paper, PCA combined with PLS regression was used to derive the first 20 principal components from the spectral data for further analysis to examine the relevant and interpretable structure in the data as well as outlier detection[22]. It was also used to eliminate defective spectra and at the same time some unnecessary wavelengths. Because PCA does not consider the concentration of the object, as in this paper the sugar and acidity, so PLS was used for further analysis of the sensitive wavelengths corresponding to sugar and acidity of peaches.

3 Results and Discussion 3.1 PCA on the Full Wavelength Region Spectra were exported from the ViewSpec software for multivariate analysis. First, the pretreatment method of multiplicative scatter correction (MSC) was used to correct for additive and multiplicative effects in the spectra after Savitzky-Golay smoothing. Then, PCA was used to derive the first 20 principal components from the spectral data for further analysis to examine the relevant and interpretable structure in the data as well as outlier detection. PCA was performed on the whole wavelengths from 400nm to 1000 nm for the total of 80 peaches in the training set, and two peaches were detected as outliers, it may caused by the man-made error when collecting the spectral curves. It was also noticed that the first four PCs could together explain over 98% of the total population variance and the remainders could account for little. Thus, the first four PCs were appropriate for characteristic description of the peach spectral curve. 3.2 Selection of Optimal Wavelengths Fig. 1 shows the loadings of first four principal components from 78 samples across the entire spectral region. It is called ‘The loading plot of PC1 to PC4’. As described above,

Fig. 1. Loadings of first four principal components from 78 peaches across the entire spectral region

Y. Shao and Y. He

the cumulative reliabilities of PC1 to PC4 were very high, so the loadings of PC1 to PC4 should be considered as the basis to eliminate unnecessary spectral for establishing the calibration model. From the loading figure, it also shows that the wavelengths before 700nm have more wave crest than wavelengths after 700nm. It indicates that the wavelengths in the visible spectral region played a very important role than near infrared region. But it may caused by the color difference of the peaches, not the sugar or acidity. So further analysis of PLS was used to ascertain the sensitive wavelengths of sugar and acidity of peach. 3.3 Sugar Content Prediction After analysis of PCA, two peaches were detected as outliers, and some unnecessary spectral were eliminated to establish the calibration model. PLS was finally used to establish the model for peach quality analysis. All 78 samples were separated randomly into two groups: A calibration set with 48 samples and the remaining 30 samples were used as the prediction set. The correlation coefficient of calibration between NIR measurements and the sugar content was as high as 0.94, with a SEC of 0.52. When the model was used to predict the 30 unknown samples, the correlation coefficient was 0.94, with a SEP of 0.50 and a bias of 0.02 (Fig. 2).

Fig. 2. Vis/NIR prediction results of sugar content for 30 unknown samples from the PLS model

3.4 Acidity Prediction The same disposal methods were used to predict the acidity of peach. The correlation coefficient of calibration between NIR measurements and the acidity was as high as 0.94, with a SEC of 0.08. And in prediction, the correlation coefficient was 0.92, with a SEP of 0.07 and a bias of -0.01 (Fig. 3).

A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach

Fig. 3. Vis/NIR prediction results of acidity for 30 unknown samples from the PLS model

3.5 Analysis of the Sensitive Wavelengths Using Loading Weights and Regression Coefficients In the above discussion of the prediction results from the PLS model, no consideration was given to the contributions of the individual wavelength to the prediction results.

Fig. 4. Loading weights for sugar content of peaches by PLS

Y. Shao and Y. He

This is because the PLS method first applies linear transform to the entire individual wavelength data. As a result, it is often difficult to ascertain how individual wavelengths are directly related to the quantities to be predicted. However, it would be helpful to examine how sugar and acidity contents are simply related to individual wavelengths so that a better understanding of NIR spectroscopy reflectance may be obtained. As to sugar content, after PLS process which was carried out with the 48 samples were finished, the number of factors LVs in PLS analysis was determined as 4 by cross-validation (Fig. 4). By choosing spectral wavebands with the highest loading weights in each of those LVs across the entire spectral region, the optimal wavelengths were chosen: 905-910nm, 692-694 nm, 443-446nm, 480-484nm (in PC1), 975-978nm, 990-992nm, 701-703nm, 638-642nm (in PC2), 984-988nm (in PC3), 580-583nm (in PC4), which were taken as the characteristic wavelengths. And the reflectance values of those 42 wavelengths were set as PLS variable to establish the prediction model. It demonstrated that the prediction results were better than those using the entire spectral region (Fig. 5). To acidity measurement, the number of factors LVs in PLS analysis was determined also as 4 by cross-validation (Fig. 6). By choosing spectral wavelengths with the highest loading weights in each of those LVs across the entire spectral region, 38 wavelengths were chose as the optimal ones. And set them as PLS variable to establish the acidity prediction model. It showed that the prediction result was not as good as those using the entire spectral region (Fig. 7).

Fig. 5. Vis/NIR prediction results of sugar content for 30 unknown samples from the PLS model using several narrower spectral regions

To further analysis the sensitive wavelengths to sugar content and acidity, the regression coefficients were also analyzed, the results were similar to loading weights, shown in Fig. 8 and Fig. 9. From Fig. 8, we can find that wavelengths of 905-910nm, 975-998nm might be of particular importance for the sugar content calibration, the wavelengths in the visible regions like 488-494nm and so on may caused by the color or shape of the peaches. The peak at 968nm may caused by the 2v1+v3 stretching

A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach

Fig. 6. Loading weights for acidity of peaches by PLS

Fig. 7. Vis/NIR prediction results of acidity for 30 unknown samples from the PLS model using several narrower spectral regions

vibration of water. The regression coefficients shown in Fig. 9 also have strong peaks and valleys at certain wavelengths, such as 900-902nm, 980~995 nm related to acidity. The wavelengths of visible spectral regions to acidity were similar to the sugar content,

Y. Shao and Y. He

Fig. 8. Regression coefficients with corresponding wavelengths for sugar content

Fig. 9. Regression coefficients with corresponding wavelengths for acidity

because there is non-existent of organic acids in this region of the spectrum. While the wavelengths between 700 to 950 nm are possible that it results from a 3rd overtone stretch of CH and 2nd and 3rd overtone of OH in peaches which was referred by Rodriguez-Saona et al. in their article about rapid analysis of sugars in fruit juices by

A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach

FT-NIR spectroscopy [23]. Slobodan and Yukihiro also proposed the detailed band assignment for the short-wave NIR region useful for various biological fluids [24]. So in our research, for sugar content, wavelengths 905-910nm, 975~998nm might be of particular importance, and for acidity, 900-902nm, 980~995 nm were better. This found was similar to the earlier literature, such as He found a wavelength of 914nm was sensitive to the sugar content of satsuma mandarins. And near 900nm were sensitive wavelengths corresponding to organic acid of oranges [25].

4 Conclusions The results from this study indicated that it is possible to use a non-destructive technique to measure the sugar and acidity contents of peach using Vis/NIR spectroscopy. Through a hybrid method of PCA and PLS, a correlation was established between the absorbance spectra and the parameters of sugar content and acidity. The results were quite encouraging with a correlation coefficient of 0.94/0.92 and SEP of 0.50/0.07 for sugar/acidity, which showed an excellent prediction performance. At the same time, the sensitive wavelengths corresponding to the sugar content and acidity of peaches or some element at a certain band were proposed on the basis of regression coefficients by PLS. For sugar content, wavelengths 905~910nm, 975~998nm might be of particular importance and for acidity, 900~902 nm, 980~995nm were better. The sensitive wavelengths analysis is very useful in the field of food chemistry. Further research on other fruits is needed to improve the reliability and precision of this technology. Even if to peaches, the different growing phase, growing situation may lead to the different results. And it is also interesting to determine whether there are nondestructive optical techniques for measurement of the maturity indices to peaches, like skin color, flesh firmness, which can combined with sugar content and acidity.

Acknowledgments This study was supported by the Teaching and Research Award Program for Outstanding Young Teachers in Higher Education Institutions of MOE, PRC, Natural Science Foundation of China (Project No: 30270773), Specialized Research Fund for the Doctoral Program of Higher Education (Project No: 20040335034) and Natural Science Foundation of Zhejiang Province, China (Project No: RC02067).

References 1. Marco, E., Maria, C.M., Fiorella, S., Antonino, N., Luigi, C., Ennio, L.N., Giuseppe, P.: Quality Evaluation of Peaches and Nectarines by Electrochemical and Multivariate Analyses: Relationships between Analytical Measurements and Sensory Attributes. Food Chemistry, 60(4) (1997) 659-666 2. Wu, B.H., Quilot, B., Genard, M., Kervella, J., Li, S.H.: Changes in Sugar and Organic Acid Concentrations during Fruit Maturation in Peaches, P. Davidiana and Hybrids as Analyzed by Principal Component Analysis. Scientia Horticulturae, 103(4) (2005) 429-439

Y. Shao and Y. He

3. Steinmetz, V., Sevila, F., Bellon-Maurel, V.: A Methodology for Sensor Fusion Design: Application to Fruit Quality Assessment. Journal of Agricultural Engineering Research, 74 (1) (1999) 21-31 4. Corrado, D.N., Manuela, Z.S., Antonella, M., Roberto, P., Bernd, H., Arnaldo, D.A.: Outer Product Analysis of Electronic Nose and Visible Spectra: Application to the Measurement of Peach Fruit Characteristics. Analytica Chimica Acta, 459(1) (2002) 107-117 5. Steuer, B., Schulz, H. Lager, E.: Classification and Analysis of Citrus Oils by NIR Spectroscopy. Food Chemistry, 72(1) (2001) 113-117 6. He, Y., Feng, S.J., Deng, X.F., Li, X.L.: Study on Lossless Discrimination of Varieties of Yogurt Using the Visible/NIR-spectroscopy. Food Research International, 39(6) (2006) 645-650 7. Downey, G., Fouratier, V., Kelly, J.D.: Detection of Honey Adulteration by Addition of Fructose and Glucose Using Near Infrared Spectroscopy. Journal of Near Infrared Spectroscopy, 11(6) (2004) 447-456 8. He, Y., Li, X. L., Shao, Y. N.: Quantitative Analysis of the Varieties of Apple Using Near Infrared Spectroscopy by Principle Component Analysis and BP Model. Lecture Notes in Artificial Intelligence, 3809 (2005) 1053-1056 9. Lammertyn, J., Nicolay, B., Ooms, K., Semedt, V.De, Baerdemaeker, J.De.: Non-destructive Measurement of Acidity, Soluble Solids and Firmness of Jonagold Apples Using NIR-spectroscopy. Transactions of the ASAE, 41(4) (1998) 1089-1094 10. Carlini, P., Massantini, R. Mencarelli, F.: Vis−NIR Measurement of Soluble Solids in Cherry and Apricot by PLS Regression and Wavelength Selection. Journal of Agricultural and Food Chemistry, 48(11) (2000) 5236−5242 11. Lu. R.: Predicting Firmness and Sugar Content of Sweet Cherries Using Near-infrared Diffuse Reflectance Spectroscopy. Transactions of the ASAE, 44(5) (2001) 1265-1271 12. McGlone, V.A., Fraser, D.G., Jordan, R.B., Kunnemeyer, R.: Internal Quality Assessment of Mandarin Fruit by Vis/NIR Spectroscopy. Journal of Near Infrared Spectroscopy, 11(5) (2003) 323-332 13. Pedro, A.-M.K., Ferreira, M.-M.C.: Nondestructive Determination of Solids and Carotenoids in Tomato Products by Near-Infrared Spectroscopy and Multivariate Calibration. Analytical Chemistry, 77(8) (2005) 2505-2511 14. He, Y., Zhang, Y., Xiang, L. G.: Study of Application Model on BP Neural Network Optimized by Fuzzy Clustering. Lecture Notes in Artificial Intelligence, 3789 (2005) 712-720 15. Zhang, Y. D., Dong, K., Ren, L. F.: Patternre Cognition of Laser-induced Auto Fluorescence Spectrum from Colorectal Cancer Tissues Using Partial Least Square and Neural Network. China Medical Engineering, 12(4) (2004) 52-59 16. Dou, Y., Sun, Y., Ren, Y. Q., Ren, Y. L.: Artificial Neural Network for Simultaneous Determination of Two Components of Compound Paracetamol and Diphenhydramine Hydrochloride Powder on NIR Spectroscopy. Analytica Chimica Acta, 528(1) (2005) 55-61 17. Fu, X. G., Yan, G. Z., Chen, B., Li, H.B.: Application of Wavelet Transforms to Improve Prediction Precision of Near Infrared Spectra. Journal of Food Engineering, 69(4) (2005) 461-466 18. Slaughter, D.C.: Non-Destructive Determination of Internal Quality in Peaches and Nectarines. Transactions of the ASAE, 38(2) (1995) 617-623 19. Pieris, K.-H.S., Dull, G.G., Leffler, R.G., Kays, S.J.: Spatial Variability of Soluble Solids or Dry-matter Content within Individual Fruits, Bulbs, or Tubers: Implications for the Development and Use of NIR Spectrometric Techniques. Hortscience, 34(1) (1999) 114-118

A Hybrid Model for Nondestructive Measurement of Internal Quality of Peach

20. Ortiz, C., Barreiro, P., Correa, E., Riquelme, F., Ruiz-Altisent, M.: Non-destructive Identification of Woolly Peaches Using Impact Response and Near-infrared Spectroscopy. Journal of agricultural Engineering Research, 78(3) (2001) 281-289 21. Golic, M., Walsh, K.B.: Robustness of Calibration Models Based on Near Infrared Spectroscopy for the In-line Grading of Stonefruit for Total Soluble Solids Content. Analytical Chimica Acta, 555(2) (2006) 286-291 22. Naes, T., Isaksson, T., Fearn, T., Davies, A.M.: A User-friendly Guide to Multivariate Calibration and Classification, NIR Publications, UK (2002) 23. Rodriguez-Saona, L.E., Fry, F.S., McLaughlin, M.A., Calvey, E.M.: Rapid Analysis of Sugars in Fruits by FT-NIR Spectroscopy. Carbohydrate Research, 336(1) (2001) 63-74 24. Sasic, S., Ozaki, Y.: Short-Wave Near-Infrared Spectroscopy of Biological Fluids. 1. Quantitative Analysis of Fat, Protein, and Lactose in Raw Milk by Partial Least-Squares Regression and Band Assignment. Analytical Chemistry, 73(1) (2001) 64-71 25. He, Y.D.F.: 1998. The Method for Near Infrared Spectral Anlysis. In Yan, Y. L., Zhao, L. L., Han, D. H., Yang, S. M. (Eds.), The Analysis Basic and Application of Near Infrared Spectroscopy 354. Light Industry of China, Bei Jing

A Novel Approach in Sports Image Classification∗ Wonil Kim1, Sangyoon Oh2, Sanggil Kang3, and Kyungro Yoon4, ** 1

College of Electronics and Information Engineering at Sejong University, Seoul, Korea [emailprotected] 2 Computer Science Department at Indiana University, Bloomington, IN, U.S.A. [emailprotected] 3 Department of Computer Science, The University of Suwon, Gyeonggi-do, Korea [emailprotected] 4 School of Computer Science and Engineering at Konkuk University, Seoul, Korea [emailprotected]

Abstract. It will be very effective and useful if an image classification system uses a standardized feature such as MPEG-7 descriptors. In this paper, we propose a sports image classification system that properly classifies sports images into one of eight classes. The proposed system uses normalized MPEG-7 visual descriptors as the input of the neural network system. The experimental results show that the MPEG-7 descriptorscan be used as the main feature of image classification system.

1 Introduction In this paper, we propose sports image classification system that classifies images into one of eight classes, such as Taekwondo, Field and Track, Ice Hockey, Horse Riding, Skiing, Swimming, Golf, and Tennis. These eight sports are selected according to the particular features of the given sports. The proposed system uses MPEG-7 visual descriptors as the main input feature of the classification system. In this paper, we first analyze several MPEG-7 descriptors, regarding color, texture, and shapes. After which we discuss several descriptors that perform well on sports image classification. This task is effective and requires no intense time computation. It can be de facto standard for real time image classification. The simulation shows that the visual MPEG-7 descriptors can be effectively used as main features of the images classification process and the proposed system can successfully rate images into multiple classes depending on the employed descriptors. In the next chapter, we discuss some previous researches in the Neural Network based image classification, then image classification system using MPEG-7 descriptors, and finally sports image classification. The proposed system is explained in the next section. The simulation environment and the results are discussed in Chapter 4. Chapter 5 concludes. ∗ **

This paper is supported by Seoul R&BD program. Author for correspondence: +82-2-450-4129.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 54 – 61, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Approach in Sports Image Classification

2 Related Works 2.1 Sports Image Classification Due to the large amount of digitized media being generated and the popularity of sports, sports image classification has become an area that requires all the techniques and methods that are described in this section. Jung et al [1] proposed a sports image classification system using Bayesian network. In this work, they showed that image mining approach using statistical model can produce a promising result on sports image classification. The existing CBIRS like QBIC and VisualSEEK [2] provide the image retrieval system based on method that are limited to low-level features such as texture, shape, and color histograms. There are researches that use various techniques to specific image domains, such as sports images. For automatic multimedia classification, Ariki and Sugiyama show a general study of classification problem of TV sports news and propose a method for this problem using multi space method that provides a sports category with more than one subspace corresponding to the typical scenes. Discrete Cosine Transformation (DCT) components are extracted from the whole image and they are used as the classification features [3]. Their other paper [4] contains in-depth experimental results. The Digital Video | Multimedia (DVMM) Lab [5] of Columbia University has done many researches in the image classification area. One of them is about structural and semantic analysis of digital videos [6]. Chang and Sundaram develop algorithms and tools for segmentation, summarization, and classification of video data. For each area, they emphasize the importance of understanding domain-specific characteristics, and discuss classification techniques that exploit spatial structure constraints as well as temporal transitional methods. One of the key problems in achieving efficient and user-friendly retrieval in the domain of images and videos is developing a search mechanism that guarantees the delivery of high precision information. One of the restrictions of image retrieval system is that it should have a sample object or a sample texture. Khan et al. [7, 8, 9, and 10] propose an image processing system in which it examines the relationships among objects in images to help to achieve a more detailed understanding of the content and meaning of individual images. It uses domain dependent ontology to create a meaning based index structure through the design and implementation of concept-based model. They propose a new mechanism to generate ontology automatically for scalable system. Their approach is applied to the sports image domain. ASSAVID is an EU sponsored project to develop a system for automatic segmentation and semantic annotation of sports video. Messer, Christmas and Kittler describe the method they use to automated classification of unknown sports video in their paper [11]. The technique is based on the concept of “cues” which attach semantic meaning to low-level features computed on the video. The paper includes experimental results with sports videos. Hayashi et al. [12] present a method to classify scenes based on motion information. Comparing to previous works that use object trajectories and optical flow field as motion information, they use the instantaneous motions of multiple objects in each image. To deal with number of objects in a scene, moment statistics are used as features in the method.

W. Kim et al.

The method consists of two phases: scenes in the learning data are clustered in the learning phase and a newly observed scene are classified in the recognition phase. 2.2 Neural Networks and MPEG-7 Descriptors Neural network has been used to develop methods for a high accuracy pattern recognition and image classification for a long period of time. Kanellopoulos and Wilkinson [13] perform their experiments about using different neural networks and classifiers to categorize images including multi-layer perceptron neural networks and maximum likelihood classifier. The paper examines the best practice in areas such as: network architecture selection, algorithm optimization, input data scaling, enhanced feature sets, and hybrid classifier methods. They have recommendations and strategies for effective and efficient use of neural networks as well. It is known that the neural network used for the modeling the image classification system should make different errors to be effective. Giacinto and Roli [14] propose an approach to ensemble automatic design of neural network. Their approach is aimed to select the subset of given large set of neural networks to form the most errorindependent nets. The approach consists of the overproduction phase and the choice phase, which choose the subset of neural networks. The overproduction phase is studied by Partidge [15] and the choice phase are sub-divided into the unsupervised learning step for identifying subsets and the final ensemble set creation step by selecting subsets from the previous step. Contrast to a relatively longer period of the study of neural network in image classification and content-based image retrieval system, MPEG-7 [16] is a recent emerging standard used in this area. It is not a standard dealing with the actual encoding and decoding of video and audio, but it is a standard for describing media content. It uses a XML to store metadata and it solves the problem of lacking standard to describe visual image content. The aim, scope, and details of MPEG-7 standard are well described by Sikora of Technical University Berlin in his paper [17]. There are a series of researches that use various MPEG-7 descriptors. Ro et al. [18] shows a study of texture based image description and retrieval method using an adapted version of hom*ogeneous texture descriptor of MPEG-7. Other studies of image classification use descriptors like a contour-based shape descriptor [19], a edge histogram descriptor [20], and a combination of color structure and hom*ogeneous texture descriptors [21]. As a part of the EU aceMedia project research, Spyrou et al. propose three image classification techniques based on fusing various low-level MPEG-7 visual descriptors [22]. Since a direct inclusion of descriptors would be inappropriate and incompatible, fusion is required to bridge the semantic gap between the target semantic classes and the low-level visual descriptors. The three different image classification techniques are: a merging fusion, a back-propagation fusion, and a fuzzy-ART neuro-fuzzy network. There is a CBIRS that combines neural network and MPEG-7 standard: researchers of Helsinki University of Technology develop a neural, self-organizing system to retrieve images based on their content, the PicSOM (the Picture + self-organizing map, SOM) [23]. The technique used to develop the PicSOM system is based on pictorial examples and relevance feedback (RF) and the system is implemented using tree structured SOM. The MPEG-7 content descriptor is provided for the system. In

A Novel Approach in Sports Image Classification

the paper, they compare the PicSOM indexing technique with a reference system based on vector quantization (VQ). Their results show the MPEG-7 content descriptor can be used in the PicSOM system despite the fact that Euclidean distance calculation is not optimal for all of them.

3 The Proposed Sports Image Classification System 3.1 Feature Extraction Module Our classification system for classifying sports image is composed of two modules such as the feature extraction module and the classification module. The two modules are connected in serial form, as shown in Fig. 1. In the feature extraction module, there are three engines. From the figure, MPEG-7 XM engine extracts the features of images with XML description format. The parsing engine parses the raw descriptions to transform them to numerical values, which are suitable for neural network implementation. The preprocess engine normalizes the numerical values to the 0-1 range. By normalizing the input features, it can avoid that input features with big number scale dominant the output of the neural network classifier (NNC) for the classification of the sports image over input features with small number scale.

Sports Image

Output MPEG-7 XM Engine

Parsing Engine

Preprocessing Engine

Neural Network Classifier

Classification Module

Feature Extraction Module

Fig. 1. The schematic of our sports image classification system

3.2 Classification Module Using the data set of the normalized input features and classes of sports, we can model an NNC in the classification module. Fig. 2 shows an example of the NNC with three layers, i.e., one input layer, one hidden layer, and one output layer. According to different MPEG-7 descriptors, the number of the input features can be various. Let us denote the input feature vector obtained from the first MPEG-7 descriptor as X D1 = ( xD1,1 , x D1, 2 , , x D1,i , , x D1,n ) , here x D1,i is the ith input feature 1

extracted from MPGE-7 descriptor 1 and the subscript n1 is the dimension of the input features from the first MPEG-7 descriptor. With the same way, the input feature vector obtained from the last MPEG-7 descriptor k can be expressed as X Dk = ( x Dk ,1 , x Dk , 2 , , x Dk ,i , , xDk ,n ) . Also, the output vector can be expressed as k

W. Kim et al.

x D1,1 x D1, 2 . . . . x Dk ,n

y1 y2

. . . .

. . .

Input layer

Hidden layer

. . . .

Output layer

Fig. 2. An example of three layered neural network classifier

Y = ( y1 , y 2 , , yi , , y s ) , here yi is the output from the ith output node and the subscript s is the number of classes. By utilizing the hard limit function in the output layer, we can have binary value, 0 or 1, for each output node yi as Equation (1).

netinputo ≥ 0 · ¸ otherwise ¸¹

§ 1, yi = f o (netinputo ) = ¨¨ © 0,

(1)

where fo is the hard limit function at the output node and netinput o is the net input of fo. As shown in Equation (2), the net input is can be expressed as the product of the output vector in the hidden layer, denoted as Yh, and the weight vector Wo at the output layer. netinputo = Wo Yh T

(2)

With the same way, the hidden layer output vector, Yh , can also be computed by functioning the product of the input weight vector and the input vector. Thus, the accuracy of the NNC depends on the values of whole weight vectors. To obtain the optimal weight vectors, the NNC is trained using the back-propagation algorithm which is commonly utilized for training neural networks. The training is done after coding each class of sports into s dimension orthogonal vector. For example, if we have eight classes then the classes are coded to (1, 0, 0, 0, 0, 0, 0, 0), (0, 1, 0, 0, 0, 0, 0, 0), . . . , (0, 0, 0, 0, 0, 0, 0, 1). Once obtaining an optimal weight vector, we evaluate the performance of NNC using the test data which is unseen during training phase.

4 Experiment 4.1 Experimental Environment

We implemented our sports image classification system using 8 sports image data such as Taekwondo, Field & Track, Ice Hocky, etc. As explained in the previous section, we extracted input features from query images using four MPEG-7 descriptors such as Color Layout (CL), Edge Histogram (EH), hom*ogenous Texture

A Novel Approach in Sports Image Classification

(HT), and Region Shape (RS) from the feature extraction module. The input feature values were normalized into 0-1 range. A total of 2,544 images were extracted. For training an NNC, 2,400 images (300 images each sports) were used and 144 images (18 images each sports) for testing. The training and testing images are exclusive. We structured the three-layered NNC in the classification module. The hyperbolic tangent sigmoid function and hard limit function was used in the hidden layer and in the output layer, respectively. In the hidden layer, 32 nodes were connected. For training

Table 1. The classification accuracies for 4 different MPEG-7 descriptors (%)

Taekwondo CL 77.78 RS 50.00 Taekwondo HT 87.50 EH 27.78 CL 0.00 Field & RS 5.56 Track HT 0.00 EH 11.11 CL 0.00 RS 0.00 Ice Hockey HT 4.65 EH 5.56 CL 0.00 RS 0.00 Horse Riding HT 5.56 EH 11.11 CL 0.00 RS 0.00 Skiing HT 0.00 EH 5.56 CL 5.56 Swim- RS 5.56 ming HT 5.71 EH 16.67 CL 16.67 RS 0.00 Golf HT 0.00 EH 5.56 CL 11.11 RS 5.56 Tennis HT 0.00 EH 11.11

Field & Ice Track Hockey 11.11 5.56 5.56 11.11 0.00 0.00 22.22 0.00 66.67 5.56 50.00 11.11 86.67 0.00 50.00 0.00 11.11 72.22 0.00 33.33 0.00 55.81 44.44 11.11 0.00 5.56 5.56 5.56 16.67 5.56 27.78 22.22 0.00 5.56 5.56 5.56 0.00 0.00 0.00 0.00 5.56 5.56 0.00 0.00 0.00 0.00 11.11 11.11 5.56 0.00 11.11 5.56 0.00 0.00 11.11 0.00 11.11 0.00 11.11 16.67 0.00 30.95 5.56 11.11

Horse Riding 0.00 5.56 0.00 0.00 16.67 5.56 0.00 16.67 0.00 38.89 6.98 33.33 83.33 77.78 33.33 33.33 5.56 0.00 0.00 16.67 0.00 11.11 17.14 5.56 5.56 11.11 0.00 11.11 5.56 11.11 4.76 5.56

Skiing Swimming 0.00 0.00 5.56 11.11 0.00 0.00 5.56 16.67 0.00 0.00 0.00 16.67 0.00 0.00 0.00 0.00 5.56 5.56 11.11 5.56 0.00 9.30 0.00 0.00 0.00 11.11 0.00 5.56 11.11 16.67 0.00 0.00 83.33 5.56 72.22 11.11 59.09 4.55 33.33 22.22 11.11 66.67 5.56 44.44 2.86 54.29 27.78 22.22 0.00 0.00 11.11 11.11 0.00 25.00 22.22 11.11 16.67 11.11 5.56 5.56 0.00 19.05 27.78 11.11

Golf

Tennis

0.00 11.11 12.50 22.22 0.00 11.11 13.33 11.11 0.00 11.11 23.26 0.00 0.00 0.00 5.56 5.56 0.00 5.56 36.36 16.67 0.00 16.67 20.00 5.56 72.22 33.33 68.18 33.33 0.00 16.67 11.90 5.56

5.56 0.00 0.00 5.56 11.11 0.00 0.00 11.11 5.56 0.00 0.00 5.56 0.00 5.56 5.56 0.00 5.56 0.00 0.00 5.56 0.00 16.67 0.00 0.00 0.00 16.67 6.82 5.56 44.44 27.78 33.33 22.22

W. Kim et al.

the NNC, we chose the back-propagation algorithm because of its training ability. In order to optimal weight vectors, large number of iterations (500,000 in this experiment) is selected. 4.2 Experimental Result

Table 1 shows the result of the accuracy of our sports image classification system for each sports image according to 4 different MPEG-7 descriptors. As seen in the table, we can see the input features extracted from Color Layout descriptor provide the best overall performance (about 70% accuracy) of classifying sports images for all sports except Field & Track since its image consists of both Track and Field. While those results from Region Shape descriptor do not work well for most of the sports, the input features from Region Shape work relatively well for the speedy sports such as Horse Riding (77.78%) and Skiing (72.22%). The results from hom*ogenous Texture for the outdoor sports such as Field & Track (86.76%) and Golf (68.18%) are also acceptable. From the analysis, we can say that our sports image classification system shows promising results for classifying sports images when the input features extracted from Color Layout descriptor are used as inputs of the NNC. Other descriptors can be complementary features according to images and domain.

5 Conclusion This paper proposed a novel classification system for classifying sports images using the neural network classifier. From the experimental results, we can conclude that the system provides acceptable classification performance (about 70%) when Color Layout MPEG-7 descriptor is used for extracting the input features of a neural network classifier. As the further researches for improving the classification performance, we continue to find the best combination of MPEG-7 descriptors by heuristic algorithms and empirical experiments. In the next research, we plan to extend the number of available sports to more than 20 instead of 8 mentioned in this paper.

References 1. Jung, Y., Hwang, I., Kim, W.: Sports Image Classification Using Bayesian Approach. Lecture Notes in Computer Science, Vol. 3697. Springer-Verlag, Berlin Heidelberg, New York (2003) 426-437 2. Smith, J., Chang, S.: Tools and Techniques for Color Image Retrieval. In Proceedings of The Symposium on Electronic Imaging: Science and Technology Storage and Retrieval for Image and Video Databases (1996) 426-437 3. Ariki, Y., Sugiyama, Y.: Classification of TV Sports News by DCT Features Using Multisubspace Method. In Proceedings of 14th International Conference on Pattern Recognition, Vol. 2 (1998) 1488-1491 4. Sugiyama, Y., Ariki, Y.: Automatic Classification of TV Sports News Video by Multiple Subspace Method. Systems and Computers in Japan, Vol. 31, No. 6 (2000) 90-98 5. Digital Video Multi Media (DVMM) Lab of Columbia University, http://www.ctr. columbia.edu/dvmm/newHome.htm

A Novel Approach in Sports Image Classification

6. Chang, S., Sundaram, H.: Structural and Semantic Analysis of Video. In Proceedings of IEEE International Conference on Multimedia and Expo (2000) 687 7. Khan, L., McLeod, D., Hovy, E.: Retrieval Effectiveness of An Ontology-based model for Information Selection. The VLDB Journal: The International Journal on Very Large Databases, Vol. 13, No. 1. ACM/Springer-Verlag Publishing (2004) 71-85 8. Khan, L., Wang, L.: Automatic Ontology Derivation Using Clustering for Image Classification. In Proceedings of 8th International Workshop on Multimedia Information System (2002) 56-65 9. Breen, C., Khan, L., Kumar, A., Wang, L.: Ontology-based Image Classification Using Neural Networks. In Proceedings of SPIE Internet Multimedia Management Systems III (2002) 198-208 10. Breen, C., Khan L., Ponnusamy, A.: Image Classification Using Neural Network and Ontologies. In Proceedings of 13th International Workshop on Database and Expert Systems and Application, Vol. 2 (2002) 98-102 11. Messer, K., Christmas, W., Kittler, J.: Automatic Sports Classification. In Proceedings of 16th International Conference on Pattern Recognition, Vol. 2 (2002) 1005-1008 12. Hayashi, A., Nakashima, R., Kanbara, T., Suematsu, N.: Multi-object Motion Pattern Classification for Visual Surveillance and Sports Video Retrieval. In Proceedings of 15th International Conference on Vision Interface (2002) 13. Kanellopoulos, I., Wilkinson, G.: Strategies and Best Practice for Neural Network Image Classification. International Journal of Remote Sensing, Vol. 18, No. 4 (1997) 711-725 14. Giacinto, G., Roli, F.: Design of Effective Neural Network Ensembles for Image Classification Purposes. Image and Vision Computing, Vol. 19, No. 9-10 (2001) 699-707 15. Patridge, D.: Network Generalization Differences Quantified. Neural Networks, Vol. 9, No. 2 (1996) 263-271 16. MPEG-7 overview, http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm. 17. Sikora, T.: The MPEG-7 Visual Standard for Content Description – an overview. IEEE Transactions on Circuit and Systems for Video Technology, Vol. 11, No. 6 (2001) 696-702 18. Ro, Y., Kim, M., Kang, H., Manjunath, B., Kim, J.: MPEG-7 hom*ogeneous Texture Descriptor. ETRI Journal, Vol. 23, No. 2 (2001) 41-51 19. Bober, M.: The MPEG-7 Visual Shape Descriptors. IEEE Transactions on Circuit and Systems for Video Technology, Vol. 11, No. 6 (2001) 716-719 20. Won, C., Park, D., Park, S.: Efficient Use of MPEG-7 Edge Histogram Descriptor. ETRI Journal, Vol. 24, No. 1 (2002) 23-30 21. Pakkanen, J., Ilvesmaki, A., Iivarinen, J.: Defect Image Classification and Retrieval with MPEG-7 Descriptors. Lecture Notes in Computer Science, Vol. 2749. Springer-Verlag, Berlin Heidelberg, New York (2003) 349-355 22. Spyrou, E., Borgne, H., Mailis, T., Cooke, E., Arvrithis, Y., O’Connor H.: Fusing MPEG7 Visual Descriptors for Image Classification. Lecture Notes in Computer Science, Vol. 3697. Springer-Verlag, Berlin Heidelberg, New York (2005) 847-852 23. Laaksonen, J., Koskela, M., Oja, E.: PicSOM – Self-organizing Image Retrieval with MPEG-7 Content Descriptor. IEEE Transactions on Neural Networks: Special Issue on Intelligent Multimedia Processing, Vol. 13, No. 4 (2002) 841-853

A Novel Biometric Identification Approach Based on Human Hand∗ Jun Kong1, 2, ∗∗, Miao Qi1, 2, Yinghua Lu1, Shuhua Wang1, 2, and Yuru Wang1, 2 1

Computer school, Northeast Normal University, Changchun, Jilin Province, China 2 Key Laboratory for Applied Statistics of MOE, China {kongjun, qim801, luyh, wangsh946, wangyr950}@nenu.edu.cn

Abstract. At present, hand-based identification as a biometric technique is being widely researched. A novel personal identification approach is presented in this paper. In contrast with the existing approaches, this system extracts multimodal features, including hand shape, palm-print and finger-print to facilitate coarse-to-fine dynamic identification. Five hand shape geometrical features are used to guide the selection of a small set of similar candidate samples at coarse level matching stage. In fine level matching stage, the features of one palmprint region and six finger regions segmented from three middle fingers are used for the final confirmat ion. The Gabor filters and wavelet moment are used to extract the palm-print feature. In addition, the maximum matching method and the fusion matching mechanism are applied in decision stage. The experimental results show the effectiveness and reliability of the proposed approach.

1 Introduction Hand-based recognition systems verify a person’s identity by analyzing his/her physical features, which have been widely used in many personal identification applications because they possess the following physiological properties: acceptability, uniqueness and arduous duplicate characteristics such as fingerprints, face, iris and retina, etc. However, it has been reported in [1] that hand-based identification is one of the most acceptable biometric. Human hand contains a lot of visible characteristics features including hand shape, principal lines, wrinkles, ridges and finger texture, which are unique to an individual and stable with the growth of age. How to extract these features is a key step for identification. From the viewpoint of feature extraction, existing hand-based recognition approaches mainly include the line-based approaches [2-4], the texture-based approaches [5-7] and appearance-based approaches [8-10]. And most of the existing systems are based on single palm-print feature which might lead to low recognition rate sometimes. Therefore, the multimodal biometric identification system integrating two or more different biometric features is being developed. ∗

This work is supported by science foundation for young teachers of Northeast Normal University, No. 20061002, China. ∗∗ Corresponding author. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 62 – 71, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Biometric Identification Approach Based on Human Hand

Image Acquisition

Pre-processing and Sub-image Segmentation

Library1

Coarse-level Identification Y Sub-images Feature Extraction

Index Vector

Fine-level Identification

Library2

Decision N

Owner

Attacker Fig. 1. Block diagram of the proposed identification system

In our multimodal biometric identification system, hand geometrical features, palm-print region of interest (PROI) features and six finger strip of interest (FROI) features are employed and a coarse-to-fine dynamic identification strategy is adopted to implement a reliable and real time personal identification system. A block-diagram of the proposed system is shown in Fig. 1, where hand geometrical features and texture features are stored in Library1 and Library2, respectively. Firstly, handprint image is captured by a flatbed scanner as an input device. Then, a series of pre-processing operations are employed for the segmentation of PROI and FROI, and geometry features are also obtained in the process of pre-processing. The hand shape geometry features are first used for coarse-level identification. And the 2D Gabor filters and wavelet moment are used to extract the PROI features for finelevel identification. At decision stage, the maximum matching method and fusion matching mechanism are employed to output the identification result. The rest of this paper is organized as follows. Section 2 introduces the image acquisition and the segmentation of FROI and PROI. Section 3 describes the Gabor filters and wavelet moment function in brief. The process of personal identification is depicted in Section 4. The experimental results are reported in Section 5. The conclusions are given in Section 6.

J. Kong et al.

2 Images Acquisition and Pre-processing 2.1 Images Acquisition In our system, no guidance pegs are fixed on the flatbed scanner. The users place their right hands freely on the platform of the scanner and the images collected are shown in Fig. 2. The advantage of this scanning manner is that the palm need not to be inked or no docking device required on the scanner to constrain the hand position. In this way, the user will not feel uncomfortable during the images acquisition.

(a)

(b)

(c)

(d)

(e1)

(e2)

(e3)

(e4)

Fig. 2. (a)-(d), (e1)-(e4) are the original gray-level images of handprint scanned from different persons and the same person, respectively

2.2 Pre-processing Before feature extraction, the segmentation of a PROI and six FROI is performed. The process of segmentation mainly includes two steps in our proposed approach: border tracing and key points locating, PROI and FROI generating, which are detailed in the following paragraphs. Step 1: Binarize the hand image by Otsu’s method [11]. Then trace the border starting from the left top to obtain the contour of hand shape which is sequentially traced and represented by a set of coordinates. The finger-webs location algorithm proposed by Lin [12] is used to obtain the seven key points (a-g) (as shown in Fig. 3). Step 2: Based on Step 1, the six regions of interest including one PROI and six FROI, are segmented and five hand shape features are also extracted in following process of pre-processing: 1. Find the point h and k which are the intersection of lines db , Then compute the midpoints m1 , m2 , m3 and

df with hand contour.

m4 of lines hb , bd , df and fk .

A Novel Biometric Identification Approach Based on Human Hand

2. Find line and line

AB which is parallel to line bf , and the distance L between line AB

bf is 50 pixels.

3. Form five length features by computing length of lines of and

am1 , cm2 , em3 , gm4

AB . Locate the top left corner R1 and top right corner R2 of PROI. As

shown in Fig. 3, line

fR2 is perpendicular to line bf and the length of line fR2

is 20 pixels. In addition, the length of line R1 R2 is 20 pixels longer than line bf . Fig. 4(a) shows the segmented square region R1 R2 R4 R3 as PROI.

4. Extract two finger strips of interest (FSOI) on ring finger with the sizes of 50 × 32 and 60 × 32 according to line cm2 (see Fig. 4(b)). 5. Find two FROI of size 32 × 32 with the maximal entropy value on the two FSOI segmented in Step 5. 6. Repeat Step 4 and 5 to find the other four FROI on middle and index fingers based on lines em3 and gm4 . Then store the six FROI (see Fig. 4(c)) as templates.

Fig. 3. The process of handprint segmentation

(a)

(b)

(c)

Fig. 4. (a) The segmented PROI. (b) The FSOI on three middle fingers. (c) The FROI segmented from (b).

J. Kong et al.

PROI images segmented from different person may be different in size, even come from the same person. The size is variable with the difference of stretch degree. The PROI is normalized to 128 × 128 pixels in our work.

3 Feature Extraction 3.1 Gabor Filtering Gabor filter is wildly used to feature extraction [13-16], which has been already demonstrated to be a powerful tool in texture analysis. A circular 2-D Gabor filter form is defined as:

x2 + y2 ½ exp ®− 2 ¾ . 2πσ 2 ¯ 2σ ¿ exp{2πi (ux cos θ + uy sin θ )},

G ( x, y ,θ , u , σ ) =

(1)

where i= − 1 , u is the frequency of the sinusoidal wave, θ controls the orientation of the function and σ is the standard deviation of the Gaussian envelope. By intuitive observation about the PROI images from different persons, we found that the dominant texture lines mainly lie on π / 8 , 3π / 8 , 3π / 4 directions. But there will be more pseudo texture lines due to different tensility and pressure in the direction 3π / 4 on captured image. Therefore, the Gabor filter is convoluted with the PROI in two directions in our study: π / 8 and 3π / 8 . The filtered image is employed by the real part. Then an appropriate threshold value is selected to binarize the filtered

(a)

(b) Fig. 5. (a) Binary images of PROI filtered using Gabor filter with two directions

3π / 8

π /8 ,

from two different persons. (b) The results of (a) processed by sequential morphological operations.

A Novel Biometric Identification Approach Based on Human Hand

image. Fig. 5(a) shows a part of results of the binarized images. Finally, morphological operators including clean, spur and label are employed to removes the spur, isolated pixels and trim some short feature lines (shown in Fig. 5(b)). 3.2 Wavelet Moment Representation The characteristic of wavelet moment method is particularly suitable for extracting local discriminative features of normalized images. Its translation, rotation and scale invariance promote itself as a widely used feature extraction approach [17]. The family of wavelet basis function is defined as:

ψ a , b (r ) =

ψ(

r −b ), a

(2)

where a is dilation and b is shifting parameters. The cubic B-spline in Gaussian approximation form is:

ψ β (r ) = n

where

(2r − 1) 2 ½ 4a n+1 σ w cos(2πf 0 (2r − 1)) exp®− 2 ¾, . 2π (n + 1) ¯ 2σ w (n + 1) ¿

(3)

n =3, a = 0.697066, f 0 = 0.409177, and σ w2 = 0.561145. Since the size r of

an image is always restricted in a domain [0, 1], let both parameters be set to 0.5, and the domains for m and n can be restricted as follows:

a = 0.5 m , m = 0, 1, ..., M , b = n ⋅ 0.5.0.5 m , n = 0, 1, ..., 2 m+1 .

(4)

Then the wavelet defined along a radial axis in any orientation can be rewritten as:

ψ mβ ,n (r ) = 2 m / 2ψ β (2 m r − 0.5n). n

(5)

And the cubic B-spline Wavelet moments (WMs) are defined as:

Wm,n ,q = ³ ³ f (r ,θ )ψ mβ ,n (r )e − jpθ rdrdθ . n

(6)

If N is the number of pixels along each axis of the image, then the cubic B-spline WMs for a digital image f ( r , θ ) can be defined as:

Wm,n ,q = ¦¦ f ( x, y )ψ mβ ,n (r )e − jpθ ΔxΔy, n

(7)

r = x + y ≤ 1, θ = arctan( y x) 2

4 Identification 4.1 Coarse-Level Identification Though the geometrical length features are not so discriminative but it can be used in coarse level matching to facilitate the system to work on a small candidates. Five

J. Kong et al.

hand shape length values are obtained in the pre-processing block. There are M training samples for every person X in the enrollment stage. μ is the template which is the mean vector of the M vectors. The similarity between testing sample to the template is measured by the Manhattan distance defined as follows: L

d ( x, μ ) = ¦ | xi −μ i |,

(8)

i =1

If the distance is smaller than pre-defined threshold value, record the index number of the template into an index vector R for fine-level identification. 4.2 Fine-Level Identification The index vector R has been recorded in coarse-level identification stage. In this section, the testing image will be further matched with the templates whose index numbers are in R. One PROI and six FSOI regions are segmented from the testing sample as shown in Fig. 4. The correlation function is adopted to compute the correlation value between the FSOI and the template. The matching rule is that a template in Library2 moves from up to down on FSOI of testing sample, and there is a correction value when the template moves one time. At last we select a maximal value as correlation value. The PROI of testing sample convolutes with Gabor filter in two directions. Then the feature vector of the PROI is computed by wavelet moment. The correlation function is used again for measuring the similarity degree. The outputs of the eight matching results are combined at the matching-score level using fusion procedure. The fusion mechanism is expressed as following equation: 8

S = ¦ wi ⋅ si ,

(9)

i =1

where wi is weight factor associated with each of the hand parts and fulfill the condition w1 + w2

+ ... + w8 = 1 and their values are set w1 = w8 = 0.13, w2 = 0.14, w3 = 0.12, w4 = w5 = w6 = 0.11 and w7 = 0.15. 5 Experimental Results In this section, our proposed approach is performed to evaluate the effectiveness and accuracy. A handprint database contains 1000 handprint images collected from 100 individuals’ right hand using our flatted scanner. The size of all images is 500×500 and the resolution is 300 dip. Five images of per user were used for training and remaining five images were employed for the testing. Each image was processed by the procedure involving pre-processing, segmentation and feature extraction. At the stage of coarse-level identification, the threshold value is set 30. The final identification results are usually quantified by false rejection rate (FRR) and false

A Novel Biometric Identification Approach Based on Human Hand

acceptation rate (FAR) which are variable depending on the threshold

p .The distribu-

tions of FRR ( p ) and FAR ( p ) are depicted in Fig. 6. There is also another threshold T2 is selected for fine-level identification. More than one template may smaller than T2 at final outputs. We select the smallest distance between the query sample and template as the final identification result. If the number of output is zero, it illuminates that the query sample is an attacker. The accuracy of personal identification is measured by the correct match rate CMR which is defined as:

CMR = 1 − [ FRR( p0 ) + FAR( p0 )]. Seen from Fig. 6, the

(10)

CMR can reach the value 96.21%, when p0 =0.815,

FRR=1.97%, FAR=1.82%. Comparing with the single palm-print methods for personal identification, our approach fuses multiple features to facilitate fine-level identification, which increases the reliability of decisions. Failure identification occurs in some handprint. The main reason of the failure identification is the variation of pressure and tensility while acquiring handprint mages. The pseudo texture lines (as shown in Fig. 7) in side of the PROI lead to mismatch.

Fig. 6. The distributions of FRR

( p ) and FAR ( p )

Fig. 7. The PROI images with different pressure and tensility

J. Kong et al.

6 Conclusions In this paper, there are three main advantages of proposed coarse-to-fine dynamic matching strategy. Firstly, no guidance pegs scanned mode is adopted to capture handprint image, which won’t make user feel uncomfortable. But failure identification may occur in some handprint image in that there are pseudo texture lines in the side of PROI because of the variation of pressure and tensility or sometimes hand moves while acquiring handprint mages, which is the reason that the proposed system can’t reach very high CMR . Secondly, our system adopts a coarse-to-fine dynamic matching strategy, which implements the real-time of system. Thirdly, this system adopts a multimodal approach, rather than concentrating just on one of the hand area, which increases the reliability of decisions. 2-D Gabor filters and wavelet moment are employed to capture the texture feature on PROI. Based on the cubic B-spline wavelet function, it is near-optimal in terms of its space-frequency localization have the wavelet inherent property of multi-resolution analysis. The maximum matching method and the fusion matching mechanism are applied in decision stage. The experimental results show that the proposed multimodal personal identification approach is feasible and reliable.

References 1. Jain, A. K., Bolle, Biometrics, R.: Personal Identification in Networked Society, and S. Pankanti, eds. Kluewr Academic, (1999) 2. Rafael, Gonzalez, C., Richard, Woods, E.: Digital Image Processing Using Matlab. IEEE Press, New York (2001) 405-407 3. Wu, X., Zhang, D., Wang, K., Bo Huang: Palmprint Classification Using Principal Lines. Patt. Recog. 37 (2004) 1987-1998 4. Wu, X., Wang, K.: A Novel Approach of Palm-line Extraction. Proceedings of International Conference on Image Processing, New York (2004) 5. Han, C. C., Cheng, H. L., Lin, C. L., Fan, K. C.: Personal Authentication Using Palm-print Features. Patt. Recog. 36 (2003) 371–381 6. You,, J., Li, W., Zhang, D.: Hierarchical Palmprint Identification Via Multiple Feature Extraction. Patt. Recog. 35 (2003) 847–859 7. Zhang, D., Kong, W. K., You, J., Wong, M.: On-line Palmprint Identification. IEEE Trans. Patt. Anal. Mach. Intell. 25 (2003) 1041-1050 8. Jing, X. Y., Zhang, D.: A Face and Palmprint Recognition Approach Based on Discriminant DCT Feature Extraction. IEEE Transaction on systems, Man and Cybernetics, 34 (2004) 2405-2415 9. Wu, X., Zhang, D., Wang, K.: Fisherpalms Based Palmprint Recognition. Patt. Recog. Lett. 24 (2003) 2829–2838 10. Connie, T., Jin, A. T. B., Ong, M. G. K., Ling, D. N. C: An Automated Palmprint Recognition System. Image and Vision Computing, 23 (2005) 501-515 11. Slobodan, Ribaric, Ivan, Fratric: A Biometric Identification System Based on Eigenpalm and Eigenfinger Features. IEEE Trans. Patt. Anal. Mach. Intell. 27 (2005) 1698-1709 12. Chih-Lung, Lin, Thomas, Chuang, C., Kuo-Chin Fan: Palmprint Verification Using Hierarchical Decomposition. Patt. Recog. 38 (2005) 2639-2652

A Novel Biometric Identification Approach Based on Human Hand

13. Kong, W. K., Zhang, D., Li, W.: Palmprint Feature Extraction Using 2-D Gabor filters. Patt. Recog. 36 (2003) 2339-2347 14. Sanchez-Avila, C., Sanchez-Reillo, R.: Two Different Approaches for Iris Recognition using Gaobr Filters and Multiscale Zero-crossing Representation. Patt. Recog. 38 (2005) 231-240 15. Ahmadian, M. A.: An Efficient Texture Classification Algorithm Using Gabor Wavelet. Proceedings of the 25 Annual International Conference of the IEEE EMBS Cancun, Mexico (2003) 17-21 16. Lee, T. S.: Image Representation Using 2-D Gabor Wavelets. IEEE Trans. Patt. Anal. Mach. Intell. 18 (1996) 959-971 17. Pan, H., Xia, L. Z.: Exact and Fast Algorithm for Two-dimensional Image Wavelet Moments via The Projection Transform. Patt. Recog. 38 (2005) 395-402

A Novel Color Image Watermarking Method Based on Genetic Algorithm Yinghua Lu1, Jialing Han1, 2, Jun Kong1, 2, *, Gang Hou1, 3, and Wei Wang1 1

Computer School, Northeast Normal University, Changchun, Jilin Province, China 2 Key Laboratory for Applied Statistics of MOE, China 3

College of Humanities and Science, Northeast Normal University, Changchun, China {kongjun, hanjl147, luyh}@nenu.edu.cn

Abstract. In the past a few years, many watermarking approaches have been proposed for solving the copyright protection problems, most of the watermarking schemes employ gray-level images to embed the watermarks, whereas the application to color images is scarce and usually works on the luminous or individual color channel. In this paper, a novel intensity adaptive color image watermarking algorithm based on genetic algorithm (CIWGA) is presented. The adaptive embedding scheme in three channels’ wavelet coefficients, which belong to texture-active regions, not only improves image quality, but also furthest enhances security and robustness of the watermarked image. The experimental results show that our method is more flexible than traditional methods and successfully fulfills the compromise between robustness and image quality.

1 Introduction With the widespread use of digital multimedia and the development in computer industry, digital multimedia contents suffer from infringing upon the copyrights with the digital nature of unlimited duplication, easy modification and quick transfer over the Internet. As a result, copyright protection has become a serious issue. Hence, in order to solve this problem, digital watermarking technique has become an active research area [1] [2] [4]. In the past a few years, most of the watermarking schemes employ gray-level images to embed the watermarks, whereas their application to color images is scarce and usually works on the luminous or individual color channel. Fleet [3] embedded watermarks into the yellow-blue channel’s frequency domain. Kutter et al. [5] proposed another color image watermarking scheme that embedded the watermark into the blue-channel of each pixel by modifying its pixel value. But they didn’t notice that the capacity of hiding information in different color channel is varied with the image changing. In this paper, a novel watermarking embedding method based on genetic algorithm (GA) is *

Corresponding author. This work is supported by science foundation for young teachers of Northeast Normal University, No. 20061002, China.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 72 – 80, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Color Image Watermarking Method Based on Genetic Algorithm

proposed. GA is applied to analyze the influence on original image when embedding and the capacity of resisting attacks in every channel. Then the optimized intensity is selected for every color channel. Using GA can improve image quality and furthest enhance security and robustness of the watermarked image simultaneously. This algorithm fulfills an optimal compromise between the robustness and image quality. This paper is organized as follows: the watermark embedding algorithm and extraction algorithm are described in Section 2 and Section 3, respectively. Experimental results are presented in Section 4. Finally, conclusions are given in Section 5.

2 The Embedding Algorithm 2.1 The Host Image Analyzing Every host image has its own color information and texture features. Based on human visual system’s characteristic, edges and complex texture of image have good visual mask effect. So the watermark is always embedded into these regions to ensure the imperceptibility [4]. In our study, for the purpose of getting active regions in host image and taking less time, the block-variance method is employed, which divides the host image into subblocks and computes each sub-block’s variance for detecting texture active regions. The process is based on block-wise, as follows: 1. Separate three channels’ sub-image R, G, B from host image I. 2. Divide each three channel sub-image into un-overlapped 8 × 8 sub-blocks in spatial domain. 3. Compute each image sub-block’s variance. Variance can measure the relative smoothness and contrast of the intensity in a region. 4. Compute each sub-image’s average variance. Compare each block’s variance with average variance. If block’s variance is greater than the average value, the block is classified as the texture active region. The two results of green channel of ‘Lena’ image and ‘Baboon’ image after texture region analysis using our algorithm are shown in Fig. 1. The image sub-blocks unchanged are the relative active regions.

(a)

(b)

Fig. 1. (a) and (b) are the result images of ‘Lena’ and ‘Baboon’ texture regions analysis using block variance method

Y. Lu et al.

(a)

(b)

Fig. 2. (a) and (b) are the texture active region sub-images of Fig. 1

We extract sub-blocks, which belong to the texture active regions respectively from three sub-images R, G and B, and then three new sub-images called texture active region sub-images are formed using these blocks. The texture active region sub-images are depicted in Fig. 2. Finally, our watermark is embedded into the frequency domain of these sub-images. 2.2 Intensity Optimizing Using GA For the texture active region sub-images, the discrete wavelet decomposition is adopted in frequency domain to embed watermarks. The multi-resolution feature and compatibility to JPEG-2000 compression standard [7] of wavelet transform make the embedded watermark robust to compression operation. Intensity optimal selecting algorithm is described as follows: 1. 2.

Transform three texture active region sub-images using discrete wavelet transform. Select coefficients to embed watermark W. Insert watermark signal at coefficients called w _ co using additive modulation. Every color channel has its own embedding intensity as α ( i ) . the wavelet coefficients after embedding.

w _ co w denotes

w _ co w = w _ co + α (i )× W i ∈ {1, 2, 3} . w

(1)

Perform the inverse discrete wavelet transform on w _ co . Embed the watermarked sub-images back into the original host image to get the watermarked color image I’. 5. Apply the attacking schemes on I’, and then adopt the GA training process to search for the optimal intensity for each channel. The flowchart for illustrating intensity optimal selecting algorithm using GA is shown in Fig. 3. Not all watermarking applications require robustness to all possible signal processing operations. In addition, the watermarked image after attacking needs to be worthy of using or transmitting. Therefore, some attacks like image-cropping is not employed in our GA training procedure. In this paper, three major attacking schemes are employed, namely, additive noise attack, median filtering attack, and JPEG attack with 3. 4.

A Novel Color Image Watermarking Method Based on Genetic Algorithm

Fig. 3. The flowchart of intensity optimizing algorithm

quality factor of 50%. The quality of watermark extracted from embedded image I’ is measured by the normalized correlation (NC). The NC between the embedded watermark W (i, j ) and the extracted watermark W ' i, j is defined as,

( )

¦ ¦ W (i , j ) × W ' (i , j ) = ¦ ¦ [W (i, j )] i =1

j =1

i =1

j =1

(2)

The watermarked image’s quality is represented by the peak signal-to-noise ratio (PSNR) between the original color image I and watermarked image I’, as follows,

PSNR = 10 × log 10 (

M × N × max( I 2 (i, j )) M

¦ ¦ i =1

[ I (i, j ) − I ' (i, j )] 2 j =1

(3)

After obtaining the PSNR of the watermarked image and the three NC values after attacking, we are ready to adopt the GA training process. The fitness function in the mth iteration is defined as: 3

f m = −( PSNRm + λ ¦ NC m, i ) , i =1

(4)

Y. Lu et al.

where f m is fitness value, λ is the weighting factor for the NC values. Because the PSNR values are dozens of times larger than the NC values in the GA fitness function, the NC values are magnified with the weighting factors λ in the fitness function to balance the influences caused by both the imperceptibility and robustness requirements. 2.3 Watermark Embedding

The first five steps of watermark embedding algorithm are the same as intensity optimal selecting algorithm, and then the obtained optimal intensity is used to form watermarked image. Fig. 4 is the block-diagram of the embedding algorithm.

Fig. 4. The block-diagram of embedding algorithm

3 Watermark Extracting Watermark extraction algorithm is the exact inverse process of embedding algorithm. The watermark can be extracted just when we get the optimal intensity as the secret keys.

4 Experimental Results The performance of digital watermarking system can be characterized by the following aspects: imperceptibility, security and robustness. All these aspects are evaluated by experimental results respectively in our study. In our simulation, ‘Lena’ image and ‘Baboon’ image with the size of 256 × 256 are taken as test images and watermark with size of 64 × 64 is shown in Fig. 8(d). The result images of test image ‘Lena’ and ‘Baboon’ are shown in Fig. 5(b) and Fig. 6(b). When free of any attacks, the PSNR of the watermarked image ‘Lena’ is 35.8487, NC is 1 and the PSNR of the watermarked image ‘Baboon’ is 36.3028 and NC is 1. In the GA training process, ten individuals are chosen for every iteration. The crossover operation is selected as scattered function in the MATLAB Genetic Algorithm Toolbox. The selection operation is selected as stochastic uniform function and

A Novel Color Image Watermarking Method Based on Genetic Algorithm

(a)

(b)

Fig. 5. (a) Original host image ‘Lena’ (b) Result image watermarked

(a)

(b)

Fig. 6. (a) Original host image ‘Baboon’ (b) Result image watermarked

the mutation operation is Gaussian function with the scale value 1.0 and the shrink value 1.0. The training iterations are set to 200. The fitness values converge after 200 iterations, which can be seen from Fig. 7, and the optimized intensity with the optimal fitness value is 62, 64, and 94 for R, G and B channel respectively. The result images under different attacks and the watermarks exacted are depicted in Fig. 8. Seen from Table 1, the conclusion can be drawn that our algorithm is robust to attacks encountered always in image processing and transmission.

Fig. 7. The diagram of fitness value

Y. Lu et al.

(a)

(d)

(b)

(e)

(f)

(c)

(g)

(h)

Fig. 8. (a) Result image of watermarked ‘Baboon’ under additive noising attack, (b) Watermarked image under filtering attack, (c) Watermarked image under compressing attack, (d) Original watermark, (e-g) Extracted watermarks from (a-c) using our method, respectively. (g) Extracted watermark from (c) using Kutter’s method. Table 1. Experimental results under different attacks of our scheme (measured by NC)

Attack Type Attack-free Additive noising Filtering JPEG QF=80 JPEG QF=50 JPEG QF=30

Baboon 1 0.9137 0.9320 0.9957 0.9801 0.9639

Lena 1 0.9139 0.9536 0.9830 0.9547 0.9390

Airplane 1 0.9479 0.9139 0.9957 0.9861 0.9752

Table 2. Experimental results under different attacks of Kutter’s scheme (measured by NC)

Attack-free 0.9684

Noising

Filtering

0.9546

0.9362

JPEG QF=80 0.6386

JPEG QF=50 0.5925

JPEG QF=30 0.5071

To evaluate the robustness of the proposed watermarking scheme, Kutter’s algorithm is simulated as comparison. The results under several attacks of Kutter’s algorithm are shown in Table 2. Compared with Table 1, it can be concluded that our algorithm is more robust than Kutter’s, especially in resisting additive nosing and JPEG compressing. To evaluate the performance of watermarking techniques, Pao-Ya Yu et al [9] used mean square error (MSE) as a quantitative index. Another quantitative index for robust is mean absolute error (MAE). These two indices are defined respectively as,

A Novel Color Image Watermarking Method Based on Genetic Algorithm

MSE =

1 3× M × N

¦ ¦ ¬ª( R

i =1 j =1

− R 'ij ) + ( Gij − G 'ij ) + ( Bij − B 'ij ) ¼º , (5)

1 H L MAE = ¦¦ Wij − W 'ij , H × L i =1 j =1

where M × N and H × L denote the size of the host color image and the watermark image respectively. Note that the quantitative index, MAE, is exploited to measure the similarity between original watermark and extracted watermark. For evaluating the performance, Table 3 exhibits comparisons of our method and Pao-Ya Yu’s method in terms of above-mentioned two quantitative indices. Table 3 illuminates that our algorithm has better robustness than Pao-Ya Yu’s. Table 3. Experimental results under different attacks of Pao-Ya Yu’s method

MAE Attacks

Images

MSE

Proposed method

Pao-Ya Yu’s

Attack-free

Lena

1.597

0.00149

0.00195

Baboon

1.667

0.02344

Lena

38.714

0.0206

0.0205

Baboon

345.778

0.0337

0.16211

Lena

21.103

0.0801

0.08887

Baboon

62.631

0.0947

0.23535

Filtering JPEG

5 Conclusion A novel embedding intensity adaptive CIWGA is proposed in this paper. A color image is divided into three channels firstly. Then genetic algorithm is applied to analyze the influence on the original image when embedding and the capacity of resisting attacks in every channel. At last, the watermark is embedded in R, G and B channels respectively. Using genetic algorithm can not only improve image quality, but also furthest enhance security and robustness of the watermarked image. This algorithm fulfills an optimal compromise between the robustness and image quality.

References 1. Cheung, W. N.: Digital Image Watermarking in the Spatial and Transform Domains. ProCeedings of TENCON’2000, Sept. 24-27,2000,3 2. Zhang, X. D., Feng, J., Lo K. T.: Image Watermarking Using Tree-Based SpatialFrequency Feature of Wavelet Transform. Journal of Visual Communication and Image Representation 14(2003) 474-491

Y. Lu et al.

3. Fleet, D., Heeger, D.: Embedding Invisible Information in Color Images. Proc. 4th IEEE International conference on Image Processing, Santa Barbara, USA, 1(1997) 532-535 4. Kong, J., Wang, W., Lu, Y. H., Han, J. L., Hou, G.: Joint Spatial and Frequency Domains Watermarking Algorithm Based on Wavelet Packets Transform.The 18th Australian Joint Conference on Artificial Intelligence ,2005 5. Kutter, M., Jordan, F., Bossen, F.: Digital Watermarking of Color Images Using Amplitude Modulation. J. Electron. Imaging 7(2) (1998) 1064-1087 6. Gen M., Cheng R.: Genetic Algorithms and Engineering Design. Wiley, New York, NY, 1997 7. Suhail, M. A., Obaidat, M. S., Ipson, S. S., Sadoun B.: A Comparative Study of Digital Watermarking in JPEG and JPEG 2000 Environments. Information Sciences 151(2003) 93-105 8. Shieh, C. S., Huang, H. C., Wang, F. H., Pan J. S.: Genetic Watermarking Based on Transform-domain Techniques. Pattern Recognition 37(2004) 555-565 9. Pao-Ta Yu, Hung-Hsu Ysai, Jyh-Shyan Lin: Digital Watermarking Based on Neural Networks for Color Images. Signal Processing 81(2001) 663-671 10. Cox, I. J., Kilian, J., Leighton, F.T., Shamoon , T.: Secure Spread Spectrum Watermaking for Multimedia. IEEE Trans. Image Process 6(12) (1997) 1673-1687 11. Ganic, E., Eskicioglu, A.M.: Robust DWT-SVD Domain Image Watermarking: Embedding Data in All Frequencies. Proceedings of 2004 Multimedia and Security Workshop on Multimedia and Security, (2004) 166-174

A Novel Emitter Signal Recognition Model Based on Rough Set Guan Xin, Yi Xiao, and He You Research Institute of Information Fusion, Naval Aeronautical Engineering Institute, YanTai, P.R. China ,264001 [emailprotected]

Abstract. On the basis of classification, rough set theory regards knowledge as partition over data using equivalence relation. Rough set theory is deeply studied in this paper and introduced into the problem of emitter recognition, based on which a new emitter signal recognition model is presented. At the same time, a new method of determining weight coefficients is proposed, which is independent of a prior knowledge. And a new classification rule is also presented in this paper. At last, application example is given, which demonstrates this new method is accurate and effective. Moreover, computer simulation of recognizing radar emitter purpose is selected, and compared with fuzzy pattern recognition and classical statistical recognition algorithm through simulation. Experiments results demonstrate the excellent performance of this new recognition method as compared to existing two pattern recognition techniques. A brand-new method is provided for researching on emitter recognition.

1 Introduction With the development of sensor technology, a lot of regular or special emitters are widely used. Emitter recognition has become an important issue in military intelligence, surveillance, and reconnaissance. In fact, a prior knowledge is hard to obtain and emitter signals overlap to a great degree. So, regular algorithms for emitter recognition do not always give good performance. Some researches have been conducted for emitter recognition over the past years, such as expert system[1], fuzzy recognition method[2], artificial neural network[3] , and attribute mathematics recognition method[4] etc. Indeterminacy mathematics methods should be developed for the sake of solving this problem objectively, practically and rationally. Rough set theory was developed by Zdzislaw Pawlak in 1982[5]. The main goal of the rough set theory is to synthesize approximation of concepts from the acquired data. On the basis of classification, rough set theory regards knowledge as partition over data using equivalence relation. Rough set theory has been conceived as a tool to conceptualize, organize and analyze various types of data, in particular, to deal with inexact, uncertain or vague knowledge in applications related to artificial intelligence[6-8]. The main advantage of rough set theory is that it does not need any preliminary or additional information about data. For the special traits of emitter recognition, a new emitter recognition method based on rough set theory is presented with its detailed steps, and a new approach to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 81 – 89, 2006. © Springer-Verlag Berlin Heidelberg 2006

G. Xin, Y. Xiao, and H. You

determining weight coefficients is proposed, which is independent of a prior knowledge. A new classification rule based on decision table is generated. Finally, application example is given, which demonstrates this new method is accurate and effective. Moreover, computer simulation of recognizing the radar emitter is selected, and compared with fuzzy recognition approach and classical statistical pattern recognition.

2 Basic Definitions of Rough Set Theory 2.1 Information Systems and Indiscernibility A data set is represented as a table, where each row represents a case, an event, or simply an object. Every column represents an attribute that can be measured for each object. This table is called an information system. More formally, an information system can be noted as a pair (U , A) , where U is a non-empty finite set of objects called the universe and A is a non-empty finite set of attributes such that a : U → Va for every a ∈ A . The set Va is called the value set of a . Any subset B of A determines a binary relation I ( B ) on U , which will be called an indiscernibility relation, and is defined as follows: xI ( B ) y if and only if a ( x) = a ( y ) for every a ∈ A , where a ( x) denotes the

a for element x . Obviously, I ( B ) is an equivalence relation. If ( x, y ) belongs to I ( B ) , we will say that x and y are B -indiscernible. Equivalence classes of the relation I ( B ) (or blocks of the partition U / B ) are refereed to as B -elementary sets. value of attribute

2.2 Reduction of Attributes Attribute reduction is one of the major concerns in the research on rough set theory. In an information system, some attributes may be redundant regarding a certain classification. Rough set theory introduces notions, which help reducing attributes without declining ability of classification. Let R be a set of equivalence relation, and r ∈ R . An attribute r is dispensable in R if ind ( R ) = ind ( R − {r}) . Otherwise, r is indispensable. The dispensable attribute does not improve or reduce the classification when it is present or absent. The set of all attributes indispensable in P is called a core of P , denoted as core( P ) . The core contains attributes that can not be removed from P without losing the original classification. 2.3 Decision Table

S = (U , R,V , f ) can be represented in terms of a decision table, assume that R = C D and C D = φ , where C is the condition attributes An information system

A Novel Emitter Signal Recognition Model Based on Rough Set

and D is the decision attributes. The information system deterministic if C

→ D , otherwise, it is non-deterministic.

S = (U , R,V , f ) is

3 The Algorithm of Emitter Signal Recognition The detailed recognition steps of the proposed emitter signal recognition model are given as follows: 3.1 Constructing Relationship Data Model Assume that we have r radar classes. Every radar class has multi-pattern. Assume that the total mode in known radar template library is n . Regard the characteristic parameters of radar emitter signals as condition attributes, marked as C =

{c1 , c2 , , cm } . Regard the class of radar emitters as decision attributes, marked as

D = {d1 , d 2 ,, d r } . Let us denote some sample ut in the known radar template library as u t = (c1,t , c 2,t , , c m ,t ; d t ) . The universe U = {u1 , u 2 , , u n } is also called

sample

set.

Then,

the

attribute

values

are

c i (u t ) =

c i ,t (i = 1, 2 , , m ; t = 1, 2 , , n ) , d (ut ) ∈ D . The two dimension table constituted by

u t (t = 1,2,, n ) is relationship data model of radar emitter

recognition. 3.2 Constructing Knowledge Systems and Discretization In order to analyse the dependency among knowledge and importance among attributes from samples, classification must be done to the universe utilizing attributes and knowledge system must also be constructed in the universe. Discretization must be done to continuous attributes before classifying, because that rough set theory cannot deal with continuous attributes. Assume that every object in the universe U is discretized, we can determine equivalence relation on U . Then, knowledge system can be constructed. Assume that

C ′ ⊆ C , define a binary relation as RC = {(u , v ) ∈ U × U

RC = ci (u ) = ci (v ), ∀ci ∈ C ′} . In like manner, define R D as R D =

{(u , v ) ∈ U × U d (u ) = d (v )} . Obviously,

RC and R D are both equivalence

relation on U . So, these two relation can be used to determine knowledge system on U . Real data sets from emitter signals include continuous variables. Partitioning continuous variables into categories must be done. Considering of the traits of emitter recognition problem, some simple applicable discretization methods can be used, such

G. Xin, Y. Xiao, and H. You

as equidistance, equifrequency, Naive Scaler algorithm, Semi Naive Scaler[9], etc. The result of discretization impacts on classification quality directly. 3.3 Conformation of Weight Coefficients In general, a prior information is hard to obtain to show the importance of each characteristic parameter of emitter, thus average weight coefficients can be adopted. But, in fact, the importance of every characteristic parameter is not always equivalence. So it is much better to adopt weighed processing. Some researches have been conducted, such as entropy analytical method, comparable matrix method, analytic hierarchy processed and so on. Here, we adopt a new method to adjust weight coefficients. This method changes the weight coefficients problem to expression of significance of attributes, which is independent of a prior knowledge. Different attribute with the decision set D may be takes on different importance. Significance of an attribute ci in the decision table can be evaluated by measuring

∈ C form the attribute set C on the positive region defined by the decision table. The number γ C (D ) expresses the degree of dependency between attributes C and D , or accuracy of approximation of U / D by C . We can ask how the coefficient γ C (D ) change when an attribute ci is removed, i.e., what is the difference between γ C (D ) and γ C−{c } (D) . Thus, we can i the effect of removing of an attribute ci

define the significance of an attribute

ci as

σ D (ci ) = γ C (D) − γ C−{ci } (D) . where bigger

σ D (ci )

is, the more important attribute

(1)

ci is.

The steps of determining weight coefficients are describes as follows. Step 1. Calculate the degree of dependency between R D and RC . That is to say, calculate the degree of dependency between emitter attribute set

(

C and emitter’s type D .

)

§ · card ¨ RC [ y ]RD ¸ [ y ] ∈(U RD ) © − ¹ . γ RC (R D ) = R D card (U ) ¦

(2)

card(S ) expresses the cardinal number of S . Step 2. Calculate the degree of dependency of R D on R C − {c i } .

where,

(

§ card¨¨ RC −{ci } [ y ]RD [ y ]R ∈(U RD ) © − γ RC −{c } (RD ) = D i card(U ) ¦

)·¸¸

¹ , i = 1, 2 , , m .

(3)

A Novel Emitter Signal Recognition Model Based on Rough Set

Step 3. According to eq.(1), calculate the significance of the

i th attribute.

σ D (c i ) = γ RC (R D ) − γ RC −{c } (R D ) , i = 1,2, , m . i

Step 4. Calculate the weight coefficient of the

λi =

σ D (c i )

¦ σ (c ) m

(4)

i th attribute.

, i = 1, 2 , m .

(5)

j =1

3.4 Classification Rules Based on Decision Table After discretization and reduction of incomplete decision table, the following classification rule can be conducted. Rule 1: Calculate the accordance degree of characteristic parameter of the pending unknown signal with condition attributes of each classification, then choice the decision rule that have the biggest accordance degree to assign the pending signal.

μ(X i ) =

card( X i Fx ) . card(Fx )

(6)

Fx is characteristic parameter set of the pending unknown signals, X i is condition attribute set of decision table, and X i Fx is characteristic set that meet characteristic conditions of X i in set Fx . where

It is easy to see that average weight is adopted in rule 1. But, in fact, the influence of each attribute on decision is different. So, weighed processing to condition attributes is better for recognition. A new classifying rule based on accordance degree matrix is presented here. Rule 2: Compare the characteristic parameters of the pending unknown signal x 0

xi in the template library, then a matrix S n×m is obtained, where n is the number of known samples in the template library. Assume that C is the set of characteristic parameters, and m is the number of characteristic parameters. If c j ( x 0 ) = c j (x i ), c j ∈ C , j = 1,2, , m , then sij = 1 . Otherwise, and that of known signal

s ij is equal to 0. Denote the weight coefficients as a = ( a1 , a 2 , , a m ) ′ , then the accordance

degree

matrix

can

described

ȝ n×1 = S n×m × a . If

i 0 = max μ i (i = 1,2,..., n) , then the pending signal x0 is the same class with the i

emitter in the template library.

G. Xin, Y. Xiao, and H. You

4 Simulation Analysis To test validity and applicability of this new recognition method proposed in this paper, in the example below, it is applied to identify the purpose of hostile radar emitter with simulated radar data. 4.1 Application Example Assume that radar characteristic vector comprises three characteristic parameters, that is radio frequency (RF), pulse repetition frequency (PRF) and pulse width (PW). Three different purposes radar are selected from the template library. Extracted incomplete sample characteristic parameters are shown in table 1. For the convenience of denotation, we give the following corresponding expressions. No.--U , U = {x1 , x 2 , , x17 } , RF--- a , PRF--- b , PW--- c . Table 1. Extracted Example Data Of Radar Emitter No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Class 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3

RF(MHz) 2774.3 1280 1313 1251 9214 2746 2985 3109 2695 160 2700 2700 2000 3400 2970 9000 3700

PRF(Hz) 428 300 301 601 429 1873 325 375 335 190 375 330 600 2500 1250 1750 2250

μs

3 4 4.1 1.7 0.8 0.6 2.6 2.1 1.1 7 1.7 0.8 3.5 0.5 0.8 0.25 0.37

Because of influences of stochastic factors on radar signals during the process of emission, transmission, and receiving, radar characteristic vector takes on statistical property. Four metrical radar emitter samples are given in table 2. Table 2. Metrical Radar Characteristic Parameter Metrical Sample 1 2 3 4

RF(MHz)

PRF(Hz)

2682.2 1285.5 2673.4 3821.4

429 617.6 326.8 2216.6

μs

2.81 1.7402 0.8291 0.3732

A Novel Emitter Signal Recognition Model Based on Rough Set

Discretization must be done on extracted data of radar emitter. Naive Scaler discretization approach[9] is adopted here. The results can be seen in table 3. Table 3. Discretized Extracted Example Data

U 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3

4 2 2 2 8 4 6 6 3 1 3 3 3 7 5 7 7

5 2 2 7 5 9 2 4 3 1 4 3 6 10 8 8 10

6 8 8 5 3 2 6 6 4 9 5 3 7 1 3 1 1

After discretization, metrical radar signal is shown in table 4. Table 4. Discretized Metrical radar signal characteristic parameter

1 2 3 4

3 2 3 7

5 7 2 10

6 5 3 1

Using recognition rule described in section 3.4, the pending unknown radar signals are assigned to No. 1, No. 1, No. 2 and No.3 class respectively. The recognition result is accord with the fact. 4.2 Comparison Experiment In our simulation, 10 different purposes radar are firstly selected from known radar template library. The simulated template library we used is built consulting the metrical parameter region of some scout, which contains 4016 records. Condition attributes set is described as {radio frequency, pulse width, pulse repetition frequency, antenna scan type, antenna polarization type, antenna scan period, radio frequency type, pulse repetition frequency type, inner pulse modulation type}. And decision attribute set is described as {radar emitter purpose}.

G. Xin, Y. Xiao, and H. You

Observation vector comprises two parts, which are random-access known characteristic vector and measurement noise. The measurement noise is assumed a zero-mean white noise with the standard deviation σ . In our simulation, two different noise environments are selected, whose standard deviation of measurement noise respectively are 2, 5 percent of the corresponding characteristic parameter. For continuous attributes, equifrequency discretization approach is adopted here. Table 5 shows the corresponding recognition results of emitter recognition algorithm based on rough set as compared to existing fuzzy pattern recognition and statistical pattern recognition technique, which are gained through 100 Monte Carlo experiments. Cauchy membership or normal membership can be used in fuzzy pattern recognition[2] . In our simulation, normal membership is adopted for continuous attributes. For discrete attributes, membership function is equal to 1 in case of modulation mode matching. Otherwise, membership function is equal to 0. Table 5. Correct recognition rate of three methods for 100 runs of stochastic simulation Rough set recognition noise

Number of reduced attributes

environment 1 environment 2

fuzzy pattern recognition

statistical pattern recognition Number Correct of recognition attributes rate

Correct recognition rate

Number of attributes

Correct recognition rate

90.6%

81.3%

72.1%

82.1%

76.8%

61%

Based on the experiments above, we can draw the following conclusions. (1) Rough set emitter signal recognition algorithm is not merely a kind of classifier. It can obtain minimal representation of knowledge under the precondition of retaining key information. It can also identify and evaluate the correlative relation and can obtain knowledge of rule from experience dada. It can be seen from table 5 that rough set recognition approach proposed excels fuzzy pattern recognition and traditional statistical pattern recognition method according to such practical reconnaissance environment. (2) Subjectivity exists in fuzzy pattern recognition method when determining membership function, which becomes an obstacle to its applications. Moreover, rough set signal recognition model is independent of a prior knowledge and depends on samples merely. (3) Rough set signal recognition model shows its obvious advantages in big samples. (4) In order to promote the correct recognition rate, more reasonable discretization algorithms should be selected according to practical applications.

5 Conclusions Emitter recognition is a key technology to multisenor information fusion system. A new emitter signal recognition model based on rough set theory is presented in this

A Novel Emitter Signal Recognition Model Based on Rough Set

paper. At the same time, a new method of determining weight coefficients is given and a new classification rule is also presented. Finally, detailed simulation experiments are conducted to demonstrate the new method. Moreover, the method is compared with fuzzy pattern recognition and classical statistical pattern recognition through simulation. The new recognition approach shows promising performance and is approved to be effective and feasible to emitter recognition.

Acknowledgements This paper is supported by the National Natural Science Foundation of China (Grant No. 60572161), Excellent Ph.D Paper Author Foundation of China(Grant No. 200036) and Excellent Ph.D Paper Author Foundation of China(Grant No. 200237).

References 1. Cheng, X.M., Zhu, Z.W., Lu, X.L.: Research and Implementation on a New Radar Radiating-Source Recognizing Expert System. Systems Engineering and Electronics, Vol.22,8 (2000) 58–62 2. Wang, G.H., He, Y.: Radar ID Methods Based on Fuzzy Closeness and Inexact Reasoning. Systems Engineering and Electronics, 1 (1995) 25–30 3. Shen, Y.J., Wang, B.W.: A Fast Learning Algorithm of Neural Network with Tunable Activation Function. Science in China, Ser. F Information Sciences, Vol.47,1 (2004) 126– 136 4. Guan, X., H,e Y., Yi, X.: Attribute Measure Recognition Approach and Its Applications to Emitter Recognition. Science in China Series F, Information Sciences, Vol.48,2 (2005) 225– 233 5. Pawlak, Z.: Rough sets. International Journal of Information and Computer Science, 11 (1982) 341–356 6. Pawlak, Z.: Rough Set Theory and Its Application to Data Analysis. International Journal of Cybernetics and Systems, 29 (1998) 661–688 7. Li, M., Zhang, H.G.: Research on the Method of Neural Network Modeling Based on Rough Sets Theory. Acta Automatica Sinica, 1 (2002) 27–33 8. Cho, Y., Lee, K., Yoo, J., Park, M.: Autogeneration of Fuzzy Rules and Membership Functions for Fuzzy Modeling Using Rough Set Theory. IEE Proceeding of Control Theory Application, Vol.145,5 (1998) 437–442 9. Wang, G.Y.: Rough Set Theory and Knowledge Acquisition. Press of Xi’an Jiaotong University, (2001)

A Novel Model for Independent Radial Basis Function Neural Networks with Multiresolution Analysis GaoYun An and QiuQi Ruan Institute of Information Science, Beijing Jiaotong University, Beijing, China, 100044 [emailprotected], [emailprotected]

Abstract. Classical radial basis function (RBF) neural network directly projects input samples into a high dimension feature space through some radial basis functions, and does not take account of the high-order statistical relationship among variables of input samples. But the high-order statistical relationship does play an important part in pattern recognition (classification) area. In order to take advantage of the high-order statistical relationship among variables of input samples in neural network, a novel independent radial basis function (IRBF) neural network is proposed in this paper. Then a new hybrid system combining multiresolution analysis, principal component analysis (PCA) and our proposed IRBF neural network is also proposed for face recognition. According to experiments on FERET face database, our proposed approach could outperform newly proposed ICA algorithm. And it is also confirmed that our proposed approach is more robust to facial expression, illumination and aging than ICA in face recognition.

1 Introduction Up to now, there have been many successful algorithms for face recognition. Principal Component Analysis (PCA) [6], Fisher’s Linear Discriminant (FLD) [7] and Independent Component Analysis (ICA) [1] are three basic algorithms for subspace analysis in face recognition, and have been well developed. But there are still some outliers which will impact the performance of face recognition algorithms. These outliers are facial expression, illumination, pose, masking, occlusion etc. So how to make current algorithms robust to these outliers or how to develop some powerful classifiers is the main task for face recognition. As a useful tool for multiresolution analysis, wavelet decomposition has also been introduced into face recognition to make algorithms much more robust to facial expression, pose and small occlusion, like the work of [2] and [3]. In [3] it has been demonstrated that the Daubechies 4 (db4) wavelet outperforms other wavelets in computation time and recognition accuracy rate. Lai etc. [2] have combined the wavelet decomposition and the Fourier transform to propose the spectroface representation for face recognition, which is robust to facial expression, translation, scale and on-the-plane rotation. So inspired by [2] and [3], the db4 wavelet will be adopted to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 90 – 99, 2006. © Springer-Verlag Berlin Heidelberg 2006

A Novel Model for Independent Radial Basis Function Neural Networks

extract subband face images which are robust to facial expression and small occlusion for our proposed approach. From another point of view, algorithms proposed in [1]-[3], [6] and [7] are just for the feature extraction stage in an identification system. Some powerful classifiers are expected to classify these extracted features. Meng etc. [4] has tried to use the radial basis function (RBF) neural network to classify features extracted by FLD. A classical RBF neural network is formed by input layer, hidden layer and output layer. It directly projects the input samples into a high dimension feature space through some radial basis functions, and does not take account of the high-order statistical relationship among variables of input samples. As known, the high-order statistical relationship does play an important part in pattern recognition (classification) area. So in order to take advantage of the high-order statistical relationship among variables, a novel independent radial basis function (IRBF) neural network is proposed in this paper. Then a novel hybrid system for face recognition is also proposed. In the hybrid system, the proposed IRBF neural network is adopted to classify the extracted PCA features of enlarged subband face images extracted by wavelet decomposition. The detail about the proposed IRBF neural network and the new hybrid system for face recognition will be discussed in the following section.

2 IRBF Neural Networks with Multiresolution Analysis In order to take advantage of the information reflected by the high-order statistical relationship among various variables, a novel IRBF neural network is proposed in this section. Then, a novel hybrid system for face recognition is also proposed. The hybrid system contains three main sub-models. They are multiresolution analysis sub-model, PCA sub-model and IRBF neural networks sub-model. Given matrix X ( n × N ) of training samples, where N is the number of samples and sample xi ∈ \ n . The whole hybrid system for face recognition could be discussed as follows. 2.1 Multiresolution Analysis Wavelet decomposition is a powerful tool for multiresolution analysis. Here, db4 wavelet in wavelet family is chosen to extract subband images for multiresolution analysis for the new hybrid system. The subband images of every sample in X are extracted for multiresolution analysis, as follows: 2 Let V ( R ) denote the vector space of a measurable, square integrable 1D function.

The continuous wavelet decomposition of a 1D signal s (t ) ∈ V ( R ) is defined: 2

(Wa s )(b) = ³ s (t )φa , b (t ) dt

(1) −1 2

where the wavelet basis function can be expressed as φa , b (t ) = a φ ((t − b) a ) , and the arguments a and b denote the scale and location parameters, respectively. Eq.(1) n can be discretized by restraining a and b to a discrete lattice ( a = 2 , b ∈ ] ).

G.Y. An and Q. Ruan

The discrete wavelet decomposition for 2D images can be similarly defined by implementing the 1D discrete wavelet decomposition for each dimension of the images separately. In this paper, only the enlarged subband images corresponding to the low frequency components in both vertical and horizontal directions of the original images are chosen for later process, because according to the wavelet theory, these subband images are the smoothed version of original images and insensitive to facial expression and small occlusion in face recognition. Fig. 1 illustrates some examples, the left two images are the original images (one is normal facial expression with glasses, the other is smiling expression without glasses), the middle two images are the chosen subband images and the right two images are the enlarged version of corresponding subband images. From the right two images, it could be noticed that facial expression and glasses are kicked out. If define the correlation factor as Euclidean distance of two images, it is clear that the correlation factor of the right two images are much smaller than that of the left two images. So the enlarged subband images corresponding to the low frequency components in both vertical and horizontal directions of the original images are very useful for face recognition and chosen in our approach.

correlation factor 187.73

correlation factor 106.38

Fig. 1. Illustration of subband images of the same person with different facial expression and small occlusion (glasses)

So, for every sample x i , y i = B (W A (xi )) , where function B ( 0 1 T −1 d ( x1 ,τ )d ( x2 ,τ ) = ® ¦ T τ =0 ¯ε ( x ),| ε ( x ) |≤ δ 0; Set I(e) is correlation with the gray level of image P(e), it can be as identified code. The change of v(e) will enable set I(e) change. When input face image P(h) need to be recognize , it will be chaos modulated to form correspond chaos memory vector .

D ( p ( h ), τ ) = { d ( p 1 ( h ), τ ), , d ( p n ( h ), τ } T e

(7)

And then let it correlation operator with chaos memory vector in different I e , Q, respectively, namely

R (h, e) =

{

i∈ I ( e )

T −1

1 T

¦ τ

m i ( τ ) d ( p i ( h ), τ )}

According to orthogonal chaos signal Eq.(1) R (h, e) =

(8)

¦ ( n ′ ( e , r )σ

we can get

+ n ′′ ( e , r ) ε ( p ))

(9)

r =1

Where n′(e, r) is the unit number of Γ (e, r ) = {i | pi ( r ) = pi ( h), i ∈ I (e) ΛI ( r )} .

n ′′(e, r ) is the unit number of Γ′(e, r ) = {i | pi (r) ≠ pi (h),i ∈ I (e)ΛI (r)} . n(e, r ) = n ′(e, r ) + n ′′(e, r ) is the tot unit of the set I(e) I(r ).From Eq. (9),we can obtain: A:

r =1 r ≠h

e = h : R(h, h) = {n(h, h) + ¦n′(h, r)}σ + ¦n′′(h, r)ε ( p) .

Denote its minimum as: q

r =1 r ≠h

R( h, h ) min = {n ( h, h ) + ¦ n′( h, r )}σ − {¦ n′′(h, r )}δ

(10)

636

X. Luo et al. q

r =1

B: e ≠ h : R ( h, e) = {¦ n′(e, r )}σ + ¦ n′′(e, r )ε ( p ) . Denote its maximum as q

r =1

R(h, e) max = {¦ n′(e, r )}σ + {¦ n′′( e, r )}δ If e

(11)

Q ,we can recognize image P(h) by

R ( h , h ) min − R ( h , e ) max > 0

(12)

σ > δ

r =1 r ≠h

r =1

¦n′′(h, r) + ¦n′′(e, r) q

r =1 r ≠h

r =1

= K (e)

{n(h, h) + ¦ n′(h, r )}− ¦ n′(e, r )

(13)

then get the maximum

R ( h , e ∗ ) = max R ( h, e ) e∈Q

3.4 Experimental Results This paper, human facial expression is presented by many feature blocks. Because in range feature block can simplify point information, facial expressing can regard as point distributing classifying question. In expression authentication experiment, faces displaying used seven basic facial emotional expressions, such as neutral , happy , surprised, sad ,disgusted ,angry and afraid. A boosting learning method is adopted. After training feature points of the input facial expressing image feature points one by one, the weight of every point in which local feature area can be computer. And then look for the best classifying model kernel of structure. Fig. 4 illustrates the flowing chart. In the paper, the local 27 feature points distributed on the corresponding shapes are obtained. Their topology forms facial express space structure. In order to test the algorithm described in the previous sections, we use two different databases, a database collected by us and the Cohn-Kanade AU code facial expression database [7]. Some of the expression sequence list on Fig. 4. We describe the face authentication procedure as follows. Step 1. Step 2. Step 3. Step 4.

Face image is detected and located, then segmented face area from the image , locate eye’s center. Face image normalize based on the center of two eyes. Improved Census Transform. Face feature information as vectors input to chaos neural network [8], and then modulation the face authentication.

System Based on Chaos Modulated Facial Expression Recognition

637

Fig. 4. Flow chart of the training system composition

The algorithm was tested on PC(P4, 1.8G, 256M DDR), as to 50×50 pixels single facial image. The recognition time was less than 20 ms and the correct percent is 91% more. Facial expression recognition results analysis compare [9] as table 1. Table 1. Comparison of facial expression recognition results

Classify method Recognition rate Linear discriminant rule based on PCA 74% Personalized galleries and elasticity graph 81% matching 2D emotion space (PCA) & minimum distance 84.5% PCA and LDA of the labeled –graph vectors 75%-92% BP learning-Neural Network 85%-90% Our arithmetic 91%

4 Conclusion We considered the use of personal attributes was able to improve robustness and reliability of the facial authentication. As the authentication method of the use of personal attributes, we proposed chaos modulated facial expression recognition method. In our former work, this leads to an efficient real-time detector with high recognition rates and very few false. Experimental results showed the improvement of the discriminating power. We integrated the classifiers and a face recognition system to build a real time facial expression authentication system. The security of the system is improved through the online input face images. A combined Web access control scheme has been implemented. The security of remote Web access has been improved.

Acknowledgment The project is supported by the National Natural Science Foundation of China under Grant No. 60572027, by the Outstanding Young Researchers Foundation of Sichuan Province Grant No.03ZQ026-033 and by the Program for New Century Excellent Talents in University of China under grant No. NCET-05-0794. We would also like to thank J. Cohn for kindly providing facial expression database used in this paper.

638

X. Luo et al.

References 1. Canavan, J. E..: Fundamentals of Network Security. Boston: Artech House ( 2001) 2. Ortega-Garcia, J., Bigun, J., Reynolds, D., Gonzalez-Rodriguez, J.: Authentication Gets Personal with Biometrics. Signal Processing Magazine, IEEE. Vol. 21, Iss. 2, (2004) 50 - 62 3. Marchany, R.C., Tront, J.G.: E-Commerce Security Issues. In: Sprague, R.H. (ed.): Proceedings of the 35th Hawaii International Conference on System Sciences. IEEE Computer Soc., Los Alamitos, CA., USA (2002) 2500 – 2508 4. Zabih, R., Woodfill, J.: Non-Parametric Local Transforms for Computing Visual Correspondence. In: Eklundh, J.O. (ed.): Proceedings of the 3rd European Conference on Computer Vision. Lecture Notes in Computer Science, Vol. 801 Springer-Verlag, Stockholm, Sweden (1994) 151–158 5. Froba, B., Ernst, A.: Face Detection with the Modified Census Transform. In: Azada, D. (ed.): Proceeding of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition. IEEE Computer Soc., Los Alamitos, USA (2004) 91-96 6. Ling, X.T., Zhou, H.: Chaotic Moderation and Correlative Detection Method for Image Classification and Recognition. Acta Electronica Sinica, Vol. 25, No.1 (1997) 54-57 7. Kanade, T., Cohn, J. and Tian, Y.: Comprehensive Database for Facial Expression Analysis. In: James Crowley (ed.): Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000 . IEEE Computer Soc., Los Alamitos, CA., USA (2000) 46-53 8. Rowley, H.A.: Neural Network-Based Face Detection. PhD thesis, Carnegie Mellon University, Pitsburgh, (1999) 9. Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Automatic Analysis of Facial Expression: The State of the Art. IEEE trans. On pattern analysis and machine intelligence, Vol.22, No. 12 (2000) 1424-1445

A Novel Computer-Aided Diagnosis System of the Mammograms* Weidong Xu1,2, Shunren Xia1, and Huilong Duan1 1

The Key Laboratory of Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, China 2 Automation College, HangZhou Dianzi University, Hangzhou 310018, China [emailprotected]

Abstract. Breast cancer is one of the most dangerous tumors for middle-aged and older women in China, and mammography is the most reliable detection method. In order to assist the radiologists in detecting the mammograms, a novel computer-aided diagnosis (CAD) system was proposed in this paper. It carried out a new algorithm using optimal thresholding and Hough transform to suppress the pectoral muscle, applied an adaptive method based on wavelet and filling dilation to extract the microcalcifications (MCs), used a model-based location and segmentation technique to detect the masses, and utilized MLP to classify the MCs and the masses. A high diagnosis precision with a low false positive rate was finally achieved to validate the proposed system.

1 Introduction Recently, breast cancer has been one of the most dangerous tumors for middle-aged and older women in China. Mammography is the most reliable detection technique of breast cancer. In the mammograms, the most important focuses are masses and microcalcifications (MCs). But the detection of those symptoms usually costs the radiologists so much time and energy that the radiologists often feel tired and miss some important focuses, for the focuses usually appear indistinct. So many computeraided diagnosis (CAD) techniques have been developed to assist the radiologists to detect the mammograms 1,2. In those methods, a high detection precision of focuses was achieved, while the adaptability and the robustness often were not emphasized. So when the focuses with special features are processed, the precision will be reduced acutely, and the false positive (FP) rate can hardly be suppressed. In this paper, a novel CAD system was proposed, which used models to represent the symptom features, applied appropriate algorithms and adjustable parameters on the targets, and overcame the defects of the conventional methods. And a high diagnosis precision with a low FP rate was realized. In this experiment, all the mammograms were taken from the 1st affiliated hospital of Zhejiang university, with the gray-level resolution of 12-bit and the spatial resolution of 1500*2000. *

Supported by Nature Science Foundation of China (No. 60272029) and Nature Science Foundation of Zhejiang Province of China (No. M603227).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 639 – 644, 2006. © Springer-Verlag Berlin Heidelberg 2006

640

W. Xu, S. Xia, and H. Duan

(a) Fig. 1. Primary parts of the mammogram

(b)

Fig. 2. Thresholding result (a) and segmentation result (b) of the pectoral muscle

2 Pectoral Muscle Suppression Pectoral muscle is the triangle region at the corner of breast region in the MLO (medio-lateral oblique) mammograms, where the focuses of breast cancer couldn’t exist. So the detection region could be reduced, by removing the pectoral muscle. A model-based method was applied to fulfill it 3. Firstly, a series of ROI (region of interest) with different sizes were applied on the corner of the breast region. In each ROI, iterative thresholding technique was used to compute the optimal threshold. All of these thresholds were combined to a curve. Then, the local mean square deviation (MSD) at each point of the threshold curve was computed, and combined to a MSD curve. Each peak of the MSD curve denotes the inflection point of the threshold curve, which denotes the violent change of gray-level distribution. With the MSD curve, the optimal threshold of the pectoral muscle could be determined, and the corresponding region could be segmented. Each point of the edge of the thresholding region was extracted, and according to these points, zonal Hough transform was applied to detect the direction of the region edge. Different from Hough transform, zonal Hough transform registers the number of all the points lying on the parallel straight-lines of the current direction in the current zone, instead of that of the points on the current straight-line. It is used to detect the low-radian curve that approaches the straight lines. Based on this direction, two straight-lines were used to fit the pectoral muscle boundary, and elastic thread and polygon approaching techniques were carried out for the refinement. Thus, the pectoral muscle was segmented and removed accurately.

3 Microcalcifications Detection The MCs are the early focuses of breast cancer, which appear as the pieces with high intensity and contrast (Fig.3(a)), and could be represented with the high-frequency (HF) information of the mammogram. Wavelet was developed rapidly in 1990s. It could decompose the signals into HF and low-frequency (LF) domains level by level, called MRA (multi resolution analysis). Due to its smoothness and locality, wavelet has been widely applied in

A Novel Computer-Aided Diagnosis System of the Mammograms

641

many research fields. A usual wavelet-based technique is discrete wavelet transform (DWT). In each resolution, the image is decomposed into four subbands: LF subband L L and HF subbands L H , H L and H H . These three high-subbands are combined into a uniform HF domain, i.e. | L H | + | H L | + | H H | . And the HF signal denoting the MCs usually lies in the 2nd and 3rd levels of the wavelet domain. Thresholding with hysteresis was then applied to extract the high-intensity wavelet coefficients in the HF domain. Firstly, the signals in the HF domain were processed with a global thresholding: if the modulus was < T0 , the coefficient was deleted. Secondly, the reserved signals were processed with another global thresholding: if the modulus was > T1 , the signal was assured as the MC. Finally, a local thresholding was carried out on the neighborhood around each assured MC, and the remaining signals near the assured MCs were assured as the MCs, if their modulus was > T2 . Thus, the useful information with comparatively low HF modulus was extracted, leaving the noises with similar HF modulus suppressed. With the reconstruction of the assured signals in the HF domain, all the MCs were located accurately (Fig. 3(b)). Next, filling dilation was used to segment the MCs. The regions R 0 reconstructed above were considered as the original regions, and contrast in the neighborhood of them was enhanced with an intensity-remapping method. Then, R 0 began to expand outwards, with an iterative dilation process based on a cross-shaped structure element B , i.e. R1 = R 0 ⊕ B , , R n +1 = R n ⊕ B . The new combined point during the dilation process wouldn’t be accepted into the MC region, if its gray-level intensity f ( x , y ) could not satisfy | f ( x , y ) − f k e r | ≤ T 3 and | f ( x , y ) − f | ≤ T 4 . Where is the mean intensity of R 0 , f is the mean intensity of the accepted points in the neighborhood, and T3 , T4 are two thresholds. f ker

(a)

(b)

Fig. 3. Original MCs (a), located MCs (b) and Segmented MCs (c)

(d)

Fig. 4. Different appearance of the masses

642

W. Xu, S. Xia, and H. Duan

In order to make the detection more accurate and adaptive, adaptive-network-based fuzzy inference system (ANFIS) is used to adjust the detection parameters ( T0 , T1 ,

T 2 , T3 , T 4 ) automatically, according to the background features. ANFIS is an artificial neural network (ANN) technique based on Sugeno fuzzy model, which has high approaching precision, good generalization ability and could avoid the local minimum 4, so it could be applied for the auto-control of the detection process of the MCs. With the experiments, the optimal values of the parameters in different backgrounds could be measured, and three features of the neighborhood (mean intensity, contrast and fractal dimension) should be extracted simultaneously. Using ANFIS, the relation between those optimal values and background features could be learned. If a new mammogram is processed, its background features in each region should be extracted firstly, and the appropriate values of the parameters could be determined by ANFIS accordingly.

4 Masses Detection The masses are the most important focuses of breast cancer. In the mammograms, the masses usually appear as a high-intensity lump with a certain area inside the breast tissue. Among them, some mass appears as a solid block, some mass appears as a roundish pie, and some mass appears as a flocky starfish. In some case, there are some MCs within the mass region. Thus, two models are proposed to represent all kinds of masses 5. Model A represents the masses in the denser tissue (Fig. 4(a), 4(b)). In this model, there is a solid central part in the mass region, the pixels of which have nearly the same gray-level intensity. Other pixels of this mass region have different intensities, and the closer to the center, the higher the intensity is. The intensity on the edge of the mass is close to the background. And Model B represents the masses in the fatty tissue, (Fig. 4(c), 4(d)). In this model, the mass appears distinct and is easy to be segmented, but there’s no obvious solid part in the region. The variance of intensity of the pixels in the mass region is much lower than that on the edge. Whatever a mass appears as, it could be represented by these two models. The suspicious regions were extracted firstly, by peeling off the fatty tissue around the denser tissue and the masses. Iterative thresholding is applied to fulfill this task, because the suspicious region has high intensity and contrast. In this way, not only the suspicious regions, but also the masses matching Model B, which appear as the isolated lumps, were extracted from the breast. To locate the masses matching Model A, which are buried deeply in the denser tissue, DWT was used to decompose the suspicious regions with high intensity. If a mass with a solid central part lies in these regions, the modulus of the HF information at the corresponding position must be very low. Hence, in the 2nd and 3rd levels of the wavelet domain, the black-hole positions where the modulus of the HF signals in a neighborhood was close to zero were registered into a map, which usually denotes the solid central part of masses. Then, a region registration process was carried out on the position map, to remove the minor structure and label the black-hole regions where the masses probably lie.

A Novel Computer-Aided Diagnosis System of the Mammograms

(a)

(b)

(c)

Fig. 5. Iterative thresholding result

(c)

643

(b)

(d)

Fig. 6. Segmentation results of the masses

Afterward, filling dilation was applied to extract the masses matching Model A, the central part of which had been located above. For the sake of the extraction precision, Canny edge detector was used to restrict the segmentation process, which is based on the local edge normal directions and the zero-crossings of the 2nd derivatives. In this way, the gradients of the boundaries inside the breast region could be extracted, and they could be regarded as one of the segmentation restrictions. The gradient extracted with Canny edge detector was like the barrier, preventing the dilated mass region from getting across. During the segmentation process, besides the detection criterions of the MCs, another criterion could be described as I g ra d ( x , y ) ≤ T g ra d , where

I grad ( x , y ) is the modulus of the gradient, and Tgrad is a threshold. Simultaneously, ANFIS was utilized to adjust the detection parameters ( T 3 , T 4 , T g ra d ) adaptively, according to the background features (mean intensity, contrast and fractal dimension), just like the auto-control introduced in Section 3. Thus, the regions of the masses matching Model A could be segmented accurately (Fig. 6).

5 Classification and Experiments With the algorithms in Section 3 and 4, the MCs and the masses in the mammograms had been located and segmented, with a number of FPs. At last, MLP (multi-layer perceptrons) was used for the classification, reducing FPs and reserving the focuses. MLP is a conventional ANN technique, which has high approaching precision and good generalization ability. Compared with the local-approaching networks, MLP requires fewer training samples at the same precision and could deal with the highdimensionality problems, so it could be applied in the medical-image processing field. In this experiment, ten features were selected to represent the MCs, including area, mean intensity, contrast, coherence, compactness, ratio of pits, number of hollows, elongatedness, fractal dimension, and clustering number. Here, coherence is defined as the MSD of the region, compactness is the roundness, ratio of pits means the ratio of the number of the pits on the boundary to the circumference, elongatedness means the ratio of the length to the width, and clustering number means the number of the MCs around the current one. Another ten features were used to represent the masses:

644

W. Xu, S. Xia, and H. Duan

area, mean intensity, contrast, coherence, compactness, elongatedness, fractal dimension, edge contrast, boundary gradient intensity, boundary direction entropy. Here, edge contrast is the MSD near the edge, boundary gradient intensity is the mean modulus of the gradients on the boundary, and boundary direction entropy means the entropy of the gradient direction distribution histogram on the boundary. 60 MLO mammograms were used to test the segmentation method of the pectoral muscle in Section 2. In the 52 samples where the pectoral muscle exists, 49 samples were detected. In the 8 samples where there isn’t any pectoral muscle, 6 samples were identified as non-pectoral-muscle mammograms, while 2 samples were mistaken. 60 mammograms were used to test the MC detection method in Section 3. In the 163 true MCs, 162 MCs were detected, while 511 FPs extracted at the same time. The true MC regions were segmented by the radiologists manually, and the result was regarded as the criterion. Thus, the extraction effect of the MCs could be evaluated, by computing the ratio of the common area (the overlapped area of the auto-extracted region and the criterion region) to the criterion area. And the mean effect was 94.7%. 60 mammograms were used to test the mass detection algorithm in Section 4. In the 78 true masses, 75 masses were detected, while 449 FPs were extracted simultaneously. And the mean extraction effect of the masses was 94.2%. The MLP classifier introduced above was finally defined as: 3 layers, 10 input nodes, 20 hidden nodes, and 1 output node. The segmented MCs and the masses were inputted to the classifier, and the result is: 158 true MCs were identified, with 12 FPs; 73 true masses were identified, with 38 FPs. Combining the segmentation and classification result, the true positive rates of the MCs and the masses were 96.9% (158/163) and 93.6% (73/78) respectively, only with 0.2 and 0.63 FP per image. The performance of this system was much better than that of the conventional methods. In this system, a series of new effective techniques were utilized, and the adaptability and the robustness of them were emphasized. Modeling technique was applied to represent the MCs and the masses, so that appropriate methods could be carried out adaptively upon the problem with different features. And ANFIS was used for the auto-adjustment of the detection. Even when the focuses with special features and backgrounds were faced with, this system also could get a satisfying result.

References 1. Xia, S.R., Lv, W.X.: Advances in the Research of Computer-aided Diagnosis on Mammograms. Foreign Medical Science: Biomedical Engineering, Vol. 23. (2000) 24–28 2. Thangavel, K., Karnan, M., Sivakumar, R., Mohideen, A.K.: Automatic Detection of Microcalcification in Mammograms-a Review. ICGST International Journal on Graphics, Vision and Image Processing, Vol. 5. (2005) 31–61 3. Xu, W.D., Wang, X.Y., Xia, S.R., Yan, Y.: Study on Model-based Pectoral-Muscle Segment Algorithm in Mammograms. J. of Zhejiang Univ. (Eng. Sci.), Vol. 39. (2005) 437–432 4. Xu, W.D., Xia, S.R., Xie, H.: Application of CMAC-based Networks on Medical Image Classification. Lecture Note on Computer Science, Vol. 3173. (2004) 953–958 5. Xu, W.D., Xia, S.R., Duan, H.L., Xiao, M.: Segmentation of Masses in Mammograms Using a Novel Intelligent Algorithm. International Journal of Pattern Recognition and Artificial Intelligence, Vol. 20. (2006) 255–270

A Partial Curve Matching Method for Automatic Reassembly of 2D Fragments Liangjia Zhu1 , Zongtan Zhou1 , Jingwei Zhang2 , and Dewen Hu1 1 Department of Automatic Control, College of Mechatronics and Automation, National University of Defense Technology, Changsha, Hunan, 410073, P.R. China [emailprotected] 2 Hunan Supreme People’s Court, Changsha, Hunan, 410001, P.R. China

Abstract. An important step in automatic reassembly of 2D fragments is to ﬁnd candidate matching pairs for adjacent fragments. In this paper, we propose a new partial curve matching method to ﬁnd the candidate matches. In this method, the fragment contours are represented by their turning functions. The matching segments between two fragment contours are found by analyzing the diﬀerence curve between two turning functions directly. The performance of our method is illustrated with randomly shredded document fragments.

Introduction

Automatic reassembly of 2D fragments to reconstruct original objects is an interesting problem with applications in forensics[1], archaeology[2,3], and other disciplines. The fragments are often represented by their boundary curves and candidate matches between diﬀerent fragments are usually achieved by curve matching. Since matching between two fragments usually occurs over a fraction of their boundaries, partial curve matching is needed. The 2D fragments reassembly problem is similar to the automatic reassembly of jigsaw puzzles, which has been widely studied [4,5]. However, those solutions exploiting some speciﬁc features or a priori knowledge, e.g. puzzle pieces have smooth edges and well-deﬁned corners, are impractical in many real applications. More generally, the fragments reassembly problem can be considered as a special case of partial curve matching problem. Researchers have proposed many solutions to this problem with diﬀerent applications. Those solutions can be roughly divided into two kinds as to whether the fragment contour is sampled uniformly or not. One is string-matching based methods that represent fragment contours with uniformly sampled points. In [2], the curvature-encoded fragment contours are compared, at progressively increasing scales of resolution, using an incremental dynamic programming sequence-matching algorithm.Wolfson [6] proposed an algorithm that converts the curves into shape signature strings and applies string matching techniques to ﬁnd the longest matching substrings. This is also a curvature-like algorithm. However, the calculation of numerical curvature is not a trivial task as expected when noise exists [7]. The other is D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 645–650, 2006. c Springer-Verlag Berlin Heidelberg 2006

646

L. Zhu et al.

feature-based matching methods. In [3], fragment contours are re-sampled using polygonal approximation and the potential matching pairs are found through optimizing an elastic energy. However, a diﬀerence in the relative sampling rate of aligned contour segments can aﬀect the optimal correspondence and the match cost [8]. In this paper, we propose a partial curve matching method to ﬁnd the candidate matching fragment pairs. The fragment contours are represented by their turning functions and the matching segments are found by analyzing the difference curve between two turning functions directly. The curve similarity is evaluated as the residual distance of corresponding points after optimal transformation between two matching segments. This paper is organized as follows: Section2 presents our partial curve matching method. We present some experimental results in Section 3, and draw our conclusions in Section 4.

Partial Curve Matching Based on Turning Functions

We assume that the fragment contours have been extracted successfully from the scanned fragments image. The method of comparing two fragment contours can be formulated as follows. 2.1

Contour Representation

We ﬁrst build the turning function θ(s) for each fragment contour, as in [6]. Then, all θi (s), i = 1 : N are sampled with the same space δ and stored as character strings Ci , i = 1 : N in clockwise order. Note that the common segments of two matched fragments traverse in opposite directions. 2.2

Histogram Analysis on Δθ

Suppose the two fragment contours to be compared are CA = (a1 , a2 , · · · , am ) and CB = (b1 , b2 , · · · , bn ) with m ≤ n. At a moment, CA is shifted by d positions Δ d = (a1+d , a2+d , · · · , am+d ) = (ad1 , ad2 , · · · , adm ), (d is an integer) to become CA d the corresponding turning function becomes θA = θA (si + dδ), i = 1 : m. The d diﬀerence between θA and θB is deﬁned as Δ

d d = θB − θA = (b1 − ad1 , b2 − ad2 , · · · , bm − adm ) ΔθAB

(1)

d At this moment, if there exist two suﬃciently similar segments on CA and d CB , the corresponding part on ΔθAB will almost be a constant. Draw the hisd togram of ΔθAB to calculate the number of points lies in each sampling interval [iλ, (i + 1)λ] , i = 0 : tn , there must be a peak on the histogram corresponding to the matching segments. tn is the number of sampling intervals that determined by d d (m) − ΔθAB (1) ΔθAB (2) tn = λ

A Partial Curve Matching Method for Automatic Reassembly

647

Denote the indices of start and end points of each segment by start and end respectively. We only check the peaks with its height H > Hmax 2 and end − m start > t for candidate pairs of start and end points, where Hmax is the l maximum of the histogram and tl is an integer. m tl is the parameter controlling the minimum length of the permitted matching segments between contour A and B. An example of the relation between Δθ(s) and the histogram is given in Figure 1. The dashed dot lines mark the mean value of the selected segment.

Fig. 1. The relation between Δθ(s) and the histogram

This is just a primary selection for ﬁnding the correct pairs of start and end points. The candidate match pairs are selected according to the following decision rule. d , compute the Decision rule: For a segment (Δθstart , · · · , Δθend ) on ΔθAB standard deviation std, average deviation avd and angle change number acn as end (Δθi − mean)2 i=start (3) std = end − start end

sin(Δθi − mean)

i=start

avd =

end − start acn =

end−1

(4)

acni

(5)

1, if |Δθi+1 − Δθi | > t0 0, otherwise

(6)

i=start

where end

mean =

Δθi

i=start

end − start

acni =

If (1) std < t1 ; and (2) avd < t2 ; and (3) acn > t3 then the corresponding segments are selected as candidate matches.

648

L. Zhu et al.

The conditions (1) and (2) reﬂect the fact that if two segments are suﬃciently similar, then the overall angle turning tendency will almost be the same; condition (2) means that the diﬀerence curve of two well matched segments should be distributed near uniformly around its mean value; and condition (3) is used to avoid matching an almost straight segment with another segment. Other constraints can also be added to these conditions. One or more segments may be found each time when shift the shorter contour one step further. For comparing any two diﬀerent fragment contours, we have to shift the shorter contour CB n times, where n is the number of samples on contour CA . d For computing ΔθAB for each shift d, the total number of comparisons is m, where m is the number of samples on contour CB . Hence, the complexity of histogram analysis is O(mn). 2.3

Recovery the Transformation and Similarity

Given a pair of start points and end pints, we compute the appropriate matching contour segments in the (x, y) plane. Denote these contour segments by X and Y , then the optimal transformation Eopt between those two segments will minimize the l2 distance between EX and Y 2

|Eopt X − Y | = min |EX − Y |

(7)

As in [9], transform X with Eopt in the (x, y) plane to get the transformed segment X . Then X and Y are evenly sampled and represented by two sequence {ui } and {vj }. The curve similarity is evaluated by m

i=1

d(ui − Y ) +

d(vj − X )

j=1 2

(min(l1 , l2 ))

, d(ui , Y ) = min |ui − vj | ∀vj ∈Y

(8)

Here, m and n are the number of points in X and Y , l1 and l2 are the length of each segment respectively.

Experimental Results

We used the randomly shredded document fragments to test the algorithm. The algorithm was implemented on a Windows platform, and the programming language was C#. An AGFA e50 scanner was used as the image acquisition device. The fragments had been digitized in 150 dpi. Figure2(a) shows the image of the scanned fragments and its size is 730 × 953. The scanned image was thresholded in RGB space to get a binary image. The contour of each fragment was extracted from this binary image. Figure2 (b) shows the extracted contours. In the test, the number of fragments is N = 16. The parameters were set as δ = 3.57, λ = 0.2, tl = 15, t0 = 0.05, t1 = 0.3, t2 = 0.1, t3 = 3 and ts = 1. In comparing any two diﬀerent fragment contours, we may get several possible matches with the curve similarity smaller than ts . In this case, we only select the

A Partial Curve Matching Method for Automatic Reassembly

(a)

649

(b)

Fig. 2. (a) The image of scanned fragments, (b) extracted contours

Fig. 3. The ﬁrst 24 candidates returned by our partial curve matching method. The similarity S of each candidate match is showed on the left bottom of each grid. The true matches are marked with star(). Table 1. Comparison between our Method and Stolﬁ’s Method[2] Object Ours Document Stolﬁ’s Ceramic

Resolution 150dpi 300dpi

T 24 73

R 16 46

Recognition Rate 66.7% 63.0%

most similar one as the candidate match. In this test, there were 24 true matches in the original document; let T denote this set and R denote the recognized true matches from T . The algorithm started with 128 initial possible matches, and returned 30 matches with S < 1, of which 16 were true. Figure 3 shows the ﬁrst 24 candidate matches, in order of increasing S. Note that candidates 1-10 and 12-13, 15, 17, 18, 20 are all correct. Table 1 shows the comparison results between our method and Stolﬁ’s method. It is hard to mark a strict comparison between the performance of these two

650

L. Zhu et al.

methods because the test fragments are diﬀerent. However, one thing to note is that our method depends much less on the scan resolution.

Conclusions and Future Work

A turning function based partial curve matching method has been proposed to ﬁnd candidate matches for automatic reassembly of 2D fragments. The accuracy of the method was veriﬁed by our experiment. Finding the candidate matches is only the ﬁrst step to reassemble the original objects. We are now working on solving the global reconstruction problem to eliminate the ambiguities resulting from the partial curve matching. Our recent results will be reported in the not remote future.

Acknowledgement This work is supported by the Distinguished Young Scholars Fund of China (60225015), National Science Foundation (60575044), Ministry of Education of China (TRAPOYT Project), and Specialized Research Fund for the Doctoral Program of Higher Education of China (20049998012).

References 1. De Smet, P., De Bock, J., Corluy,E.: Computer Vision Techniques for Semiautomatic Reconstruction of Ripped-up Documents. Proceedings of SPIE. 5108 (2003) 189–197 2. Leit˜ ao, H.C.G., Stolﬁ, J.: A Multiscale Method for The Reassembly of Twodimensional Fragmented Objects. IEEE Transactions on Pattern Analysis and Machine Intelligence. 24 (2002) 1239–1251 3. Kong, W., Kimia, B.B.: On Solving 2D and 3D Puzzles Using Curve Matching. Proceedings of Computer Vision and Pattern Recognition. 2 (2001) 583–590 4. Burdea, C., Wolfson, H.J.: Solving Jigsaw Puzzles by A Robot. IEEE Transactions on Robotics and Automation. 5 (1989) 752–764 5. Yao, F.H., Shao, G.F.: A Shape and Image Merging Technique to Solve Jigsaw Puzzles. Pattern Recognition Letters. 24 (2003) 1819–1835 6. Wolfson, H.J.: On Curve Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (1990) 483–489 7. Calabi, E., Olver, P., Shakiban, C., Tannenbaum, A., Haker, S.: Diﬀerential and Numerically Invariant Signature Curves Applied to Object Recognition. International Journal of Computer Vision. 26 (1998) 107-135 8. Sebastian, T.B., Klein, P.N., Kimia, B.B.: On Aligning Curves. IEEE Transactions on Pattern Analysis and Machine Intelligence. 25 (2003) 116–125 9. Pajdla, T., van Gool, L.: Matching of 3-D Curves Using Semi-diﬀerential Invariants. Proceeding of International Conference, Computer Vision. (1995) 390-395

A Split/Merge Method with Ranking Selection for Polygonal Approximation of Digital Curve Chaojian Shi1,2 and Bin Wang2, 1

Merchant Marine College, Shanghai Maritime University, Shanghai, 200135, P. R. China Department of Computer Science and Engineering, Fudan University, Shanghai, 200433, P. R. China [emailprotected], [emailprotected]

Abstract. Polygonal approximation of digital curve is an important problem in image processing and pattern recognition. In traditional splitand-merge method (SM), there exists the problem of dependence on the given initial solution. For solving this problem, a novel split-and-merge method (RSM), which applies the ranking selection scheme of genetic algorithm to the split and merge process, is proposed. Experiments of using two benchmark curves to test RSM are conducted and show its good performance.

Introduction

Polygonal approximation of digital curve is a hot topic in pattern recognition and image processing and has won wide practical applications such as vectorization, map service, CAD and GIS applications. The polygonal approximation problem can be stated as follow: given a digital curve with N points, approximate it by a polygon with a given total number of segments M so that the total approximation error is minimized. The polygonal approximation problem is a NP-hard problem and the size of the search space is C(N, M ) [1]. In the past decades, many approaches have been proposed to solve the polygonal approximation problem. Some of them are based on local search strategy such as sequential tracing[2], split-and-merge method [3] and dominant points detection [4]. Others are based on global search technique such as genetic algorithm[5,1]and ant colony methods[6]. The local-search-based methods work very fast. However as the results depend on the selection of starting point or the given arbitrary initial solution, they usually lack of optimality. The approaches based on genetic algorithm, tabu search and ant colony methods can obtain better results, but require more computation time. So they are hardly ﬁt for real applications. In this paper, we propose a novel split-and-merge method (RSM). Diﬀerent from SM, RSM applies the ranking selection scheme of genetic algorithm to the split and merge process and eﬀectively solves the problem of ﬁnal solution’s

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 651–656, 2006. c Springer-Verlag Berlin Heidelberg 2006

652

C. Shi and B. Wang

dependence on the initial solution. Experiments of using two benchmarks to test RSM are conducted and show good performance.

Problem Statement

A closed digital curve C can be represented by a clockwise ordered sequence of points C = {p1 , p2 , . . . , pN }, where N is the number of points of on the curve and pi+N = pi . We deﬁne arc p i pj as the consecutive points pi , pi+1 , . . . , pj , and chord pi pj as the line segment connecting points pi and pj . The approximation error between p i pj and pi pj is deﬁned as e(p d2 (pk , pi pj ) (1) i pj , pi pj ) = pk ∈pi pj

where d(pk , pi pj ) is the perpendicular distance from point pk to the line segment pi pj . The polygon V approximating the digital curve C is deﬁned as a set of ordered line segments V = {pt1 pt2 , pt2 pt3 , . . . , ptM −1 ptM , ptM pt1 }, such that t1 < t2 < . . . < tM and {pt1 , pt2 , . . . , ptM } ⊆ {p1 , p2 , . . . , pN }, where M is the number of vertices of the polygon V . The approximation error between the curve C and its approximating polygon V is deﬁned as follows: E(V, C) =

e(pti pti+1 , pti pti+1 )

(2)

i=1

Then the polygonal approximation problem is formulated as follows: Given a closed digital curve C = {p1 , p2 , . . . , pN } and an integer number 3 ≤ M ≤ N . Let SP be the set of all the polygons which approximate the curve C. Let SSP = {V | V ∈ SP ∧ |V | = M }, where |V | denotes the cardinality of V . Find a polygon P ∈ SSP such that E(P, C) = min E(V, C) V ∈SSP

(3)

The Traditional Split-and-Merge Method

The traditional split-and-merge method (SM) is a recursive method starting with a initial polygon V = {pt1 pt2 , pt2 pt3 , . . . , ptM −1 ptM , ptM pt1 }, which approximates the curve. At each iteration, ﬁrstly, a split process is performed. Among all the curve’s points, select the point pk with the farthest distance from its corresponding edge pti pti+1 , and then remove the edge pti pti+1 and add two new edges pk pti and pk pti+1 to the polygon. We consider the process as splitting the edge pti ptj at point pk and term the point pk splitting point. Secondly, the merge process is performed. Among all the vertices of the polygon, select the vertex ptj which has the minimum distance from the line segment connecting two adjacent vertices ptj−1 and ptj+1 , and then remove the edges ptj−1 ptj and ptj ptj+1 and add edge

A Split/Merge Method with Ranking Selection

653

Fig. 1. Split-and-merge process

ptj−1 ptj+1 to the polygon. We consider the process as merging the edges ptj−1 ptj and ptj ptj+1 at vertex ptj and term the vertex ptj merging point. Fig. 1 give an example to illustrate the split and merge processes. Repeat the above processes until the number of iteration is equal to a pre-speciﬁed number. The disadvantage of this method is that, if a bad initial polygon is given, the obtained ﬁnal solution may be far away from the optimal one. Therefore, SM is not stable and depends on the given initial solution.

The Proposed Method

In this section, a novel split-and-merge method (RSM), which applies ranking selection scheme of genetic algorithms to the split-and-merge process, is proposed. 4.1

Splitting Strength and Merging Strength

Let C = {p1 , p2 , . . . , pN } be a digital curve and V = {pt1 pt2 . . . , ptM −1 ptM , ptM pt1 } be its approximating polygon. In the following, we give the deﬁnitions of the splitting strength at the point of the curve C and merging strength at the vertex of the polygon V . ptk+1 and ptk ptk+1 ∈ V , the splitting strength at Deﬁnition 1. Suppose pi ∈ ptk the point pi is deﬁned as S(pi ) = d(pi , ptk ptk+1 )/(1 + d(pi , ptk ptk+1 )).

(4)

Deﬁnition 2. Assume that ptk be a vertex of the polygon V , ptk−1 and ptk+1 be its two adjacent vertices. The merging strength of the vertex ptk is deﬁned as M (ptk ) = 1/(1 + d(ptk , ptk−1 ptk+1 )).

(5)

654

4.2

C. Shi and B. Wang

Ranking Selection Strategy

Selection is an important phase of genetic algorithms (GA). A wide variety of selection strategies have been proposed. Most of them are based on ﬁtnessproportionate selection and may lead to premature convergence. To avoid premature convergence, Baker proposed a ranking selection scheme in [9]. The idea of this strategy is that: at each generation, all the individuals in the population are sorted according to their ﬁtness value, and each individual is assigned a rank in the sorted population. For N individual in the population, the best individual gets rank 1, whereas the worst receives rank N. The selection probabilities of the individuals are given by some function of their rank. Let P = {x1 , x2 , . . . , xN } denote the sorted population and f (x1 ) ≥ f (x2 ) ≥ . . . ≥ f (xN ), where f (·) is the ﬁtness function of the individual. Then the selection probability p(xi ) must satisﬁes the following conditions: (1) p(x1 ) ≥ p(x2 ) . . . ≥ p(xN ) and (2) N p(xi ) = 1. i=1

Inspired by the above selection strategy, we apply it to the traditional splitand-merge method for the selection of splitting and merging points. A function for calculating the selection probabilities is developed here. Assume that C = {x1 , x2 , . . . , xM } be an ordered set of points. Here, we let the ordered set C corresponds to a sorted population and each point of C corresponds to an individual. Then we can use the above ranking selection strategy to perform the selection of points in C. For each point xi , we assign a selection probability p(xi ) to it and calcaulate the p(xi ) via the following equations: ⎧ ⎨ p(xi ) = p(xi−1 ) · e−t/(i−1) , i = 2, . . . , M M ⎩ p(xi ) = 1

(6)

i=1

where t is a parameter which is used to adjust the probability distribution. In general, we empirically set the parameter t in [1.4, 2.4]. 4.3

Algorithm Flow

The proposed algorithm has two parameters, one is the parameter t for adjusting the probability distribution, the other is the number G of iterations. input. The digital curve C and the number of polygon’s sides M . output. The polygon B with M edges which approximates C. step 1. Generate an initial polygon V with M edges by randomly selecting M points from C as the vertices of the polygon. Set B = V and k = 0 . step 2. For those points of C which are not the vertices of the polygon, calculate their splitting strength using Eq. 4. step 3. Sort these points by their splitting strength value in descending order and select a point by the ranking selection strategy. Then, perform splitting process at the selected point.

A Split/Merge Method with Ranking Selection

655

step 4. For each vertex of V , calculate its merging strength value using Eq. 5. step 5. Sort these vertices by their merging strength in descending order and select a vertex using the ranking selection strategy. Then, perform merging process at the selected vertex. step 6. Compute the approximation error of the polygon V using Eq. 2. If it is smaller than the approximation error of polygon B, then replace B with V . step 7. Set k + 1 to k, if k T , where T is a given threshold. Although the eigenbackground model exploits the correlation of pixels and oﬀer less computational load compared to pixel-based methods, it fails to deal with the dynamic background because the eigenspace is learned from the training set oﬀ-line and do not update during the detection procedure.

672

L. Wang et al.

Adaptive Background Modeling

In order to model dynamic background, we propose an incremental method that updates the eigenspace of the background model using a variant sequential Karhunen-Loeve algorithm which in turns is based on the classis R-SVD method. In addition, linear prediction model is employed to make the detection more robust. 3.1

Incremental Update of Eigenspace

The SVD of d × n matrix X = U SV T . The R-SVD algorithm provides an eﬃcient way to carry out the SVD of a larger matrix X ∗ = [X, E], where E = [In+1 , In+2 , · · · , In+k ] is a d × k matrix containing k incoming observations as follows [5]: 1. Use an orthonormalization process (e.g., Gram-Schmidt algorithm) on [U, E] to ˜ obtain an orthonormal matrix " U #= [U, E].

V 0 be a (n+k)×(n+k) where Ik is a k dimensional 0 Ik identity matrix. It follows then, " T# " # " T # " # U S UT E U XV U T E V 0 S = U T X ∗ V = ˜ T [X, E] = ˜T = . (1) 0 Ik E E XV E˜ T E 0 E˜ T E

2. Let the matrix V =

˜ S˜V˜ T and the SVD of X ∗ is 3. Compute the SVD of S = U ˜ )S( ˜ V˜ T V T ). ˜ S˜V˜ T )V T = (U U X ∗ = U (U

(2)

˜ is an d × (n + k) columnwhere S˜ is a diagonal (n + k) × (n + k) matrix, U U ˜ orthonormal matrix and V V is an (n + k) × (n + k) column-orthonormal matrix. Based on the R-SVD method, the sequential Karhunen-Loeve algorithm is able to perform the SVD computation of larger matrix X ∗ eﬃciently using the smaller matrices U , V and the SVD of smaller matrix S . Note that this algorithm enables us to store the background model for a number of previous frames and perform a batch update instead of updating the background model every frame. 3.2

Detection

We use linear prediction [6,7] to detect foreground. This method employs a Wiener ﬁlter to estimate pixel intensity value of each pixel using latest P frames. Let I (t − 1), I (t − 2), · · · , I (t − P ) present the projections of latest P frames onto the eigenspace, i.e. I (t − i) = ΦTM (I(t − i) − μb ), i = 1, 2, · · · , P . The projection of current frame onto the eigenspace can be predicted as:

Ipred (t) =

P i=1

ai I (t − i).

(3)

Adaptive Eigenbackground for Dynamic Background Modeling

673

the current frame can be computed as:

Ipred (t) = ΦM Ipred (t) + μb .

(4)

diﬀerences between the predicted frame and the current frame are computed and thresholded, the foreground points are detected at the locations: |I(t) − Ipred (t)| > T , where T is a given threshold. 3.3

The Proposed Method

Put the initialization, detection and eigenspace update modules together, we obtain the adaptive background modeling algorithm as follows: 1. Construct an initial eigenspace: From a set of N training images of background {Ii }t=1···N , the average image μb is computed and mean-subtracted images X are obtained, then the SVD of X is performed and the best M eigenvectors are stored in an eigenvector matrix ΦM . 2. Detection: For an incoming image I, the predicted projection Ipred is ﬁrst computed then it is reconstructed as Ipred , foreground points are detected at locations where |I − Ipred | > T . 3. Update the eigenspace: Store the background model for a number of previous frames and perform a batch update of the eigenspace using sequential Karhunen-Loeve algorithm. 4. Go to step 2.

(a)Input images

(b)Detection Results Fig. 1. Detection results of the ﬁrst image sequence

674

L. Wang et al.

Experiments

In order to conﬁrm the eﬀectiveness of the proposed method, we conduct experiments using three diﬀerent image sequences. The ﬁrst is the scene of the ocean front which involves waving water surface. The second is the scene of the fountain which involves long term changes due to fountaining water and illumination

(a)Input images

(b)Detection Results Fig. 2. Detection results of the second image sequence

(a)Input images

(b)Detection Results Fig. 3. Detection results of the third image sequence

Adaptive Eigenbackground for Dynamic Background Modeling

675

changes. The third is the scene of a lobby where the lights switch. In order to reduce complexity, the images are divided into equal size blocks and each block is updated and detected individually in our experiments. Experimental results are shown in Fig. 1, Fig. 2 and Fig. 3. We can see from the results that the proposed method is able to give good performance when the appearance of the background changes dramatically. Our current implementation of the proposed method in MATLAB runs about six frames per seconds on a Pentium IV 2.4GHz processor and can certainly be improved to operate in real time.

Conclusion

In this paper, we extend the eigenbackground by proposing an eﬀective and adaptive background modeling approach that 1)updates the eigenspace on-line using the sequential Karhunen-Loeve algorithm; 2)employs linear prediction model for object detection. The advantage of the proposed approach is its ability to model dynamic background. Through experiments, we claim that the proposed method is able to model the background and detect moving objects under various type of background scenarios and with close to real-time performance.

References 1. Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000) 831–843 2. Friedman, N., Russell, S.: Image Segmentation in Video Sequences. In: Proceedings of the Thirteeth Conference on Uncertainty in Artiﬁcal Intelligence. (1997) 175–181 3. Stauﬀer, C., Grimson, E.: Adaptive Background Mixture Models for Real-time Tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Volume 2. (1999) 246–252 4. Mittal, A., Paragios, N.: Motion-based Background Subtraction using Adaptive Kernel Density Estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Volume 2. (2004) 302–309 5. Ross, D., Lim, J., Yang, M.H.: Adaptive Probabilistic Visual Tracking with Incremental Subspace Update. In: Proceedings of the Eighth European Conference on Computer Vision. Volume 2. (2004) 470–482 6. Monnet, A., Mittal, A., Paragios, N., Ramesh, V.: Background Modeling and Subtraction of Dynamic Scenes. In: Proceedings of the Ninth IEEE International Conference on Computer Vision. (2003) 1305–1312 7. Toyama, K., Krumm, J., Brumitt, B., Meyers, B.: Wallﬂower: Principles and Practice of Background Maintenance. In: Proceedings of the Seventh IEEE International Conference on Computer Vision. Volume 1. (1999) 255–261

Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value Dong-Woo Kim, Young-Jun Song, Un-Dong Chang, and Jae-Hyeong Ahn Chungbuk National University, 12 Gaeshin-dong, Heungduk-gu, Chungbuk, Korea {[emailprotected], [emailprotected], [emailprotected], [emailprotected]}

Abstract. As a result of development of the internet and increase of digital contents, management of image information has become an important field. And appearance of content-based image retrieval has been developing the systematic management of image information much more. The existing method used several features such as color, shape, texture, etc, and set them as weight value, which caused every image to have big precision difference. The study used the fuzzy-integral method to improve the above problem, so that it has produced the optimum weight value for each image. And the proposed method, as a result of being applied to 1,000 color images, has showed better precision than the existing.

1 Introduction Today, development of computer technology and digital contents has made it possible to easily acquire and store various image as well as text. Such image information has been easy to store and increased in use, but has been more difficult in management. Particularly, image retrieval, at the early stage, used text-based image retrieval [1], but a variety of data like image had a limit in retrieval methods using text or keyword. Therefore, effective image management needed new retrieval methods, so CBIR(content-based image retrieval)[2] has appeared which makes objective and automatic image retrieval possible by automatically extracting and retrieving features from image itself. The major problem of the above method is extracting features, and finding similarity between queried image and images within database. This study has proposed an adaptive content-based image retrieval method that extracts features from color, texture, and shape, and uses fuzzy-integral image retrieval. As for the content-based image retrieval method, first, the typical technique retrieving color information uses color histogram proposed by Swain[3]. Second, the technique using texture took advantage mostly of frequency transformation domain; Wu et al. [4] used DCT (discrete cosine transform), and Yuan et al. [5] proposed a method using wavelet. Third, as a method using shape, Jain et al. [6] proposed a retrieval method used in limited applications like logo or trade-mark retrieval. Now, the method[7] using a mixture of 2~3 features, not using just each of 3 features, and the method[8] using neural network are proposed. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 676 – 682, 2006. © Springer-Verlag Berlin Heidelberg 2006

Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value

677

On the other hand, fuzzy set proposed by Zadeh[9] in 1965 considers as fuzziness the degree of ambiguity resulting from human subjective judgment, and treats its degree as a fixed quantity. Fuzzy measure and fuzzy integral, mathematical conception proposed by Sugeno[10] in 1972, tries to overcome the limitation of such ambiguity through the fuzzy evaluation that transits the general additive into nonadditive method. In comparing similarity, such fuzzy integral increases precision by giving the optimum weight value between several features. The rest of the paper is organized as follows. Section 2 describes the proposed method, and Section 3 shows the experimental results. Last, Section 4 discusses the consequence.

2 Feature Extract and Similarity Comparison 2.1 Feature-Region Extract Extraction of feature region mostly uses color, texture, and shape information. Of them, color and texture features use after dividing the existing region[11]. At this time, color information acquired for each region is RCFV (region color feature vector); it is expressed as the following eq.(1) if region number is rk, and histogram ratio pi. RCFV = [rk , pi ], (k = 1, 2,, N , i = 1, 2,, M )

(1)

Here, M means quantization color level and N means the total number of blocks dividing the region of image; M for the study is 12, and N is 16. N is the experimental value. Texture information compensated the part using only the DC of the DCT by adding AC information. As using all AC increases calculation complexity, AC coefficients are recomposed of just each direction component. The size of each region of the proposed method is 64×64, so DCT transformation into 8×8 block can acquire 64 DC coefficients and synthesized AC coefficients. The average of DC coefficients and the average of AC coefficients are expressed as dk and akj each, and used as texture feature so that the acquired coefficients may be used as feature vector for each region. The acquired texture feature is expressed as a vector type like eq. (2) if the acquired texture feature is RTFV (region texture feature vector), region number rk, DC value dk, and the average of horizontal and vertical and diagonal line and the rest AC coefficient akj. RTFV = [ rk , d k , a kj ],

( k = 1, 2, , N ,

j = 1, 2, 3, 4)

(2)

Shape information uses edge. Edge pixel is selected just as over 128, the central value of lightness, for detecting only important edge. The selected edge pixel, in order to exclude minute edges, can be recognized as edge only when linked consecutively 3 times. Each edge histogram extracted from image is acquired according to each region, and used as shape feature. The acquired each-region edge histogram (RSFV: region shape feature vector) is expressed as a vector type like eq. (3) if region number is rk, and region edge histogram ek. RSFV = [rk , ek ], (k = 1, 2,, N )

(3)

678

D.-W. Kim et al.

Color feature vector (RCFV), texture feature vector (RTFV), and shape feature vector (RSFV) can be merged if the same-size regions are used. That is, merging color, shape, and texture in each region can raise precision. Equation (4) expresses RFV(region feature vector), which merges RCFV, RTFV, and RSFV. RFV = [ rk , pi , d k , a kj , ek ] (4) The acquired RFV has 1 shape feature, 5 texture features, and 12 color features according to the 12 levels of quantization for each of the 16 regions. 2.2 Comparison of Fuzzy-Integral Similarity Various methods for similarity comparison have been proposed[12]. Of them, the study used histogram intersection function with less calculation than the others. At this time, using several features arbitrarily fixes the weight value of each feature or manually sets its weight value. Therefore, setting weight value, when using fuzzy integral, can raised the efficiency of retrieval. As for the proposed method, fuzzy measure is set as item X = {x1, x2, x3}; x1 is established by color, x2 by texture, and x3 by shape. H, the power set of each item, is ĳ, {x1}, {x2}, {x3}, {x1, x2}, {x1, x3}, {x2, x3}, {x1, x2, x3}. At this time, g(xi), the fuzzy measure of each set, is shown in table 1 as precision appearing in retrieving optional 30 images just up to the 10th order by the method chosen as a power set. The values of fuzzy measures are experimental values. Table 1. Fuzzy measures

H ĭ x1 x2 x3 x1, x2 x1, x3 x2, x3 x1, x2, x3

Means ĳ Color Texture Shape Color, Texture Color, Shape Texture, Shape Color, Shape, Texture

g(xi) 0.00 0.80 0.35 0.20 0.92 0.85 0.43 1.00

The weight values are applied to fuzzy measure for database image and each queried image each. Equation (5) expresses the chosen fuzzy measure when normalized measure (Nxi) is xi as a single case. And equation (6) expresses Wxi(weight value applied to fuzzy measure) when the chosen measure is xm. Nxi =

g ( xi )

, i ∈{1, 2, 3}

¦ g(x

)

(5)

j =1

Nx i + ( Nx m × Nx m ), ° § ° ¨ Wx i = ® Nx − Nx × Nx m × ° i ¨¨ m °¯ ©

xi = x m · Nx i ¸ , i ∈ {1, 2, 3} ¸, x i ≠ x m Nx ¦ j¸ j∉{m} ¹

(6)

Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value

679

The weight value of each feature is expressed in table 2 when it is substituted in eq.(5), and (6) by the fuzzy measure in table 1. Table 2. The weighting value of features

Selected feature x1 x2 x3 x1, x2 x1, x3 x2, x3

Color(Wx1) 0.96 0.56 0.59 0.48 0.60 0.39

Texture(Wx2) 0.03 0.32 0.25 0.43 0.14 0.41

Shape(Wx3) 0.01 0.12 0.16 0.09 0.26 0.20

As a result, the final similarity, as in eq.(7), results from what multiplication the original similarity by weight values. 3

¦ (x

i =1

× Wxi )

(7)

Fig.1 shows the whole block diagram of proposed method. In fig. 1, the solid lines mean the creation of feature vectors from input image and the dotted lines query processing. Input image

Query image

Feature extraction Color

Feature extraction

Fuzzy integral

Compare to similarity

Retrieval

Texture

& Fuzzy measure

Shape Feature DB

Fig. 1. The whole block diagram of proposed method

3 Experimental Results The study evaluated the performance of the proposed content-based image retrieval system of 1,000 natural images. 1,000 pieces of images were divided into 10 groups with 100 pieces each; the same group was composed of similar images. Each of the images is 256×384 size or 24bit color jpeg of 384×256 size, and often used in content-based image retrieval [13]. The study used precision and recall for evaluating the efficiency of the system [12]. The experiment compared 2 methods; one was a method making the weight value of 3 features fixed by adding shape information to the existing method [11], the other was the proposed method applying the optimum weight value by using fuzzy integral.

680

D.-W. Kim et al.

The whole performance of each method is shown by table 3 acquiring the average precision of up to the 10th order. As for the whole performance, it has been found that the proposed method has better performance than the existing method using fixed weight value. According to the result of table 3, the highlydependent-on-color image group like horses, flowers, buses showed good precision even when compared by color-centered fixed weight value; particularly, a highlydependent-on-color image like flowers showed a little better precision than by the proposed method. But it has been found that a less-dependent-on-color image like African remains showed much lower precision by the existing method. The proposed method, however, compared with the existing method, has been found to increase precision by decreasing the weight value of color and increasing the weight value of texture and shape information. Table 3. The precision of each method

Image group Horse Flower Bus Africa Ruins

Existing method 0.93 0.94 0.88 0.70 0.61

Proposed method 0.94 0.92 0.91 0.79 0.72

Fig.2 shows the result of retrieval comparing the existing and the proposed method by querying remains images. Remains images are comparatively hard to retrieve, but because remains are mostly buildings, taking good advantage of shape information can improve problems hard to retrieve only with color.

(a)

(b) Fig. 2. The result of query(ruins), where (a) is the result of existing method and (b) is the result of proposed method

Adaptive Content-Based Image Retrieval Using Optimum Fuzzy Weight Value

681

precisio n

According to the retrieval results, fig. 2(a) retrieved color, texture, and shape information by the existing fixed weight-value method; so wrongly retrieved mountain image at the 5th order and elephant image at the 9th order. As for fig. 2(b), weight value was adaptively applied by the proposed method, which showed better retrieval result than by the existing method even though image was wrongly retrieved at the 10th order. Therefore, the proposed method showed better precision. Fig.3 shows a graph of the acquired recall and precision of remains images. The proposed method, taking optimum advantage of texture and shape information, showed better performance than the existing method. 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1

0.2

0.3

0.4

rec all pro pos ed method

exis tin g method

Fig. 3. Precision vs recall(ruins)

4 Conclusions Today, the content-based image retrieval system uses several multiple features, not just one feature. These methods, when comparing similarities between extracted features, give the weight value of features by human subjective judgment or set weight value manually. In this case, weight value wrongly established for image decreases precision. The study has proposed a method using fuzzy integral for the weight value of each feature in order to improve the existing method. As the result of experimenting 1,000 color images, the weight-value similarity retrieval method using fuzzy integral, which the study proposes, has been found to be more excellent in objective performance (precision and recall) than the existing method.

Acknowledgments This work was supported by the Regional Research Centers Program of the Ministry of Education & Human Resources Development in Korea.

682

D.-W. Kim et al.

References 1. Chang, S.K., Yan, C.W., Dimitroff, D.C., and Arndt, T.: An Intelligent Image Database System. IEEE Trans. Software Eng. Vol. 14. No. 5. (1988) 681–688 2. Saha, S.K., Das, A.K., Chanda, B.: CBIR using erception ased exture and olour Measures. Proceedings of Pattern Recognition ICPR 2004. Vol. 2. (2004) 985–988 3. Swain, M.J., Ballard, D. H.: Color Indexing. International Journal of Computer Vision. Vol. 7. No. 1. (1991) 11–32 4. Wu, Y.G. and Liu, J.H.: Image Indexing in DCT Domain. Proceedings of ICITA 2005. Vol. 2. (2005) 401–406 5. Yuan, H., Zhang, X.P., Guan, L.: A Statistical Approach for Image Feature Extraction in the Wavelet Domain. Proceedings of IEEE CCECE 2003, Vol. 2. (2003) 1159–1162 6. Jain, A.K. and Vailaya, A.: Shape-based Retrieval: A Case Study with Trademark Image Databases. Pattern Recognition, Vol. 31. No. 9. (1998) 1369–1390 7. Besson, L., Costa, A.D., Leclercq, E., Terrasse, M.N.: A CBIR -Framework- using Both Syntactical and Semantical Information for Image Description. Proceedings of Database Engineering and Applications Symposium 2003. (2003) 385–390 8. Han, J.H., Huang, D.S., Lok, T.M., Lyu, M.R.: A Novel Image Retrieval System based on BP Neural Network. Proceedings of IJCNN2005. Vol. 4. (2005) 2561–2564 9. Zadeh, L.A.: Fuzzy Sets. Information and Control. Vol. 8. (1965) 89–102 10. Sugeno, M.: Fuzzy Measures and Fuzzy Integrals: A Survey. Fuzzy Automata and Decision Processes. (1977) 89–102 11. Kim, D.W., Kwon, D.J., Kwak, N.J., Ahn, J. H.: A Content-based Image Retrieval using Region based Color Histogram. Proceedings of ICIC 2005. (2005) 12. Vittorio, C., Lawrence, D. B.: Image Database. John Wiley & Sons Inc. (2002) 379–385 13. Wang, J.Z., Li, J., Wiederhold, G.: Simplicity: Semantics-Sensitive Integrated Matching for Picture Libraries. IEEE Trans. on Pattern Analysis and Machine Intelligence.Vol. 23. No. 9. (2001) 947–963

An Adaptive MRF-MAP Motion Vector Recovery Algorithm for Video Error Concealment* Zheng-fang Li 1, Zhi-liang Xu 2,1, and De-lu Zeng1 1

College of Electronic & Information Engineering, South China University of Technology Guangzhou, 510641, China. 2 Department of Electronic communication Engineering, Jiang Xi Normal University Nanchang, 330027, China. [emailprotected]

Abstract. Error concealment is an attractive approach to combat channel errors for video transmission. A motion vector recovery algorithm for temporal error concealment is proposed. The motion vectors field is modeled as Gauss-Markov Random Field (GMRF) and the motion vectors of the damaged image macroblocks can be recovered adaptively by Maximum a Posteriori (MAP). Simulation results show that the proposed method offers significant improvement on both objective PSNR measurement and subjective visual quality of restored video sequence.

1 Introduction Most of international video coding standards can obtain high image quality at low bit rate, based on block discrete cosine transform (BDCT), motion compensation (MC), and variable length coding (VLC) techniques. However the highly compressed video data will be more sensitive to the channel error. The loss of one single bit often results in the loss of the whole block or several consecutive blocks, which seriously affects the visual quality of decoded images at the receiver. Error concealment (EC) technique is an attractive approach that just takes advantage of the spatial or temporal information that come from the current frame or the neighboring frames to recover the corrupted areas of the decoded image. EC technique requires neither the additional bit rate nor the modification of the standard coding algorithms. Traditional EC methods include BMA [1], AVMV [2], MMV [3], and so on. Recently, many creative works [4-7,10] in this field have been presented. In this paper, we focus on temporal EC to conceal the missing image blocks, which belong to inter-coded frame. *

The work is supported by the National Natural Science Foundation of China for Excellent Youth (60325310), the Guangdong Province Science Foundation for Program of Research Team (04205783), the Specialized Prophasic Basic Research Projects of Ministry of Science and Technology, China (2005CCA04100), the Growing Foundation of Jiangxi Normal university for Youth (1336).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 683 – 688, 2006. © Springer-Verlag Berlin Heidelberg 2006

684

Z.-f. Li, Z.-l. Xu, and D.-l. Zeng

2 Motion Vector Recovery Based on MRF-MAP 2.1 Motion Vectors Field Model of MRF The motion vectors field was modeled as MRF by Salama [9]. The potential functions are chosen such that N1 −1 N 2 −1 3

¦Vc (v) = ¦ c∈C

i =0

¦ ¦b j = 0 m =0

m i, j

§ ρ ( Dm (vi , j ) · ¨ ¸ σ © ¹

(1)

where N1 and N2 are the number of MBs on vertical and horizontal direction of a frame image respectively, vi , j is the motion vector of MB(i,j), ρ is a cost function,

bim, j is weighting coefficients, σ is a scaling factor, c is cliques, the set of cliques is C={{(i,j-1), (i,j)}, {(i-1,j+1), (i,j)}, {(i-1,j), (i,j)},{ (i-1,j-1), (i,j)}}.

Dm (⋅) has the

following form:

D0 (vi , j ) = vi , j −1 − vi , j , D1 (vi , j ) = vi −1, j +1 − vi , j D2 (vi , j ) = vi −1, j − vi , j , D3 (vi , j ) = vi −1, j −1 − vi , j

(2)

The minimum of equation (3) can be obtained by means of iterative conditional modes (ICM) algorithm. The MAP estimate of motion vector vˆi , j of MB(i,j), given its neighboring motion vectors is i +1 j +1 3 § ρ ( Dm (vl , k ) · ˆvi , j = arg min ¦¦¦ blm,k ¨ ¸ vi , j l =i k = j m = 0 σ © ¹

where the parameters

(3)

blm,k and σ were set to unit value by Salama [6], ρ was cho-

sen as Huber function. The estimate of motion vector of the damaged MB can be obtained by equation (3). However, in this motion vector estimate algorithm, the spatial correlation among neighboring MBs hasn’t been considered in this algorithm (MRF-MAP) proposed by Salama [9]. In order to improve the precision of the estimated motion vector based on MRF-MAP, we propose an adaptive MRF-MAP (AMRF-MAP) algorithm to estimate the vector of the damaged MB. Considering a GMRF model, ρ is a quadratic function. The minimization of equation (3) yields a unique global solution. σ is set to unit value, so the estimated motion vector

vˆi , j =

vˆi , j of MB (i, j ) is given by

( k ,l )∈U

where

bi , j →k ,l ⋅ vk ,l

( k ,l )∈U

bi , j →k ,l

(4)

bi , j →k ,l is the weight assigned to the difference between the values of the mo-

(i, j ) and the motion vector of MB (k , l ) , U is the set of neighboring MBs of MB (i, j ) . tion vector of MB

An Adaptive MRF-MAP Motion Vector Recovery Algorithm

685

2.2 Adaptive Weight Selection In our adaptive MRF model, the weight is selected adaptively, based on the spatial information (neighboring pixels) and temporal information (neighboring motion vectors).

Fig. 1. Missing block and neighboring blocks

Let the size of MB is N × N . The dark block is the damaged MB, and the light blocks are the neighboring MBs as shown in Fig. 1. The size of the damaged Mb is enlarged to ( N + 1) × ( N × 1) . Let the motion vector of the damaged MB is V . When the motion compensation is performed, there will be single pixel overlapping (the grid area as shown in Fig.1) between the concealed MB and the neighboring MBs. In order to measure the degree of smoothness of the overlapping v

area, a function S is defined as follows: N −1

S Lv = ¦ f ( x0 + i, y0 − 1, n) − f ( x0 + vx + i, y0 − 1 + v y , n − 1) i =0

N −1

S Rv = ¦ f ( x0 + i, y0 + N , n) − f ( x0 + i + vx , y0 + N + v y , n − 1) i=0

N −1

STv = ¦ f ( x0 − 1, y0 + i, n) − f ( x0 − 1 + vx , y0 + v y + i, n − 1) i =0

N −1

S Bv = ¦ f ( x0 + N , y0 + i, n) − f ( x0 + N + vx , y0 + v y + i, n − 1) i=0

S v = S Lv + S Rv + STv + S Bv where

(5)

( x0 , y0 ) is the upper left coordinate of the enlarged damage MB, n represents

current frame and n-1 represents the referenced frame. The motion vector of the damaged MB is V , and its x , y components by Vx and Vy respectively. L , R , T , B represent the left, right, top and bottom directions respectively. motion vectors of the neighboring MBs, ( k , l ) ∈ U . If

vk ,l

V = Vk ,l , the corresponding

can be obtained by equation (5). The smaller value of

ability that

V equals to Vk ,l .

Vk ,l is one of the

vk ,l

, the bigger prob-

686

Z.-f. Li, Z.-l. Xu, and D.-l. Zeng

Fig. 2. Classification of the motion vectors

In addition, since neighboring MBs in one frame often move in a similar fashion, we can group motion vectors that have similar motions into a number of groups. The motion vectors are sorted into 9 classes according to the direction and magnitude information [8] as shown in Fig.2. Let G1, G2, G3,…, G9 denote the set of 9 classes. There is a counter Ci (i= 1, 2, ….9) for each of the nine classes. The counter Ci be used for store the number of motion vectors which belong to corresponding Gi. The bigger value of Ci, the bigger probability that the motion vector V of the damaged MB belongs to Gi. According to the above analysis, we define bi , j →k ,l as follows:

bi , j →k ,l = Ck ,l ⋅ where

min( S

vm,n

)

(6)

vk ,l

(m, n) ∈ U , Ck ,l ∈ Ci ( i = 1, 2,3,...9 ).

Substituting (6) into (4), the estimated motion vector

vˆi , j =

Ck , l ⋅

( k ,l )∈U ( m , n )∈U

min( S

vm,n

vk ,l

)

⋅ vk ,l

vˆi , j of MB (i, j ) becomes:

( k ,l )∈U ( m , n )∈U

Ck , l ⋅

min( S

vm,n

vk ,l

)

(7)

3 Simulation Results Four YUV ( 144× 176 ) grayscale video sequences are used to evaluate the performance of the proposed algorithm. The size of the missing MB is 8 × 8 , and isolated block loss and consecutive block loss are considered. Fig.3(a) is the 92nd frame of Forman with 20.2% isolated blocks loss. The (b), (c), (d), (e), (f) of Fig.3 show the results of BMA [1], AVMV [2], MMV [3], MRF-MAP [9], and our proposed AMRF-MAP algorithms respectively. From Fig.3 (b) and (c), the recovered images by BMA and AVMV algorithm are not smooth and still have serious blocking artifacts. We can see that the proposed algorithm AMRF-MAP recovers the images with edges more successfully than the MMV and the MRF-MAP according to the comparison of Fig.3 (d), (e) and (f).

An Adaptive MRF-MAP Motion Vector Recovery Algorithm

(a) Corrupted frame

(d) MMV

(b) BMA

687

(e)MRF MAP

(f)AMRF MAP

Fig. 3. Visual quality comparison by different error concealment methods for Foreman sequence with 20.2% isolated blocks lost rate Table 1. Multi-frame average PSNR(dB) comparison for different video sequences by different methods with block lost rate 20.2% Video sequences Carphone

BMA

AVMV

MMV 30.2

MRFMAP 30.7

AMRFMAP 32.2

26.8

28.0

Foreman Claire Coastguard

27.2 31.1 21.8

28.7 35.3 28.5

30.3 37.5 29.1

28.9 39.1 28.6

30.9 39.4 29.8

Table 2. Multi-frame average CPUtime(s) comparison for different video sequences by different methods with block lost rate 20.2% Video sequences Carphone

BMA

AVMV

MMV 1.27

MRFMAP 1.85

AMRFMAP 2.33

2.86

0.63

Foreman Claire Coastguard

2.98 3.07 3.11

0.67 0.64 0.69

1.48 1.40 1.37

2.00 2.10 1.94

2.50 2.44 2.40

Total fifteen consecutive frames of the Video sequences are used to be simulated with 20.2% isolated MBs missing. The size of the damaged MB is 8×8. In Table 1, we provide the comparison of average PSNR of the recovered image by different of

688

Z.-f. Li, Z.-l. Xu, and D.-l. Zeng

methods. Table 2 is the average CPUtime comparison. From Table 1, it is observed that the proposed algorithm outperforms the other algorithms obviously, and the complexity of the proposed algorithm is moderate demonstrated by Table 2.

4 Conclusion Aim at to recover the damaged MBs, which belong to inter-coded model, an effective temporal EC is proposed in this paper. The motion vectors field is modeled as GMRF,and the weight is selected adaptively based on the spatial information and temporal information. The simulation results show that the proposed method outperforms the existing error concealment methods.

References 1. Lam, W. M., Reilbman, A. R., Liu, B.: Recovery of Lost or Erroneously Received Motion Vectors. IEEE Proceeding ICASSP, 5 (1993) 417-420 2. Sun, H., Challapali, K., Zdepski, J.: Error Concealment in Digital Simulcast AD-HDTV decoder. IEEE Trans. Consumer Electron., 38 (3) (1992) 108-116 3. Haskell, P., Messerschmitt, D.: Resynchronization of Motion Compensated Video Affected by ATM Cell Loss. Proceeding ICASSP’92, San Francisco, CA, 3 (1992) 545-548 4. Zhou, Z. H., Xie, S. L.: New Adaptive MRF-MAP Error Concealment of Video Sequences. Acta Electronica Sinica, 34 (4) (2006) 29-34 5. Zhou, Z. H., Xie, S. L.: Error Concealment Based on Robust Optical Flow. IEEE International Conference on Communications, Circuits and Systems, (2005) 547-550 6. Zhou, Z. H., Xie S. L.: Video Sequences Error Concealment Based on Texture Detection. International Conference on Control, Automation, Robotics and Vision, (2004) 1118-1122 7. Zhou Z. H., Xie S. L.: Selective Recovery of Motion Vectors in Error Concealment. Journal of South China University of Technology, 33 (7) (2005) 11-14 8. Ghanbari, S., Bober, M. Z.: A Cluster Based Method for the Recovery of the Lost Motion Vectors in Video Coding. International Workshop on Mobile and Wireless Communications Network, (2002) 583-586 9. Salama, P., Shroff, N. B., Delp, E. J.: Error Concealment in MPEG Video Streams over ATM Networks. IEEE J. Select. Areas Commun.,18 (2000)1129-1144 10. Xie, S. L., He, Z. S., Gao, Y.: Adaptive Theory of Signal Processing. 1st ed. Chinese Science Press, Beijing (2006) 255-262

An Efficient Segmentation Algorithm Based on Mathematical Morphology and Improved Watershed Ge Guo, Xijian Ping, Dongchuan Hu, and Juanqi Yang Information Science Department, Zhengzhou Information Science and Technology Institute, Zhengzhou, Henan 450002 Mailbox 1001, 837# [emailprotected]

Abstract. Image separation is a critical issue toward the recognition and analysis phase in many image processing tasks. This paper describes an efficient segmentation algorithm based on mathematical morphology and improved watershed which uses the immersion-based watershed transform applied to the fusion image of multi-scale morphological gradient and distance image to decrease the known drawback of watershed, oversegmentation, notably. Furthermore, oversegmentation is posteriorly reduced by a region merging strategy to obtain meaningful results. The presented segmentation technique is tested on a series of images and numerical validation of the results is provided, demonstrating the strength of the algorithm for image segmentation.

1 Introduction Image segmentation technology is one of the most popular subjects of considerable research activity over the last forty decades. Many separation algorithms have been elaborated and present and were extensively reviewed by Clarke et. al.[1]which said that fully-automated segmentation technology will still be a difficult task and fully automatic segmentation procedures that are far from satisfying in many realistic situations. Watershed transform is a popular division tool based on morphology and has been widely used in many fields of image segmentstion due to the advantages that it possesses: it is simple; by it a continuous thin watershed line can be found quickly; it can be parallelized and produces a complete separation which avoids the need for any kind of contour joining. However, some notalble drawbacks also exist which have been seriously affect its practicability. Among those the most important are oversegmentation resulted from its sensitivity to noise and poor detection of significant areas with low contrast boundaries. To solve above problems an image fusion method is presented where both geometry and intensity information are considered to get satisfied division. Furthermore an automatically merging method is proposed to reduce overseparated regions. The actual segmentation procedure consists out of three parts: 1) Image fusion of multi-scale morphological gradient and distance image; 2) Segmentation by the immersion simulation approach described by Vincent and Soille; 3) Reduction of the oversegmentation by small region merging method. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 689 – 695, 2006. © Springer-Verlag Berlin Heidelberg 2006

690

G. Guo et al.

2 Image Fusion The division result based on watershed transform depends much on the quality of the referenced image. The difficulty of watershed is to check if the objects and their background are marked by a minimum and if the crest lines outline the objects. If not, the transform for original image is needed so that the contours to be calculated correspond to watershed lines and the objects to catchment basins. Gang Lin[2]gives a gradient-weighted distance transform to get more suitable referenced image, however, problems arise when abundant noise exit or intensity contrast is low in boundary. A modified fusion method is proposed to improve such situation. 2.1 Multiscale Morphological Gradient Morphological gradient is defined as the arithmetic difference between dilation and erudition of structuring element B. It emphasizes pixel changes much more than other gradient operators. However, the main problem of morphological gradient is the selection of the structuring element size. So a multiscale morphological gradient is taken into account to enhance blurt edge combining the respective advantages of large structuring element and small structuring element[3]which is described as follows:

M( f ) =

1 n ¦{[( f ⊕ Bi ) − ( f ΘBi )]ΘBi −1} . n i =1

(1)

Where Bi ( 1 ≤ i ≤ n ) is a group of foursquare structure element with size of

(2i + 1) × (2i + 1) . The multi-scale morphological gradient is less sensitive to noise than traditional morphological gradient because the former adopts the average value of each scale. Besides, such gradient has a stronger ability of resisting the interaction between two connected contours. 2.2 Morphological Filter for Gradient Image

The purpose here is to smooth the gradient image in order to reduce oversegmentation due to noises while retaining the salient image edges. To this aim, the smoothing of gradient image is in urgent need. Morphological filters [4] composed of morphological opening and closing are proved to be attractive for this task. It possesses the property of simplifying an image by producing flat zones while preserving efficiently sharp edges due to the flat zones connectivity. For image f , the opening and closing operations are defined as follows

Morphological Opening : γ B ( f ) = δ B (ε B ( f )) . ® ¯ Morphological Closing : ϕ B ( f ) = ε B (δ B ( f ))

(2)

Where δ B , ε B are denoted as the dilation and erosion operation with structuring element B . An opening (closing) operation can only preserve (fill) the small structures that have the same shape as the structuring element. In order to preserve the useful

An Efficient Segmentation Algorithm

691

parts and fill the small holes as much as possible, it is needed to construct a series of opening and closing operation with different shape to suit different demand and the output of the filter group is adopted by:

° Γ ( f ) = Max{γ B1 ( f ), γ B2 ( f ) γ Bn ( f )} . ® °¯ Ψ ( f ) = Min{ϕ B1 ( f ), ϕ B2 ( f ) ϕ Bn ( f )}

(3)

Here Bi figures one structuring element. It is clearly that the more structuring elements taken, the more details (holes) will be reserved (filled). In considerations of noise removal and detail preserving abilities, following structuring elements in Figure 1 are taken into account.

Fig. 1. Different structuring elements

Figure2(a) is a rice image added salt noise which produces large numbers of minimums in gradient image which makes oversegmentation seriously after watershed segmentation (Figure2(b)). Figure2(c) shows the watershed result on multi-scale morphological gradient smoothed by the method described in 2.2 where the regions reduced a lot.

(a)

(b)

(c)

Fig. 2. (a)Rice image added noise;(b)Watershed on traditional gradient;(c)Watershed on multiscale morphological gradient after morphological filter

2.3 Distance Transform

To separate single object is to find the connected points of two different regions. Distance transform is an operation for binary image which transform the position information into intensity information by assigning to every point (to both those in objects as well as those in the background) the minimum distance from that particular point to the nearest point on the border of an object. In general Chamfer algorithm is used to approximate Euclidean distance and the detailed steps are as follows: 1) Original image binerization: To reduce computing time, considering the automaticity, we can adopt the conventional threshoding methods based on histogram. Here we choose the iterative algorithm proposed by Ridler and Calvard which possesses some good properties such as stability, speed and consistency.

692

G. Guo et al.

2) Region connection: Thresholding sometimes results in some small isolated objects due to the existence of dense or uneven distribution. To remove these artificial objects, a minor region removal algorithm based on region area is used. After thresholding all the connected components are identified and the sizes of all the isolated components are calculated. Then object smaller than a set threshold is considered to be an artificial region and its intensity is changed to be the same value with its biggest neighboring object. 3) Chafer distance transform: Choose the 5×5 mask (figure3) to realize Chamfer distance transform [5]. Two scans are taken orderly that the former one is left to right, top to bottom; and the back one is right to left, bottom to top. When the mask is moving, at each position, the sum of the local distance in each mask point and the value of the point it covers are computed, and the new value of the point corresponding to 0 is the minimum of these sums.

(a)

(b)

Fig. 3. (a) Forward pass template (b) Backward pass template

2.4 Fusion Method

Multi-scale grad reflects the intensity information which is very sensitive to noise and usually results in oversegmentation. Chamfer distance reflects the position information which is geometrical and is good at separating objects with regular shapes. If we can find suitable method combining above-mentioned transforms to represent pixels’ character, edge detection of watershed will certainly become easier. Let M be the multi-scale grad, D be the Chamfer distance and g be fusion result, the fusion formula is given by:

g (i, j ) = (max( g (2) (i, j )) − g (2) (i, j )) .

(4)

Where: g (1) (i, j ) = D(i, j )[(1 + a ) − a g (2) (i, j ) =

M (i, j ) − M min ]. M max − M min

255* g (1) (i, j ) . (1) g max

(5)

(6)

Equation (6) is utilized to void g (i, j ) to overstep 255. α is a gradient weight controlling factor that is determined experientially according to edge’s blurt degree. The fainter the edge is, the bigger α is, and when edge is stronger α become smaller. The fusion image represents two characters of one point including position information

An Efficient Segmentation Algorithm

693

and intensity information. And it is clear that g (i, j ) is lower when (i, j ) is close to the center of the object where gradient is lower nevertheless higher when pixel (i, j ) is close to boundary where gradient is lower And it is clear that g (i, j ) is lower when (i, j) is close to the center of the object where gradient is lower nevertheless higher when pixel (i, j) is close to boundary where gradient is lower.

3 Immersion-Based Watershed The fusion image of multi-scale gradient and distance image is considered as a topographic relief where the brightness value of each pixel corresponds to a physical elevation. Of all watershed transforms the immersion technique developed by Vincent and Soille [6]was shown to be the most efficient one in terms of edge detection accuracy and processing time. The operation of their technique can simply be described by figuring that holes are pierced in each local minimum of the topographic relief. In the end, the surface is slowly immersed into a ‘lake’, by that filling all the catchment basins, starting from the basin which is associated to the global minimum. As soon as two catchment basins tend to merge, a dam is built. The procedure results in a partitioning of the image in many catchment basins of which the borders define the watersheds.

4 Region Merging After the watershed segmentation algorithm has been carried out on the fusion image, oversegmentation can be nearly eliminated, but there still remain a small quantity of regions that could by merging yield a meaningful segmentation. In the next step, the partitioning is additional diminished by a properly region merging process which is done by merging neighboring regions having similar characteristics. Suppose i is the current region with size of Ri and k neighboring partitions recorded as R j ( j = 1, 2, k , j ≠ i ) . Let Li , j be the mean strength for the shared boundaries between two adjacent regions i and j . If j is one of i ' s neighboring regions, the adjudication function Pi , j used in this work is defined as: Pi , j =

Ri × R j Ri + R j

μi − μ j Li , j ( j = 1, 2 k ) .

(7)

It is clearly that the smaller Pi , j is, the similar the two regions are. The merging process starts by joining two regions with the smallest P value. During the merging process, all the information of the two regions such as area, mean intensity and so on is combined and the P value is updated. Then the merging process is continued by again merging two regions with the smallest P value and the process is stopped when all the adjudication functions of any two regions satisfy Pi , j > Threshold ( Threshold is a set threshold).

694

G. Guo et al.

5 Experiments and Discussion Evaluation of the proposed algorithms was carried out using a cell image of 500 × 375 pixels. Figure4 (a) shows the segmentation process of the cell image with complex background. Several strategies including multi-scale morphological gradient and morphological filter are taken in our algorithm to reduce noise in the algorithm introduced above which decreases the oversegmentation visibly. Figure4(c) is the reference image obtained by fusing multi-scale morphological gradient and distance image which brings out reasonable result as is shown in Figure4 (c). The final output after region merging step is shown in Figure4(d) where one meaningful divided region corresponds to one single cell.

(a)

(b)

(c)

(d)

Fig. 4. (a) Original cell image; (b) Fusion image; (c) Segmentation result by our method; (d) Final result after small region merging

As comparison, Figure5 shows the watershed result on morphological gradient images. It can be seen that better segmentation performance can be available from our method described above.

Fig. 5. Segmentation result on morphological gradient image

6 Conclusion In this paper, an improved watershed algorithm is introduced where a modified fusion method and region merging strategy are applied in turn to reduce oversegmentation. The proposed algorithm was tested on a series of images with different types. Results proved that our algorithm is much suitable and easily to segment objects with corre-

An Efficient Segmentation Algorithm

695

spondingly regular shapes. However, just as all the papers on segmentation said, no algorithm can fit for all types of images, and our method, when applied to objects with greadtly erose shapes the division may be not so satisfied.

References 1. Clarke L. P,Velthuizen R. P,Camacho M. A.: MRI Segmentation: Methods and Applications. Magnetic Resonance Imaging (1995) 343-368 2. Lin G., Umesh Adiga, Kathy Olson.: A Hybrid 3D Watershed Algorithm Incorporating Gradient Cues and Object Models for Automatic Segmentation of Nuclei in Confocal Image Stacks. Cytometry. (2003) 23-36 3. Lu, G. M., Li, S. H.: Multiscale Morphological Gradient Algorithm and Its Application in Image Segmentation. (2001) 37-40 4. Mal, Zhang, Y.: A Skeletionization Algorithm Based on EDM and Modified Retinal Model. Journal of Electronics(China). (2001) 272-276 5. Borgefors, G.: Distance Transformations in Digital Images. Comput. Vis. Graph. Image Process. (1986) 344-371 6. Vincet, L., Soille, P..: Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations. IEEE Trans. Pat. Anal. Machine Intell. (1991) 583-598

An Error Concealment Based on Inter-frame Information for Video Transmission* Youjun Xiang1, Zhengfang Li1, and Zhiliang Xu2,1 1 College

of Electronic & Information Engineering, South China University of Technology, Guangzhou 510641, China [emailprotected] 2 College of Physics and Electronic Communication Engineering, Jiangxi Normal University, Nanchang 330027, China

Abstract. Transmission of encoded video signals in error-prone environments usually leads to packet erasures which often results in a number of missing image blocks at the decoder. In this paper, an efficient error concealment algorithm for video transmission is proposed which based on the inter-frame information. The missing blocks are classified into low activity ones and high activity ones by using the motion vector information of the surrounding correctly received blocks. The low activity blocks are concealed by the simple average motion vector (AVMV) method. For the high activity blocks, several closed convex sets are defined, and the method of projections onto convex sets (POCS) is used to recover the missing blocks by combining frequency and spatial domain information. Experimental results show that the proposed algorithm achieves improved visual quality of the reconstructed frames with respect to other classical algorithms, as well as better PSNR results.

1 Introduction Following the development of the technical of multimedia, the demand of real-time video transmission is rapidly increasing now. When the video images are transmitted on the error-prone channel, the loss of one single bit often results in the loss of the whole block or several consecutive blocks, which seriously affects the visual quality of decoded images at the receiver. Error concealment (EC) technique is an attractive approach that just takes advantage of the spatial or temporal information that comes from the current frame or the neighboring frames to recover the corrupted areas of the decoded image. When the missing blocks belong to the inter-coded mode, they can be recovered by the temporal error concealment methods. The classical temporal error concealment methods are average motion vector (AVMV)[1] and boundary match algorithm (BMA)[2]. These methods have the advantage of low computational complexity. However, when the estimated motion vector of the missing block is *

The work is supported by the National Natural Science Foundation of China (60274006), the Natural Science Key Fund of Guang Dong Province, China (020826) and the National Natural Science Foundation of China for Excellent Youth (60325310).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 696 – 701, 2006. © Springer-Verlag Berlin Heidelberg 2006

An Error Concealment Based on Inter-frame Information for Video Transmission

697

unreliable, there will be serious blocking artifacts in the reconstructed image, which degrade the quality of the reconstructed image. In [3], [9-10], some effective error concealment algorithms are presented to successfully recovering the motion vectors and image blocks lost. Some new ideas based on adaptive MRF-MAP in [4-5] are also presented to address this problem. In order to overcome the deficiency of the AVMV algorithm, a combination temporal error concealment algorithm is proposed in this paper.

2 Restoration Algorithm Using Projections Method 2.1 Iterative Approach Based on the Theory of POCS The main inspiration below this approach has been the technique employed by Hirana and Totsuka [6] for removal of wires and scratches from still images. In order to restore the missing blocks in video images, we improve the technique. The first step of the algorithm consists in selecting a subimage, which is a neighborhood of the missing block (called repair subimage) and a same or similar subimge matched from neighboring frame (called sample subimage). Repair subimage provides a hint for about the local spatial information and sample subimage for the frequency information. Example of these subimages can be seen in Fig.1. r is the missing block, f is the repair subimage and s is the sample subimage. f and s have the same dimension.

(a) repair subimage

(b)sample subimage

Fig. 1. Selection of subimages

The second step is to formulate the desired properties in terms of convex constrains. To characterize such properties, the following constrains and projections are considered. 1) The first projection operation that we use

Pmin − DC ( f ) = IFFT ( Meiphase ( F ) )

(1)

where

min(| F (u , v ) |,| S (u , v) |) M(f ) = ® ¯ | F (0, 0) |

if (u , v) ≠ (0, 0) ½ , ¾ if (u , v ) = (0, 0) ¿

(2)

F = FFT ( f ) and S = FFT ( s ) , is a projection onto the underlying set

Cmin − DC = { f :| F (u , v) |≤| S (u , v) |, (u, v) ≠ (0, 0)} .

(3)

698

Y. Xiang, Z. Li, and Z. Xu

Generally, the observed signal can be modeled as a multiplication of the unknown signal by a time-limited binary window function. In the frequency domain, the convolution of the unknown signal spectrum with the window spectrum leads to a blurred and spread spectrum of the observed signal, in general an increased magnitude. In order to eliminate the influence of the known window spectrum, we use the sample-spectrum as a template for improving the repair-spectrum by correcting the spectrum magnitude. M defined in Eq.(4) is a kind of minimum taking operation on | F (u , v) | and | S (u , v) | . The only exception is at DC, (u , v) = (0, 0) where the value of | F (0, 0) | is retained. The motivation for not modifying the DC value of the repairspectrum is that it contains the value of the overall repair subimage intensity. While reshaping spectrum magnitude we leave the phase of the repair-spectrum untouched for automatic alignment of global features. 2) A constraint for continuity within the surroundings neighborhood of a restored block is imposed for smooth reconstruction of a damaged image. The projection onto the smooth constraint set is Psmooth ( f ) = Θ( f )

(4)

Θ( x) denotes the filtering operator applied to image x with median filter. 3) The third projection operator Pclip ( f ) imposes constrains on the range of

where

the restored pixel values. It operates in the spatial domain. The convex set corresponding to the clipping to the feasible range of [ smin , smax ] is

Cclip = { f : smin ≤ f ≤ smax , for

f ( k , l ) ∈ r} .

(5)

4) Since the foregoing operations affect even the pixels outside of the missing block r , these must now be corrected in spatial domain. This is done simply by copying the known pixel values around r from the original repair subimage. The convex set corresponding to know pixel replacement is

Creplace = { f : f (i, j ) = f 0 (i, j ), (i, j ) ∉ r} . The appropriate projection onto

(6)

Creplace is

Preplace ( f ) = f (1 − w) + f 0 w ,

(7)

where w is the binary mask which is 0 at missing pixel locations and 1 otherwise. Missing pixels are restored iteratively by alternately projecting onto the specified constraint sets. Thus the algorithm can be written as

f k +1 = Preplace ⋅ Pclip ⋅ Pmin − DC ⋅ Psmooth ⋅ f k ,

k = 0,1,

(8)

where k is the iteration index. The scheme is presented in Fig.2. 2.2 Proposed Method For the advantages of low computational complexity of AVMV and significantly good performance of POCS, a combination temporal error concealment algorithm is proposed in this paper.

An Error Concealment Based on Inter-frame Information for Video Transmission

Fig. 2. Scheme of the algorithm based POCS

699

Fig. 3. Flow Chart of the Proposed Algorithm

Fig.3 gives a flowchart of the algorithm. The missing blocks are classified into low activity blocks and high activity blocks by using the motion vector information of the surrounding correctly received blocks. For the missing blocks, which are low activity blocks, it can be concealed by the simple average motion vector (AVMV) method. For the missing high activity blocks, several closed convex sets are defined, and the method of projections onto convex sets (POCS) is used to recover the missing blocks by combining frequency and spatial domain information. While global features and large textures are captured in frequency domain, local continuity and sharpness are maintained in spatial domain. In the algorithm, we define block activity criterion as ___ 1 N 1 high activity block, for | vxi − vx | ≥ α or ° ¦ the block is N i N ® °low activity block, otherwise ¯

___

¦ | vy − vy | ≥ α i

(9)

where (vxi , vyi ) , i = 1, 2, , N is the motion vector of the surrounding correctly received blocks, (vx, vy ) is the average of these MVs and α is a predetermined values.

3 Simulation Results The video sequences “Carphone” and “Foreman” are used to evaluate the performance of the proposed algorithm. The size of the missing blocks is 8 × 8 or 16 × 16 , and isolated block loss and consecutive block loss are considered. Fig.4 (a) is the 52nd frame of “Carphone” with 20.2% isolated blocks loss and the size of missing block is 8 × 8 . The (b), (c), (d), (f) of Fig.4 show the results of MRF [7], SR [8], BMA [2], AVMV [1] and our proposed algorithms respectively. In Fig.4, it is noticed that the corrupted components of the edge of the car’s window are recovered more faithfully by our algorithm and SR algorithm than BMA [2] algorithm and AVMV [1] algorithm. There are still serious blocking artifacts in Fig 4(d) and Fig 4(e). There is obvious discontinuity between the recovered missing components and the undamaged blocks in Fig.4 (c). It is noticed that the proposed algorithm recovers the missing block in the eye’s region more faithfully than it does by MRF [7] algorithm and SR [8] algorithm.

700

Y. Xiang, Z. Li, and Z. Xu

We provide the comparison of PSNR of the recovered image by different algorithm in Table 1. If the missing blocks belong to the isolated situation and the size of missing blocks is 8 × 8 , the SR’s PSNR is higher than obtained by BMA and AVMV. From Table 1, it is observed that the proposed algorithm outperforms the other algorithms obviously at any blocks missing situation.

(a)Corrupted Frame

(d) BMA

(b) MRF

(e) AVMV

(f) Proposed Algorithm

Fig. 4 Recovered 52 nd frame of “Carphone” sequence with 20.2% isolated blocks missing

Table 1. Comparison of the PSNR in Different Situation PSNR (dB) Corrupted video sequences MRF Forman (92nd)

Carphone (52nd)

8 × 8 Discrete Missing (20% loss rate) 8 × 8 Consecutive Missing (20% loss rate) 16 ×16 Discrete Missing (20% loss rate) 16 ×16 Consecutive Missing (20% loss rate) 8 × 8 Discrete Missing (20% loss rate) 8 × 8 Consecutive Missing (20% loss rate) 16 ×16 Discrete Missing (20% loss rate) 16 ×16 Consecutive Missing (27% loss rate)

BMA

AVMV

Ours

29.1

29.5

27.5

28.8

30.0

23.0

22.8

26.5

27.4

27.7

25.7

25.3

27.5

31.5

32.1

20.3

22.8

27.3

31.8

33.2

25.0

28.7

27.2

26.0

33.2

21.2

20.8

26.6

27.6

29.7

23.9

24.5

27.4

30.5

32.7

20.8

21.4

28.3

32.6

32.3

An Error Concealment Based on Inter-frame Information for Video Transmission

701

4 Conclusions In this paper, an efficient error concealment algorithm for video transmission is proposed which based on the inter-frame information. In our approach, the AVMV and the POCS is combined to fully exploit advantages from each method. The missing blocks are classified into low activity blocks and high activity blocks by using the motion vector information of the surrounding correctly received blocks. For the low activity block, it can be concealed by the simple AVMV. For the high activity block, several closed convex sets ( Cmin − DC , Cclip , Csmooth and Creplace ) are defined, and the POCS is used to recover the missing block by combining frequency and spatial domain information, which solves the problem that the MVs estimated by AVMV is unreliable for areas with fast motion and object boundaries. Experimental results show that the proposed algorithm achieves improved visual quality of the reconstructed frames with respect to other classical error concealment algorithms, as well as better PSNR results.

References 1. Sun, H., Challapali, K., Zdepski, J.: Error Concealment in Digital Simulcast AD-HDTV Decoder. IEEE Trans. Consumer Electron., Vol.38, No.3 (1992)108-116 2. Lam, W.M., Reilbman, A.R., Liu, B.: Recovery of Lost or Erroneously Received Motion Vectors. Proceeding ICASSP, Vol. 5 (1993) 417-420 3. Zhou Z.H., Xie S.L.: Selective Recovery of Motion Vectors in Error Concealment. Journal of South China University of Technology. Vol.33, No.7 (2005) 11-14 4. Zhou Z.H., Xie S.L.: New adaptive MRF-MAP Error Concealment of Video Sequences. Acta Electronica Sinica, Vol.34, No.4 (2006) 29-34 5. Zhou Z.H., Xie S.L.: Error Concealment Based on Adaptive MRF-MAP Framework. Advances in Machine Learning and Cybernetics, Lecture Notes in Artificial Intelligence 3930, (2006) 1025-1032 6. Hirani, A., Totsuka, T.: Combining Frequency and Spatial Domain Information for Fast Interactive Image Noise Removal. Proceeding SIGGRAPH’96 Conf., (1996) 269-276 7. Shirani, S., Kossentini, F., Ward R.: A Concealment Method for Video Communications in an Errorprone Environment. IEEE J. Select. Areas. Commu., Vol.18, No.6 (2000) 1122-1128 8. Li, X., Michale, T.O.: Edge-directed Prediction for Lossless Compression of Natural Images. IEEE Trans. on Image Processing, Vol.10, No.6, (2001) 813-817 9. Zhou, Z.H., Xie, S.L.: Error Concealment Based on Robust Optical Flow. IEEE International Conference on Communications, Circuits and Systems, HongKong, (2005) 547-550 10. Xie, S.L., He, Z.S., Gao, Y.: Adaptive Theory of Signal Processing. 1st ed. Chinese Science Press, Beijing (2006) 255-262

An Integration of Topographic Scheme and Nonlinear Diﬀusion Filtering Scheme for Fingerprint Binarization Xuying Zhao, Yangsheng Wang, Zhongchao Shi, and Xiaolong Zheng Institute of Automation, Chinese Academy of Sciences NO.95 Zhongguancun East Road, Beijing, P.R. China {xuying.zhao, yangsheng.wang, zhongchao.shi, xiaolong.zheng}@ia.ac.cn

Abstract. This paper proposes an approach to ﬁngerprint binarization integrating nonlinear diﬀusion ﬁltering scheme and topographic scheme in which the properties of essential ﬂow-like patterns of ﬁngerprint are deliberately analyzed in diﬀerent points of view. The ﬁltering scheme is on the basis of the the coherent structures, while the topographic scheme is based on the analysis of the underlying 2D surface. The ﬁngerprint image is smoothed along the coherent structures and binarized according to the sign of the trace of the Hessian matrix. The integration method is tested with a series of experiments and the results reveal the good performance of our algorithm.

Introduction

Although several schemes have been proposed to extract features directly from the grey-level ﬁngerprint image [1,2], the process of extraction is generally intractable because of the noise generated by such factors as the presence of scars, variations of the pressure between the ﬁnger and acquisition sensor, worn artifacts, the environmental conditions during the acquisition process, and so forth. Therefore, an input gray-scale ﬁngerprint image is then transformed by the enhancement algorithm into a binary representation of the ridge pattern, called binary ridge-map image [3] to reduce the noise present in the image and detect the ﬁngerprint ridges. The ﬁngerprint image binarization that classiﬁes each pixel in ridge and valley regions heavily inﬂuences the performances of the feature extraction process and hence the performances of the overall system of automated ﬁngerprint identiﬁcation. The binary image obtained follows to be used further by subsequent processes in order to extract features such as detecting and classifying the minutiae point. Most of the proposed methods [4,5,6] of ﬁngerprint binarization require a global or local threshold to discriminate between ridge and valley regions in which the threshold is more or less arbitrarily chosen based on a restricted set of images. Wang has made an observation that if we consider the grey-scale image to be a surface, then its topographical features correspond to shape features of the original image. He investigated the properties of geometric features in the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 702–708, 2006. c Springer-Verlag Berlin Heidelberg 2006

An Integration of Topographic Scheme

703

context of OCR and give an analysis of the practicality and eﬀectiveness of using geometric features for text character recognition [7]. Similarly, Tico has proposed a method of ﬁngerprint binarization based on the topographic properties of the ﬁngerprint image [8]. In Tico’s scheme, the discrete image is treated as a noisy sampling of underlying continuous surface, and ridge and valley regions are discriminated by the sign of the maximum normal curvature of this surface. In fact, we observed that a point of ﬁngerprint can be classiﬁed as a ridge point or as a valley point by the property of the surface with no need to calculate the normal curvature. Also the assumption of continuous surface is often invalid with those ﬁngerprint images of poor quality. Consequently, ﬁngerprint image enhancement is the ﬁrst step in our recognition algorithm to reduce noises and increase the contrast between ridges and valleys in the gray-scale ﬁngerprint images. It’s essential the question how to enhance ﬂow-like patterns to improve the quality of ﬁngerprint without destroying for instances semantically important singularities like the minutiae in the ﬁngerprint. The problem has been addressed to have a multiscale simpliﬁcation of the original image by embedding it into a scale-space in order to obtain a subsequently coarser and more global impression of the main ﬂow-like structures. The idea of scale-space ﬁltering that derived from the multiscale description of images have also been introduced and well developed and widely used in computer vision [9,10,11,12,13,14]. As far as ﬁngerprint images are concerned, such a scale-space should take into account the coherence of the structures by smoothing mainly along their preferred orientation instead of perpendicular to it [14]. The technique of coherence-enhancing anisotropic diﬀusion ﬁltering combines ideas of nonlinear diﬀusion ﬁltering with orientation analysis by means of structure tensor. Weickert [15] also presented that the direction sensitivity constitutes an additional problem for the design of appropriate algorithms for diﬀusion ﬁltering that had not been addressed in the computer vision literature before. The diﬃculty can be handled by use of speciﬁc ﬁrst-order derivative ﬁlters that have been optimized with respect to rotation invariance [16]. In this paper, we ﬁrst present the approach for nonlinear diﬀusion ﬁltering with optimized rotation invariance in section 2. In section 3, we introduce our method of ﬁngerprint binarization based on the properties of geometric features. Experimental results for the integration of two schemes are given in section 4. Finally, we presents some concluding remarks in section 5.

Scheme for Filtering

The essential idea of the approach to scale-space ﬁltering can be brieﬂy described as follows. Embed the original image in a family of derived images u(x, y, t) obtained by convolving the original image u0 (x, y, t)) with a Gaussian kernel G(x, y; t) of variance t: u(x, y, t) = u0 (x, y, t) ∗ G(x, y; t) .

(1)

Larger values of t, the scale-space parameter, correspond to images at coarser resolutions.

704

X. Zhao et al.

The one parameter family of derived images may equivalently be viewed as the solution of the heat diﬀusion equation: ∂u(x,y,t) ∂t

= u(x, y, t) u(x, y, 0) = u0 (x, y) ∂u(x,y,t) =0 ∂en

(x, y) ∈ Ω, t > 0 , (x, y) ∈ Ω , (x, y) ∈ Ω, t > 0 .

(2)

Upon analyzing ﬂow-like patterns, numerous nonlinear diﬀusion ﬁlters have been proposed, most of which use a scalar diﬀusivity. Weickert surveyed underlying structure method for describing coherence in images and construct a coherence-enhancing diﬀusion which smooths along coherent ﬂow-like structures [14]. The approach of the nonlinear diﬀusion ﬁltering enables true anisotropic behaviour by adapting the diﬀusion process not only to the location, but also allowing diﬀerent smoothing in diﬀerent directions. 2.1

Nonlinear Diﬀusion Filtering

Denote the ﬁngerprint image as I, with each pixel I(x, y). The principle of nonlinear diﬀusion ﬁltering is as follows. We calculate a processed version u(x, y, t) of I(x, y) with a scale parameter t ≥ 0 as the solution of diﬀusion equation with I as initial condition and reﬂecting boundary conditions: ∂t u = div(D∇u) u(x, y, 0) = I(x, y) < D∇u, n >= 0

I(x, y) ∈ I, t > 0 , I(x, y) ∈ I, t = 0 , I(x, y) ∈ Γ, t > 0 .

(3)

Hereby, n denotes the outer normal and < ., . > the usual inner product, while Γ is the boundary of image I and D is the symmetric positive deﬁnite diﬀusion tensor. For the purpose of ﬁngerprint enhancement, we should choose the diﬀusion tensor D as a function of the local image structure, i.e. the structure tensor Jρ (∇uσ ), to adapt the diﬀusion process to the image itself. The structure tensor can be obtained by convolving the tensor product of the vector-valued structure descriptor ∇uσ with a Gaussian Kρ : " # j j Jρ (∇uσ ) = 11 12 = Kρ ∗ (∇uσ ⊗ ∇uσ ) , (4) j21 j22 where the parameter σ is called local scale, and the integration scale ρ reﬂects the characteristic size of the ﬁngerprint image. The symmetric matrix Jρ is positive semideﬁnite and possesses orthonormal eigenvectors w1 , w2 with ⎡ ⎤ 2j12 . /2 √ 2 2 ⎥ ⎢ +4j12 j22 −j11 ± (j11 −j22 )2 +4j12 ⎢ ⎥ √ (5) w1,2 = ⎢ 2 ⎥ 2 ± (j11 −j22 ) +4j12 ⎣ . j22 −j11√ ⎦ / j22 −j11 ±

2 (j11 −j22 )2 +4j12

2 +4j12

if j11 = j22 or j12 = 0. The corresponding eigenvalues are !

1 2 2 μ1,2 = j11 + j22 ± (j11 − j22 ) + 4j12 , 2

(6)

An Integration of Topographic Scheme

where the + sigh belongs to μ1 . The diﬀerence

2 μ1 − μ2 = (j11 − j22 )2 + 4j12

705

(7)

measures the coherence within a window of scale ρ that plays an important role for the construction of the diﬀusion ﬁlter. To adapt to the local structure, the diﬀusion tensor D should possess the same eigenvectors w1 , w2 as the structure tensor Jρ (∇uσ ). So it can be given by "

# " #" T # a b λ1 0 w1 = (w1 |w2 ) D(Jρ (∇uσ )) = . b c 0 λ2 w2T

(8)

The eigenvalues of D are chosen as: λ1 = α , α −β λ2 = α + (1 − α)e (μ1 −μ2 )2m

if μ1 = μ2 ,

(9)

else .

Herein, λ1 is given experientially by λ1 = α = 0.01 that deﬁnes the diﬀusion in the direction orthogonal to the ridge. λ2 is an increasing function in (μ1 − μ2 )2 with the restriction parameter β = 3, while m decides the speed of the diﬀusion process. 2.2

Filtering with Optimized Rotation Invariance

The ﬁrst derivative operator scribed as: ⎡ −3 0 1 ⎣ −10 0 Fx = 32 −3 0

with optimized rotation invariance [16] can be de⎤ 3 10 ⎦ 3

⎤ ⎡ 3 10 3 1 ⎣ 0 0 0 ⎦ . and Fy = 32 −3 −10 −3

(10)

It has been shown that they approximate rotation invariance signiﬁcantly better than related popular operators like the Sobel operator. Now we can calculate the structure tensor Jρ (∇uσ ) (4) using the optimized derivative operator Fx , Fy (10), and assemble the diﬀusion tensor D(Jρ (∇uσ )) (8) as a function of the structure tensor. Decompose and rewrite the divergence operator in (3) as j1 = a∂x u + b∂y u , j2 = b∂x u + c∂y u , (11) div(D∇u) = ∂x j1 + ∂y j2 . Thereby, the ﬂux components j1 , j2 and div(D∇u) are calculated respectively by means of the optimized derivative operator. Updating in an explicit way until to be stable or in a limited steps, we obtain the enhanced ﬁngerprint image as an input of binarization.

706

X. Zhao et al.

Binarization Based on Geometric Properties

An enhanced ﬁngerprint image can be approximately regarded as continuous two dimensional surface that is deﬁned by the equation z = u(x, y) mathematically. The geometric properties in a certain point (x, y) are determined by the gradient vector ∇u and the Hessian matrix H computed in that point. The gradient vector ∇u is oriented to the direction of maximum change in the value of the image, i.e. the two dimensional function u(x, y) that is physically the same as section 2 mentioned. The Hessian matrix is deﬁned in terms of second order partial derivatives: 2 2 H=

∂ u ∂x2 ∂2u ∂y∂x

∂ u ∂x∂y ∂2u ∂y 2

(12)

Let ω1 , ω2 be the unit eigenvectors of H, and λ1 , λ2 the corresponding eigenvalues with |λ1 | ≥ |λ2 |. λ1 , λ2 are real and ω1 , ω2 are orthogonal to each other because H is symmetric. H determines the normal curvature that is the value of second order derivative of u(x, y) on a given direction ω as follows: ∂2u = ω T Hω , (13) ∂ω 2 where the direction vector ω is expressed as a two dimensional column vector. Consequently, the second directional derivative is extremized along the two directions deﬁned by the Hessian eigenvalues ω1 and ω2 , and λ1 , λ2 are the corresponding extreme values of the normal curvature. The orthogonal directions ω1 , ω2 are also called principal directions whereas the normal curvatures λ1 , λ2 are also called principal curvatures of the surface. Detailed mathematical descriptions of various topographic properties of the two dimensional surfaces are described in [7] based on the concepts of gradient vector, Hessian eigenvectors and Hessian eigenvalues. In a ﬁngerprint image neighborhood ridges and valleys have the same orientation in most of the image area and the gray level in every transversal section exhibits a quite sinusoidal shape. So we can conclude that the direction of the maximum principal curvature is given by λ1 due to the relationship of λ1 and λ2 . Accordingly, the sign of λ1 can be used to discriminate between ridge and valley regions, i.e., a point in the ﬁngerprint is classiﬁed as a ridge point if λ1 is positive or as a valley point if λ1 is negative. The trace of the 2 × 2 square matrix H is deﬁned by T r(H) =

∂2u ∂2u + 2 . ∂x2 ∂y

(14)

The following is a useful property of the trace. T r(H) = λ1 + λ2 .

(15)

Considering the relation |λ1 | ≥ |λ2 |, it is not hard to verify that the sign of λ1 is equivalent to that of T r(H). Hence the ﬁngerprint image can be binarized in accordance to the sign of the trace of the Hessian.

An Integration of Topographic Scheme

707

Experimental Results and Analysis

The method proposed in this paper have been tested on the public domain collection of ﬁngerprint images, DB3 Set A in FVC2002 and on our own database V20. The former contains 800 ﬁngerprint images of size 300×300 pixels captured by capacitor sensor from 100 ﬁngers (eight impressions per ﬁnger). The latter consists of 4000 images from 200 ﬁngers of people in diﬀerent age and job categories. The ﬁngerprint images in our own database V20 are collected by using capacitive sensor of V eridicomT M at a resolution of 500 dpi and quantiﬁed into 256 gray levels. It is shown in Fig.1 the results obtained with our scheme on some typical images from DB3 Set A and V20. We can see that our method is able to connect interrupted ridges eﬀectively and eliminate most of burrs and smudges.

Fig. 1. Fingerprint binarization example. The shown images taken from left to right represent the original images, the enhanced images and binarized images obtained with our scheme.

Conclusions

In this paper, we introduce an approach to ﬁngerprint binarization integrating nonlinear diﬀusion ﬁltering scheme and topographic scheme that are all advanced on analyzing in the property of ﬂow-like patterns of ﬁngerprint images. A series of experiments validate our algorithm that takes advantage of both nonlinear

708

X. Zhao et al.

diﬀusion process and geometric features. Additionally, the ﬁngerprint enhancement can be iterated in an explicit way and be stopped in a very limited steps in most cases for that it is enough to discriminate ridge and valley regions by the sign of the trace of the Hessian. Therefore, the algorithm is computationally eﬃcient and can be applied on on-line ﬁngerprint veriﬁcation systems.

References 1. Maio, D., Maltoni, D.: Direct Gray-Scale Minutiae Detection in Fingerprints. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 1 (1997) 27-40 2. Jiang X., Yau W.: Detecting the Fingerprint Minutiae by Adaptive Tracing the Gray-level Ridge. Pattern Recognition (2001) 999-1023 3. Tico, M., Onnia, V., Kuosmanen, P.: Fingerprint Image Enhancement Based on Second Directional Derivative of the Digital Image. EURASIP Journal on Applied Signal Processing (2002) 1135-1144 4. Moayer, B., Fu, K.S.: A Tree System Approach for Fingerprint Pattern Recognition. IEEE Trans. On Pattern Analysis and Machine Intelligence (1986) 376-387 5. Wahab, A., Chin, S.H., Tan, E.C.: Novel Approach to Automated Fingerprint Recognition. IEE Proc. Vis. Image Signal Process (1998) 160-166 6. Nalini, K., Chen, S., Jain, K.: Adaptive Flow Orientation-Based Feature Extraction in Fingerprint Images. Pattern Recognition (1995) 1657-1672 7. Wang L., Pavlidis T.: Direct Gray-Scale Extraction of Features for Character Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15, No. 10 (1993) 1053-1067 8. Tico, M., Kuosmanen, P.: A Topographic Method for Fingerprint Segmentation. Proceedings of 1999 International Conference on Image Processing, ICIP99, Kobe, Japan (1999) 36-40 9. Babaud, J., Witkin, A., Baudin, M., et al.: Uniqueness of the Gaussian Kernel for Scale-Space Filtering. IEEE Trans. Pattern Anal. Machine Intelligence, Vol. 8 (1986) 309-320 10. Yuille, A., Poggio, T.: Scaling Theorems for zero Crossings. IEEE Trans. Anal. Machine Intelligence, Vol.8 (1986) 150-158 11. Koenderink, J.: The Structure of Images. Biological Cybernation, Vol.50 (1984) 363-370 12. Hummel, A.: Representations Based on Zero-crossings in Scale-Space. in Proc. IEEE Computer Vision and Pattern Recognition Conf (1987) 204-209 13. Weickert, J.: Multiscale Texture Enhancement. Computer Analysis of Images and Patterns, Lecture Notes in Computer Science, Vol.970 (1995) 230-237 14. Weickert, J.: Coherence-Enhancing Diﬀusion Filtering. International Journal of computer Vision, Vol.31, No.2/3 (1999) 111-127 15. Weickert, J., Scharr H.: A Scheme for Coherence-Enhancing Diﬀusion Filtering with Optimized Rotation Invariance. Journal of Visual Communication and Image Representation, Vol. 13 (2002) 103-118 16. Jahne, B., Scharr, H., Korkel S.: Principles of ﬁlter Design. Handbook on Computer Vision, Vol.2: Signal Processing and Pattern Recognition, Academic Press, San Diego (1999) 125-152

An Intrusion Detection Model Based on the Maximum Likelihood Short System Call Sequence Chunfu Jia1, 2 and Anming Zhong1 2

1 College of Information Technology and Science, Nankai University, Tianjin 300071 College of Information Science and Technology, University of Sciences and Technology of China, Hefei 230026, China [emailprotected], [emailprotected]

Abstract. The problem of intrusion detection based on sequences of system calls is studied. Using Markov model to describe the transition rule of system calls of a process, an intrusion detection model based on the maximum likelihood short system call sequence is proposed. During the training phase, the Viterbi algorithm is used to obtain the maximum likelihood short system call sequence, which forms the normal profile database of a process, during the detecting phase, the system call sequence generated by a process is compared with the maximum likelihood sequence in its normal profile database to detect the intrusions. Experiments reveal good detection performance and quick computation speed of this model.

1 Introduction With the rapid develop of computer network, Intrusion Detection System (IDS) draws more and more attention from researchers, and begins to take a critical role in many real systems. The first problem an intrusion detection system (IDS) faced is the selection of source data. Previous researchers reveal that the sequence of system calls can reflect the essential action characteristics of a process and can be used as a type of effective source data. Researches by Forrest [1] and Kosoresow [2] show that the short sequences of system calls generated by a process at a certain length are stable and reliable. So the behaviour pattern of a process can be described by its short sequence of system calls. In the sequence time delay embedding (STIDE [1]) model, profile of normal behaviour is built by enumerating all unique, contiguous sequences of a predetermined, fixed length T in the training data. The Markov Model was first introduced into the field of IDS by Denning [3]. Ye [4] used Markov Model to calculate the occurrence probability of a certain short system call sequence. If the probability is smaller than a threshold, anomalies are assumed to occur in the current process. HMM (Hidden Markov Model) is also used to detect intrusions in the same manner. Zhong [5] studied the performance of HMM through experiments and concluded that the first-order HMM has better intrusion detection performance than that of the second-order HMM. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 709 – 714, 2006. © Springer-Verlag Berlin Heidelberg 2006

710

C. Jia and A. Zhong

Based on the Markov Model of Ye [4] and the short system call sequence model of Forrest [1], we present a new intrusion detection model using maximum likelihood short system call sequence. In this model, Markov chain is used to describe the transition rule of system calls. There are two phases involved in our model, the training phase and the detecting phase. During the training phase, the Viterbi algorithm is used to obtain the maximum likelihood short system call sequences, from which the normal profile database of a process is built. During the detecting phase, the system call sequences generated by a process are compared with the maximum likelihood sequences in its normal database, and then the difference is used to judge wether the current process is normal. The following parts of this paper are organized as below: part 2 introduces the model and main algorithms, part 3 introduces our experiments on this model, in part 4, the features of our model are discussed and some analyses on the experiment results are presented.

2 The Model and Main Algorithms We use Markov chain to describe the transition rule between system calls. The Markov chain is defined as: λ = ( S , π , A, N ) , (1) where S = {1, 2, , N } — set of states, each system call corresponds with a state;

π = (π i ) N , π i = P ( s1 = i , i ∈ S ) —distribution vector of the initial states; A = (aij ) N × N , aij = P( st +1 = i | st = j ) , ( i, j ∈ S ) — transition probability matrix; N =| S | — the number of states. Our model has five modules: System Call Collecting Module, Pre-processing Module, Markov Training Module, Markov Testing Module and Outputting Module. • System Call Collecting Module: Collects system calls generated by a process to be used as the data source of intrusion detection. This module can be implemented by different technologies on different operating system platform. For example, in Solaris BSM can be used and in Linux, LKM can be used. • Pre-processing Module: Constructs the state set of the Markov chain based on the system calls. Research by Ye [4] shows that taking every system call as a state of the Markov Chain can obtain good detecting performance. But in this way we would get too many states. Matthias [6] reported that the unpopular system calls are valuable for intrusion detection, so we can not simply drop out those unpopular system calls. In this model, we can construct the Markov state set as followings: 1) Scan through the system call sequence and count the occurrence of every system call. For each system call s, compute its frequency P(s) in the system call sequence. 2) Sort the system calls in descending order according to their frequency and give a serial number to each system call. 3) Compute the least integer N so that ΣiN=1−1 P(i ) ≥ c (where i is the serial number of a system call, and c is preset probability value near 1, such as 0.99), then take every system call whose number is between 1 and N-1 as a state of the Markov

An IDS Based on the Maximum Likelihood Short System Call Sequence

711

chain, and take all the other system calls (i.e., the unpopular system calls) as one state of the Markov chain. The Markov chain with N states is so constructed. In the above steps we do not discriminate different unpopular system calls, only treat them as one state of the Markov chain, so the state number of the Markov chain and the computation cost are both reduced. After pre-processing, the sequence of system calls is converted to the sequence of Markov states which can be denoted as s1 s2 … st …. • Markov Training Module: Responsible for the establishment of normal rule database for each process. The normal rule database is composed of the maximum likelihood short state sequence. This module works as followings: 1) Compute the transition matrix A = (aij)N×N of the Markov chain: aij = nij ni ,

(2)

where aij — the transition probability from state i to state j; nij — the number of observation pairs st and st+1 with st in state i and st+1 in state j; ni — the number of observation pairs st and st+1 with st in state i and st+1 in any one of states 1, 2, …, N. 2) Use Viterbi algorithm to calculate the maximum likelihood short state sequence for each process. The maximum likelihood sequence starting with state s at length T can be denoted as Os = s, s2 , , sT . Based on the property of Markov chain, we have P(Os | λ ) = P( s, s2 , , sT | λ ) = ass2 as2 s3 asT −1sT .

(3)

3) Since each ast −1st term in (3) is less than 1 (generally significantly less than 1), it can be seen that as T starts to get bigger (e.g., 10 or more), P(Os | λ ) starts to head exponentially to zero. For sufficiently large T (e.g., 20 or more) the dynamic range of the P(Os | λ ) computation will exceed the precision range of essentially any machine. A reasonable way of performing the computation is by logarithm. By defining

(

)

T ª º U ( s, s2 , sT ) = − « ln ass2 + ¦ ln ast −1st » , t =3 ¬ ¼

(4)

P(Os | λ ) = exp ( −U ( s, s2 , , sT ) ) .

(5)

we can get

The maximum likelihood state sequence (starting with state s at length T) Os = s, s2 , , sT should satisfy the following equation: Os = max P( s, s2 , , sT | λ ) = min U ( s, s2 , , sT ) , T T { st }t =2

{ st }t =2

(6)

We define ωij (the weight from state i to state j) as ωij = − ln(aij ) . Then the problem of finding maximum likelihood state sequence is converted to the problem of finding the shortest state path through a directed weighted graph, and can be solved by the Viterbi algorithm. To discuss the Viterbi algorithm in detail, we should introduce two parameter δ t ( j ) and ψ t ( j ) ( j ∈ N ) , where

712

C. Jia and A. Zhong

δ t ( j ) = max P( s, s2 , s3 , , st −1 , st = j | λ ) , T { st }t = 2

(7)

i.e., δ t ( j ) is the best score (least accumulative weight) along a single path, at time t, which accounts for the first t states and ends in state j. By induction we have

δ t ( j ) = min ( δ t −1 (i ) + ωij ) .

(8)

1≤ i ≤ N

To actually retrieve the state sequence, we need to keep track of the argument which minimized (8), for each t and j. We do this via the array ψ t ( j ) . The complete procedure for finding the maximum likelihood state sequence can now be stated in pseudo code as follows: for each s ∈ S { for (i = 1; i sup(Rj) or 3. CO(Ri)=CO(Rj) and sup(Ri)=sup(Rj) but length(Ri )>length(Rj) 4. CO(Ri)=CO(Rj), sup(Ri)=sup(Rj), length(Ri ) > length(Rj) but Ri is generated earlier than Rj. A rule R1: {t C} is said a general rule w.r.t. rule R2: {t' C'}, if only if t is a subset of t'. Given two rules R1 and R2, where R1 is a general rule w.r.t R2, we prune R2 if R1 also has higher rank than R2.

4 Experiments and Results Coronary arteriography is performed in patients with angina pectoris, unstable angina, previous myocardial infarction, or other evidence of myocardial ischemia. Patients with stenosis of the luminal narrowing greater then 50% were recruited as the CAD group, the others were classified as the control group(normal). By using angiography, 390 patients with abnormal(CAD) and 280 patients with normal coronary arteries(Control) were studied. The accuracy was obtained by using the methodology of stratified 10-fold Table 5. Description of summary results Classifier Naïve Bayes C4.5 CBA CMAR Our Model

Precision 0.814 0.659 0.88 0.882 0.921 0.935 0.945 0.889 0.959 0.938

Recall 0.576 0.862 0.889 0.872 0.939 0.915 0.896 0.941 0.939 0.957

F-Measure 0.675 0.747 0.884 0.877 0.93 0.925 0.92 0.914 0.949 0.947

Class CAD Control CAD Control CAD Control CAD Control CAD Control

Root Mean Squared Error 0.4825 0.334 0.2532 0.2788 0.2276

726

K. Noh et al.

cross-validation. We compare our classifier with NB[10] and state-of-art classifiers; the widely known decision tree induction C4.5[11]; an association-based classifier CBA[11, 14]; and CMAR[13], a recently proposed classifier extending NB using long itemsets. The result is shown on Table 5. We used precision, recall, f-measure and root mean square error to evaluate the performance. The result is shown Table 5. As can be seen from the table, our classifier outperforms NB, C4.5, CBA and CMAR. We also satisfied these experiments because our model showed more accurate than Bayesian classifier and decision tree that make the assumption of conditional independence.

5 Conclusions Most of the parameters employed in diagnosing diseases have both strong and weak points existing simultaneously. Therefore, it is important to provide multi-parametric indices diagnosing these diseases in order to enhance the reliability of the diagnosis. The purpose of this paper is to develop an accurate and efficient classification algorithm to automatically diagnose cardiovascular disease. To achieve this purpose, we have introduced an associative classifier that is further extended from CMAR by using a cohesion measure to prune redundant rules. With this technique, we can extract new multi-parametric features that are then used together with clinical information to diagnose cardiovascular disease. The accuracy and efficiency of the experimental results obtained by our classifier are rather high. In conclusion, our proposed classifier outperforms other classifiers, such as NB, C4.5, CBA and CMAR in regard to accuracy.

References 1. 2. 3. 4. 5.

6. 7.

8. 9. 10. 11.

Cohen.: Biomedical Signal Processing. CRC press, Boca Raton, FL (1988) Conumel, P., ECG: Past and Future. Annals NY Academy of Sciences, Vol.601 (1990) J. Pan: A Real-time QRS Detection Algorithm. IEEE Trans. Eng. 32 (1985) 230-236 Taddei, G., Constantino, Silipo, R.: A System for the Detection of Ischemic Episodes in Ambulatory ECG. Computers in Cardiology, IEEE Comput. Soc. Press, (1995) 705-708 Meste, H., Rix, P., Caminal.: Ventricular Late Potentials Characterization in Timefrequency Domain by Means of a Wavelet Transform. IEEE Trans. Biomed. Eng. 41 (1994) 625-634 Thakor, N. V., Yi-Sheng, Z.: Applications of Adaptive Filtering to ECG Analysis: Noise Cancellation and Arrhythmia Detection. IEEE Trans. Biomed. Eng. 38 (1991) 785-794 Kuo, D., Chen, G. Y.: Comparison of Three Recumbent Position on Vagal and Sympathetic Modulation using Spectral Heart Rate Variability in Patients with Coronary Artery Disease. American Journal of Cardiology, 81 (1998) 392-396 Guzzetti, S., Magatelli, R., Borroni, E.: Heart Rate Variability in Chronic Heart Failure. American Neuroscience; Basic and Clmical, 90 (2001)102-105 Duda, R., Hart, P.: Pattern Classification and Scene Analysis. John Wiley, New York, (1973) Quinlan, J., C4.5: Programs for Machine Learning, Morgan Kaufmann. San Mateo, (1993) Liu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In Proc. of the 4th International Conference Knowledge Discovery and Data Mining, (1998)

Associative Classification Approach for Diagnosing Cardiovascular Disease

727

12. Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In SIGMOD'00, Dallas, TX, (2000) 13. Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Association Rules. In Proc. of 2001 International Conference on Data Mining, (2001) 14. Jin Suk Kim, Hohn Gyu Lee, Sungbo Seo, Keun Ho Ryu: CTAR: Classification Based on Temporal Class-Association Rules for Intrusion Detection. In Proc, of the 4th International Workshop on Information Security Applications, (2003) 101-113

Attentive Person Selection for Human-Robot Interaction Diane Rurangirwa Uwamahoro1, Mun-Ho Jeong1 , Bum-Jae You1 , Jong-Eun Ha2 , and Dong-Joong Kang3 1

Korea Institute of Science and Technology, {diana, mhjeong, ybj}@kist.re.kr 2 Seoul National University of Technology [emailprotected] 3 Pusan National University [emailprotected]

Abstract. We present a method that enables the robot to select the most attentive person into communication from multiple persons, and gives its attention to the selected person. Our approach is a common components-based HMM where all HMM states share same components. Common components are probabilistic density functions of interaction distance and people’s head direction toward the robot. In order to cope with the fact that the number of people in the robot’s ﬁeld of view is changeable, the number of states with common components can increase and decrease in our proposed model. In the experiments we used a humanoid robot with a binocular stereo camera. The robot considers people in its ﬁeld of view at a given time and automatically shifts its attention to the person with highest probability. We conﬁrmed that the proposed system works well in the selection of the attentive person to communicate with the robot. Keywords: Common components, Hidden Markov Model, attention.

Introduction

Human attention is an essential for HRI (Human-Robot Interaction) as well as HCI (Human-Computer Interaction) since attention sensed in many ways such as audio signal processing and computer vision technologies expresses human intention to communicate with robots or computers. The selection of attentive person might not be of little importance in single person-to-robot interaction, however, it becomes critical to multiple-person-to-robot interaction since it is the ﬁrst step for interaction or communication. Human attention has been studied in the area of HCI. Ted Selker applied visual attentiveness from eye-motion and eye-gesture to drive some interfaces [1,2]. C. J. Lee et al. designed the Attention Meter, a vision-based input toolkit, which measures attention using camera based input and was applied to an interactive karaoke space and an interactive night-market [3]. Their primary concerns are D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 728–734, 2006. c Springer-Verlag Berlin Heidelberg 2006

Attentive Person Selection for Human-Robot Interaction

729

how to measure attentiveness or how to apply it to interactive user interfaces, but not how to select an attentive person for interaction. There have been some studies to deal with human attention for multiple person-to-robot interaction. Tasaki et al. proposed the method that enables robots to communicate with multiple people using the selection priority of the interactive partner based on the concept of proxemics [4]. Lang et al. presented the attention system for a mobile robot that enables the robot to shift its attention to the person of interest and to maintain attention during interaction [5]. Both the two systems, with regard to the attentive person selection they just used simple rules like digital gates combining sound localization with people detection. With their ways treating locations of sound and people, they should fail in obtaining the attentiveness with continuousness and uncertainties related to the clues such as how far people are from the robot and who is talking. The diﬃculties in the assessment of attention and the selection of the attentive person result from the followings: ﬁrst, we should choose measurable features for the assessment of attentiveness adequate for HRI. Second, it is diﬃcult to measure the features due to environmental noises. Third, we should consider history of features and uncertainty of features. Lastly, the selection of attentive person for interaction depends not only on the above features but also on the interaction strategies and complex atmosphere of conversation like human-tohuman interaction. In this paper we present a method to select the most attentive person for multiple-person-to-robot interaction. To express the continuousness and uncertainty of the attentive clues we use the probabilistic density functions called as common components, which represent human attentiveness related to head direction, distance to the robot, sound location, body motion, voice recognition, and etc. The robot is in a dynamic environment where the number of people in the ﬁeld of view of the robot is variable and they take actions to communicate with the robot over time. To model variable presence of people and the most attentive person selection, we suggest a common-components based HMM (Hidden Markov Model) incorporating common components into a HMM framework. In this model a person corresponds to a state that could vanish and appear. Therefore, the optimal state with the maximum likelihood at certain time represents the most attentive person with the highest intention to communicate with the robot. The remainder of this paper is organized as follows. In the following section we explain the approach of common components-based HMM proposed in this paper. In section 3 we give application of common components-based HMM to the most attentive person selection. Section 4 concludes this paper.

Common Components-Based HMM

Hidden Markov Model (HMM) is a stochastic process, in which a time series is generated and analyzed by a probability model [6]. HMM and its modiﬁed types

730

D.R. Uwamahoro et al.

Fig. 1. Common Component

Fig. 2. Complement of Common Component

have been widely used in the areas of HRI such as gesture recognition [7,8,9], face recognition [10] and speech recognition [11]. Conventional HMM has the ﬁxed number of states representing probabilistic distributions and the transition probabilities between the states. This property well expresses complex distributions of time series, but sets limits to such application that the number of states changes over time. The common componentsbased HMM proposed in this section can overcome the limitation since it utilizes an appointed rule forming states with probabilistic components shared by all states. 2.1

State with Common Components

Common components, f (ck ), are probability density functions of observations and compose basis functions of the states as follows: p(x) = f (c1 )f (c2 ) . . . f (cK ), x = (c1 , c2 , . . . , cK )T

(1)

where K is the size of measurements x. f (ck ) is illustrated in Fig 1 and should satisfy the condition of probability distribution function (pdf) mathematically expressed as: b f (ck ) = 1, 1 ≤ k ≤ K (2) a

The section between a and b represents the sensing scope. We also deﬁne the complement of the basis function of the states, p(x) = f (c1 )f (c2 ) . . . f (cK ).

(3)

f (ck ) is shown as Fig. 2 where M is constant and f (ck ) obtained by f (ck ) = M − f (ck ),

f (ck ) = 1

(4)

Based on the basis function of the state, we deﬁne the probabilistic distributions of the states by

Attentive Person Selection for Human-Robot Interaction

731

P (ot |qt = s1 ) = p(xs1 )p(xs2 )p(xs3 ) . . . p(xsNt ), P (ot |qt = s2 ) = p(xs1 )p(xs2 )p(xs3 ) . . . p(xsNt ), .. . P (ot |qt = sNt ) = p(xs1 )p(xs2 ) . . . p(xsNt−1 )p(xsNt ).

(5)

where qt {s1 s2 . . . sNt } is a state variable, ot = (xs1 xs2 . . . xsNt )T is the observation vector and Nt is a variable of the number of states. Equations 1 and 5 show good scalability of the states based on the common components. When the number of states is changed, the probabilistic distributions of states are easily updated by the rule shown as equation 5 noting that the observation vector does not have a ﬁxed size due to Nt . 2.2

Optimal State Selection

We deﬁned the states of the common components-based HMM in the previous section. The other necessary elements for constituting the common componentsbased HMM are similar to the typical ones of HMM such as the initial probabilities of states πs and the state transition probabilities Asi sj . A little diﬀerence from the conventional HMM is caused by the fact that those elements should be updated according to changes in the number of states. The Viterbi algorithm is used to ﬁnd the most probable sequence of hidden states maximizing P (Qt , Ot ) where Qt = {q1 q2 . . . qt } , Ot = {o1 ot . . . ot }. Fortunately, the Viterbi algorithm can be extended to the proposed model by the use of Nt instead of the ﬁxed number of states N in the conventional HMM.

Application to Most Attentive Person Selection

The proposed method was implemented in a humanoid robot, MAHRU [12]with binocular stereo vision camera system to capture images. The robot has to select one person among many people in its ﬁeld of view and gives attention to the selected person. The number of people in the ﬁeld of view of the robot (participants) varies over time. Experiments were conducted to ascertain the functions of the proposed method. The states of the common components-based HMM at a particular time correspond to participants at that time. We deﬁned the common components of distance between the robot and a person and head direction of a person using Gaussian distribution as follows: (s) f (ck )

1 = √ exp 2πσk

−(c

(s) − μk )2 k 2σ2 k

(6)

where k = 1 for the distance, k = 2 for the head direction and is a constant value for fulﬁllment of 2. The measurements vector of each sate is deﬁned as (s)

(s)

xs = (c1 , c2 )

(7)

732

D.R. Uwamahoro et al.

(a) Step 1

(b) Step 2

Fig. 3. Most Attentive Person Without Changes in the Number of People . Step 1 : There are two participants; Person A (left) and person B (right). They are at the same distance from the robot’s location, A is looking at the robot and B is not looking at the robot. A is selected (the circle within the face shows the selected person). Step 2 : B came near the robot looking at the robot, A stayed at the same location as in Step 1 and is looking at the robot. B is selected. Step 3 : A came near the robot with a very short distance less than 0.5 meters between him and the robot, B went back a little bit and continued to look at the robot. B is selected.

The observation vector at time t noted by ot groups the measurements of each state at time t, (s )

(s )

)

ot = (xs1 , . . . , xsNt ) = (c1 1 , c2 1 , . . . , c1 Nt , c2 Nt ).

(8)

The dimension of the observation vector at time t depends on the size of states at that time multiplied by the number of measurements. In this case it is given by: dim ot = K ∗ Nt = 2Nt The initial probabilities of states are deﬁned as πsi =

1 , N0

s1 ≤ si ≤ sN0 , N0 = 0

(9)

The transition probabilities between the states are set to Asi si = 0.95, Asi sj =

0.05 , i = j, Nt − 1

s1 ≤ s i , sj ≤ s N t

(10)

Using the Open Source Computer Vision Library (OpenCV) face detector [13], we detect human faces in the captured images and calculate their head direction. The distance from the robots location is calculated by estimating the average depth within the regions of the detected faces. Figure 3 shows the experimental result for the most attentive person in the case that the number of participants is ﬁxed. The results expresses that the proposed method well describes the human attentiveness considering continuousness and uncertainty of such measurements as distance and direction. As said, the proposed common components-based HMM allows the state size to vary to cope with the change in the number of participants. In Fig. 4 we can conﬁrm that.

Attentive Person Selection for Human-Robot Interaction

(a) Step 1

(b) Step 2

(d) Step 4

(e) Step 5

(f) Step 6

733

Fig. 4. Most Attentive Person Without Changes in the Number of People. Step 1 : A (left), B (centre) and C (right). A is not looking straight at the robot . B is looking straight at the robot and the distance between him and the robot is closer to 1.5 meter the best distance - than the distance of A and B from the robot respectively. C is not looking at the robot. B is selected (the circle within the face shows the selected person). Step 2 : A came close to the robot and looking straight at the robot. B goes back and continues to look straight at the robot. C stayed at the same location and is looking at the robot. A is selected. Step 3 : There are three people; A, B and C. As for the robot, it can see only one face. Face A can not be recognized as face because it is partially seen by the robot, and B head direction is higher than 450 , in that case the robot can not recognize it as a face. C is the only participant and is selected. Step 4 : There are three participants; A, C and D (new participant at the right). Both are looking at the robot straight and they are at diﬀerent distance from the robot. C is at the distance closer to 1.5 meter than A and D. C is selected. Step 5 : There are four participants; A, B, C and D. C and D are looking straight at the robot however they are at diﬀerent distances from the robot. D is at a better distance than others. D is selected. Step 6 : Three participants stay; A, B and D. D moved close to the robot. Both A and B maintained their location from the previous step and are looking straight at the robot. D is selected.

Conclusion and Further Work

In this paper, we have presented a common components-based HMM to select the most attentive person for multiple person-to-robot interaction. The use of common components and an appointed rule forming states made it possible to overcome the limitation of HMM due to the ﬁxed state size of HMM. And we found that the Viterbi algorithm still could be feasible to the case that the size of states is variable. We have implemented the proposed method in a humanoid robot, MAHRU [12] to enable the robot to select one person into communication among many participants. While the participants move in/out and change their head directions at same time, the robot showed successful results shifting its attention to the participant with the highest intention to communicate.

734

D.R. Uwamahoro et al.

There are two main works for the future. The ﬁrst one is to estimate the parameters of the common components-based HMM by learning and show the eﬀectiveness of the rule forming the states with common components theoretically. In order to make the most use of the scalability of the states based on the common components, the other one for the future is to incorporating sound localization and body motion into common components.

References 1. Selker, T., Snell, J.: The Use of Human Attention to Drive Attentive Interfaces. Invited paper CACM (2003) 2. Selker, T.: Visual Attentive Interfaces. BT Technology Journal. Vol.22, No.4 (2004)146–150 3. Lee, C.H.J., Jang, C.Y.I., Chen, T.H.D., Wetzel, J., Shen, Y.T.B., Selker, T.: Attention Meter: A Vision-based Input Toolkit for Interaction Designers. CHI 2006 (2006) 4. Tasaki, T., Matsumoto, S., Ohba, H., Toda, M., Komatani, K., Ogata, T., Okuno, H.G.: Dynamic Communication of Humanoid Robot with Multiple People Based on Interaction Distance. Proc. of 2nd International Workshop on Man-Machine Symbiotic Systems (2004) 329–339 5. Lang, S., Kleinehagenbrock, M., Hohenner, S., Fritsch, J., Fink, G.A., Sagerer, G.: Providing the Basis for Human-Robot-Interaction: A Multi-modal Attention System for a Mobile Robot. International Conference on Multimodal Interfaces (2003) 28–35 6. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE, Vol.77 (1989) 257–286 7. Jeong, M.H., Kuno, Y., Shimada, N., Shirai, Y.: Recognition of Shape-Changing Hand Gestures. IEICE Tran. on Information and Systems, Vol. E85-D (2002) 1678–1687 8. Jeong, M.H., Kuno, Y., Shimada, N., Shirai, Y.: Recognition of Two-Hand Gestures Using Coupled Switching Linear Model. IEICE Tran. on Information and Systems, Vol. E86-D (2002) 1416–1425 9. Starner, T., Pentland, A.: Real Time American Sign Language Recognition from Video using Hidden Markov Models. Technical report 375, MIT Media Lab (1996) 10. Othman, H., Aboulnasr, T.: A Tied-Mixture 2D HMM Face Recognition System. Proc. 16th International Conference of Pattern Recognition (ICPR’02), Vol. 2 (2002) 453–456 11. Peinado, A., Segura, J., Rubio, A., Sanchez, V., Garcia, P.: Use of Multiple Vector Quantisation for Semicontinuous-HMM Speech Recognition. Vision, Image and Signal Processing, IEE Proceedings, Vol. 141 (1994) 391–396 12. MAHRU: http://humanoid.kist.re.kr (2006) 13. Intel, C.: Open Source Computer Vision (OpenCV) Library. http://www.intel. com/technology/computing/opencv (Retrieved October 2005)

Basal Cell Carcinoma Detection by Classiﬁcation of Confocal Raman Spectra Seong-Joon Baek and Aaron Park The School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea, 500-757 [emailprotected]

Abstract. In this study, we propose a simple preprocessing method for classiﬁcation of basal cell carcinoma (BCC), which is one of the most common skin cancer. The preprocessing step consists of data clipping with a half hanning window and dimension reduction with principal components analysis (PCA). The application of the half hanning window deemphasizes the peak near 1650cm−1 and improves classiﬁcation performance by lowering the false positive ratio. Classiﬁcation results with various classiﬁers are presented to show the eﬀectiveness of the proposed method. The classiﬁers include maximum a posteriori (MAP) probability, k-nearest neighbor (KNN), and artiﬁcial neural network (ANN) classiﬁer. Classiﬁcation results with ANN involving 216 confocal Raman spectra preprocessed with the proposed method gave 97.3% sensitivity, which is very promising results for automatic BCC detection.

Introduction

Skin cancer is one of the most common cancers in the world. Recently, the incidence of skin cancer has dramatically increased due to the excessive exposure of skin to UV radiation caused by ozone layer depletion, environmental contamination, and so on. If detected early, skin cancer has a cure rate of 100%. Unfortunately, early detection is diﬃcult because diagnosis is still based on morphological inspection by a pathologist. There are two common skin cancers: basal cell carcinoma (BCC) and squamous cell carcinoma (SCC). Both BCC and SCC are nonmelanoma skin cancers, and BCC is the most common skin neoplasm [1]. Thus the accurate detection of BCC has attracted much attention from clinical dermatologists since it is diﬃcult to distinguish BCC tissue from surrounding noncancerous tissue. The routine diagnostic technique used for the detection of BCC is pathological examination of biopsy samples. However this method relies upon a subjective judgment, which is dependent on the level of experience of an individual pathologist. Thus, a fast and accurate diagnostic technique for the initial screening and selection of lesions for further biopsy is needed [2]. Raman spectroscopy has the potential to resolve this problem. It can be applied to provide an accurate medical diagnosis to distinguish BCC tissue from

Corresponding author.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 735–740, 2006. c Springer-Verlag Berlin Heidelberg 2006

736

S.-J. Baek and A. Park

surrounding normal (NOR) tissue. Recently, direct observation method based on confocal Raman technique was presented for the dermatological diagnosis of BCC [2]. According to the study, confocal Raman spectra provided promising results for detection of precancerous and noncancerous lesions without special treatment. Hence, with the confocal Raman spectra, we could design an automatic classiﬁer having robust detection results. In this paper, we propose a simple preprocessing method for classiﬁcation of BCC. Experiments with three kinds of classiﬁers including MAP, KNN, and ANN were carried out to verify the eﬀectiveness of the proposed method.

Raman Measurements and Preprocessing of Data

The tissue samples were prepared with the conventional treatment, which is exactly the same as [2]. BCC tissues were sampled from 10 patients using a routine biopsy. Cross sections of 20 μm were cut with a microtome at -20 o C and stored in liquid nitrogen. Two thin sections of every patients were used for experiments. One section was used for classiﬁcation and the other section was stained with H&E and used as a reference after locating the boundaries between BCC and NOR by an expert pathologist with a routine cancer diagnosis. The confocal Raman spectra for the skin samples are shown in Figure 1, where no strong background noise is not observed. In the Fig. 1A, there is a clear distinction between BCC and NOR tissues. Most of the spectra belong to this case. The Fig. 1B shows the case when a BCC spectrum is measured in the vicinity of the boundary of BCC and NOR. Since a peak near 1600 cm−1 is a distinctive feature of BCC spectra as you see in the Fig. 1A, the BCC spectrum in Fig. 1B could be classiﬁed as BCC though the feature is not so evident. The Fig. 1C shows an outlier, where the BCC spectrum is obtained in the middle of the BCC region but looks very similar to that of NOR. Similar spectrum can be found in Fig. 2B (g). The case will be discussed in the latter section. A skin biopsy was performed in the perpendicular direction from the skin surface, and it is the same for the spectral measurements. That is the direction from the epidermis to the dermis in Fig. 2A and 2B. Raman spectra of BCC

NOR BCC A

C 1800

1600

1400

1200 800 1000 Raman Shift (cm-1)

600

Fig. 1. Confocal Raman spectra from three patients at diﬀerent spots

BCC Detection by Classiﬁcation of Confocal Raman Spectra

}

skin

}

d e

30~40༁

f g h i

1800

1400

1000

Raman Shift (cm-1)

600

}

NOR

skin

}

d e

BCC

} NOR

737

30~40༁

g h i

}

1800

1400

1000

Raman Shift (cm-1)

NOR

BCC

NOR

600

Fig. 2. Confocal Raman proﬁles of skin tissue with an interval of 30-40 μm

tissues were measured at diﬀerent spots with an interval of 30-40 μm. In this way, 216 Raman spectra were collected from 10 patients. We normalized the spectra so that they fall in the interval [-1,1], which is often called minmax method. Needless to say, there are many normalization methods. For example, one can normalize a given data set so that the inputs have means of zero and standard deviations of 1 or have the same area. According to our preliminary experiments, however, the minmax method gave the best results. Thus we adopted the simple minmax normalization method. Next to the normalization of data, we applied a clipping window so that unnecessary data should be discarded. Unnecessary data generally degrade the performance of a classiﬁer. According to the previous work [2], the main spectral diﬀerences between BCC and NOR are in the region of 1220 - 1300 cm−1 and 1640 - 1680 cm−1 , which are also observed in Fig. 1. Thereby we discarded the data in the region below 1200 cm−1 . In addition to it, the data in the region, 1800 - 1600 cm−1 , were windowed by half Hanning window. The presence of high peak near 1650 cm−1 is a marker of NOR tissue, while the high peak near 1600 cm−1 is a marker of BCC tissue as you see in Fig. 1A. BCC spectra measured in the vicinity of the boundary often possess both peaks. They are classiﬁed into NOR even though the characteristics of the other region is similar to those of BCC. Thus application of a half Hanning window should improve the classiﬁcation rates by lowering the peak near 1650 cm−1 or lower the false positive ratio at least. A half Hanning window is deﬁned as follows. w[n] = 0.5 − 0.5 cos(2πn/M ), 0 ≤ n ≤ M/2. Overall data window we used in the experiments are plotted in the Fig. 3. For dimension reduction, well known PCA was adopted. PCA identiﬁes orthogonal bases on which projections are uncorrelated. Dimension is reduced by discarding transformed input data with low variance, which is measured by the corresponding eigenvalue. The number of retained principal components is determined experimentally to be 5.

738

S.-J. Baek and A. Park 1

0.5

0 1800

1600

Raman Shift (cm-1)

1200

. . . . . .

Fig. 3. The clipping data window combined with a half Hanning window

Experimental Results

Three types of classiﬁers including MAP, KNN, and ANN were examined. In the MAP classiﬁcation, we select the class, wi , that maximizes the posterior probability P (wi |x). Given the same prior probability, it is equivalent to the selection of the class that maximizes the class conditional probability density. Let w1 , w2 be BCC class and NOR class respectively. MAP classiﬁcation rule is expressed as follows [3]. Decide w1 if P (x|w1 ) ≥ P (x|w2 ), where conditional probability density is modeled by multivariate Gaussian. We used the Mahalanobis distance for the KNN classiﬁcation. The discriminant function of the KNN classiﬁer, gi (x), is the number of i class training data among the k number of nearest neighbors of x. The number k was set to be 5 experimentally. The KNN algorithm requires a large number computation in proportion to the number of training data. Fortunately, there are many fast algorithms available. In the experiments, we used the algorithm in [4]. As an ANN, multi layer perceptron (MLP) networks were employed for the classiﬁcation. Extreme ﬂexibility of MLP often calls for careful control of overﬁt and detection of outliers [5]. But for the well separated data, overly careful parameter adjustments of the networks are not necessary. In the experiment, the number of hidden unit was set to be 9 and a sigmoid function was used as an activation function. Since there are only two classes, we used one output unit. ANN models were trained to output -1 for the NOR class and +1 for the BCC class using back propagation algorithm. At the classiﬁcation stage, output value is hard limited to give a classiﬁcation result. The performance of MLP undergoes a change according to the initial condition. Thus the experiments were carried out 20 times and the results were averaged. Overall 216 data were divided into two groups. One is a training set and the other is a test set. Actually, the data from 9 patients were used as a training set and the data from the remaining patient were used as a test set. Once the classiﬁcation completes, the data from one patient are eliminated from the training set and used as new test data. The previous test data are now inserted into the training set. In this way, the data from every patients were used as a test set. The average number of BCC and NOR spectra in the test set is 8 and 14 and that of the training set is 68 and 126 respectively.

BCC Detection by Classiﬁcation of Confocal Raman Spectra

739

The classiﬁcation results without the data windows are summarized in the Table 1. In the table, we can see that the sensitivity of every methods is over 91.5%. Among them, MAP and ANN shows the sensitivity over 93% and outperforms KNN. Since there were not enough BCC data, nonparametric methods such as KNN might be inferior to the others. But the speciﬁcity of KNN is nearly equal to the others for the case of NOR detection. Table 1. Classiﬁcation results with original data. Stars indicate the decision of an expert pathologist MAP KNN ANN BCC NOR BCC NOR BCC NOR BCC∗ 93.0 7.0 91.8 8.2 93.2 6.8 NOR∗ 4.2 95.8 6.9 97.1 5.6 96.4

To show the eﬀectiveness of the proposed data window, another experiments were carried with the window. The results are shown in the Table 2. Even with simple clipping window, the classiﬁcation performance is improved but further improved when the clipping window is combined with a half Hanning window. With the half Hanning window, the sensitivity of every methods is over 94%. The average sensitivity is increased by 0.73% while the averaged speciﬁcity by 0.53%. This indicates that the half Hanning window contributes to lowering the false positive ratio more than the false negative ratio. In case of ANN, the false positive ratio was reduced from 6.2% to 3.1% and overall true classiﬁcation rate is about 97%. Considering that this performance enhancement is achieved without any cost, the usage of the proposed data window is easily justiﬁed. Table 2. Classiﬁcation results with data windowing. Stars indicate the decision of an expert pathologist.

Simple Clipping Half Hanning

BCC∗ NOR∗ BCC∗ NOR∗

MAP BCC NOR 94.5 5.5 2.9 97.1 94.6 5.4 2.3 97.7

KNN BCC NOR 94.6 5.4 3.8 96.2 95.9 4.1 2.9 97.1

ANN BCC NOR 96.5 3.5 5.6 96.4 97.3 2.7 3.5 96.5

Even though the classiﬁcation error rates is already small, there is a possibility to further improve the performance. Careful examination of the errornous data reveals an interesting thing. Many of the false positive errors arise in the middle of the BCC region. The Fig. 1C and Fig. 2B (g) are such examples. Considering that the confocal Raman spectroscopy focus on a very small region, normal tissue could be focused on by chance instead of BCC tissue. Since BCC tissue is marked as a region. there is a possibility of false marking. Hence, we

740

S.-J. Baek and A. Park

are currently investigating the method to ﬁx this kind of problem. Taking this into consideration, we could say that the classiﬁcation could be almost perfect especially for the detection of BCC.

Conclusion

In this paper, we propose a simple preprocessing method for classiﬁcation of basal cell carcinoma (BCC), which is one of the most common skin cancer. The preprocessing step consists of data clipping with a half hanning window and dimension reduction with principal components analysis (PCA). The experimental results with and without the data window show that the application of the data window could lower the false positive ratio. The ANN classiﬁcation performance involving 216 Raman spectra was about 97% when the data was processed with the proposed window. With this promising results, we are currently investigating automatic BCC detection tools.

Acknowledgement This work was supported by grant No. RTI-04-03-03 from the Regional Technology Innovation Program of the Ministry of Commerce, Industry and Energy(MOCIE) of Korea.

References 1. Jijssen, A., Schut, T. C. B., Heule, F., Caspers, P. J., Hayes, D. P., Neumann, M. H., Puppels, G. J.: Discriminating Basal Cell Carcinoma from its Surrounding Tissue by Raman Spectroscopy. Journal of Investigative Dermatology 119 (2002) 64–69 2. Choi, J., Choo, J., Chung, H., Gweon, D.-G., Park, J., Kim, H. J., Park, S., Oh, C.H.: Direct Observation of Spectral Diﬀerences between Normal and Basal Cell Carcinoma (BCC) Tissues using Confocal Raman Microscopy. Biopolymers 77 (2005) 264–272 3. Duda, R. O., Hart, P. E., Stork, D. G.: Pattern Classiﬁcation. Jone Wiley & Son, Inc (2001) 4. Baek, S.J., Sung, K.-M.: Fast KNN Search Algorithm for Nonparametric Classiﬁcation. IEE Electronics Letters 35 (2000) 2104–2105 5. Gniadecka, M., Wulf, H., Mortensen, N., Nielsen, O., Christensen, D.: Diagnosis of Basal Cell Carcinoma by Raman Spectra. Journal of Raman Spectroscopy 28 (1997) 125–129

Blind Signal-to-Noise Ratio Estimation Algorithm with Small Samples for Wireless Digital Communications∗ Dan Wu, Xuemai Gu, and Qing Guo Communication Research Center of Harbin Institute of Technology, Harbin, China {wudan, xmgu, qguo}@hit.edu.cn

Abstract. To extend the range of blind signal-to-noise ratio (SNR) estimation and reduce complexity at the same time, a new algorithm is presented based on a signal subspace approach using the sample covariance matrix of the received signal and combined information criterion (CIC) in information theory. CIC overcomes the disadvantages of both Akaike information criterion’s (AIC) under penalization and minimum description length’s (MDL) over penalization and its likelihood form is deduced. The algorithm needs no prior knowledge of modulation types, baud rate or carrier frequency of the signals. Computer simulation shows that the algorithm can blindly estimate the SNR of digital modulation signals commonly used in additional white Gaussian noise (AWGN) channels and Rayleigh fading channels with small samples, and the mean estimation error is less than 1dB for SNR ranging from -15dB to 25dB. The accuracy and simplicity of this method make it more adapt to engineering applications.

1 Introduction Signal-to-noise ratio (SNR) is defined as a measure of signal strength relative to background noise and it is one of the most important criterions for information transformation quality. In modern wireless communication systems, the precise knowledge of SNR is often required by many algorithms for their optimal performance. For example, SNR estimates are typically employed in power control, mobile assisted hand-off, adaptive modulation schemes, as well as soft decoding [1,2] procedures, etc. . Estimating SNR and providing this estimate to the data detector are essential to the successful functioning of any communications receiver. SNR estimators can be divided into two classes. One class is the data-aided estimator for which known (or pilot) data is transmitted and the SNR estimator at the receiver uses known data to estimate the SNR. The other class is the non-data-aided estimator. For this class of estimator, no known data is transmitted, and therefore the SNR estimator at the receiver has to “blindly” estimate the SNR. Although the dataaided estimator performs better than the non-data-aided estimator, it is not suitable for non-cooperative situations. In this paper, non-data-aided or blind SNR estimator is considered. Some methods have been proposed recently. In [3], SNR estimation in ∗

This work is supported by National 863 Projects of China. Item number is 2004AA001210.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 741 – 748, 2006. © Springer-Verlag Berlin Heidelberg 2006

742

D. Wu, X. Gu, and Q. Guo

frequency domain was introduced, using circular correlation for M-ary phase shift keying (MPSK), but the method is not suitable for other modulation types. Fourthorder moments method was applied in [4] for constant envelope modulations, and in [5], an envelope-based estimation method is proposed for nonconstant envelope modulations. Both of these two methods need the prior knowledge of envelope. In [6], an interative SNR estimation for negative SNRs algorithm was developed. However, the method has relatively high bias for low SNR (When SNR is below -10dB, bias is over 3dB). Blind SNR estimation can be employed in many hot fields of Information War, such as threat analysis, electronic surveillance system, etc. These applications have a high demand of estimation speed and range of SNR. However, performances of the methods mentioned above decrease when the number of samples is not big enough. Even the number of samples is appropriate, performances are not desirable when SNR is below zero. In this paper, a new blind SNR estimation algorithm is presented based on eigenvalue decomposition of the correlation matrix of the received signals and the principle of combined information criterion (CIC)[7]. Compared with using Akaike information criterion (AIC) and minimum description length (MDL), algorithm using CIC gives more accurate results in additional white Gaussian noise (AWGN) channels at low SNR with small samples. When applied to Rayleigh fading channels, the performance is also acceptable. This paper is organized as follows. After the statement and formulation of the problem in Section II, the blind SNR estimation algorithm is introduced in Section III. In Section IV the computer simulation results are presented. Section V draws the conclusions.

2 Problem Formulation Assume y(t) is a received signal in AWGN channels. Then the signal model after sampled is y ( k ) = s ( k ) + n( k ) .

(1)

where s(k) is a digitally modulated signal with unknown modulation type and n(k) is an independent zero-mean, Gaussian random process with second order moments E ª¬ n(k )n H (l ) º¼ = σ N2 I δ kl .

(2)

E ª¬ nT ( k ) n(l ) º¼ = 0 .

(3)

where xH denotes the Hermitian transpose of x, xT denotes the transpose of x, σ N2 is the noise power, δ kl represents the Kronecker delta function, and I is the identity matrix. Let Y(k)=[y(k) y(k+1) …y(k+L-1)], then Y (k ) = S (k ) + N (k ) .

(4)

Blind SNR Estimation Algorithm with Small Samples

743

where S(k)=[s(k) s(k+1) …s(k+L-1)], and N(k)=[n(k) n(k+1) …n(k+L-1)]. The L order covariance matrix of the received signal is Ryy = E (YY H ) = E (( S + N )( S + N ) H ) = E ( SS H ) + σ N2 I = Rss + σ N2 I .

(5)

where Rss is the covariance matrix of the original signal. Based on the properties of covariance, Rss is a positive semi-definite matrix and its rank is assumed q (q 0 . Let ϕ (.) be the probability density function of a standard normal distribution. In Bayesian unsupervised segmentation using parametric estimation, the problem of segmentation is based on the model identification. The most commonly used estimator is the ML estimator, which is solved by the classical EM algorithms[4]. Here, the mixture parameters are estimated by the BASEM algorithm which consists of four steps. Given an available Bootstrap sample X 0* from the original image X 0 , and after initializing parameters from the SAR image histogram, the details are as follows: 4.1 Expectation Step

The posterior probability for a pixel X * ( s ) ∈ X 0* to belong to class k at the iteration is given by

τ s ,k =

π k (1 σ k )ϕ (eks* σ k ) K

¦ π k (1 σ k )ϕ (eks* σ k )

k = 1, , K .

(2 )

k =1

where es*, k = X * ( s ) − ak ,0 − ak ,1 X * ( sγ ) − − ak , p X * ( sγ p ) , X * ( sγ ) is parent of X * ( s ) .

752

X.-B. Wen, Z. Tian, and H. Zhang

4.2 Stochastic Step

Then, construct a Bernouilli random variable zs , k of parameter

τ s ,k .

4.3 Annealing Step

From zs , k and τ s , k , one can construct another random variable

ws ,k = τ s , k + hn ( z s ,k − τ s ,k ) .

(3)

where hn is a given sequence which slowly decreases to zero during iterations. 4.4 Maximization Step

In this step, ws , k is considered artificially as the a posterior probability of X * ( s ) , so that, at next iteration, we have

¦ ws , k

πˆ k = {s|m ( s ) = 0} N

ws , k [ X * ( s ) − aˆk ,0 − aˆk ,1 X * ( sγ ) − − aˆk , p X * ( sγ p )]2

. k = 1, , K .

(5)

¦ ws ,k X * ( s ) μ ( X * ( s), i) = ¦ aˆk , j ¦ ws , k μ ( X * ( s ), j ) μ ( X * ( s), i) .

(6)

σˆ = 2 k

(4)

k = 1, , K .

{ s | m ( s ) = m}

{s | m ( s ) = m}

ws , k

where (aˆ k , 0 , aˆ k ,1, , aˆ k , p ) satisfy the system of equations p

{ s | m ( s ) = m}

j =1

{ s | m ( s ) = m}

where μ ( X * ( s ), i ) = 1 for i = 0 and μ ( X * ( s ), i ) = X * ( sγ i ) for i > 0 . The estimates of the parameters are then obtained by iterating the four steps until convergence. Parameters K , pk can be selected by Bayesian information criterion (BIC). After the number of SAR imagery regions is detected and the model parameters are estimated, SAR image segmentation is performed by classifying pixels. The Bayesian classifier is utilized for implementing classification. That is to say, to attribute at each X (s ) a class k with the following way: k ( X ( s )) = Arg{max[π j (1 σ j )ϕ (e js σ j )]} . 1≤ j ≤ K

(7)

4 Experimental Results for SAR Imagery To demonstrate the segmentation performance of our proposed algorithm, we apply it to two complex SAR images which are size of 128 × 128 pixel resolution, consist of

Bootstrapping Stochastic Annealing EM Algorithm for Multiscale Segmentation

753

woodland and cornfield (see Fig. 2(a)). From the complex images, we generate an above-mentioned quadtree representation consisting of L = 3 levels and use a secondorder regression. We randomly select 900 representative pixels from the original images. An unsupervised segmentation method based on the BSAEM algorithm is then used for parameters estimation, and Bayesian classification is adopted for pixels classification. The number of resampling and the order of regression are such chosen, because it is found that by increasing the regression order to p = 2 and resampling number for both cornfield and forest, we can achieve a lower probability of misclassification and a good trade-off between modeling accuracy and computational efficiency. Fig. 2(c) shows the results from applying BSAEM approach to two SAR images, as well as the results (see Fig.2(b)) from EM algorithm for comparison. Table 1 and Table 2 show the misclassification probabilities and the time of segmentation images under P4 computer, respectively. The results we obtain show that the BSAEM algorithm produces better results than the EM method both in the quality of the segmentation and the computing time.

(a)

(b)

(c)

Fig. 2. (a) Original SAR image composed of woodland and cornfield. (b) Segmented image from EM algorithm. (c) Segmented image from BSAEM algorithm. Table 1. Misclassification probabilities for the SAR images in Fig. 2

Pmis (. | forest ) Fig.2(top) Fig.2(bottom)

EM (b) 1.312 2.776

BSAEM (c) 1.296 3.162

Pmis (. | grass ) EM (b) 5.249 1.527

BSAEM (c) 4.124 1.619

754

X.-B. Wen, Z. Tian, and H. Zhang Table 2. Time of segmentation images under P4 computer (s)

EM Fig. 2 (top) Fig. 2 (bottom)

2637 4324

BSAEM 470 793

5 Conclusion We apply the Bootstrap sampling techniques to the segmentation of SAR image based on the MMAR model, and give BSAEM algorithm for MMAR model of SAR imagery. This kind of algorithm leads to a great improvement in ML parameter estimation and considerably reduces the segmentation time. Experimental results show that the BSAEM algorithm gives better results than the classical one in the quality of the segmented image.

Acknowledgements This work is supported in part by the National Natural Science Foundation of China (No. 60375003), the Aeronautics and Astronautics Basal Science Foundation of China (No. 03I53059), the Science and Technology Development Foundation of Tianjin Higher-learning, the Science Foundation of Tianjin University of Technology.

References 1. Fosgate , C., Irving , W.W., Karl, W., Willsky, A.S.: Multiscale Segmentation and Anomaly Enhancement of SAR Imagery. IEEE Trans. on Image Processing (1997) 7-20 2. Irving , W.W., Novak , L.M., Willsky, A.S.: A Multiresolution Approach to Discrimination in SAR Imagery. IEEE Tran. Aerosp. Electron. Syst. (1997) 1157-1169 3. Kim, A., Kim, H.: Hierarchical Stochastic Modeling of SAR Imagery for Segmentation / Compression. IEEE Trans. on Signal Processing (1999) 458-468 4. Wen, X.B., Tian, Z.: Mixture Multiscale Autoregressive Modeling of SAR Imagery for Segmentation. Electronics Letters (2003) 1272-1274 5. Efron, B., Tibshirani R.: An Introduction to the Bootstrap. London. U.K.,Champman & Hall (1993)

BP Neural Network Based SubPixel Mapping Method Liguo Wang1, 2, Ye Zhang2, and Jiao Li2 1

Abstract. A new subpixel mapping method based on BP neural network is proposed to improve spatial resolution of both raw hyperspectral imagery (HSI) and its fractional image. The network is used to train a model that describes the relationship between mixed pixel accompanied by its neighbors and the spatial distribution within the pixel. Then mixed pixel can be super-resolved by the trained model in subpixel scale. To improve the mapping performance, momentum is employed in BP learning algorithm and local analysis is adopted in processing of raw HSI. The comparison experiments are conducted both on synthetic images and on truth HSI. The results prove that the method has fairly good mapping effect and very low computational complexity for processing both of raw HSI and of fractional image.

1 Introduction One biggest limitation of hyperspectral imagery (HSI) relates to spatial resolution, which determines the spatial scale of smallest detail depicted in an image. In HSI, a significant proportion of pixels are often mixed of more than one distinct material. The presence of mixed pixels severely affects the performance of military analysis, environment understanding, etc. In this case, spectral unmixing [1] was introduced to decompose each mixed pixel into disparate components with respective proportions. There exist many spectral unmixing methods to estimate land cover components, such as linear spectral mixture modeling [2], multilayer perceptrons [3], nearest neighbor classifiers [4] and support vector machines [5]. Generally, spectral unmixing can provide more accurate representation of land covers than hard classification of oneclass-per-pixel. However, the spatial distribution of each class component in mixed pixel remains unknown. Subpixel mapping (SM) is just presented to solve this problem by dividing each pixel into several smaller units and allocating the target (specified class) to the smaller cells. A limited number of SM methods have been presented. Reference [6] made use of another higher spatial resolution image to sharpen the output of spectral unmixing, but it is difficult to obtain two coincident images of different spatial resolution. Reference [7] formulated the spatial distribution of target component within each pixel as energy function of a Hopfield neural network (HNN). The results provide an improved D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 755 – 760, 2006. © Springer-Verlag Berlin Heidelberg 2006

756

L. Wang, Y. Zhang, and J. Li

representation of land covers. However, it costs a considerable computational time to obtain the results. All the methods described above are only suited for processing of fractional image. Unfortunately, for the limitation of complexity of raw image, there is no effective SM method applying to raw imagery. In this paper, a novel predictor based on BP neural network (BPNN) with momentum is proposed for processing both of fractional image and of raw HSI.

2 BP Learning Algorithms with Momentum The standard BP algorithm has slow rate of convergence and the convergence is confronted with locally optimal phenomenon. Momentum decreases BP network’s sensitivity to small details in the error surface, and with momentum a network can slide through some shallow local minima. Let xi (i = 1,2,..., n) be the inputs to the network, y j ( j = 1,2,..., p) the outputs from the hidden layer, ok (k = 1,2,.., q) the output layer and w jk the connection weight from the j th hidden node to the k th output node. Momentum algorithm is formulated by appending some fraction of previous weight increment to standard BP algorithm:

Δw jk ( n) = ηδ k (n) y j (n) αΔw jk (n − 1)

0 0 , n

there must be at least one gradually changing course, or a set J .

J n = {I xn = I xn1 , I xn2 I xnj −1 , I xnj = I yn | d ( I xnm , I xnm+1 ) < ε , ∀m ∈[1, j −1], m ∈ N} ⊂ I n

(3)

3.2 Spatial Coverage Architecture

Spatial geometrical coverage inclusive of the embedding set I n , is constructed by topological product between I n and hyper sphere. The selection on multiform distinct spatial bodies will lead to tremendous diversifications in topological coverage and geometrical realization. In practice, topological covering set S n like hyper sausages, inclusive of set I n for an object On , can be designed as: Sn =

S ,S i

n i

= {x | d(x, y) ≤ r, y ∈Vi , x ∈ Rd },Vi = {y | y = αI xni + (1−α)I xni +1 ,α = [0,1]}

(4)

One segment of topological covering set is defined here. Let d ( x, x1 x2 ) = minα∈[0,1] d ( x, αx1 + (1 − α ) x2 )

(5)

be the distance of x and the line segment x 1 x 2 , where x1 , x2 ∈ R d are two centers.

Thus one hyper sausage set is S(x1, x2; r) = {x | d 2 (x, x1x2 ) ≤ r 2}. 3.3 Probabilistic Spatial Coverage

For the sake of building probability distribution for every object On , we create one kind of density function, called a dipole kernel function, to estimate a segment of class conditional probability function p( I On ) or q( x ") , much like one hyper sausage in spatial coverage. q ( x, x1 , x2 ) ≤ 0 hG ( x | x1 , Σ) , ° K ( x, x1 , x2 ) = ®hG ( x | x 2 , Σ) , q( x, x1 , x2 ) ≥ d ( x1 , x 2 ) °hG ( x − q( x, x , x ) × ( x − x ) d ( x , x ) | x , Σ), otherwise 1 2 2 1 1 2 1 ¯

(6)

where G(x u, Σ) is a Gaussian density, q(x, x1, x2 ) = (x − x1) • (x2 − x1) d(x1, x2 ) , h is the

magnitude. On a set of M typical views in an appropriate order from object On ,

q ( x ") = (max iM=1 K i ) M .

764

G. Ji et al.

Bayesian decision rule adopts the Maximum A Posteriori estimation for the hypothesis set, n∗ = arg maxnN=1 p(On I ) . Moreover, Bayesian theorem can decompose the posterior probability p(On I ) into the class conditional probability p(I On ) and the

prior probability p(On ) , p(On I ) = p(I On ) p(On ) p(I ) , where P( I ) is the unconditional probability. Therefore, assuming that all objects have the same prior probability, what we need is only to make a decision on the conditional probability function p(I On ) for

each object On .

4 Simulation Experiment Image database consists of cellular images of regular phytoplankton species in various styles, angles and shapes. Based on global similarity measure to guide decisions, a typical cellular sort procedure was involved to form connectivity frameworks for each phytoplankton species. Fig.1 shows example images from database, including Ceratium, Rhizosolenia, Pleurosigma, Stephanopyxis, Peridinium, Skeletonema, Nitzschia, Chaetoceros and Coscinodiscus.

Fig. 1. Example images

Cellular recognition strategy is composed of phytoplankton statistical learning and spatial coverage analysis. Recognition procedure flows from multi-object world to possible genera or subgenera nodes, and then to species nodes step by step. According to phytoplankton’s own nature, BYY system is first assigned to roughly decide on the united class nodes that multiple species subordinate to at a higher level, and to share automatic parameter learning and model selection in parallel. BYY harmony learning results in a winner-take-all type competition for model selection so that the number of possible genera or subgenera could be predicted for primary recognition. Probabilistic spatial geometrical coverage is then set up to exactly cognize genuine species from the already specified aggregate. Exquisite and subtle inner relationship inside each species could be further explored in a coarse to fine process. The whole hierarchical scheme employs BYY system on Gaussian mixture models in a Backward architecture for the united aggregate, and probabilistic spatial construction with dipole kernel density estimate function like hyper sausages for species, respectively. Pattern recognition was performed in aid of the winner-take-all competition p( y x) , or p( y x, ") and p("x) , for the united class nodes, as well as the conditional

probability computation q ( x ") for every species. The decision was made by Maximum A Posteriori estimation on the particular species where the maximal posterior probability occurred. Probability distribution, from harmony competition in genera or subgenera, and from spatial coverage in species, would increase or decrease

Cellular Recognition for Species of Phytoplankton Via Statistical Spatial Analysis

765

the corresponding object’s activity, and the most active object was used to predict or judge which species the phytoplankton are really from.

5 Result Analysis Test images from unknown species were classified into the most competitive united classes, and matched against all the inclusive species prototypes in those particular aggregate. Predictive genera or subgenera could be estimated in harmony learning stage. The hypothesis set was ranked in a down sequence by probability gained from the stored probabilistic spatial analysis. Training recognition rates were all 100%. With proper parameters, mistaken recognition rate for all images from unlearned species could be 0%, i.e., unlearned species were all rejected without incorrect recognition. Table 1 makes a comparison on average recognition rates with single BYY harmony learning, single probabilistic spatial coverage, as well as their combination. Fig. 2 lists some recognition results with the top three matches. Table 1. Recognition comparison Recognition approaches Single BYY Single probabilistic coverage BYY based spatial coverage

Recognition rates of top three matches (%) 1 2 3 76.92 88.46 92.31 84.62 96.15 98.80 85.38 96.92 98.92

Fig. 2. Recognition results with the top three matches

6 Conclusions In this paper, we present a scheme of cellular pattern recognition for species of phytoplankton via harmony statistical learning and spatial coverage analysis. BYY

766

G. Ji et al.

harmony learning system on Gaussian mixture models in a Backward architecture, is assigned to carry out automatic parameter learning and model selection in parallel, and roughly decide on the most competitive united class for possible genus or subgenus. Probabilistic spatial geometrical coverage with a dipole kernel density estimate function like hyper sausages, is adopted to exactly cognize and match all the inclusive phytoplankton species in a certain united class. The hierarchical hybrid strategy guarantees that species inner knowledge could be explored step by step in a coarse to fine process. The decision is made on Bayesian decision rule. Instead of a single evaluation, prediction is ranked in a sequence by means of probability, which combines visible information from multiple species together, and makes the correct hypothesis more probable. Simulation experiment has achieved probability distribution information, and proved the approach effective, superior and feasible.

Acknowledgements This research was fully supported by the Natural Science Foundation of China (60572064) and the National 863 Natural Science Foundation of P. R. China (2001AA636030).

References 1. Vapnik, V.N: The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, (1995) 2. Xu, L.: Bayesian Ying Yang harmony learning. The handbook of brain theory and neural networks, Arbib, M.A., Cambridge, MA, the MIT Press, (2002) 3. Huang, D.S.: Systematic Theory of Neural Networks for Pattern Recognition. Publishing House of Electronic Industry of China, Beijing (1996) 70-78 4. Wang, S. J.: Biomimetics pattern recognition. INNS, ENNS, JNNS Newletters Elseviers, (2003) 5. Wang, S. J., Wang, B.N.: Analysis and theory of high-dimension spatial geometry for Artificial Neural Networks. Acta Electronica Sinica, Vol. 30 No.1, (2002) 1-4 6. S-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, (2004) 7. Huang, D. S.: Radial basis probabilistic neural networks: Model and application. International Journal of Pattern Recognition and Artificial Intelligence, 13(7), (1999) 10831101

Combination of Linear Support Vector Machines and Linear Spectral Mixed Model for Spectral Unmixing Liguo Wang1, 2, Ye Zhang2, and Chunhui Zhao1 1

School of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China {wangliguo, zhaochunhui}@hrbeu.edu.cn 2 Dept. of Information Engineering, Harbin Institute of Technology, Harbin 150001, China {wlg74327, zhye}@hit.edu.cn

Abstract. Aiming at the shortcoming of linear spectral mixing model (LSMM) and linear support vector machines (LSVM) has potential capability to be used in spectral unmixing, but it is hard to construct too many classifiers when LSVM is used in partial unmixing. In this paper, the equality of LSVM and LSMM is proved concisely, and then a new double-unmixing scheme is proposed by combining the two models. In the first time, LSVM based full unmixing is performed for selecting related class subset. In the second time, appropriate model is selected according to the cardinality of current subset for partial unmixing. Another, least square LSVM has been improved for effective unmixing. Experiments prove the high efficiency of the proposed scheme.

1 Introduction In resent years, hyperspectral remote sensing has been applied widely and the techniques of hyperspectral imagery (HSI) processing have been developed greatly. As a key technique of mixed pixel processing, spectral unmixing [1] catches more and more eyes. Mixed pixel is a pixel that corresponds to more than one class or substance of land cover. Given all endmembers(EMs) or constituent spectra in HSI, the task of spectral unmixing is to work out the fractional abundance of each class. Spectral unmixing can be classified as full spectral unmixing (full unmixing, FU) and partial spectral unmixing (partial unmixing, PU)[2]. The former is performed on the full set of EMs while the latter on the partial set of them. PU has advantage over FU for two reasons. First, the number of classes in whole HSI is large while it is small for a specified mixed pixel. Second, the participation of unrelated classes to a mixed pixel can deteriorate unmixing accuracy of it. Under this condition, selecting class subset is proposed for each mixed pixel in spectral unmixng. But the selection process as in hierarchical unmixing and in stepwise unmixing is usually of expensive computational cost [3]. So far, traditional linear spectral mixing model (LSMM)[1] is dominant in spectral unmixing methods for its clear physical meaning, great convenience and low computational cost. In this method, one and only EM is used as a representation of each D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 767 – 772, 2006. © Springer-Verlag Berlin Heidelberg 2006

768

L. Wang, Y. Zhang, and C. Zhao

class. But in fact, the spectral variety of in-class is often great and one EM cannot represent the whole class in good. This case leads to a discounted unmixing accuracy of the method. Reference [4][5] proves that linear support vector machines (LSVM) [6] has potential capability to be used in spectral unmixing. Automatic selection of pure pixel and flexible processing of linearly nonseparable modeling lead to high unmixing performance of the method. The proof is hard to comprehend for common readers. Another, it is inconvenient to construct too many classifiers for LSVM when it is used in PU in which subset selection is about class set instead of EM one. Under this condition, the method is still not be used in spectral unmixing practically. In this paper, the equality of LSVM and LSMM for spectral umixing is proved concisely, and then a new double-unmixing scheme is proposed by combining the two models.

2 Proof of Equality of LSVM and LSMM for Spectral Umixing For the limitation of space, LSVM and LSMM are omitted here. Let f (•) be the discrimination function of LSVM. In this section, the equality of LSVM (based on 1a-r classifier structure) and LSMM is proved in a concise manner when the same set of EMs is available. For visualization purpose, 3-EM set is considered firstly. Denote general expression Δ xyz is the triangle formed by point x , y and z with area of S ( Δ xyz ) , and Lxy is the line segment formed by point x and y with length of

le( L xy ) . Suppose pixel P is mixed by EM A , B and C (see in fig. 1), and A S (Δ ABC ) equals 1, then it is easy to conclude that FLSMM (P) , the fractional abundance of EM A in pixel P , generated by LSMM based spectral unmixing, equals to S (Δ PBC ) . Let D be the crossing point of LBC and extended L AP , then we get: A FLSMM ( P ) = S (Δ PBC ) =

le( LPD ) . le( L AD )

(1)

A Now we prove that the abundance FLSVM (P) generated by LSVM is just the same

as (1). Prescribe EM A is positive class while B and C (and so D ) are negative ones, i.e.

f ( A) = 1, f ( B ) = f (C ) = f ( D) = 0 .

(2)

For any real number α and β and any pixel x1 and x2 , the following expression holds constantly:

f (αx1 + βx2 ) = αf ( x1 ) + βf ( x2 ) .

(3)

Combination of LSVM and LSMM for Spectral Unmixing

769

Fig. 1. Equality of LSVM and LSMM A Then, the fractional abundance FLSVM (P ) can be computed as

A FLSVM ( P) = f ( P)

le( LPD ) le( L AP ) f ( A) + f ( D) . le( L AD ) le( L AD )

le( LPD ) le( L AD )

(4)

From (1) and (4) we can see the two results are the same. As for the other abundances of EM B and C , the same conclusion is drawn. For more complicated cases including bias terms calculation and nonlinear re-estimation, the proof is given in reference [4], in which the unique merits of LSVM are also demonstrated. By the way, 1-a-r (one against rest) classifier structure, which constructs classifier for each class pair, is appropriate to construct LSVM while 1-a-1 (one against one) type, which constructs classifier between each class and the rest classes, is unfeasible for purpose of spectral unmixing.

3 New Spectral Unmixing Scheme and Its Advantages According to the analysis foregoing, PU is superior to FU in terms of unmixing accuracy. In this case, LSVM is confronted with the difficulty of constructing too many subclassifers. For example, when the number of classes is large as 10 and the cardinality of each subset is no more than 3, the number of subclassifers to be constructed is perhaps larger than 400. On the other hand, LSMM has low unmixing accuracy. Considered that the two models have complementary advantage, the combination of the tow models is hopeful of getting better unmixing performance. The new unmixing scheme can be described as the following 3 steps, in which the selection of class subset (and so EM one) is also carried out by exploiting spectral information (unmixing results) and spatial information (neighbor pixels): 1. LSVM based FU is used to decompose each mixed pixel. 2. For each mixed pixel, accumulating each class abundances obtained in step 1 in its 3× 3 local window according to the unmixing results in step 1. After removing

770

L. Wang, Y. Zhang, and C. Zhao

classes with less than 5 points (reference value) accumulated values, only no more than 4 classes (reference value) are retained as related classes to the mixed pixel. 3. If the number of related classes of a mixed pixel is 1, it is considered as pure one and its abundance is adjusted to 1. If the number is 2, decomposing the pixel again by pair-class LSVM. Otherwise, the pixel is reprocessed by LSMM based PU.

4 Improving SVM for Effective Unmixing In this section, we will present a weighted or robust least square SVM for better unmixing. Least square SVM as a version of SVM is widely used for its convenience, and its optimization expression is written as min J ( w, e) = w, b , e

γ 1 2 w + 2 2

¦ ei2

(5)

i =1

In order to obtain a robust estimation, we Let n +1 and n −1 be sample numbers in class +1 and –1 respectively, φ ( x +1 ) and φ ( x −1 ) be their corresponding class centers in

kernel space, and D( xi , x y ) be the distance between sample x i and its corresponding i

class center. Formulas for computing φ ( x +1 ) , φ ( x −1 ) and D( xi , x y ) are specified as: i

φ ( x +1 ) =

1 n +1

¦ φ ( x ), i

i , yi = +1

φ ( x −1 ) =

1 n −1

¦φ (x ) . i

i , y i = −1

D( xi , x yi ) = φ ( xi ) − φ ( x yi ) = [ K ( xi , xi ) + K ( x yi , x yi ) − 2K ( xi , x yi )]1 / 2 .

(6)

(7)

For more description, one can consult reference [7].

5 Experiments and Results In the first group experiments, a simple comparison of FU effect is performed for LSVM and LSMM on a subscene (40 × 50 pixels, 126 bands) of naval military base HSI acquired in San Diego. The two EMs are the mean spectrum of two classes of training samples (100 samples in each class) selected manually in the subscene. The resulting fractional images are shown in fig. 2. It can be seen LSVM has better unmixing effect than LSMM. In the second group experiments, correction coefficient (CC)[8] is used as an evaluation criterion of spectral unmixing. The HSI comes from an agriculture/forestry landscape AVIRIS data acquired in Indian Pine Test Site (144 × 144 pixels, 200 bands). Four class crops are selected from the HSI (displayed in 4 levels grey) as experimental data. (see in fig. 3). Fig. 3 b) gives the region in which the pixels need to be unmixed again. The comparison of unmixing accuracy between the new scheme and traditional LSMM method is given in table 1. The new scheme gets about 10 points improvement of unmixing accuracy. Fig. 4 gives the comparison of fractional images. In visual, the new scheme has a clear superiority to traditional LSMM method.

Combination of LSVM and LSMM for Spectral Unmixing

771

Table 1. Comparison of unmixing accuracy

LSMM New

Unmixing accuracy of each class 0.7826 0.6858 0.9196 0.8369 0.9003 0.8369 0.9761 0.8868

(1) Original image

Average 0.8062 0.8062 0.9000 0.9000

(2) Fractional image(LSVM) (3) Fractional image(LSMM)

Fig. 2. Comparison of unmixing performance

a) Original 4 class crops

b) Mixed area

Fig. 3. Extraction of mixed area

a) Fractional images generated by traditional LSMM

b) Fractional images generated by new unmixing scheme Fig. 4. Unmixing fractional images of 4 class crops under different methods

772

L. Wang, Y. Zhang, and C. Zhao

6 Conclusions LSVM and LSMM have different advantages and shortcomings, and the appropriate utilizing of them is helpful of improving unmixing performance. When all classes or only two classes are included in spectral unmixing, LSVM is preferred, and LSMM otherwise. Another, selecting class subset by combination of spatial information and spectral information is effective and free of cumbersome computation. The proposed scheme gives a feasible manner for applying LSVM to spectral unmixing and provides new idea for improving unmixing performance. Furthermore, improved LSVM (such as robust LSVM) instead of original ones can be used for a better unmixing performance. The proposed spectral unmixing scheme has several advantages. First, this scheme makes use of spatial information and partial set of classes in spectral unmixing. This is more effective comparing with other methods based on spectral information only and full set of classes. Second, SVM based spectral unmixing has good properties than traditional LSMM, such as automatic selection of pure pixels, multi-pixel representation of each class, easy extension for non-linear classification, etc. Third, unmixing accuracy can be further improved by improving SVM property. The proposed scheme will be the new thought to develop spectral unmixing. In future, the application of nonlinear SVM in unmixing waits for further argumentation.

References 1. Keshava, N., Mustard, J. F.: Spectral Unmixing. Signal Processing Magazine, IEEE. 19 (2002) 44–57 2. Nielsen, A. A.: Spectral Mixture Analysis: Linear and Semi-Parametric Full and Iterated Partial Unmixing in Multi- and Hyperspectral Image Data. International Journal of Computer Vision, 42 (2001) 17-37 3. Daniel, N.: Evaluation of Stepwise Spectral Unmixing with Hydice Data. SIMG-503 Senior Research Final Report Center for Imaging Science. http://www.cis.rit.edu/research/thesis/ bs/1999/newland/title.html 4. Martin, B., Hugh, G. L., Steve, R. G.: Linear Spectral Mixture Models and Support Vector Machines for Remote Sensing. IEEE Trans. on Geoscience and RemoteSensing, 38 (2000) 2346-2360 5. Martin, B., Gunn, S. R., Lewis, H. G.: Support Vector Machines for Optimal Classification and Spectral Unmixing. Ecol. Modeling, 120 (1999) 167-179 6. Vapnik, V. N.: The Nature of Statistical Learning Theory. New York, Springer Press, NY (1995) 7. Wang, L.G., Zhang Y., Zhang, J.P.: A New Weighted Least Squares Support Vector Machines and Its Sequential Minimal Optimization Algorithm. Chinese J. Electron, 15 (2006), to be appear in No. 3 8. Hassan, E.: Introducing Correctness Coefficient as An Accuracy Measure for Sub Pixel Classification Results. www.ncc.org.ir/articles/poster83/H. Emami.pdf

Combining Speech Enhancement with Feature Post-processing for Robust Speech Recognition Jianjun Lei, Jun Guo, Gang Liu, Jian Wang, Xiangfei Nie, and Zhen Yang School of Information Engineering, Beijing University of Posts and Telecommunications, 100876 Beijing, China [emailprotected]

Abstract. In this paper, we present an effective scheme combining speech enhancement with feature post-processing to improve the robustness of speech recognition systems. At front-end, minimum mean square error log-spectral amplitude (MMSE-LSA) speech enhancement is adopted to suppress noise from noisy speech. Nevertheless this enhancement is not perfect and the enhanced speech retains signal distortion and residual noise which will affect the performance of recognition systems. Thus, at back-end, the MVA feature postprocessing is used to deal with the remaining mismatch between enhanced speech and clean speech. We have evaluated recognition performance under noisy environments using NOISEX-92 database and recorded speech signals in continuous speech recognition task. Experimental results show that our approach exhibits considerable improvements in the degraded environment.

1 Introduction In the past decade, the performance of automatic speech recognition (ASR) has been significantly improved. More and more ASR systems are being deployed in many applications. In many situations, these speech recognition systems must be operated in some adverse environments, where ambient noise becomes the major hurdle to achieve high-accuracy recognition performance. Various methods have been proposed to improve environmental robustness of ASR, which can be broadly classified into three categories [1]. Firstly, an increase of the noise robustness can be achieved by extracting speech features that are inherently less distorted by noise. Cepstral mean normalization (CMN) [2], with the merits of inexpensive computation and good recognition performance, can remove the mean cepstrum from all vectors with the cepstral mean calculated separately from each sentence. Second approaches, such as parallel model combination (PMC) [3], adapt the acoustic models in the recognizer to the changing noise conditions. In the third category, speech features are enhanced before they are fed into the recognizer. This can be achieved either prior to the feature extraction, like speech enhancement [4] [5], or by incorporating extra compensating steps into the feature extraction module, like feature compensation [6]. Such an enhancing step is largely independent of the vocabulary size of the recognizer and also does not require an adaptation of the recognition software. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 773 – 778, 2006. © Springer-Verlag Berlin Heidelberg 2006

774

J. Lei et al.

In our research we focus on the method combining speech enhancement with feature post-processing to improve the robustness of speech recognition systems. In [7] [8], spectral subtraction (SS) and Wiener filter were tested for combining with feature post-processing to improve the robustness of speech recognition systems. Minimum mean square error log-spectral amplitude (MMSE-LSA) [9] estimator is superior to SS and Wiener filter, and results in a much lower residual noise level without further affecting the speech itself. Thus, we adopted the MMSE-LSA enhancement approach in front-end to suppress the corrupted additive noise. The residual noise after MMSELSA processing can be regarded as additive and stationary noise approximately, which ensures that some simplified feature post-processing can be used at back-end. MVA [10] feature post-processing has shown great success in smoothing the speech parameterization, so we use it to deal with the remaining mismatch between enhanced speech and clean speech. Experiments show that our approach achieves significant improvements in the system performance. The remainder of this paper is organized as follows. The next section describes the MMSE-LSA algorithm. Section 3 shows the procedures of MVA feature postprocessing. The experiments and the results are given in section 4 and some conclusions are drawn in section 5.

2 MMSE-LSA Speech Enhancement The speech signal and noise process are assumed statistically independent, and spectral components of each of these two processes are assumed zero mean statistically independent Gaussian random variables. Let X k = Ak e jα k , Dk , and Yk = Rk e jν k , denote the k-th Fourier expansion coefficient of the speech signal, the noise process, and the noisy observations, respectively, in the analysis interval [0 ,T ] . Let

{ } and λ (k ) = E{D } denote, respectively, the variances of the clean

λ x (k ) = E X k

and noisy spectral components. MMSE-LSA aims at producing an estimate of Ak whose logarithm is as close as possible to the logarithm of Ak in the MMSE sense. Under the above assumptions on speech and noise, this perceptually motivated criterion results in the estimator given by −t ˆ = ε k exp§¨ 1 ∞ e dt ·¸ R A k ³ ¨ 2 νk t ¸ k 1+ εk © ¹

(1)

εk Rk2 λ (k ) γk ; εk = x ; γk = . 1+ εk λ d (k ) λ d (k )

(2)

where vk =

ε k and γ k are the a priori and a posteriori signal-to-noise ration(SNR), respectively. In order to reduce computational complexity, the exponential integral in (1) may be evaluated using the functional approximation below instead of iterative solutions or tables [11]. Thus, to approximate

Combining Speech Enhancement with Feature Post-processing

ei(v ) =

∞

³v

e−x dx x

775

(3)

with

− 2.31log10 (v ) − 0.6, for v < 0.1, ° ~ e i(v ) ≈ ®-1.544log10 (v ) + 0.166, for 0.1 ≤ v ≤ 1, ° 10 − 0.52 v −0.26 , for v > 1. ¯

(4)

3 MVA Feature Post-processing After MMSE-LSA speech enhancement, a great reduction of the mismatch between noisy speech and clean speech is obtained. Nevertheless this enhancement is not perfect and the enhanced speech retains residual noise which will affect the performance of recognition systems. MVA post-processing has been demonstrated as an effective technique to smoothing the speech parameterization in [10]. Thus, we use it to deal with the residual mismatch between enhanced speech and clean speech after the speech enhancement. The MVA post-processing is quite similar to certain schemes well known to the community (namely variance normalization and mean subtraction). The crucial difference lies in the domain in which the post-processing is applied. In this work, we apply MVA post-processing after delta and delta-delta feature processing. Once speech enhancement is applied to reduce the mismatch in the spectral domain, the standard mel-frequency cepstral coefficients (MFCC), c1…c12, and the log energy along with their delta and delta-delta coefficients are computed and processed by MVA. For a given utterance, we represent the data by a matrix C whose element Ctd is the dth component of the feature vector at time t, t = 1…T, the number of frames in the utterance and d = 1…D, the dimension of the feature space. The first step is mean subtraction defined by: Ctd′ = C td − μ d

(5)

where

μd =

1 T

¦ Ctd

(6)

t =1

This is followed by variance normalization defined by: Ctd =

C td′

σd

C td − μ d (7)

σd

where

σd =

1 T

¦ (C td t =1

− μd )

(8)

776

J. Lei et al.

The third step is mixed auto-regression moving average filter, defined by: ¦ M C~(t −i )d + ¦ M C (t + j )d j =0 ~ ° i =1 if M < t ≤ T − M C td = ® 2M + 1 °C otherwise ¯ td

(9)

where M is the order of the auto-regression moving average filter.

4 Experiments A continuous HMM-based speech recognition system is used in the recognition experiments for examining the presented approach. The database we used in these experiments is selected from the mandarin Chinese corpus provided by the 863 plan (China High-Tech Development Plan) [12]. The training set for train HMMs includes utterances of 96 (48 males and 48 females) speakers. The test set includes utterances of 10 (5 males and 5 females) speakers. The white, f-16 and factory1 noise from NOISEX-92 [13] are added to the test set by varying the signal-to-noise ratio (SNR) from 0dB to 20dB. The acoustic modeling is based on a set of 61 phones. Each phone is modeled as a three-emitting-state left-right topology with a mixture of 8 Gaussians per state and diagonal covariance matrices. Then triphone-based acoustic models are used in this continuous speech recognition task. In order to test the validity of the approach, four experiments are done: the baseline without noise reduction, enhanced with MMSE-LSA, feature post-processing by MVA and MMSE-LSA combined with MVA feature post-processing. They are respectively titled as “Baseline”, “LSA”, “MVA” and “LSA-MVA” in Table 1-3. Tests with white noise, showed in Table 1, demonstrate that “LSA”, “MVA” and “LSA-MVA” can be more effective than “Baseline”, especially in the low SNR conditions. For example, in the 10dB white noise condition, it is observed that “LSA”, “MVA” and “LSA-MVA”, respectively, achieve absolutely 44.09%, 21.32% and 45.84% improvement when compared with “Baseline”. As a whole, “LSA”, “MVA” and “LSA-MVA” achieve averagely 22.77%, 14.09% and 25.35% improvement, respectively. Further, the performances of all approaches in clean environments are also well maintained between 81% and 84%, our method almost doesn’t affect the performance of system in clean environments. To investigate the effectiveness of the approach in the nonstationary environments, we test the f-16 noisy environments and the factory1 noisy environments with different SNRs. It is observed from Table 2 and Table 3 that the approach can also effective for improving the system performance. In the f-16 noisy environments, “LSA”, “MVA” and “LSA-MVA”, respectively, achieve 13.67%, 5.01% and 15.30% improvements compared with “Baseline”. In the factory1 noisy environments, they averagely achieve 6.73%, 2.43% and 7.81% improvements, respectively. From another point of view, we observe from Table 1-3 that “LSA-MVA” is more effective than “LSA” and averagely obtain 2.58% improvement for the white noise, 1.63% for the f-16 noise, and 1.08% for the factory1 noise, respectively. The experimental results demonstrate that the MVA feature post-processing can decrease the residual mismatch between enhanced speech and clean speech after MMSE-LSA speech enhancement.

Combining Speech Enhancement with Feature Post-processing

777

Table 1. Recognition rates (%) for additive white noise

SNR

0dB

5dB

10dB

15dB

20dB

Clean

Avg.

Baseline

0.15

1.61

8.32

20.73

48.47

83.61

27.15

LSA

8.47

25.51

52.41

61.17

70.66

81.27

49.92

MVA

5.99

14.16

29.64

50.07

64.67

82.88

41.24

LSA-MVA

9.78

26.86

54.16

66.60

76.20

81.42

52.50

Table 2. Recognition rates (%) for additive f-16 noise

SNR

0dB

5dB

10dB

15dB

20dB

Clean

Avg.

Baseline

2.77

13.28

33.87

62.04

73.58

83.61

44.86

LSA

16.80

42.18

62.48

71.53

76.93

81.27

58.53

MVA

12.41

27.01

43.94

61.75

71.24

82.88

49.87

LSA-MVA

17.08

43.21

67.30

73.58

78.39

81.42

60.16

Table 3. Recognition rates (%) for additive factory1 noise

SNR

0dB

5dB

10dB

15dB

20dB

Clean

Avg.

Baseline

2.04

11.39

37.96

62.19

74.89

83.61

45.35

LSA

9.60

29.62

54.74

66.28

70.95

81.27

52.08

MVA

8.18

19.85

41.75

61.61

72.41

82.88

47.78

LSA-MVA

9.64

29.34

55.91

68.18

74.45

81.42

53.16

5 Conclusions This paper describes a method that combining MMSE-LSA speech enhancement with MVA feature post-processing for robust speech recognition. The MMSE-LSA speech enhancement is adopted to suppress noise from noisy speech. The residual noise after MMSE-LSA processing can be regarded as additive and stationary noise approximately, which ensures that some simplified feature post-processing can be used at back-end. Thus, the MVA feature post-processing is used to deal with the remaining mismatch. Experimental results show that our method exhibits considerable improvements under noise conditions and MVA feature post-processing can further increase the performance of the system after speech enhancement.

778

J. Lei et al.

Acknowledgements This research is partially supported by NSFC (National Natural Science Foundation of China) under Grant No.60475007, Key Project of Chinese Ministry of Education under Grant No.02029, and the Foundation of Chinese Ministry of Education for Century Spanning Talent.

References 1. Gong, Y.: Speech Recognition in Noisy Environments: A Survey. Speech Communication, Vol. 16, No. 3 (1995) 261–291 2. Atal, B. S.: Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification. Journal of the Acoustical Society of America, Vol. 55, No. 6 (1974) 1304-1312 3. Gales, M. J. F., Young, S. J.: Robust Continuous Speech Recognition Using Parallel Model Combination. IEEE Transactions on Speech and Audio Processing, Vol. 4, No. 5, (1996) 352-359 4. Ephraim, Y., Lev-Ari, H., Roberts, W. J. J.: A Brief Survey of Speech Enhancement. The Electronic Handbook, CRC Press, http://ece.gmu.edu/~yephraim/ephraim.html (2005) 5. Ephraim, Y., Cohen, I.: Recent Advancements in Speech Enhancement. The Electrical Engineering Handbook, CRC Press, http://ece.gmu.edu/~yephraim/ephraim.html (2005) 6. Moreno, P. J., Raj, B., Stern, R. M.: A Vector Taylor Series Approach for EnvironmentIndependent Speech Recognition. Proceedings of ICASSP’96 (1996) 733-736 7. Segura, J. C., Benitez, C., de la Torre, A., Rubio, A. J.: Feature Extraction Combining Spectral Noise Reduction and Cepstral Histogram Equalization for Robust ASR. Proceedings of ICSLP’02 (2002) 225-228 8. Segura, J. C., Ramirez, J., Benitez, C. de la Torre, A., Rubio, A. J.: Improved Feature Extraction Based on Spectral Noise Reduction and Nonlinear Feature Normalization. Proceedings of EUROPSPEECH’03 (2003) 353-356 9. Ephraim, Y., Malah, D.: Speech Enhancement Using a Minimum Mean Square Error LogSpectral Amplitude Estimator. IEEE Transactions on Speech, Signal Processing, Vol. 33, No. 2 (1985) 443-445 10. Chen, C. P., Bilmes, J., Ellis, D. P. W.: Speech Feature Smoothing for Robust ASR. Proceedings of ICASSP’05 (2005) 525-528 11. Martin, R., Malah, D., Cox, R. V., Accardi, A. J.: A Noise Reduction Preprocessor for Mobile Voice Communication. EURASIP Journal on Applied Signal Processing, Vol. 8 (2004) 1046-1058 12. Zu, Y. Q.: Issues in the Scientific Design of the Continuous Speech Database. http://www.cass.net.cn/chinese/s18_yys/yuyin/ report/report_1998.htm 13. Varga, A. and Steeneken, H. J. M.: Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems. Speech Communication, vol. 12, no. 3, (1993)247-251

Conic Section Function Neural Networks for Sonar Target Classification and Performance Evaluation Using ROC Analysis Burcu Erkmen and Tulay Yildirim Yildiz Technical University, Department of Electronics and Communications Engineering 34349 Besiktas, Istanbul-Turkey {bkapan,tulay}@yildiz.edu.tr

Abstract. The remote detection of undersea mines in shallow waters using active sonar is a crucial subject required to maintain the security of important harbors and cost line areas. Neural network classifiers have been widely used in classification of complex sonar signals due to its adaptive and parallel processing ability. In this paper, Conic Section Function Neural Networks (CSFNN) is used to solve the problem of classification underwater targets. Simulation results support the ability of CSFNN with computational advantages of traditional neural network structures to utilize highly complex sonar classification problem. Receiver Operating Characteristic (ROC) analysis has been applied to the neural classifier to evaluate the sensitivity and specificity of diagnostic procedures. The ROC curve of the classifier based on different threshold settings demonstrated excellent classification performance of the CSFNN classifier.

1 Introduction Automatic identification and classification of underwater signals on the basis of sonar signals are challenging problem due to the complexity of the ocean environment. Identification by human experts is usually not objective and a very heavy workload. Neural Networks with their adaptive and computational advantages over the traditional signal processing and pattern recognition techniques appear to be ideally suited to active sonar classification. The pioneer papers by Gorman and Sejnowski [1], [2] were perhaps the first papers which reported the application of neural networks to this area. After these pioneer papers, there has been growing interest in the use of networks for the automatic recognition of sonar targets. Among the several neural classifier, Multi-layer Perceptron (MLP) classifier have been used in several applications about sonar target classification [1]-[7]. Radial Basis Function Networks (RBFN) [3, 8], General Regression Neural Networks [9], and the Probabilistic Neural Networks (PNN) [3] have been also the efficient Feed-Forward Neural Networks widely used to classify sonar signal in literature. The Receiver Operating Characteristic (ROC) analysis [10] is an efficient method of measuring and comparing diagnostic performance of medical and sonar studies [5, 6, 11, 12]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 779 – 784, 2006. © Springer-Verlag Berlin Heidelberg 2006

780

B. Erkmen and T. Yildirim

In this paper, Conic Section Function Neural Networks (CSFNN) has been used to classify sonar returns from two different targets on the sandy ocean bottom a mine and cylindrical shaped rock. The idea brought forward by Dorffner [13] is to generalize the function of a unit to include all these decision regions in only one network, providing a relationship between an MLP unit and an RBFN unit. To evaluate the performance of neural classifier, the ROC curve has been obtained based on different threshold settings. This paper organized as follows. Section 2 briefly describes the CSFNN structure. Section 3 presents some concepts related to ROC analysis. In Section 4, the neural classifier designs, simulation results and performance evaluation with ROC analysis are presented. Finally, Section 5 outlines some conclusions.

2 Conic Section Function Neural Networks The CSFNN is capable of making automatic decisions depending on the data distribution of a given application. The decision boundaries of MLP (hyperplane) and of RBF (hypersphere) are the special cases of CSFNN. There would be intermediate types of decision boundaries such as ellipses, hyperbolas or parabolas in between two cases which are also all valid for decision regions. The propagation rule of CSFNN (when will consist of RBF and MLP propagation rules) can be derived using analytical equations for a cone. The following form (Eq.1) is obtained for n-dimensional input space. F

(x) =

i =1

− c ) w − cos ω ij ij j

N 2 ¦ ( x pi − c ij ) i =1

(1)

Where xpi refers to input vector for p. pattern, wij refers to the weights for each connection between the input and hidden layer, cij refers to center coordinates and Ȧij refers to opening angles. i and j are the indices referring to the units in the input and hidden layer, respectively. This equation consists of two major parts analogous to the MLP and the RBF. The equation simply turns into the propagation rule of an MLP network, which is the dot product (weighted sum) when the Ȧ is ʌ/2. Second part of the equation gives the Euclidean distance between the inputs and the centers for an RBF network. The network can be started with an initialized MLP or with an RBF depending on the opening angels [14].

3 Receiver Operating Characteristic Analysis ROC analysis is an established method of measuring diagnostic performance for the analysis of radar images. The ROC curve is a good measure when performance of different schemes needs to be compared. The evaluation criteria are based on the ROC curve, used in sonar target classification system to indicate trade-off of the conditional probability of correct classification versus conditional probability of falsealarm responses. Equivalently, the ROC curve is a graphical representation of the trade-off between sensitivity (Sn) and Specificity (Sp). Sensitivity and specificity are the basic expressions (Eq.2. and Eq.3.) for the diagnostic test interpretation of the ROC analysis.

CSFNN for Sonar Target Classification and Performance Evaluation

781

sensitivit y =

number of true positives number of true positives + number of false negatives

(2)

specificity =

number of true negatives number of true negatives + number of false positives

(3)

A simple calculation of sensitivity and specificity for both classifiers are given above. Furthermore, a simple estimation of the accuracy of CSFNN classifier, which is defined by means of the dreal (Eq 4) distance of a real classifier from ideal one (dideal=0): d real =

(1 − sensitivit y )2 + (1 − specificit y )2

(4)

If sensitivity (x-axis) against specificity (y-axis) for each classifier had been plotted it could have been treated dreal as the Euclidian distance of a (1-sensitivity, 1specifity) point from the top right (1,1) corner, which represented the ideal classifier. The smaller the distance, the better the classifier is. The using of artificial neural networks as a classifier in sonar applications, the operating points for the ROC curve can be generated by varying threshold value for the output node. The performance of diagnostic test is satisfied when ROC curve climbs rapidly towards upper left hand corner of the graph. On the other hand, unsatisfied performance is obtained when the ROC curve follows a diagonal path from the lower left hand corner to the upper right hand corner.

4 Simulation Results The aim of this study is to employ Conic Section Neural Networks to classify sonar returns. The dataset, which is the original sonar data used by Gorman and Sejnowski [1, 2], was taken from the University of California collection of machine-learningdatabases. Although this dataset is an old one, it is a well-known benchmark for sonar problems. This dataset consists of sonar returns collected from two sources: a metal cylinder and similarly shaped rock. Both objects were lying on a sand surface, and the sonar chirp projected at them from different angles (aspect-angles) produced the variation in the data. The data set consists of 208 returns (111 cylinder shaped mine returns and 97 rock returns) which are sorted in increasing order of aspect-angle. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The entire data set was split into randomly train and test sets (104 samples for training file, 104 samples for test file). The data was filtered and spectral envelope of 60 samples (the inputs or dimensionality of our data) was extracted. The network (CSFNN) used here was based on a fully-connected feed-forward neural network composed of 60 input nodes and an output node. The network properties of the sonar classification system are shown in Table 1. As stated before, CSFNN is capable of combining decision region of MLP and RBFN only one network. Therefore, MLP and RBFN are also used to classify the same dataset to compare classification performance with CSFNN. Classification process was performed by using MATLAB 7.0.

782

B. Erkmen and T. Yildirim Table 1. Network properties of sonar classification neural system

Properties Learning Algorithm Number of Input Nodes Neuron Number of Hidden Layer Neuron Number of Output Layer Learning Rate Momentum Rate Activation Function Training Method Bias Weights Epoch Number Initial Weights Initialization of CSFNN

Values-Methods Back Propagation 60 26 1 0.1 0.8 Logarithmic Sigmoid Continuous Training Used 1000 Random RBF case (Ȧ=ʌ/4)

Table 2. Classification rate (%) comparisons for MLP, RBF and CSFNN

MLP RBF CSFNN

CLASSES “mine” “rock” class. rates (%) class. rates (%) Train Test Train Test 100 85.5 100 88 100 85.5 100 95.24 100 88.71 100 97.62

OVERALL Total class. rates (%) Train Test 100 86.5 100 89.42 100 92.3

Table 2 shows the classification rate of three types of neural classifiers for comparison. MLP have two hidden layers (30-10 neurons) training with backpropagation algorithm using m=0.8 momentum parameter and lr=0.1 learning rate. RBF has used an Orthogonal Least Squares algorithm for selecting and optimally locating centers. In this experiment, RBFN has 70 hidden neurons referring to RBFN centers. As can be seen from Table 2, the training result of CSFNN was reached to 100% success rate same as other networks. However, in testing phase, CSFNN showed a much better performance of classification performance with 92.3% success rate. These results show that CSFNN is an efficient neural network in sonar classification problem. Finally, ROC analysis has been applied to test results of CSNN classifier to evaluate the classification performance. The evaluation criteria of ROC analysis in sonar detection indicate the trade-off of probability of true detection versus probability of false detection. Table 3 is employed for these calculations. Table 3 labeled with classification results on the left side and mine absent/present status on the top. Table 3. Diagnostic test interpretation table

The Result of the Classifier “Mine” “Rock”

Mine Present True Positives (TP) False Negatives (TP)

Rock Present False Positives (FP) True Negatives (TN)

CSFNN for Sonar Target Classification and Performance Evaluation

783

The ROC curve is a graphical representation of the trade-off between sensitivity (Sn) and Specificity (Sp). ROC curve is plotted using Table 4 for the diagnostic test interpretation. The operating points for the ROC curve can be generated by varying threshold value for the output node. dreal is also calculated to estimate the accuracy of the CSFNN classifier. Table 4. Evaluation of the CSFNN classifier performance for each threshold setting

Threshold Values 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Sensitivity

62 60 59 55 55 55 55 47 42 31 0

42 19 11 4 1 1 1 1 0 0 0

0 2 3 7 7 7 7 15 20 31 62

0 23 31 38 41 41 41 41 42 42 42

1 0.9677 0.9516 0.8871 0.8871 0.8871 0.8871 0.7581 0.6774 0.5 0

1-Specifity dreal 1 0.4524 0.2619 0.0952 0.0238 0.0238 0.0238 0.0238 0 0 0

1 0.45 0.266 0.022 0.013 0.013 0.013 0.243 0.323 0.5 1

Test Rates(%) 59.61 79.81 86.53 89.42 92.3 92.3 92.3 84.61 80.76 70.19 40.38

As can be observed from Figure 1, ROC curve climbs rapidly towards upper left hand corner of the graph. This proves that, CSFNN classifier provides excellent classification performance for sonar returns.

Fig. 1. Receiver Operating Characteristic (ROC) Curve for CSFNN

5 Conclusions The general objective of this work was to classify underwater target collected from two sources: a metal cylinder and similarly shaped rock using a CSFNN. When the performance of CSFNN is compared with traditional neural classifiers (MLP and

784

B. Erkmen and T. Yildirim

RBF) in terms of successful classification rates, classification results prove the success of CSFNN classifier. ROC curve is also demonstrated the excellent classification performance of CSFNN for sonar classification problem. This neural classifier will be useful for sonar researchers.

References 1. Gorman, R. P., Sejnowski, T. J.: Learned Classification of Sonar Targets Using a Massively Parallel Network. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 36, No.7, (1988) 1135-1140 2. Gorman, R. P., Sejnowski, T. J.:. Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets. Neural Networks, Vol 1, No 1, (1988) 75-89 3. Chen, C.H.: Neural Networks for Active Sonar Classification. Pattern Recognition, Vol.II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on , (1992) 438 – 440 4. Diep, D., Johannet, A., Bonnefoy, P.; Harroy, F.; and Loiseau, P.:Classification of Sonar Data for a Mobile Robot Using Neural Networks. Intelligence and Systems, Proceedings., IEEE International Joint Symposia on, (1998) 257 – 260 5. Haley, T.B.: Applying Neural Networks to Automatic Active Sonar Classification. Pattern Recognition, Proceedings., 10th International Conference on, Vol: 2, 41 – 44 6. Shazeer, D.J., Bello, M.G.: Minehunting with Multi-layer Perceptrons. (1991) Neural Networks for Ocean Engineering, 1991, IEEE Conference on, (1990) 57 – 68 7. Jing, Y.Y., El-Hawary, F.: A Multilayered ANN Architecture for Underwater Target Tracking. Electrical and Computer Engineering, Conference Proceedings. 1994 Canadian Conference on, vol.2, (1994) 785 - 788 8. Yegnanarayana, B., Chouhan, H.M., Chandra Sekhar, C.: Sonar Target Recognition Using Radial Basis Function Networks. Singapore ICCS/ISITA '92. ‘Communications on the Move', vol.1, (1992) 395 - 399 9. Kapano÷lu, B., Yıldırım, T.: Generalized Regression Neural Networks For Underwater Target Classification. Neu-Cee 2nd International symposium on Electrical and Computer Engineering, Nicosia, North Cyprus, (2004) 223-225 10. Woods, K.S., Bowyer, K.W.: Generating ROC Curves for Artificial Neural Networks. Computer-Based Medical Systems, Proceedings IEEE Seventh Symposium on , (1994) 201 - 206 11. Azimi Sadjadi, M.R., Yao, D., Huang, Q., Dobeck, G.J.: Underwater Target Classification Using Wavelet Packets and Neural Networks. IEEE Transaction on Neural Networks, Vol 11., No. 3., May (2000) 12. Ward, M.K., Stevenson, M.: Sonar Signal Detection and Classification Using Artificial Neural Network. Electrical and Computer Engineering, 2000 Canadian Conference on Vol. 2, vol.2, March (2000) 717 - 721 13. Dorffner, G.: Unified Framework for MLPs and RBFNs: Introducing Conic Section Function Networks. Cybernetics and Systems”, Vol.25, (1994) 511-554 14. Yıldırım, T.: Development of Conic Section Function Neural Networks in Software and Analogue Hardware. Phd Thesis, University of Liverpool, May (1997)

3D Map Building for Mobile Robots Using a 3D Laser Range Finder Zhiyu Xiang and Wenhui Zhou Dept. of Information & Electronic Engineering, Zhejiang University 310027 Hangzhou, P. R. China {xiangzy, zhouwh}@zju.edu.cn

Abstract. 3D map building for mobile robots under cluttered indoor environments remains a challenge. The problem lies in two aspects: map consistency and computational complexity. A novel method to finish such a task is proposed. The system employs a special designed 3D laser range finder, which is built on the basis of a 2D laser scanner, as the environment-perceiving sensor. The registration of 3D range images is realized by localization on a so-called ceiling map, which is the set of intersecting lines between the ceiling and the walls in the room. By matching the calculated local ceiling map with the global one, accurate sampling positions without suffering from accumulative errors can be obtained. A consistent 3D map can then be built on the basis of it. The experimental results demonstrate our success.

1 Introduction Precise 3D map of environments are greatly valuable for the automation in many important areas [1]. The increasing need for rapid characterization and quantification of complex environments has created challenges for data analysis. The challenge lies in several aspects: a) lacking of good, cheap and fast sensors that allow the robots to sense the environment in real time; b) the algorithms to generate a consistent 3D map with low computational cost. A widely used sensor for 3D perception is laser range finder (or LADAR). Lots of localization and 2D map building algorithms (SLAM) have been developed [2] based on 2D laser range finder. However 2D laser range finer can only scan on a fixed plane acquiring limited range information. To extend it into 3D applications, Zhao et al. [3] acquired the 3D information by using two 2D range finders mounted horizontally and vertically on the mobile robots. 3D laser range finer is a good choice for map building. However, today’s commercial 3D laser rang finders are large and heavy, built for stationary use mainly. For consistent map building, registration of different views acquired by robot is the key problem. Pair wise matching such as standard ICP [4] and incremental matching [5] suffers error-accumulating results. Meanwhile most of the algorithms have heavy computational costs and are difficult to implement in real time. In this paper a novel consistent 3D map building method is proposed. The system employs a special designed 3D laser range finder, which is built on the basis of a 2D D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 785 – 790, 2006. © Springer-Verlag Berlin Heidelberg 2006

786

Z. Xiang and W. Zhou

laser scanner, as the environment-perceiving sensor. The registration of 3D range images is realized by localization on a so-called ceiling map, which is the set of intersecting lines between the ceiling and the walls in the room. Using of ceiling map has three advantages: a) a priori global ceiling map is easily available either from the design drawings or by manual measurement; b) it is only 2-dimensional, meaning computational efficiency in localization; c) due to its height it is relatively immune to the obstruction under cluttered environments. By matching the calculated local ceiling map with the global one, sampling positions without suffering from error accumulation can be obtained. A consistent 3D map can then be built on the basis of it. Furthermore to reduce the cost of computation, the map fusion process is based on the planar surfaces instead of raw 3D points. The whole algorithm is very computational effective and can be implemented in real time. The paper is organized as follows: Section 2 introduces the design of 3D laser range finder briefly. Section 3 presents the algorithms for surface detection and local ceiling map generation. In section 4 3D map building algorithms are described. Section 5 presents the experimental results and the conclusion is given in the last section.

2 3D LADAR System The system is composed of three components: a) 2D LADAR; b) mechanical scanning equipment; c) control and data acquiring unit. The mechanical scanning equipment includes a supporting base and an axle driven by a step motor. The task of the control unit is: a) receiving the command from the host computer; b) sending the current pitching angle to the host computer. The host computer saves the high-speed LADAR ranging data through RS-422 and pitching angle from the control unit simultaneously. The coordinates of the 3D sampling points in LADAR coordinate system can be obtained from the equation (1): x = ρ cos α ° ® y = ρ sin α cos θ ° z = ρ sin α sin θ ¯

(1)

where ρ is the ranging data, α and θ represents the horizontal and vertical scanning angle, ( x, y, z ) represents the coordinates of the 3D points.

3 Local Surface and Ceiling Map Generation 3.1 Local Surface Map Building Local surface map building is carried out on an on-line scheme to accelerate the algorithm. It consists of two major steps: line detection and surface generation. Line Detection. Upon arriving of each 2D scan from the 3D LADAR, first the points whose range beyond a certain threshold are discarded to reduce the noise. Then the

3D Map Building for Mobile Robots Using a 3D Laser Range Finder

787

neighboring points whose range differences are small enough are clustered into clouds. Within each cloud a recursive splitting process is carried out to sub-divide the clouds into small segments within which all of the points can be fit by one line. This step is carried out on local 2D scanning coordinates. Surface Generation. After line detection is done the data is converted into 3D. Based on the detected lines, surface generation algorithm detects the surfaces in the 3-dimensional scene. The algorithm precedes the following steps [1]: 0. The first set of lines coming from the very first 2D scan is stored; 1. Every other line is being checked with the set of stored lines. If a matching line is found, these two lines are transformed into a surface; 2. If no such matching line exists, the line may be an extension of an already found surface. In this case, the new line is matching with the top line of a surface. This top line is being replaced by the new line, resulting in an enlarged surface; 3. Otherwise the line is stored as a stand-alone line in the set mentioned above. Two main criteria have to be fulfilled in order to match lines: a) the angle between the two lines has to be small enough; b) the distance from any end points to the other line must be small enough. To achieve real time capabilities, the algorithm makes use of the characteristics of the data as it comes from the range finder, i.e., the lines are sorted throughout the whole scene due to their inherited order. Thus an efficient local search can be realized. Fig. 1 shows an example of the detected planar surfaces from a set of 3D data. 3.2 Ceiling Map Generation Under cluttered environments usually the 2D map on the height level of the LADAR is very trivial, leading difficulties when use it for range image registration. Fig. 2 (a) shows such a cluttered 2D map on the height of z = 0. However, since few things exist on the height of the ceiling, a clear and relatively complete 2D ceiling map can be obtained. The ceiling map can be easily obtained by intersecting a planar plane close to the ceiling and parallel to the ground to all of the detected surfaces. Fig. 2 (b) shows a ceiling map example using the same 3D data shown in Fig 2 (a). The quality of the ceiling map is represented by the total length of the line segments within the map. We vary the height of the intersection plane to generate several ceiling maps and only select the best one as the final result.

4 Consistent 3D Map Building With a global and a local ceiling map in hand, the determination of the robot’s pose becomes a common 2D localization problem. We adopt a scan matching algorithm based on Extended Kalman Filter (EKF) to finish this task. The detailed algorithm can be found in lots of literatures such as [2].

788

Z. Xiang and W. Zhou

Fig. 1. An example of the detected surfaces from 3D data

(a)

(b)

Fig. 2. (a) An example of the 2D map when z=0, where lots of small segments from cluttered environment are displayed. (b) An example of the local ceiling map obtained using z = 1900 mm, where profiles of the room are clearly shown.

Since all of the local ceiling maps have been matched to the same global ceiling map, the resulting positions will have the same reference. Therefore a consistent 3D map can be built on the basis of the correct pose of the robot. The 3D map integrating process includes four steps: a) filtering out the small surfaces; b) coordinate transformation; c) matching between the local and global map; d) map fusion. Since the environment is cluttered, lots of small surfaces will exist in the local map. They have little meaning in visualized 3D model and will increase the computational cost of the following steps. Two criteria are used to filter out the small surfaces: a) The number of scan lines within the surface must be larger than a threshold; b) The average length of the scan lines within the surface must exceed a threshold. After the first step, the number of the surfaces can be reduced to about a half, leaving only relatively large surfaces in the local 3D map. In the second step, coordinates of all of the surfaces in the local map have to be transformed into global coordinate according to the calculated robot position. Deciding whether a surface in the local 3D map is

3D Map Building for Mobile Robots Using a 3D Laser Range Finder

789

matching with another surface in the constructing 3D map depends on some criteria: (a) The angle and the distance between the two surfaces should be small enough; (b) Two surfaces should be partially overlapped. In the last map fusion step, new plane parameters are calculated given the vertices of the both 3D polygons. Then each polygon is projected from the 3D coordinates space into a 2D coordinate space of the obtained plane. Finally the polygons are merged. The algorithm from Vatti [6] is used for the polygon merging process.

5 Experiment Fig. 3 shows the obtained trajectory of the robot during the experiment. The bounding rectangle represents the global ceiling map of the environment. The robot started from position A and ended on the position B. The small triangles represent the sampling positions and the direction of the robot on the trajectory. The localization results were satisfying, where the covariance of the positioning error during the whole process was less than 1 cm over all x-y-z components.

Fig. 3. Trajectory of the robot calculated by the ceiling map matching algorithm

Fig. 4. 3D planar surface map obtained from the experiment

790

Z. Xiang and W. Zhou

Fig. 4 shows the integrated 3D planar surface map of the environment. The map had the size of about 3200 mm in x direction, 5000 mm in y direction and 3300 mm in Z direction respectively. For the reason of clarity, the ceiling was removed from the map to give a good view to the internal walls and floors. Some parts of the walls and the floor were blank because of the obstruction by the objects. The map was globally consistent by noticing the accurate size and the correct positions of the walls within it.

6 Conclusions A consistent 3D map building method for mobile robots under indoor environments is proposed. Matching the local ceiling map to the global one solves the requirement of consistency. The matching is only in 2D and therefore is computationally efficient. Furthermore using of the ceiling map enables the robot to deal with cluttered environments very well since it is almost unaffected by the obstruction. The experimental results demonstrated our success.

Acknowledgments The research was sponsored by China National Science Foundation under grant No. 60505017.

References 1. Surmann, H., Nuchter, A., Hertzbero, J.: An Autonomous Mobile Robot with a 3D Laser Range Finder for 3D Exploration and Digitalization of Indoor Environments. International Journal of Robotics and Autonomous Systems. Elsevier. 45(2) (2003) 181-198 2. Chou, H., Traonmilin, M., Ollivier, E., Parent, M.: A Simultaneous Localization and Mapping Algorithm based on Kalman Filtering. IEEE Intelligent Vehicles Symposium. (2004) 631-635 3. Zhao, H., Shibasaki, R.: Reconstruction Textured CAD Model of Urban Environment using Vehicle-borne Laser Range Scanners and Line Cameras. IEEE Proceedings of International workshop on Computer vision systems. (2001) 453-458 4. Best, P., Mckay, N.: A Method for Registration of 3D Shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (1992) 239-256 5. Chen, Y., Medoni, G.: Object Modeling by Registration of Multiple Range Images. Proceedings of the IEEE conference on Robotics and Automation. (1991) 2724-2729 6. Vatti, B.: A Generic Solution to Polygon Clipping. Communications of the ACM. (1992) 56-63

Construction of Fast and Robust N-FINDR Algorithm Liguo Wang1, 2, Xiuping Jia3, and Ye Zhang2 1

School of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China [emailprotected] 2 Dept. of Information Engineering, Harbin Institute of Technology, Harbin 150001, China {wlg74327, zhye}@hit.edu.cn 3 School of Electrical Engineering, University College, the University of New SouthWales, Australian Defence Force Academy, Campbell, ACT 2600, Australia [emailprotected]

Abstract. N-FINDR has been a popular algorithm of endmember (EM) extraction method for its fully automation and relative efficiency. Unfortunately, innumerable volume calculation leads to a low speed of the algorithm and so becomes a limitation to its applications. Additionally, the algorithm is vulnerable to outliers that widely exist in hyperspectral data. In this paper, distance measure is adopted in place of volume one to speed up the algorithm and outliers are effectively controlled to endow the algorithm with robustness. Experiments show the improved algorithm is very fast and robust.

1 Introduction In resent years, hyperspectral remote sensing has been applied in many fields and the processing techniques of hyperspectral images (HSI) have been developed greatly. In HSI, mixed pixels are a mixture of more than one distinct substance In these cases, the resulting spectrum will be composite of some spectra. One of the most important HSI processing techniques is spectral unmixing[1], which aims at analyzing of mixed pixels. Spectral unmixing is to decompose a mixed pixel into a collection of distinct endmembers(EMs) with a set of fractional abundances which indicate the proportion of each EM. As the basic step of spectral unmixing, EM extraction is crucial for computing fractional abundances accurately. In this case, a number of EM extraction methods [2] have been proposed over the past decade. N-FINDR algorithm [3] has been famous and popular in EM extraction methods for its fully automation and relative efficiency. The algorithm is essentially an automated technique for finding the purest pixels in an image. The convex nature of hyperspectral data allows this operation to be performed in a relatively straightforward manner. It is based on theory of convex geometry and conducted on a reduced dimensional data space provided by MNF transformation [4]. Now N-FINDR algorithm is widely used in Anomaly detection, efficient materials mapping, hyperspectral image compression, and hyperspectral image sharpening [5]. It is in commercial use and is available in ENVI - a powerful hyperspectral image analysis package [6]. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 791 – 796, 2006. © Springer-Verlag Berlin Heidelberg 2006

792

L. Wang, X. Jia, and Y. Zhang

Despite the algorithm has been successfully used in various applications and is improved by reference [7], innumerable volume calculation leads to a very low speed of the algorithm, and outliers usually bring undesirable interference to final extraction. To get a fast and robust N-FINDR algorithm, distance measure instead of volume measure is used to low down the computational burdon, and effective control of outliers is implemented to endow the algorithm with robustness.

2 Speeding Up of N-FINDR Algorithm For the limitation of space, original N-FINDR algorithm is omitted here. Let E be the matrix of ( N + 1) pixels s1 , s 2 , , s N +1 augmented with a row of ones. In original NFINDR algorithm, the volume V(E) of the simplex formed by the pixels is proportional to the determinant of E. The part just aims to reduce the complexity of volume calculation. For the purpose of visualization, two-dimension space is considered firstly. In fig. 1, A, B, and C are the vertices of a triangle and Vold is the area of the triangle. Let A0 be a point which is different from A, B, and C, and Vnew be the volume of the triangle formed by A0, B and C. Let again the line segments AD, A0D0 be the distances from A and A0 to the line segment BC. Whether A0 can take the place of A is determined by the sizes of areas of the two triangles. For comparison Vold and Vnew, original algorithm computes them directly. It should be noted that the goal is only to compare which volume is larger but not necessary to compute them quantificationally. From geometry theoretics, the following expression holds Vold/Vnew=AD/A0D0 .

(1)

l1 l

C D0

D B

A0 x Fig. 1. Equality of distance measure and volume one in N-FINDR algorithm

In other word, the distance comparison is equivalent to the volume comparison for renewing EMs. An intuitionistic illustration is given in fig. 1. Where l is the straightline formed by B and C, l1 is its parallel straight-line passing through vertices A, and l2 is the symmetrical line of l1 about line l. It can be seen that A0 can substitute for A if and only if A0 lies in the exterior of the region bounded by a pair of parallel straight-line l1 and l2.

Construction of Fast and Robust N-FINDR Algorithm

In N-dimension space, simplex with ( N + 1) vertices is considered. Let

793

s1 , s2 , ,

s N +1 are these vertices and s0 is another pixel different from them. Then distance comparison can be used to estimate whether pixel s0 taking the place of si (1 ≤ i ≤ N + 1) can result in an increase of volume. In this case, straight-line and area in twodimension space are corresponding to hyperplane and volume accordingly. In the space, the equation of the hyperplane formed by s1 , s2 , , si −1 , si +1 , , s N +1 can be written as N

¦ β i xi + c = 0 ,

(2)

i =1

where β = ( β 1 , β 2 , , β N ) T is the solution of the following equations:

[s1 , s 2 , , si−1 , si+1 , , s N +1 ]T ⋅ β + c ⋅ I N

=0 .

(3)

I N is a N × 1 vector of ones. c takes 0 if the original is linear correlation to s1 , s2 ,

, si−1 , si+1 , , s N +1 , and 1 otherwise. In practice, c has little chance to be evaluated as 0. Another, the normalization of distance form original point to hyperplane does not affect the analysis of the paper. Then the distance d ( s 0 ) from s0 to hyperplane (2) is proportional to the following expression:

d ( s0 ) = β T ⋅ s0 + c .

(4)

In N-FINDR algorithm, the times for constructing hyperplane, equaling to the times of EM renewing multiplied by the number of EMs, are much less than the times of distance calculation. So the computational cost of hyperplane construction can be omitted relatively. From (4), distance calculation includes just a dot product of two N × 1 vectors, of which the computational cost is linear to dimension N , while volume calculation needs to calculate a determinant of a ( N + 1) × ( N + 1) matrix, of which the computational cost is cubic to dimension N . In brief, original N-FINDR algorithm and its fast version are named O-N-FINDR and F-N-FINDR respectively.

3 Detecting and Removing Outliers Outliers have higher probability to be selected as EMs for their special spatial position, and so they can bring severe influence on iterative searching of EMs. Outliers are widely existed in HSI, and only one error extraction can lead to the breakdown of N-FINDR algorithm sometimes. In this case, the construction of robust N-FINDR algorithm is very necessary. If outliers can be excluded from hyperspectral data, the bad influence of them can be avoided. For that outliers usually exist in an isolated manner, local analysis is helpful to get the purpose. In concrete, the number of pixels in a local window can be counted for center pixel, and the number can be used as

794

L. Wang, X. Jia, and Y. Zhang

outlier index of the center pixel, the larger of the index, the more isolation extent, and so the higher probability to be selected as outlier. To low down the computational cost, square window can take the place of round one approximately. Here the robust N-FINDR algorithm is named R-N-FINDR briefly.

4 Experiments and Results To get a clear evaluation, F- N-FINDR and R- N-FINDR are respectively compared with O- N-FINDR. In the first group of experiments, 1000 points are generated by mixing of three points A(-15,0), B(15,0) and C(0,20) which are prescribed as EMs. The mixed points are uniformly distributed within the triangle spanned by A, B and C (solid triangle in fig.2). Independent random Gaussian noise with variance 1 is added to the 1000 points in both x and y directions. The two algorithms get the same final simplex (dotted triangle in fig.3) with different execution time. Table 1 gives the comparison for them of execution time (E-time in brief), the renewing times of EMs (EM-times in brief) and the computation times for volume/distance (V/D-times in brief). For a small N as 3, algorithm is speeded up by 7 times or so.

Fig. 2. Scatter plot of synthetic data used in exp. 1

In the second group of experiments, a large number of EMs as 10 is taken. In a 9dimensional space, 9 standard unit vectors and original point are prescribed as EMs. 10000 points are generated by linear combination of the EMs for the experiment. A detailed comparison of execution time is shown in table 2. In this case, improved algorithm gets a speeding up of more than 40 times compared with O-N-FINDR. In the first two group of experiments, the equal value of EM-times and V/D-times shows the equivalence of distance measure and volume one. In the third group of experiments, 3 consulting spectra in lab database are used to compose of 1000 spectra. Another 30 outliers are generated by imposing noise to the 3 consulting spectra. The first line in fig. 3 gives the comparison of consulting spectra

Construction of Fast and Robust N-FINDR Algorithm

795

and the results selected by different methods. It can be seen O-N-FINDR is severely interfered by outliers, while the results of R- N-FINDR is very close to consulting spectra. In the last group of experiments, 3 consulting spectra (soybean, grass and woods) are generated by averaging three classes of spectra (500 data of each class) that come from an agriculture/forestry landscape AVIRIS data acquired in Indian Pine Test Site. The second line in fig. 3 gives the performance comparison of the two methods implemented on the 1500 dataset. Again, robust algorithm gives closer results to the consulting spectra. It is known that ideal EM should correspond to class center but not simplex vertex, and so the robust algorithm can get more reasonable extraction regardless of existing of outliers. Table 1. Comparison of execution time/iteration times (exp. 1)

O-N-FINDR F-N-FINDR

E-time 2.9380 0.4210

EM-times 21 21

V/D-times 9504 9504

Table 2. Comparison of execution time/iteration times (exp. 2)

O-N-FINDR F-N-FINDR

a) Consulting EMs

E-time 961.6 22.75

EM-times 95 95

V/D-times 689843 689843

b) EMs selected by O-N-FINDR c) EMs selected by R-N-FINDR

Fig. 3. Comparison of selected EMs of different methods

796

L. Wang, X. Jia, and Y. Zhang

5 Conclusions In this paper, a fast and robust N-FINDR algorithm is constructed. In speediness measure, distance comparison takes the place of volume comparison, the larger the number of EMs, the higher the efficiency of the replacement. Another, constructing a good order for initialization and renewing of EMs is helpful of further speeding up of the algorithm. In robustness measure, outliers get effective control, and the control can lead to more reasonable extraction of EMs whether outlier exists or not. The algorithm must get more effective applications with the help of the improvements.

References 1. Keshava, N., Mustard, J. F.: Spectral Unmixing. Signal Processing Magazine, IEEE. 19 (2002) 44–57 2. Cipar, J. J., Eduardo, M., Edward, B.: A Comparison of End Member Extraction Techniques. Proceedings of SPIE - The International Society for Optical Engineering, 4725 (2002) 1-9 3. Winter, M. E.: N-FINDR: An Algorithm for Fast Autonomous Spectral End-member Determination in Hyperspectral Data. SPIE Imaging Spectrometry, 5 (1999) 266-275 4. Green, A., Berman, M., Switzer, P., Craig, M.: A Transformation for Ordering Multispectal Data in Terms of Image Quality with Iimplications for Noise Removal. IEEE Transactions on Geoscience and Remote Sensing, 26 (1988) 65-74 5. Winter, M. E., and Winter, Ed.: New Developments in the N-FINDR Algorithm. Presented at: IGARSS 2001 International Geoscience and Remote Sensing Symposium Sydney, Australia. http://www.higp.hawaii.edu/~winter/ 6. Xing, Y., Gomez, R.B.: Hyperspectral Image Analysis Using ENVI (Environment for Visualizing Images). Proc. of SPIE - The International Society for Optical Engineering, 4383 (2001) 79-86 7. Plaza, A. Chein-I Chang: An Improved N-FINDR Algorithm in Implementation. Proceedings of SPIE--Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XI, 5806 (2005) 298-306

Dental Plaque Quantiﬁcation Using Cellular Neural Network-Based Image Segmentation Jiayin Kang1 , Xiao Li2 , Qingxian Luan2 , Jinzhu Liu3 , and Lequan Min1,3 1

School of Information Engineering, University of Science and Technology Beijing, 100083 Beijing, P.R. China [emailprotected] 2 School of Stomatology, Peking University, 100081 Beijing, P.R. China [emailprotected], [emailprotected] 3 School of Applied Science, University of Science and Technology Beijing, 100083 Beijing, P.R. China {jinzhucn, lqmin}@sohu.com

Abstract. This paper presented an approach for quantifying the dental plaque automatically based on cellular neural network (CNN) associated with histogram analysis. The approach was applied to a clinical database consisting of 15 objects. The experimental results showed that this method provided accurate quantitative measurement of dental plaque compared with that of traditional manual measurement indices of the dental plaque.

Introduction

The detection of dental plaque is crucial for patients, their clinicians and also researchers. A number of dental plaque indices have been developed to overcome the diﬃculties of quantifying the presence of dental plaque. These indices have been devised to allow an easy and semi-quantitative assessment of the distribution of dental plaque and they vary in their approach to record the scores (The details can be found in references [1],[2]). However, most clinical scoring systems for plaque are subjective because measurements rely primarily on the clinician’s ability to demarcate or score areas of disclosed plaque via visual examination. This may lead to inter-operator errors. Image segmentation is the ﬁrst step in image analysis and pattern recognition. It is crucial and essential component of image analysis and/or pattern recognition system, is one of the most diﬃcult tasks in image processing, and determines the quality of the ﬁnal result of analysis [3]. To understand an image, one needs to isolate the objects in it and ﬁnd the relation among them. The process of objects separation is referred as image segmentation. In other words, segmentation is used to extract the meaningful objects from the image [4]. Quantiﬁcation of dental plaque on digital photograph of the anterior teeth of patients is an important research issue. The purpose of this study was to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 797–802, 2006. c Springer-Verlag Berlin Heidelberg 2006

798

J. Kang et al.

propose a novel quantitative and automated method to measure dental plaque accumulation via digital image segmentation based on cellular neural network (CNN) associated with histogram analysis.

2 2.1

Dental Plaque Quantiﬁcation Via Cellular Neural Network Cellular Neural Network

The CNN [5], ﬁrst introduced as an implemental alternative to full-connected Hopﬁeld neural network, has been widely studied for image processing, robotic and biological vision, and higher brain functions [5],[6]. In particular, some templates of the CNN have been presented for deleting small objects, edge detection, convex corner detection, diagonal line detection, global connectivity detection, and so on (see [7]-[10]). The standard M × N CNN architecture is composed of cells . The dynamic of each cell is given via the equation as follows: ak,l yi+k,j+l + bk,l ui+k,j+l + zi,j x˙ i,j = −xi,j + Ci+k,j+l ∈Sr (i,j)

= −xi,j +

r r

Ci+k,j+l ∈Sr (i,j)

ak,l yi+k,j+l +

k=−r l=−r

r r

bk,l ui+k,j+l + zi,j .

(1)

k=−r l=−r

where xi,j , yi,j , ui,j , zi,j represent state, output, input, and threshold respectively; Si,j (r) is the sphere of inﬂuence with radius r; ak,l and bk,l are the elements of the A-template and the B-template respectively. the output yi,j is the piece-wise linear function given by ak,l , bk,l 1 yi,j = (|xi,j + 1| − |xi,j − 1|), i = 1, 2, · · · , M ; j = 1, 2, · · · , N . 2 2.2

Standard Threshold CNN

The template of the standard threshold CNN has the form:

0 0 0

0 2 0

0 0 0

Z = −z ∗ .

and Z = −z ∗ , where −1 < z ∗ < 1 I. Global Task 1. Given: 2. Input: 3. Initial State:

Static gray-scale image P and threshold z ∗ . U (t) = Arbitrary or default to U (t) = 0. X(0) = P .

(2)

Dental Plaque Quantiﬁcation Using CNN-Based Image Segmentation

Y (t) ⇒ Y (∞) = binary image when all pixels P with gray-scale intensity Pi,j > z∗ becomes black.

4. Output: II. Local Rules xi,j (0) 1. xi,j (0) < z ∗ 2. xi,j (0) > z ∗ 3. xi,j (0) = z ∗ 2.3

799

→ yi,j (∞) → White, independent of neighbors. → Black, independent of neighbors. → z ∗ , assuming zero noise.

Choosing Threshold

Roughly speaking, the ”clusters” in a tooth digital image may be classiﬁed by two kinds. One kind represents the tooth surfaces, and another stands for the dental plaque (see Fig. 1 (a) and (b)). (a)

(b)

Fig. 1. Two digital tooth images with diﬀerent dental plaque (red color). (a) large amount dental plaque, (b) small amount dental plaque.

Threshold is a particular useful approach for distinguishing diﬀerent kinds of clustering in images. For well-deﬁned images, the threshold is located at the valley between two peaks in the gray level histogram [11](see Fig. 2(a)). However, the digital images of teeth with dental plaque have not always two remarkable peaks for determining the valley (see Fig. 2(b)). One of the methods of choosing threshold T is automatically iterative procedure, is given via the equation as follows [12]: ⎧ ⎫ L−1 Ti ⎪ ⎪ ⎪h k ⎪ h k ⎪ k ⎪ k ⎬ 1 ⎨ k=0 k=Ti +1 . (3) + Ti+1 = Ti L−1 ⎪ ⎪ 2⎪ ⎪ ⎪ ⎪ hk ⎭ hk ⎩ k=0

k=Ti +1

where hk is the numbers of pixels which gray value are k, The total number of gray-level is L. The formula (3) can be easily implemented by a computer programs. Using the formula 3 and corresponding computer programs, we compute the threshold T of the histogram in Fig. 2(a) and Fig. 2(b). The results are T 186, T 206. Based on the two thresholds, the areas percentages of the dental

800

J. Kang et al. (b) 3000

2500

2000

Distribution

(a) 3000

1500

1000

500

Gray scale 0

100

150

Gray scale 200

250

100

150

200

250

Fig. 2. Gray-level histogram of tooth images shown in (a) Fig. 1(a), and (b) Fig. 1(b)

plaques to the total tooth surface are 46.34 % and 34.25 % respectively. Clearly, the calculation results are not accurate enough. After investigating digital tooth images of 15 objects obtained from the clinical trials, we found that the threshold may be determined by the minimum gray scale distribution between gray scales 50 and 150 from the histograms of the color Plane G. A computer-aided automated searching program has been designed in order to search thresholds of the histograms of digital tooth images. The steps of the program are listed as follows: 1. Image preprocessing (denoiseing and “hole-ﬁlling”). 2. Calculate the data of the histograms of digital tooth images. 3. Search the minimum valley (threshold T) between gray scales 50 and 150 from the data of histograms of the color Plane G. 4. According to the threshold T, segment the plaque. 5. Calculate the numbers of pixels of the plaque and teeth surfaces, respectively. Using this approach, the thresholds of the histograms shown in Figs. 2(a) and 2(b), are 99 and 95 respectively. They are more reasonable than those obtained by formula (3).

3 3.1

Experimental Results and Analysis Dental Plaque Measurement Using Traditional Dental Plaque Index

The plaque index developed by Quigiey and Hein and modiﬁed by Turesky is one of the most frequently used indices in product-testing [2]. The index is scored as follows: 0: no plaque; 1: separate ﬂecks of plaque at cervical margin of the margin of the tooth; 2: a thin continuous band of plaque (up to 1 mm ) at the cervical margin; 3: a band of plaque wider than 1 mm but covering less than (1/3)rd of crown;

Dental Plaque Quantiﬁcation Using CNN-Based Image Segmentation

801

4: plaque covering at least (1/3)rd but less than (2/3)rd of the crown; 5: plaque covering (2/3)rd or more of the crown. The results of dental plaque level (the mean of total 12 teeth for per object) according to Turesky index was listed in the third column in Table 1 (clinical database consisting of 15 objects). 3.2

Dental Plaque Quantiﬁcation Based on the Proposed Approach

By means of the approaches designed in sections 2.2 and 2.3, we processed digital tooth images of 15 objects obtained via the clinical trials. In the meantime, the tooth images were indexed by visual observations. The results are listed in Table 1. Two processed tooth images (No.01 and No.02) are shown in Fig. 3. Obviously, as shown in Table 1, the data obtained via our computer-aided approaches are more accurate than those by visual observations. Table 1. The results of dental plaque based on proposed method and Turesky respectively Patient No. Proposed method(Percentage) Turesky(Level) 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

51.29 14.30 8.43 27.82 28.10 12.94 58.44 6.74 22.48 22.32 79.64 11.42 53.59 7.98 16.36

2.75 1.75 1.18 1.63 1.75 1.50 3.42 1.20 2.08 1.63 4.58 0.92 2.75 1.36 1.42

(a) (b)

Fig. 3. Detected dental plaque (black color) of images shown in (a) Fig. 1(a), and (b) Fig. 1(b)

802

J. Kang et al.

Conclusion

A method to quantify the dental plaque based on cellular neural networks associated with the histogram analysis was proposed. Experimental results shown that the method, presented in this paper, is an automated, objective and quantitative option to current indices in quantifying dental plaque without need for a clinician. As a result, it is helpful for providing the standard for clinical evaluations. Furthermore, it may be an attractive method to monitor the trend of dental plaque growth in longitude investigation.

Acknowledgement The authors would like to thank for the ﬁnancial supports given by the National Natural Science Foundations of China (Grant Nos. 60074034, 70271068), and the Research Fund for the Doctoral Program of Higher Education (Grant No. 20020008004) by the Ministry of Education of China.

References 1. Carter, K., Landini, G., Walmsley, A.D.: Automated Quantiﬁcation of Dental Plaque Accumulation using Digital Imaging. J. of Dentistry. 32 (2004) 623–628 2. Pretty, I.A., Edgar, W.M., Smith, P.W., Higham, S.M.: Quantiﬁcation of Dental Plaque in the Research Environment. J. of Dentistry. 33 (2005) 193–207 3. Cheng, H.D., Jiang, X.H., Sun, Y., Wang, J.L.: Color Image Segmentation: Advances and Prospects. Pattern Recogniton. 34 (2001) 2259–2281 4. Deshmukh, K.S., Shinde, G.N.: An Adaptive Color Image Segmentation. Electronic Letters on Computer Vision and Image Analysis. 5 (4) (2005) 12–23 5. Chua, L.O., Yang, L.: Cellular Neural Networks: Theory and Application . IEEE Trans. Circuits Syst.. 35 (1988) 1257–1272, 1273–1290 6. Chua, L.O., Roska, T.: Cellular Neural Networks and Visual Computing. Cambridge Press (2002) 7. Cai, H., Min, L.Q.: A Kind of Two-input CNN with Application. Int. J. of Bifurcation and Chaos. 15 (12) (2005) 4007–4011 8. Li, G.D., Min, L.Q., Zang, H.Y.: Design for Robustness Edge-gray Detection CNN. 2004 Int. Conf. on Communications, Circuit and Syst.. II (2004) 1061–1065 9. Liu, J.Z., Min, L.Q.: Design for CNN Templates with Performance of Global Connectivity Detection. Communications in Theoretical Physics. 41(1) (2004) 151–156 10. Su, Y.M., Min, L.Q.: Robustness Designs of Templates of Directed Overstrike CNNs with Applications. J. of Signal processing. 11 (2004) 449–454 11. Kim,D.Y., Park, J.W.: Connectivity-based Local Adaptive Thresholding for Carotid Artery Segmentation using MRA Images. Image and Vision Computing. 23 (2005) 1277–1287 12. Rafael, C.G., Richard, E.W., Steven, L.E.: Digital Image Processing using Matlab. Publishing House of Electronics Industry, Beijing (2004)

Detection of Microcalcifications Using Wavelet-Based Thresholding and Filling Dilation∗ Weidong Xu1,2, Zanchao Zhang1, Shunren Xia1, and Huilong Duan1 1

Abstract. Microcalcifications (MCs) are the main symptoms of breast cancer in the mammograms. This paper proposed a new computer-aided diagnosis (CAD) algorithm to detect the MCs. At first, discrete wavelet transform (DWT) was applied to extract the high-frequency signal, and thresholding with hysteresis was used to locate the suspicious MCs. Then, filling dilation was utilized to segment the desired regions. During the detection process, ANFIS was applied for auto-adjustment, making the CAD more adaptive. Finally, the segmented MCs were classified with MLP, and a satisfying result validated this method.

1 Introduction Breast cancer continues to be one of the most dangerous tumors for middle-aged and older women in China. Among all the detection methods, mammography performs most effectively. In the mammograms, the early symptoms of breast cancer are microcalcifications (MCs). Since the MCs always appear tiny and indistinct, the detection of the MCs usually costs the radiologists much time and energy. Many computeraided diagnosis (CAD) methods have been developed to assist the radiologists. Nishikawa proposed a different-image technique based on image subtracting, which could extract the high-frequency (HF) image signal 1. Choe applied Harr wavelet to decompose the images, applied cubic mapping to enhance the wavelet coefficients, and detected the MCs from the reconstructed images with thresholding 2. Gulsrud reduced the difference of the background with image subtracting, eliminating the random noise with median filtering, and used an optimal filter to locate the MCs 3. Those conventional methods performed well in detecting those MCs with similar features and backgrounds. But if the MCs in the special backgrounds are faced with, the detection result was usually not so satisfying. This paper proposed a novel detection algorithm, which used discrete wavelet transform (DWT) to extract the HF signal, applied dilation filling to segment the suspicious regions, and utilized ANFIS to adjust the detection process. It overcame the defects of the conventional methods, and got a high detection precision with low false positive (FP) rate in the experiments. ∗

Supported by Nature Science Foundation of China (No. 60272029) and Nature Science Foundation of Zhejiang Province of China (No. M603227).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 803 – 808, 2006. © Springer-Verlag Berlin Heidelberg 2006

804

W. Xu et al.

2 Discrete Wavelet Transform and Thresholding with Hysteresis MCs appear as tiny pieces with high intensities, which could be represented with the HF signals. In the conventional methods, image-subtracting technique was often applied to extract the HF information. Compared with them, wavelet is more appropriate to extract the objective HF information. Wavelet has great smoothness and locality, which makes it very effective in the information extraction. With wavelet, the low-frequency (LF) and the HF information of the signals could be decomposed level by level, called multi-resolution analysis (MRA). In this way, the objective signal could be extracted and processed according to their resolutions. A usual wavelet-based technique is discrete wavelet transform (DWT). With the twoscale sequences of the scale function and the wavelet function, the signals could be decomposed and reconstructed level by level easily. With DWT, the image information in each resolution is decomposed into four subbands: LL stores the low-frequency information, while LH , HL and HH store the HF information. These three subbands can be combined into a uniform HF domain, i.e. | LH | + | HL | + | HH | . In the practical work, the HF information that denotes the MCs always lies in the 2nd and 3rd levels of the wavelet domain 4.

(a)

(b)

Fig. 1. Original Images (a) and Location Result of the MCs (b)

Thresholding with hysteresis was used to extract the high-intensity coefficients in the HF domain of the 2nd and 3rd levels. Firstly, the coefficients were processed with a global thresholding: if the modulus was < T0 , the signal was deleted. Secondly, the reserved coefficients were processed with another global thresholding: if the modulus was > T1 , the signal was considered as the MCs. Finally, a local thresholding was applied on the neighborhood around each assured MC, and the remaining coefficients near the assured MCs were considered as the MCs, if their modulus were > T2 . In

Detection of MCs Using Wavelet-Based Thresholding and Filling Dilation

805

this way, those useful signals with comparatively low HF modulus could be extracted, leaving the noise with similar HF modulus suppressed. With the reconstruction of the assured signals in the HF domain, all the MCs could be located accurately (Fig. 1).

Fig. 2. Filling Dilation Result of the MCs in Fig.1

Fig. 3. ANFIS

3 Filling Dilation Then, filling dilation technique was used to segment the suspicious MCs. It is like a liana expanding, the mammogram is like the relief maps, and the gray-level intensity of the mammogram is like the altitude of the landform. The MC region R0 located in Section 2 was considered as the original regions of the lianas, and the gray-level contrast in the neighborhood was enhanced with an intensity-remapping method. Then, the liana began to expand outwards, with an iterative dilation process based on a cross-shaped structure element B : R 1 = R 0 ⊕ B , " , R n + 1 = R n ⊕ B . The new combined point during the dilation process wouldn’t be accepted into the current region, if its gray intensity f ( x , y ) could not satisfy: | f ( x , y ) − f k e r |≤ T 3 and | f ( x , y ) − f |≤ T 4 . Where f ker is the mean intensity of R 0 , f is the mean intensity of the accepted points in the neighborhood, and T 3 , T 4 are two thresholds.

If the altitude of the current position was quite different from that of the environment, it means that the liana cannot climb to it. In this way, the regions of the suspicious MCs were extracted accurately and adaptively, as shown in Fig. 2.

4 ANFIS-Based Parameter Adjustment In Section 2 and 3, the location and segmentation of the MCs could be finished. It is found out that the values of the five detection parameters ( T 0 , T1 , T 2 , T 3 , T 4 ) are quite important in the CAD. These thresholds should be auto-adjusted during the detection, according to the gray-level characteristics at the corresponding position. In this experiment, ANFIS (adaptive-network-based fuzzy inference system, in Fig. 3.)

806

W. Xu et al.

was used for auto-controller. ANFIS has high nonlinear approaching precision, good generalization ability, and has been widely applied in many research fields 5. ANFIS is divided into 6 layers. Each sample Sn has N d dimension signals, and

X i ( i ∈ [0, N d − 1] ) is inputted to a set of fuzzy membership functions μ Fi ;k ( X i ) ( k ∈ [0, M i − 1] , M i is the number of the fuzzy sets Fi ;k of

each dimension signal

the current dimension), and the function value is the fuzzy membership degree. could be mapped to a

1× M i vector, and Sn has N d such vectors. The fuzzy

membership function prototype is defined as a Gaussian, which has two premise parameters: c = ci ;k is fuzzy center, and σ = σ i ;k is fuzzy width.

μ cG,σau s sia n ( x ) = e

−

1 x−c 2 ( ) 2 σ

(1)

The fuzzy membership degrees of different dimension signals are cross-multiplied, Nd −1

to generate

Ay ( = ∏ M i ) excitation intensity Y j ( = μF0;k ×"μFi ;k ×"μFN −1;k 0

i =0

of the corresponding fuzzy association rules

Nd −1

)

R j ( = F0;k0 ×" Fi;ki ×" FNd −1;kN −1 ). d

In the fifth layer, Sugeno fuzzy model is applied to draw the conclusion of each fuzzy association rule, i.e. if X 0 is F0;k0 , " , and X Nd −1 is FNd −1;kN −1 , d

Nd −1

then f j ( Sn ) = WNd ; j + ¦ Wi ; j X i . Wi; j is called conclusion parameter, and the i=0

fuzzy conclusion

O j is computed by multiplying Y j and f j .

O j ( S n ) = Y j f j ( S n ) = (W N d ; j +

Finally, add up all

A y −1

N d −1

¦ i=0

W i ; j X i )Y j

¦Y

(2)

j=0

O j , and the output of ANFIS is gained, i.e. F ( S n ) =

Ay −1

¦O

j =0

Least mean square (LMS) method is adopted to train the ANFIS. If the ideal output

Sn is d , the mean square error is E ( S n ) = (d ( S n ) − F ( S n )) 2 2 . In order to ∂E minimize E , Wi ; j should be adjusted with the inverse direction of : ∂W

∂E ∂E ∂e ∂F ∂O = W i; j ( n ) − η ∂W ∂e ∂F ∂O ∂W = W i ; j ( n ) + η eY j X i = W i ; j ( n ) + η Y j ( S n ) X i ( d ( S n ) − F ( S n ))

(3)

W N d ; j ( n + 1) = W N d ; j ( n ) + η Y j ( S n )( d ( S n ) − F ( S n ))

(4)

W i ; j ( n + 1) = W i ; j ( n ) − η

Detection of MCs Using Wavelet-Based Thresholding and Filling Dilation

And

ci ;k and σ i ;k could also be adjusted like that:

c i ; k ( n + 1) = c i ; k ( n ) − η

∂E ∂E ∂e ∂F ∂O ∂Y ∂μ = ci ;k ( n ) − η ∂c ∂e ∂F ∂O ∂Y ∂μ ∂c

= c i ; k ( n ) + η ( d ( S n ) − F ( S n )) ¦ m

σ i ; k ( n + 1) = σ i ; k ( n ) − η

1 x − ci ;k ( n ) 2 ) σ i ;k ( n )

∂ Y m x − ci ;k ( n ) − 2 ( fm (Sn ) e ∂ μ Fi ;k σ i2; k ( n )

∂E ∂E ∂e ∂F ∂O ∂Y ∂μ = σ i ;k ( n ) − η ∂σ ∂e ∂F ∂O ∂Y ∂μ ∂σ

= σ i ; k ( n ) + η ( d ( S n ) − F ( S n )) ¦ f m ( S n ) m

∂Y m = ∂ μ Fi ; k

( ¦ Y j (Sn ) −

A y −1

μ F ( ¦ Y j (S n ))

1 x − ci ;k ( n ) 2 ) σ i ;k ( n )

2 ∂ Y m ( x − c i ; k ( n )) − 2 ( e 3 ∂ μ Fi ; k σ i ;k ( n )

(5)

(6)

A y −1

Ym ( S n ) i ;k

Where

807

j=0

¦ Y (S l

))

(7)

j=0

is the step length of the training,

m (or l ) ( ∈ [0, Ay − 1] ) is the serial

Rm (or Rl ) that contain the current fuzzy membership function μ Fi ;k ( X i ) , and there are totally Ay M i such rules. number of the fuzzy association rules

With the experiments, the optimal values of the detection parameters in different backgrounds could be measured, and three background features (mean intensity, contrast and fractal dimension) should be extracted simultaneously. With ANFIS, the relation between those optimal values and background features could be learned. If a new mammogram is processed, its background features should be extracted firstly, and then the appropriate values of the parameters could be determined by ANFIS accordingly, making the location and segmentation adaptive and accurate.

5 Experiments and Discussion 60 mammograms were used to test the proposed algorithms, which were taken from the 1st affiliated hospital of Zhejiang university, with the size of 1500*2000 and 12-bit gray-level. There were 163 MCs in these mammograms, and with the proposed algorithm, 162 MCs were detected, with 511 FPs. Finally, MLP (multi-layer perceptrons) was applied to remove the FPs, and reserve the true MCs. Before that, ten features of the MCs were extracted, including area, mean intensity, contrast, coherence, compactness, ratio of pits, number of hollows, elongatedness, fractal dimension, and clustering number. Coherence is the mean square deviation of the region, compactness is the roundness, ratio of pits is the ratio of the number of pits (concave point of the boundary) to the circumference, elongatedness is the ratio of the length to width, and clustering number is the number of the surrounding MCs.

808

W. Xu et al.

The MLP classifier was defined as: 3 layers, 10 input nodes, 20 hidden nodes, and 1 output node. And the classification result was: 158 true MCs were detected, and 499 false MCs were identified. So the true positive rate was 96.9%, while the number of the FPs per image was 0.2. Then, the true MC regions were segmented manually by the radiologists, and the result was considered as the segmentation criterion. So that the extraction effect of the MCs could be evaluated, by computing the ratio of the common area (the overlapped part of the auto-extracted area and the criterion area) to the criterion area. For those MCs with strong HF information, the mean extraction effect was 97.2%; for those MCs with comparatively weak HF signal, the mean extraction effect was 90.5%; for all the MCs, the mean extraction effect was 94.7%. The performance of the proposed algorithm was much better than that of the conventional methods. It’s because in this paper, adaptability and robustness were emphasized. ANFIS was used for the auto-adjustment of the detection process, and filling dilation could extract the desired regions accurately, which makes the feature extraction of the MCs could be applied upon the accurate regions so that the precision of the final classification could be ensured. Even when the MCs with the very special characteristics and backgrounds were faced with, this algorithm could get a satisfying result, which couldn’t be achieved with the conventional techniques. So the algorithm could deal with all kinds of MCs with different backgrounds and features in a high detection precision, and a very low FP rate could be achieved simultaneously.

References 1. Nishikawa, R.M., Jiang, Y., Giger, M.L., Doi, K., Vyborny, C.J., Schmidt, R.A.: Computeraided Detection of Clustered Microcalcifications. IEEE International Conference on System, Man and Cybernetics, Chicago, USA, (1992) 1375–1378 2. Choe, H.C., Chan, A.K.: Microcalcification Cluster Detection in Digitized Mammograms Using Multiscale Techniques. IEEE Southwest Symposium on Image Analysis and Interpretation, Tucson, USA, (1998) 23–28 3. Gulsrud, T.O., Husoy, J.H.: Optimal Filter-based Detection of Microcalcifications. IEEE Trans. Biomed. Eng., Vol. 48. (2001) 1272–1280 4. Yoshida, H., Zhang, W., Cai, W.D., Doi, K., Nishikawa, R.M., Giger, M.L.: Optimizing Wavelet Transform Based on Supervised Learning for Detection of Microcalcifications in Digital Mammograms. IEEE International Conference on Image Processing, Lausanne, Switzerland, (1995) 152–155 5. Xu, W.D., Xia, S.R., Xie, H.: Application of CMAC-based Networks on Medical Image Classification. 1st IEEE International Symposium on Neural Networks, Dalian, China, (2004) 953–958

ECG Compression by Optimized Quantization of Wavelet Coeﬃcients Jianhua Chen, Miao Yang, Yufeng Zhang, and Xinling Shi Department of Electronic Engineering Yunnan University Kunming, P.R. China, 650091 [emailprotected]

Abstract. The optimization of the parameters of a uniform scalar dead zone quantizer used in a wavelet-based ECG data compression scheme is presented. Two quantization parameters: a threshold T and a step size Δ are optimized for a target bit rate through the particle swarm optimization algorithm. Experiment results on several records from the MIT-BIH arrhythmia database show that the optimized quantizer produces improved compression performance.

Introduction

The purpose of Electrocardiogram (ECG) compression is to reduce the amount of bits needed to transmit, to store digitized ECG data as much as possible with a reasonable implementation complexity while maintaining clinically acceptable signal quality. Among various transform coding techniques, those based on the discrete wavelet transform (DWT) play an interesting role due to their easy implementation and eﬃciency. In [1],[2], both the embedded zerotree wavelet (EZW) and the set partitioning in hierarchical tree (SPIHT) algorithms, which have shown very good results in image coding, were applied to ECG signals. In [3],[4], the wavelet coeﬃcients were thresholded to set small coeﬃcients to zero and then, the nonzero coeﬃcients were uniformly quantized and coded. In the above mentioned algorithms, the wavelet transform coeﬃcients are actually quantized by a special kind of uniform quantizer which has a larger central quantization bin around zero, called the dead zone. However, the width of the dead zone was either determined by a threshold (in [3],[4]) or ﬁxed to 2 times the width of other quantization bins(in [1],[2]). For good rate-distortion performance, the relationship between the width of the dead zone and the width of other quantization bins should be optimized for the desired bit rate since the improper selection of these parameters may lead to higher distortion at that bit rate. Particle Swarm Optimization (PSO) is a population based evolutionary optimization technique developed by J. Kennedy and R. Eberhart in 1995, inspired by the social behavior of bird ﬂocking or ﬁsh schooling [5]. PSO shares a lot of similarities with other evolutionary computation techniques like Genetic Algorithm (GA), but PSO does not utilize crossover or mutation operation; rather, it D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 809–814, 2006. c Springer-Verlag Berlin Heidelberg 2006

810

J. Chen et al.

has memory and tracks the best solution achieved in the past. PSO is attractive because it has few parameters to adjust and it gets to better result in a faster and less computation-consuming way compared to many other methods. In this paper, the optimization of the parameters of a uniform scalar dead zone quantizer (USDZQ) is presented. Here, two quantization parameters: a threshold T and a step size Δ are optimized for a desired bit rate through the PSO algorithm. The quantizer is applied in a wavelet-based ECG data compression scheme where the quantized wavelet coeﬃcients are entropy coded by using the Exp-Golomb coding and Golomb-Rice coding. The rest of this paper is organized as follows: in Section 2, the USDZQ is introduced and then, the objective function for the quantization optimization and the PSO algorithm are discussed in Section 3. Implementation details and experiment results are presented in Section 4.

Uniform Scalar Dead Zone Quantizer

The USDZQ is described as: ⎧ if k = −1 ⎪ ⎪ [−3δ, −T ) ⎨ (−T, T ) if k = 0 Ik = [T, 3δ) if k = 1 ⎪ ⎪ ⎩ [(2k − 1)δ, (2k + 1)δ) otherwise

0 if k = 0 Rk = ±2kδ if k = ±1, ±2, ...

(1)

(2)

Here, k is the quantizer output index, Ik describes the kth decision interval and Rk is the corresponding reconstruction level. δ is half of the quantization step size Δ (or, Δ = 2δ), T is the threshold around zero and δ < T < 2δ. In fact, the quantization of this quantizer can be viewed as a threshold operation followed by common midtread uniform quantization. For most natural signals, many high frequency subband wavelet coeﬃcients are so small that no signiﬁcant information will be lost in the reconstructed signals when these coeﬃcients are quantized to zero. The numerous zero-valued quantization indices produced by samples falling within the central quantization bin, the dead zone, can be eﬃciently coded by using some kind of entropy coder. Therefore, it’s desired to have a larger dead zone to set more high frequency coeﬃcients to zero so as to increase the compression performance without losing much quality of the reconstructed signals. In the USDZQ, the width of the dead zone is 2T , while all other decision intervals are of width Δ except for the two decision intervals next to the dead zone. However, all reconstruction levels are equally spaced. In this way, the dequantization, performed during the decompression process to ﬁnd an approximation to the original coeﬃcients, simply consists of the multiplication of each quantization index by the step size.

ECG Compression by Optimized Quantization of Wavelet Coeﬃcients

811

Optimization of the Quantizer by PSO

For a given signal, let H(T, Δ) be the output bit rate of the coding system based on the USDZQ with parameters T and Δ, and let D(T, Δ) be a measure of the distortion introduced in the ECG signal by the quantization. The quantizer optimization problem can then be stated as: For a given target bit rate Htarget , determine T and Δ in a way that maintains the equality H(T, Δ) = Htarget as well as minimizes D(T, Δ). Optimization can be achieved by minimizing the cost function J = D(T, Δ) + λ|H(T, Δ) − Htarget |

(3)

for the given target bit rate Htarget . From the rate-distortion theory [6], the ratedistortion function R(D) is non-increasing in D and is convex. Since D(T, Δ) is non-decreasing in T, Δ, H(T, Δ) will be non-increasing in T, Δ. Apparently, |H(T, Δ) − Htarget | reaches the minimum when H(T, Δ) = Htarget . For a large enough parameter λ, minimizing the cost J is equivalent to minimizing the distortion D(T, Δ) under the H(T, Δ) = Htarget constraint. In ECG compression applications, H(T, Δ) is measured in bits/sample and the distortion D is measured by the Percent Root-Mean-Square Diﬀerence: N N (xi − xi )2 (xi − xi )2 i=1 i=1 × 100 or P RD = × 100 (4) P RD = m N N 2 2 xi (xi − xi ) i=1

i=1

where xi is the original signal, xi is the reconstructed signal, xi is the mean value of the original signal and N is the number of samples in the signal. The modiﬁed version (P RDm ) is not sensitive to the signal mean and therefore is used as the distortion D in (3). However, the P RD measure is also calculated for comparison with other methods. This optimization problem can be solved by using the the PSO algorithm. Like most evolutionary computation techniques, PSO starts with a population of solutions, called particles, randomly selected from the solution space and searches for the optima determined by a ﬁtness function. Each particle representing one potential solution ﬂies in the search space with a velocity adjusted according to the best position in its own ﬂying experience (P b) and the best position in all its companions ﬂying experience (Gb). In this work, the ﬁtness function is deﬁned as the cost function in (3). The standard procedure of the particle swarm optimization is described as follows: 1) Set the iteration number k to zero. For a population of M particles, ran(0) (0) domly assign the initial position Xi (potential solution) and velocity Vi for (0) each particle i in two dimensions in the (T, Δ) problem space. The initial P bi for each particle is set as its original position. Calculate the optimization ﬁtness (0) function (J(Xi )) for each particle and store the value. Find the best ﬁtness

812

J. Chen et al.

among all particles and store the value and its corresponding position as initial Gb(0) . In this work, a population size of 10 is used. (k) (k) 2) Update the velocity Vi and the position Xi for each particle according to (5) and (6) respectively. (k+1)

(k)

= w(k) ∗ Vi

(k)

+ c1 ∗ rand() ∗ (P bi

(k)

− Xi )

(k)

(k+1) Xi

+c2 ∗ rand() ∗ (Gb(k) − Xi )

(5)

(k) Xi

(6)

(k+1) Vi

Where w(k) is the inertia weight, c1 and c2 are cognitive and social acceleration constants. In order to restrict the particles from traveling out of the solution space, a limit V max is usually placed on the velocity. When the velocity exceeds this limit in any dimension, the value is set as the limit. V max of 2 works well in this study. (k+1) based on the new ﬁtness 3) For each particle i, (i = 1, · · · , M ), update P bi (k+1) evaluation J(Xi ) and the ﬁtness of its own best position in last iteration (k) (J(P bi )), (k)

(k+1)

P bi

(k+1)

(k)

P bi if J(Xi ) ≥ J(P bi ) (k+1) (k+1) (k) if J(Xi ) < J(P bi ) Xi

(7)

Update the best global position Gb(k+1) based on the ﬁtness of the best global position in last iteration J(Gb(k) ) and the new best ﬁtness of every particle (k+1) J(P bi ), i = 1, · · · , M , ⎧ ⎨ Gb(k) if min(J(P b(k+1) )) ≥ J(Gb(k) ) i (k+1) i (8) = Gb (k+1) (k+1) ⎩ P b(k+1) if J(P bm ) = min(J(P bi )) < J(Gb(k) ) m i

4) Repeat steps 2) and 3) until the required number of iterations K is met. Inertia weight w(k) in (5) improves the performance of the PSO algorithm and is decreased linearly during a PSO search. w(k) = wmax −

(wmax − wmin ) × k K

(9)

In this work, we set wmax = 0.9 and wmin = 0.4. c1 and c2 are both set to 2.

Implementation and Results

The 9/7-tap biorthogonal ﬁlters are used to implement the discrete wavelet transform (the ‘bior4.4’ ﬁlters in MATLAB). For entropy coding, the Exp-Golomb codes are used to code the lengths of runs of the zero quantization indices and the Golomb-Rice codes are used to code the nonzero indices [7]. Although the DWT implemented here is not orthonormal, which means the mean square error between transform coeﬃcients and their quantized values is not the same

ECG Compression by Optimized Quantization of Wavelet Coeﬃcients

813

as the mean square error between the original samples and the reconstructed samples, we observe that these two errors are still directly proportional to each other. Therefore, we only calculate the P RDm measure between the transform coeﬃcients and their quantized values such that the inverse transform is not calculated during each optimization iteration. By minimizing this distortion measure, the distortion between the original samples and the reconstructed ones is also minimized and the computational cost is reduce accordingly. For the same consideration, the actual Golomb coding is not implemented during the optimization procedure. We only accumulate the number of bits of the codewords for lengths of zero runs and nonzero coeﬃcients. When the optimal T, Δ pair is found, Golomb codewords are produced to achieve the actual compression. The proposed algorithm is tested on several records from the MIT-BIH ECG arrhythmia database. All ECG data used here are sampled at 360 Hz, and the resolution of each sample is 11 bits/sample. Since the bit rate is H(T, Δ), the compression ratio (CR) can be deﬁned as CR = 11/H(T, Δ) . Table 1 describes the process of the algorithm’s convergence on record 117 of the MIT-BIT database. Here, the target CR is 5, λ is set to 100. Due to the convex nature of the absolute value function in the second term of equation (3), the convergence of the algorithm to the target CR will be fast. From Table 1, it can be found that during the quantization optimization, the PSO algorithm converges rapidly on the target CR (after 4 iterations). However, the reduction of the distortion term in equation (3) and hence the reduction of the cost function itself is slow after the 5th iteration. Table 1. The process of convergence iteration ﬁtness compression number evaluation ratio 1 4.2326 5.0724 2 2.8048 5.0388 3 1.6386 5.0122 4 1.2756 4.9959 5 1.2756 4.9959 10 1.1726 4.9983 20 1.1035 4.9998 50 1.0939 4.9999

For compression performance evaluation, a set of signals are used which consists of 1-min length of data in record numbers: 104, 107, 111, 112, 115, 116, 117, 118, 119, 201, 207, 208, 209, 212, 213, 214, 228, 231, 232 in the MIT-BIH database. The averaged PRD results of the proposed algorithm at diﬀerent CRs are listed in Table 2. The results reported in [7] on the same data set are also listed for comparison. It can be found that the proposed algorithm yields improved P RD performance at all tested compression ratios.

814

J. Chen et al. Table 2. Averaged PRD(%) Performance Comparison CR 8:1 10:1 12:1 16:1 20:1 Proposed P RD 2.34 2.85 3.38 4.58 6.01 Chen [7] P RD 2.39 2.93 3.46 4.67 6.13

Concluding Remarks

The threshold T and the quantization step size Δ of the uniform scalar dead zone quantizer (USDZQ) used in an ECG signal compression scheme are optimized through the PSO algorithm such that the input signals can be compressed at a bit rate very close to the desired one with the minimum distortion. The quantized coeﬃcients are further entropy coded by using the Exp-Golomb coding and the Golomb-Rice coding. Experiment results show that the optimized quantizer produces improved coding performance. The algorithm does not require any a priori signal information for training, and thus can be applied to any ECG signal.

Acknowledgement This work is supported by Yunnan University, P.R. China.

References 1. Hilton,M.L.: Wavelet and Wavelet Packet Compression of Electrocardiograms. IEEE Trans. Biomed. Eng., Vol. 44. (1997) 394-402 2. Lu,Z.T., Kim,D.Y., Pearlman,W.A.: Wavelet Compression of ECG Signals by the Set Partitioning in Hierarchical Trees Algorithm. IEEE Trans. Biomed. Eng., Vol. 47. (2000) 849-856 3. Benzid,R., Marir,F., Boussaad,A., Benyoucef,M., Arar,D.: Fixed Percentage of Wavelet Coeﬃcients to be Zeroed for ECG Compression. Electronics Letters. Vol. 39. No. 11. (2003) 830-831 4. Blanco-Velasco,M., Cruz-Rold´ an,F., Godino-Llorente,J.I., Barner,K.E.: ECG Compression with Retrieved Quality Guaranteed. Electronics Letters. Vol. 40. No. 23. (2004) 1466-1467 5. Kennedy,J., Eberhart,R.: Particle Swarm Optimization, Proc. IEEE Inter. Conf. on Neural Networks, Vol. 4, (1995) 1942-1948 6. Yeung,R.W.: A First Course in Information Theory, Springer (Kluwer Academic/Plenum Publishers), (2002) 7. Chen, J., Ma, J., Zhang, Y., Shi, X.: ECG Compression Based on Wavelet Transform and Golomb Coding. Electronics Letters. Vol. 42. No. 6. (2006) 322-324

Effects on Density Resolution of CT Image Caused by Nonstationary Axis of Rotation Yunxiao Wang, Xin Wang, Xiaoxin Guo, and Yunjie Pang Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street 2699#, Changchun, 130012, P.R. China [emailprotected]

Abstract. This paper discusses the negative effect on density resolution of computerized tomography (CT) image caused by the nonstationary axis. Noise introduced and noise propagation has been analyzed according to the integral formula of reconstructing CT image. Through analyzing and deducing we get the Signal Noise Ratio which is influenced by position noise in the special position of some simple situation. Furthermore, having provided the result of SNR in other position by utilizing the method of numerical computation and having got the conclusion that the SNR is dropped in the position of edge of image and the main part of the image is almost not impacted. The simulative image agrees well with the conclusion. The result provides theoretic reference for confining the shaking quantity of the rotating axis in large CT system.

1 Introduction Computer Tomography (CT) is a method of reconstructing section image by using projection data of a certain section of measured object in each direction. The measured object or radiant point and detector rotating around a certain axis is a habitual method to get the whole azimuthally projection datum in practical application of CT. The position of axis is very significant for image reconstruction. It is well known that very kinds of image reconstruction algorithms regard the position of axis as the origin of coordinates. In the process of measurement and data obtained, the precision of localization of the mechanical equipment and deformation of measured object, etc, will lead to the fact that this axis of rotation is unstable. Nonstationary axis induces the additional translation while the measured object is rotating. The translation causes the projection data of different positions overlapped, thus influences the quality of CT image [1],[2]. In the theory of CT image, two key technical parameters are spatial resolution and density resolution. The effects on spatial resolution have discussed in [3]. Density resolution shows the differentiating capability of the minimum density interval in the object of CT imaging. Density resolution is generally represented by the percentage of ratio of the smallest density difference which can distinguish and the average density of object. The density resolution of CT system is mainly decided by the SNR of the reconstructed image and we may consider the reciprocal of SNR as the density resolution [4]. Density resolution depends largely on the noise of system which is D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 815 – 820, 2006. © Springer-Verlag Berlin Heidelberg 2006

816

Y. Wang et al.

generally determined by the statistical fluctuations of intensity of the ray sources, the statistical fluctuations of the interaction of ray and the substance, the fluctuations of ray captured by detector, the noise of head amplifier and the noise of A/D change, etc. The nonstaionary axis of rotation will introduce the noise of statistical fluctuation which will be propagated to the final reconstructed image. Consequently the SNR of CT system is degraded and the density resolution is dropped. In order to make the conclusion have generality, all following calculation is based on “parallel projection” and “filtered back projection” because “parallel projection” is the most basic way of data gathering, other projection beam may be turned into parallel projection and “filtered back projection” algorithm is a method of image reconstruction which is most frequently used [5].

2 Analyzing Effect on Density Resolution Supposing the projection data in ĭ direction is Ȝĭ(Xr) in Fig1 of [3], filtered back projection algorithm firstly require that each set of projection data and filter function q1(Xr) do convolution,then we will get the filtered projection Ȝĭ’(Xr), ∞

λ'Φ ( X r ) = ³ λΦ ( X r ' )q1 ( X r − X r ' )dX r ' ≡ λΦ ( X r ) ∗ q1 ( X r ) .

(1)

−∞

In Equation (1), there is no considering the factor of nonstationary axis. Now we suppose that the center of rotation moves to point (įr, įș) in polar coordinates when we sample projection data in angle ĭ. Projection data has an excursion of įrcos(įș-ĭ), then the actual projection value becomes Ȝĭ(Xr+įr, cos(įș-ĭ)). According to “filtered back projection algorithm”, we make the filtered projection data do a reverse projection and accumulate them, and then we get the function of reconstructed image. Let q1(x) be a filter function and we get,

μ (r ,θ ) =

∞

dφ ³ λφ ( X r + δ r cos(δ θ − φ )) q1 ( X r − X r ) dX r '

−∞

. '

(2)

X r = r cos( θ −φ )

Density resolution of CT is mainly caused by various statistical noises. In Equation(2) Ȝ is also a random quantity even though we do not consider the uncertain factor of position for the radiate source, detector, the function of ray, etc are all stochastic processes. The ratio between noises which is caused by these factors and the ideal reconstructed signal can be regarded as the density resolution of CT [5]. The randomicity of Ȝ is merely influenced by įr and į when we only consider the shaking factor of rotation center. We rewrite Equation (2) and consider įr is a small quantity. And we expanding Ȝ using Taylor Series and get Equation (3). μ i (r , θ ) = ≈

π ³

∞

dφ ³ λ Φ ( X r ) q1 ( r cos(θ − φ ) − X r )dX r

(3)

−∞

∞

dφ ³ λ Φ ( X r + δ r cos(δ φ )) q1 ( r cos(θ − φ ) − X r )dX r −∞

∞

dφ ³ λ Φ ( X r ) δ r cos(δ φ ) q1 ( r cos(θ − φ ) − X r )dX r −∞

(1)

Effects on Density Resolution of CT Image

817

In Equation (3) all length quantities are nondimensional quantities and they multiply 2d are the actual length. We have analyzed in [3] that įr and į are the steady stochastic process of one dimension about projection angle and įr and į do not change along with Xr’ in the same angle . Considering E(įrcosį ) 0, we may think the mean value of ȝi(r,ș) is the first term of Equation(3) and the variance is the second term if ignoring the higher order term in Equation (3). We may find out that stochastic fluctuating variance has something to do with Ȝ (Xr) besides the shaking rotation axis. And namely it has something to do with reconstructed image. For example while reconstructing an infinitely large and absolutely distributed uniformity image in theory, the random shaking of rotation center does not have any influence on the reconstructed image. But if the image is not an infinitely large or absolutely distributed uniformity picture, the random shaking of rotation center will introduce noises and the situation of noises has something to do with concrete image. The characteristic brings a lot of difficulties to analyze the influence of the shaking rotation center on density resolution of CT image. 2.1 Noise Analyzing and Numerical Computer at the Points of a Uniform Circle For a uniform circle whose radius is R, the one-dimensional function of its projection is λφ ( X r ) = 2 R 2 − X r 2 .

(4)

Its first-order derivative is shown in Equation (5). − 2X r

λφ (1) ( X r ) =

R − Xr 2

(5)

Substituting Equation (5) into the second term of Equation (3) and letting r=0, we get 1 D® ¯π

§1 =¨ ©π

§ ¨1 =¨ ¨π ©

∞

dφ ³ λΦ

(1 )

−∞ ∞

½ ¾. r =0 ¿

(6)

dφ ³ λΦ

−∞

( X r ) δ r cos( δ φ ) q1 ( r cos( θ − φ ) − X r ) dX r

(1 )

' ' '· ( X r ) q1 ( − X r ) dX r ¸ D {δ r cos( δ φ )} ¹ 2

dφ ³

−R

− 2X r

R2 − X r

· ¸ q1 ( X r ) dX r ¸ D {δ r cos( δ φ )} ¸ ¹ '

Because Ȝ (1)(Xr) is an odd function and q1(Xr) is an even function, their product is an odd function. An integral of an odd function is zero in a symmetric interval, so the value of Equation (6) equals to 0, which means the variance is 0. In other words, the reconstructed image’ value at the center of a uniform circle is still a definite quantity but not a random quantity although there exist the shaking of rotation axis. Thus the SNR of this point is and the density resolution is not affected by any factors. For the situation of r0, the integral is not so simple as r=0 and presently we can get the datum by only using the method of numerical calculation. Before numerical calculation, we must firstly determine the value of R. The detector interval is the order of magnitude of 0.2mm and the section diameter of measured object is

818

Y. Wang et al.

commonly around 1m, so R should be around in 0.5m/ (2 x 0.2mm) =1250.Let R=1500, when r is respectively 0, 150, 300, ……,1500, we have calculated the mean of the first term of Equation(3) and the variance of the second term Equation (3) in Table 1. The SNR at some point in reconstructed image is defined in [4].

SNR ( r , θ ) =

E ( μ i ( r , θ )) . σ ( μ i ( r , θ ))

(7)

And consider the reciprocal of SNR as the density resolution [5]. ı(įrcosį ) is the mean square error of random variable įrcosį in Table 1. Because the shaking quantity of the center of rotation is very small(in general įr is small than 0.7) , ı(įrcosį ) is smaller than 1. Also taking into account the sharp changes of SNR when r is close to R, so the sampling density is increased by ten times when r is larger than 1350.It can be seen from the computation result of Table 1 that there are hardly any effects on density resolution caused by nonstationary axis when r is smaller than 1485, but the SNR drops sharply when r is close to R and the density resolution declines along with it. Table 1. Numerical calculation of density resolution of reconstructed image of a uniformity circle at different r when the shaking quantity of nonstationary axis is tiny r

E(ȝi(r,ș))

ı(ȝi(r,ș))

ı(ȝi(r,ș))/ E(ȝi(r,ș))

6399.948453

0.000000×ı(įrcosį )

150

6399.988609

0.000476×ı(įrcosį )

0.000000×ı(įrcosį )

450

6399.991562

0.000946×ı(įrcosį )

0.000000×ı(įrcosį )

750

6400.001474

0.001821×ı(įrcosį )

0.000000×ı(įrcosį )

1050

6400.031210

0.004600×ı(įrcosį )

0.000001×ı(įrcosį )

1350

6400.258385

0.039047×ı(įrcosį )

0.000006×ı(įrcosį )

1380

6400.363610

0.054264×ı(įrcosį )

0.000008×ı(įrcosį )

1410

6400.585839

0.103632×ı(įrcosį )

0.000016×ı(įrcosį )

1440

6401.106782

0.221732×ı(įrcosį )

0.000035×ı(įrcosį )

1470

6403.395663

0.926066×ı(įrcosį )

0.000145×ı(įrcosį )

1500

5300.472977

3298.659848×ı(įrcosį )

0.622333×ı(įrcosį )

2.2 Effects on Density Resolution Under a Relatively Uniform Background In general, CT system is mainly used to check defects of object and therefore its section is formed all in the uniform background with some defects. And it meets the condition of a relatively uniform background, so the conclusion in section 2.2 may be generalized to reconstruct CT image in other uniform background conditions. That is to say, under the relatively uniform background condition, the influences on density resolution of CT image caused by nonstationary axis of rotation are mainly represented by the SNR of

Effects on Density Resolution of CT Image

819

image edge dropped and the main body of image has not affected nearly. We will illuminate this conclusion in the following part. Firstly, we have proved that filter function q1(x) meets Equation (8) (Because of the limit of the length on this paper, the process of proving has not given). n

³ q ( x) dx ≡ 0 m

(8)

(m and n are integers) .

So we get Equation (9) from Equation (3).

σ(μi (r,θ))

(9)

ª 1 π ∞ (1) ' ' 'º = « ³ dφ³ λφ (Xr ) q1(r cos(θ −φ) − Xr )dXr » ×σ(δr cos(δφ )) . ¬π 0 −∞ ¼ ª 1 π § +∞ i+1 (1) ' ' ' ·º = « ³ dφ¨ ¦³ λφ (r cos(θ −φ) − Xr ) q1(Xr )dXr ¸» ×σ(δr cos(δφ )) 0 ¹¼ ©i=−∞ i ¬π Under the relatively uniform background condition, the changes of Ȝ (1)(rcos(ș- )in Equation (9) is not big in general while Xr ' is from i to i+1 and it can be substituted by a constant., thus the corresponding sum to integral term is 0.Ȝ (1)(rcos(ș)-Xr’) changes relatively sharp only on the edge and so the corresponding sum to integral term is not 0. In other word, it is very small that the fluctuation of reconstructed image which nonstationary axis of rotation brings to under the uniform background except edge and these effects on density resolution may be neglected compared with other factors and just will make the SNR drop on the edge. Xr’)

3 Experimental Results According to the simulating conditions of reconstructed image provided in [3], we have reconstructed images of four small holes under the uniform background. The value of ȝ of holes is 1.2 times the value of ȝ0 of background. Figure 1 shows the simulative results. It can be seen that the fuzzy degree edge of holes is intensified along with the shaking quantity of axis įrm increasing.

(a) įrm

(b) įrm

0.25

0.4

(d) įrm

0.5

Fig. 1. Reconstructed image under a uniform background with different shaking quantity of axis

820

Y. Wang et al.

4 Conclusions The nonstationry axis of rotation is one of the important factors that influencing CT image quality in the process of image reconstruction and density resolution is the key parameter of CT image quality. In this paper, we have presented a new method of analyzing the effects on density resolution under the situation of unstable axis. Synthesizing the datum given in Table 1, analysis on Equation (9) and the simulative result in Figure1, we may consider that the effects on density resolution caused by unstable axis of rotation only embodied in the edge of image whose density resolution is dropped and the main part of the image is almost not impacted. Density resolution is measuring the differentiating capability of density interval of CT system in large scale. From this standpoint, the fuzzy edge does not impact on the density resolution. So we have concluded that the effects may be neglected for the objects measured by CT system. This paper is providing a new theory basis for CT system and it can be applied to the future CT system.

Reference 1. Joseph A Concepcion, John D Carpinelli, Gioietta Kuo-Petravic, et al.: CT Fan Beam Reconstruction with a Nonstationary Axis of Rotation. IEEE Trans On Medic Imaging(1992) 111 116 2. Kijewski, M. F., Judy, P. F.: The Effects of Misregistration of the Projections on Spatial Resolution of CT Scanners, Med Phys (1983) 169 175 3. Yunxiao Wang, etc.: Effects on Spatial Resolution of CT Image caused by Nonstationary Axis, Proceedings of 2005 International Conference on Machine Learning and Cybernetics (2005) 5473-5478. 4. Barrett, H.H., Swindell.: The Theory of Radiological Imaging, Image Forming, Checking and Processing, Science Press (1998) 395 486 5. Lapin GD, Groothuis DR.:The Necessity of Image Registration for Dynamic CT Studies, Proceedings of the Annual Conference on Eng in Medical and Biology 13 (1991)283-284

Embedded Linux Remote Control System to Achieve the Stereo Image Cheol-Hong Moon1 and Kap-Sung Kim2 1

Gwangju University, Gwangju, Korea [emailprotected] http://web2.gwangju.ac.kr/ chmoon/ 2 Gwangju University, Gwangju, Korea [emailprotected]

Abstract. A new embedded SoC(System on a Chip) system, which enables the remote control and capture of a stereo image was developed and used to measure the distance and provide 3D information. This system is a simple and economical stereo system that overcomes the limitations of the systems currently in operation, which include the requirement of high-end equipment and software. The stereo image system developed in this study consisted of two CCDs (Charge Coupled Device), two image decoders to convert the cameras output analog signal into digital and TFT-LCD (Thin Film Transistor Liquid Crystal Display) to display the captured image. An I2C, stereo block and LCD IP control were used to control the system employing Embedded Linux for real-time operation. Web Server was also installed in the embedded stereo system to allow for remote networking with clients. The HTML(Hyper Text Markup Language) ﬁle and CGI (Common Gateway Interface) program were designed to control the embedded system in a server . In this system, remote control and image capture from a PC were enabled on the Web.

Introduction

Home networking has become an essential part of everyday life. In particular, the Internet provides people with almost inﬁnite information and convenience. This study investigated a system to acquire stereo images through a network. This new system uses a SoC (System on a Chip) to reduce the volume, power consumption and size of the remote control system, which used to be quite complicated and bulky.[1] This study conﬁgured an embedded Linux remote control system using a SoC system and designed the IP (Intellectual Property) for the stereo system.[2-5] The IP is the individual component in a chip. Stereo IP, Stereo Control IP and LCD IP were designed in this study.[6] The embedded system, which optimizes the system using the designated functions, can be used more eﬀectively in an Open Source OS such as Linux than in a conventional commercial OS.[7-9] Because Linux performs better in the embedded system, embedded Linux was ported by conﬁguring a SoC system H/W to acquire stereo images. The image was acquired and stored by accessing the embedded system D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 821–826, 2006. c Springer-Verlag Berlin Heidelberg 2006

822

C.-H. Moon and K.-S. Kim

via a web browser and controlling it remotely. The acquired images were conﬁrmed on TFT-LCD through a network and transferred to the client through a web browser.

2 2.1

Software and Hardware Design Software Design

The software design in this article included both development environments and embedded environments.

Fig. 1. Structure of the Kernel

The Linux Kernel was used in the embedded system because satisﬁes the requirements of the embedded system. Redhet 9.0 by Redhet Co was installed for this study. The kernel is the core of the operating system. It resides in the DRAM of the target board, conﬁgures the environment necessary for the system to operate and schedules the program operation. The kernel can be divided into Micro Kernel and Monolithic Kernel. Micro Kernel is the minimal kernel and only performs the core functions of the kernel with the remainder being performed as service processes. Monolithic kernel has the structure including many of the service routines essential for the system operation. Monolithic Kernel has the advantages of easy implementation and eﬀective management of the system resources. However it is diﬃcult to port it to the system with various environments, and the kernel size becomes large. Figure 1 shows the kernel structure indicating the process management, memory management, ﬁle system management, device management and network management. Development Environment Conﬁguration. In this article, the environment to develop the embedded software was conﬁgured as follows. Minicom was installed to monitor and transfer data through a serial port, and a cross compiler was installed to construct the cross environments. TFTP was installed to construct the network environments, and NFS (Network File System) was implemented to construct the ﬁle system through a network.

Embedded Linux Remote Control System to Achieve the Stereo Image

823

Embedded Environment Conﬁguration. To conﬁgure the embedded system, the devices were conﬁgured ﬁrst when the system was turned on. The boot loader called the kernel and the ﬁle system was compiled and installed in the ﬂash memory. The kernel, which is the core of the Linux, was compiled and installed in the ﬂash memory. Finally, the ﬁle system for the users was constructed and installed in the ﬂash. Web Server Construction. For the purpose of this article, the system was conﬁgured as a web server. In the ﬁle system, the web server, the core of the remote system and the CGI (Common Gateway Interface) program that performs the stereo image acquisition in the web server were designed and installed. The web server installed in this system was BoA web server version 0.94.9. A SoC (System on a Chip) is a technology that condenses not only hardware logic but also a processor, ROM, RAM, controller and peripheral circuits into a chip. This means that a SoC is the ultimate goal for various system-related companies as well as for semiconductor companies. The chip used in this article was Excalibur, which was produced by Altera. It has an ARM922T 32bit processor and the IP is connected to an AMBA (Advanced Microcontroller Bus). The Excalibur chip used in this article included APEX20Ks 400,000 gate FPGA regions.

Fig. 2. Stereo IP Block Diagram

Figure 2 shows the whole block diagram of the Stereo IP. The block diagram was divided into the Processor Area, PLD Area and External Hardware Area. The Processor Area was composed of Arm922T 32bit RISC (Reduced Instruction Set Computer) Core, Stripe-to-PLD Bridge to access the PLD Area and PLD-toStripe Bridge to access the Processor area from the PLD area. The PLD Area is a logically designed part using the VHDL language, which should be implemented as hardware. The Slave Decoder outputs the Select signal for each module by decoding the addresses of each module to access the modules in the PLD Area from the Processor. The I2C Control Logic is the control logic to set the inner register to make an Image Decoder output, YUV422 16bits. The Stereo Control Logic transfers the image data of the valid data zone of the Image Decoder to SRAM Control Logic using the Capture Start signal when the I2C conﬁguration is complete. The SRAM Control Logic enables the data transmitted from the Stereo Control Logic to be transferred to SRAM with the control signal to write the data. Stereo DMA Logic reads the stereo image stored in SDRAM through the PLD-to-Stripe Bridge. The LCD Driver issues various signals to operate the

824

C.-H. Moon and K.-S. Kim

640x480 TFT LCD and displays the data read from the Stereo DMA Controller Logic on the LCD. The Default Slave Logic is selected by default if any of the three modules (SRAM Control, Stereo Control, I2C Control) are not selected. The External hardware area is composed of two image decoders for a Stereo Image and SRAM where the image data will be stored.

Fig. 3. I2C IP Status Diagram

Fig. 4. LCD IP Block Diagram

Figure 3 shows a status diagram of the I2C Control IP. The I2C Control IP separates the input clock as 50khz and uses it as an I2C Main clock. The I2C Control IP has an address of A0000000 and Slave Decoder selects it as the I2C Control Logic when the address approaches A0000000 in the Stripe. Beginning with the start conditions, the I2C Control IP converts parallel data such as a device address, a decoder inner address and data into serial data. It also sets the decoder register and completes data transmission by transmitting the stop conditions. Figure 4 shows a Block Diagram of the Stereo DMA and LCD IP, which is divided to an Excalibur Stripe Area and FPGA Area. It stores the SDRAM address, which will read the image data through the Stripe-to-PLD Bridge and the size of the image that will be displayed on the LCD in the Register BAnk through an AHB Slave Interface. The LCD Driver issues the Pixel Clock, Hsync, Vsync, and DE signals to initiate a 640x480 size TFT LCD with the enable signal of the Excalibur Stripe. The DMA Controller detects the Hblank and Vblank signals, which is the slot that does not display image data on LCDfrom LCD Driver through AHB Master Interface, accesses the Excalibur Stripe zone through the PLD-to-Stripe Bridge and stores the data in DPRAM (Dual Port Ram). The LCD Driver reads the data from DPRAM in the slot where it will display the image and then displays it on the LCD.

Experiments and Results

The experiment was performed according to the pathways shown in Figure 5. The ﬁgure shows the system operation ﬂow chart. The system is divided into three parts. It begins with the far right client system, a middle Web SoC embedded system and ends with the far left Linux program. (1) Initially, the system is accessed through the client part. When the address, (http://220.69.22.111), is entered through the internet address window in a PC browser, the web server

Embedded Linux Remote Control System to Achieve the Stereo Image

825

Fig. 5. System Operation Flow Diagram

installed in Linux can be accessed via an Ethernet Chip of the embedded system through the network. The Web server delivers the main screen to the client system through an Ethernet. (2) When the Image Creation button in the client is pressed to convert the image, CGI program is executed after ﬁrst acquiring the image. The program checks the value through getword and initializes the image decoder through the decoder initialization. After initialization, the system acquires data through a CCD camera and stores it in SRAM. After saving, the data is converted to a RAW ﬁle through makeraw. The created RAW ﬁle is converted to a BMP ﬁle through RAW to BMP and is then saved. (3) The saved ﬁle after the image conversion is then transmitted from the Web Server to the Client system when the Image Acquisition button in the Client is clicked. (4) A stereo image can then be seen. Simulation. Figure 6 shows the results of a Simulation of the LCD DMA Controller IP. Figures 7 and 8 show the embedded system image when it accesses the web server and the screen when the image was acquired.

Fig. 6. LCD DMA Controller IP Simulation

Fig. 7. Embedded System Image

Fig. 8. Web Server Image Acquisition

826

C.-H. Moon and K.-S. Kim

Conclusion

An embedded stereo system was implemented using a SoC Chip and the embedded Linux was installed to acquire the animation image from the outside through a network. Previously, network animation image systems were quite bulky, consumed a large amount of electricity and a network controlling system was diﬃcult to implement. In order to overcome the shortcomings of this conventional system, a SoC was applied to the system and a remote program was designed to allow an existing image device to be controlled remotely. The system contained a controller, which controlled and initialized the stereo image IP. The IP was designed in the PLD part using Quartus II and a I2C controller, Stereo Block controller and LCD IP control were implemented. The IP ported the embedded Linux and ﬁle system, which processes multiple tasks simultaneously and in real time. The system was installed on a web server for Linux in order to allow the system to be accessed through the web. A CGI program was created to control the web server and the embedded SoC system, and a stereo image was acquired by controlling the web server remotely.

Acknowledgements This research was supported by the Program for the Training of Graduate Students in Reginal Innovation and the Technical Developement of Regional Industry which was conducted by the Ministry of Commerce, Industry and Energy of the Korean Government.

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

Park, J.H. : IT EXPERT EMBEDDED Linux, Hanbit Media 2003, (2003) ALTERA. : Excalibur Devices Hardware Reference Manual, (2002) ARM. : AMBA Speciﬁcation (Rev 2.0), (1999) ARM. : ARM922T(Rev 0) Technical Reference Manual, (2000) Sloss, A.N., Symes D. Wright C.: ARM System Developer’s Guide, Elsevier 2005, (2005) PHILIPS. : ”SAA7111AHZ Enhanced video input processor”, Data sheet, (1998) Matthew, N., Stones, R., translated by Han, D.H., Lee, M.Y. : Linux Programming for Beginners, Daerim 1998, (1998) Watson, K.M. Whitis Ma.: Linux Programing Unleashed, SAMS 1999, (1999) Kwon, S.H.: Linux Programing Bible, Global 2002, (2002)

Estimation of Omnidirectional Camera Model with One Parametric Projection Yongho Hwang and Hyunki Hong Dept. of Image Eng., Graduate School of Advanced Imaging Science, Multimedia and Film, Chung-Ang Univ. [emailprotected], [emailprotected]

Abstract. This paper presents a new self-calibration algorithm of omnidirectional camera from uncalibrated images. First, one parametric non-linear projection model of omnidirectional camera is estimated with the known rotation and translation parameters. After deriving projection model, we can compute an essential matrix of unknown camera motions, and then determine the camera positions. In addition, we showed that LMS (Least-Median-Squares) is most suitable for inlier sampling in our model than other methods: 8-points algorithm and RANSAC (RANdom Sampling Consensus). In the simulation results, we demonstrated that the proposed algorithm can achieve a precise estimation of the omnidirectional model and extrinsic parameters.

1 Introduction The seamless integration of synthetic objects with real photographs or video images has long been one of the central topics in computer vision and computer graphics [1,2]. Generating a high quality synthesized image requires first matching the geometric characteristics of both the synthetic and real cameras. Since the fisheye lens has a wide field of view, it is widely used to capture the scene and illumination from all directions from far less number of omnidirectional images. This paper presents a new self-calibration algorithm for estimating the omnidirectional camera model from uncalibrated images. First, we derive one parametric non-linear projection model of the omnidirectional camera, and estimate the model by minimizing a distance the projected points and the epipolar curves. In this process to estimate the camera model, our method uses the known rotation and translation parameters. After deriving projection model, however, we can compute an essential matrix of unknown camera motions and then determine the relative rotation and translation. The experimental results showed that the proposed method can reconstruct 3D scene structure and generate photo-realistic images. It is expected that animators, visual effect and lighting experts for the film industry would benefit highly from it.

2 Previous Studies Many researches for self-calibration and 3D reconstruction from omnidirectional images have been proposed up to now. In addition, these approaches are combined widely with IBL (Image-Based Lighting) [2, 5] due to their merits. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 827 – 833, 2006. © Springer-Verlag Berlin Heidelberg 2006

828

Y. Hwang and H. Hong

Xiong et al register four fisheye lens images to create the spherical panorama, while self-calibrating its distortion and field of view [3]. However, camera setting is required, and the calibration results may be incorrect according to lens because it is based on equi-distance camera model. Sato et al simplify user’s direct specification of a geometric model of the scene by using an omnidirectional stereo algorithm, and measure the radiance distribution. However, because of using the omnidirectional stereo, it is required in advance a strong camera calibration for capturing positions and internal parameters, which is complex and difficult process [5]. Although previous studies on calibration of omnidirectional images have been widely presented, there were few methods about estimation of one parametric model and extrinsic parameters of the camera [6~8]. Pajdla et al metntioned one parametric non-linear projection model has smaller possibility to fit outliers, and explanied that simultaneous estimation of a camera model and epipolar geometry may be affected by sampling corresponding points between a pair of the omindirectional images [9]. However, it requires further consideration about various inlier sampling methods: 8points algorithm, RANSAC (RANdom Sampling Consensus), LMS (Least-MedianSquares) [10]. This paper presents a robust calibration algorithm for one parametric model with relative efficiency and determines which method is most suitable for inlier sampling in our model.

3 Projection Model Estimation The camera projection model describes how 3D scene is transformed into 2D image. The light rays are emanated from the camera center, which is the camera position, and determined by a rotationally symmetric mapping function f .

f (u, v) = f (u) = r / tan θ .

(1)

where, r = u 2 + v 2 is the radius of a point (u, v) with respect to the camera center and θ is the angle between a ray and the optical axis. The mapping function f has various forms determined by lens construction [7,11]. The precise two-parametric non-linear model for Nikon FC-E8 fisheye converter as follows: θ=

ar , 1 + br 2

a − a 2 − 4bθ 2 . 2bθ

(2)

where a, b are parameters of the model. On the assumption that the maximal view angle θ max is known, the maximal radius rmax corresponding to θ max can be easily obtained from the normalized view field image. It allows to obtain the one-parametric non-linear model as follows:

θ=

ar 2

r 1+ ( a θ max − 1 rmax ) rmax

(3)

Estimation of Omnidirectional Camera Model with One Parametric Projection

829

In order to estimate one parametric non-linear projection model, we use two omnidirectional images with known relative camera direction and translation. 20 corresponding points between images are established by the commercial program MatchMover pro3.0 [12].

(a)

(b)

Fig. 1. Input images were taken by Nikon FC-E8 fisheye converter mounted on Nikon Coolpix995 with 1530 1530 pixels and 20 correspondences marked by red circles. (a) omnidirectional image captured at the reference position, (b) at relatively rotated and translated position (rotation R: -30 around y-axis, unit translation vector T: (tx,ty,tz)=(0.9701, 0, 0.2425)).

Since the relative rotation and translation parameters are known in estimation of the camera model, we can draw an epipolar. In addition, we obtain the parameter a minimizing a distance of the epipolar curves and the projected points as follows:

1 arg min a N

¦ d (curve , pt ) . i

(4)

i =1

where, N and d(,) are the number of correspondences and Euclidean distance between pt curvei is the i-th epipolar curve, and i is the i-th a curve and a point, respectively. corresponding point.

(a)

(b)

Fig. 2. (a) Sum of distance between epipolar curves and correspondence points by change of parameter a, (b) estimated projection model, parameter a = 1.3

830

Y. Hwang and H. Hong

The distance error graph for the parameter a is shown in Fig. 2 (a). We obtained the minimum error when a is 1.3, and then the estimated projection model is represented in Fig. 2 (b).

4 Experimental Results Correspondences between two images (Fig. 1) were established by MatchMover pro 3.0. After estimating the omnidirectional camera model, we can obtain the relative parameters including rotation and translation, which were computed from the essential matrix. Fig. 3 shows the computed epipolar curves are located precisely on 20 corresponding points. In these results, the average pixel error of 20 feature points is 0.010531 and the angular error is 9.174 10-6, respectively. Fig. 4 shows the epipolar curves on the omnidirectional image from the essential matrix of the image. In these results, the average pixel error of 55 feature points is 0.0155 and the angular error is 2.0 10-5, respectively. In order to ascertain our performance, we have compared the estimated results with the known camera parameters in table 1.

Fig. 3. Computed epipolar curves on feature points

Fig. 4. Feature points and corresponding epipolar curves

The input images (800 800) and 70 corresponding points between two views are showed in Fig. 6. These are manually detected so that the evenly distributed points are selected. We have compared three methods: 8-points algorithm, RANSAC, LMS, to determine which method is most suitable for inlier sampling. In three algorithms the

Estimation of Omnidirectional Camera Model with One Parametric Projection

831

numbers of point’s subset are 70, 47 and 20, respectively. Our experimental results (Table 2) represent that LMS has the least errors among three methods. Inlier selection—where corresponding points should be located in the image—is important for robust estimation of the camera model and the essential matrix. Table 1. Son of real and estimated camera parameters R: -15° with y-axis, T: (1, 0.0, 0.0) Real Parame ters

Esti mated parame ters

ª 0 . 9659 « 0 « « 0 . 2588 « 0 ¬ ª 0 . 9574 « 0 . 001 « « 0 . 2887 « 0 ¬

− 0 . 2588

1 0

0 0 . 9659

R: -30° with y-axis, T: (0.9701, 0, 0.2425)

1º 0 »» 0» » 1¼

ª 0 . 866 « 0 « « 0 .5 « ¬ 0

0 . 0019 0 . 9999 0 . 0099

− 0 . 2887 0 . 01 0 . 9574

0 . 9944 0 . 0117 0 . 1053

º » » » » ¼

ª 0 . 8489 « 0 . 0011 « « 0 . 5286 « 0 ¬

− 0 .5

1 0

0 0 . 866

0 . 9701 º » 0 » 0 . 2425 » » 1 ¼

0 . 0081

− 0 . 5285

0 . 9999

0 . 0172

0 . 0152

0 . 8487

0 . 9163 º 0 . 0016 »» 0 . 4004 » » 1 ¼

Table 2. Comparison of three algorithms for inlier selection

8-points algorithm

Pixel error (average) 0.0237

RANSAC

0.0142

LMS

0.0106

Angular error (average) 5.068

10-9

3.171

10-9

1.167

10-9

Fig. 5. Feature points and corresponding epipolar curves

Fig. 6. Generated images by rendering synthetic objects into a real-world scene

832

Y. Hwang and H. Hong

In addition, our method is able to identify 3D position of the light sources with respect to the camera positions estimated by omnidirectional calibration. Sampling the bright regions in the image enables the user to detect a light source of the scene [14]. We have reconstructed only the illumination environment including light positions, and experimented on integration of synthetic objects—sphere, torus, table, cone—into the real scene. Fig. 6 represents the generated images and the animation results.

5 Conclusions This paper presents a new self-calibration algorithm for one parametric non-linear projection model of omnidirectional camera. From the estimated projection model, we can compute an essential matrix of unknown camera motions, and determine the camera positions. In simulation results, LMS method can make the most precise inlier sampling in our model. By using both hemispherical coordinates of two cameras, we identify 3D position of the light sources with respect to the camera positions. In addition, photo-realistic scenes can be generated in the reconstructed illumination environment. Further study will include an integration of scene and illumination reconstruction for photo-realistic image synthesis.

Acknowledgements This work was supported by the Ministry of Education, Korea, and under the BK21 project.

References 1. Fournier, A., Gunawan., Romanzin, C.: Common Illumination between Real and Computer Generated scenes. Proc. of Graphics Interface. (1993) 254-262 2. Debevec, P.: Rendering Synthetic Objects into Real Scenes: Bridging Traditional and Image-based Graphics with Global Illumination and High Dynamic Range Photography. Proc. of Siggraph. (1998) 189-198 3. Xiong, Y., Turkowski, K.: Creating Image Based VR using a Self-calibrating Fisheye Lens. Proc. of Computer Vision and Pattern Recognition. (1997) 237-243 4. Nene, S. A., Nayar , S. K.: Stereo with Mirrors. Proc. of Int. Conf. on Computer Vision. (1998) 26~35 5. Sato, Y., Sato, Ikeuchi, K.: Acquiring a Radiance Distribution to Superimpose Virtual Objects onto a Real Scene. IEEE Trans. on Visualization and Computer Graphics, Vol.5, No.1 (1999) 99~136 6. Bunschoen, R., Krose, B.: Robust Scene Reconstruction from an Omnidirectional Vision Oystem. IEEE Trans. on Robotics and Automation, Vol.19, No.2 (2003) 23~69 7. Micusik, Pajdla, T.: Estimation of Omnidiretional Camera Model from Epipolar Geometry. Proc. of Computer Vision and Pattern Recognition. (2003)485-490 8. Micusik, D., Martinec, Pajdla, T.: 3D Metric Reconstruction from Uncalibrated Omnidirectional Images. Proc. of Asian Conf. on Computer Vision. (2004)545-550 9. Micusik, B., Pajdla, T.: Omnidirectional Camera Model and Epipolar Estimateion by RANSAC with Bucketing. IEEE Scandinavian Conf. Image Analysis. (2003)

Estimation of Omnidirectional Camera Model with One Parametric Projection

833

10. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge Univ. (2000) 11. Kumler,J.,M.Bauer.: Fisheye Lens Designs and Their Relative Performance. http://www.realviz.com 12. Oliensis, J.: Extract Two-image Structure from Motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.24, No.12 (2002)1618-1633 13. Agarwal, S., Ramamoorthi, R., Belongie, S., Jensen, H.: Structured Importance Sampling of Environment Maps. Proc. of Siggraph.(2003) 605-612

Expert Knowledge Guided Genetic Algorithm for Beam Angle Optimization Problem in Intensity-Modulated Radiotherapy Planning∗ Yongjie Li and Dezhong Yao School of Life Science and Technology, University of Electronic Science and Technology of China, 610054 Chengdu, China {Liyj, Dyao}@uestc.edu.cn

Abstract. In this paper, a useful tool is proposed to find the optimal beam configuration within a clinically acceptable time using a genetic algorithm (GA) guided by the expert knowledge. Two types of expert knowledge are employed: (1) beam orientation constraints, and (2) beam configuration templates. The knowledge is used to reduce the search space and guide the optimization process. The beam angles are selected using GA, and the intensity maps are optimized using the conjugate gradient (CG) method. The comparisons of the optimization on a clinical prostate case with and without expert knowledge show that the proposed algorithm can improve the computation efficiency.

1 Introduction Intensity-modulated radiotherapy (IMRT) is a powerful technology to improve the therapeutic ratio by using modulated beams from multiple directions to irradiate the tumors. The conventional IMRT planning starts with the selection of suitable beam angles, followed by an optimization of beam intensity maps under the guidance of an objective function [1][2]. The set of such beams should be chosen such that the plan could produce highly three-dimensionally conformal dose distributions to the target, while sparing those organ-at-risks (OARs) and normal tissues as much as possible. Many studies have shown that the selection of suitable beam angles is most valuable for a plan with a small number of beams (9) in some complicated cases, where the tumor volume surrounds a critical organ, or is surrounded by multiple critical organs [3][4]. Beam angle selection is important but also challenging for IMRT planning because of the inherent complexity, mainly the large search space and the coupling between the beam configuration and the beam intensity maps [4][5]. In the current clinical practice, beam angle selection is generally based upon the experience of the human planners. Several trial-and-error attempts are normally needed in order to find a group of acceptable beam angles, mainly because of the facts ∗

Supported by grants from NSFC of China (30500140 & 60571019).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 834 – 839, 2006. © Springer-Verlag Berlin Heidelberg 2006

Expert Knowledge Guided GA for Beam Angle Optimization Problem in IMRT

835

that beam directions are case-dependant and they are coupled with the intensity maps of the incident beams, which result in the less straightforwardness for manual selection, compared to the conventional conformal radiotherapy (CRT) [3]. To date, extensive efforts have been made by many researchers to facilitate the technique of computer-assisted beam angle selection for IMRT planning [3~9]. Though there are fruitful improvements achieved, it still cannot act as a routine tool in clinical practice because of the limitation of the intrinsic extensive computation time. Two directions for the future studies to further improve the optimization performance are: (1) optimization algorithms themselves, and (2) the external intervention or guidance to the optimization process, such as the usage of the expert knowledge accumulated by the oncologists and physicists over time. This study is concentrated on the technique to improve the optimization efficiency by utilizing the expert knowledge to guide the evolution process of the genetic algorithm (GA). The rest of this paper is organized as follows. In Section 2, the frame of the knowledge guided-GA is briefly introduced, followed by the detailed descriptions of beam angle optimization with GA. The objective function is also described in this section. In Section 3, a clinical prostate is employed to show the performance of the proposed algorithm. Finally, some conclusions and discussion are given in Section 4.

2 Materials and Methods In order to decrease the computation burden, the optimization is separated into two iterative steps: beam angle selection using GA [7], and optimization of intensity maps for the selected beams using a conjugate gradient (CG) method [2]. It is noted that the terms plan, chromosome and individual are used interchangeably throughout the paper, unless they have different meanings that should be clearly stated. 2.1 Knowledge Guided Genetic Algorithm Two types of expert knowledge are used: (1) beam orientation constraints, which define the orientation scopes through which no beam can pass, and (2) beam configuration templates that are the most possible beam angles suitable to the current tumor. The first type is used to reduce the searching space by discarding the defined scopes from the whole space with 360°. The remaining of the total 360° are divided into discrete angles with an increment, such as 10°. The second type of the knowledge is used (1) to initialize some of the individuals in the first generation of GA (the left individuals are initialized randomly), and (2) to replace the worst individual in each new generation. The scheme of expert knowledge guided GA is shown in Fig. 1. No more than a quarter of the total individuals in the first generation of GA are allowed to be initialized with the templates, in order to avoid the knowledge dominating the GA operations at the beginning of the optimization. If there are plan templates remained after the initialization, they will be used to replace the worst individual in each new generation, until no template remains. When the optimization finishes, the optimized results are saved in the database to serve as templates.

836

Y. Li and D. Yao

Fig. 1. The scheme of expert knowledge guided GA for beam angle optimization

Fig. 2. The coding scheme and the genetic operations for beam angle optimization

2.2 Genetic Algorithm for Beam Angle Optimization This study adopts a one-dimensional integer-coding scheme, in which the combination of beam angles is represented by a chromosome with a length of user-specified beam number, and each gene in the chromosome represents a trial beam angle. The genes in one chromosome are different with each other, which means that there should be no two beams with same angles in one plan. This study adopts four genetic operations: selection, crossover, mutation, and pseudo immunity (Fig. 2). Parent individuals with higher fitness are selected into the next generation with a higher probability. To any two selected parent individuals, a crossover operation is applied according to a given crossover probability. Then a mutation operation to the two children is randomly done according to a mutation probability. Finally, a pseudo immunity operation is applied to the two children. For example, in the first individual generated after mutation, the angles in the first and last genes are 310° and 300°, respectively, which are two angles with too little separation for an acceptable plan. So it is reasonable to replace one of them with a suitable angle (310° is randomly selected to be randomly replaced by 220° in Fig. 2). 2.3 Objective Function and Fitness Value For each new individual (i.e. a new plan), a CG method is employed to optimize the corresponding beam intensity maps [2][7]. The optimization aims to minimize the dose difference between the prescribed and the calculated dose distributions, which can be mathematically described by the following objective function

Expert Knowledge Guided GA for Beam Angle Optimization Problem in IMRT

837

(

(1)

NT & & Fobj ( x ) = ¦ δ ⋅ w j ⋅ d j ( x ) − p j j =1

& d j (x) =

where

(

& x = x1 , x 2 , , x N

Nray

) is the beam set,

)

¦ a jm ⋅ xm

(2)

m =1

is the number of the beams in a treatment

& plan. All of the selected beams in x are divided into rays, N ray is the total number of & the ray. Fobj (x& ) is the value of objective function of x . NT is the point number in the

volume; δ = 1 when the calculated point dose in the volume breaks the user-defined constraints, else δ = 0; w j is the weight of jth point, d j and p j are the calculated and prescribed doses of the jth point in the volume. a jm is the dose deposited to the jth &

point from the mth ray with a unit weight. x m is the intensity of the mth ray. The quality of each individual is evaluated by a fitness value, and the purpose is to find the individual (plan) with maximum fitness. The fitness value is calculated by

(

& & & Fitness(s ) = Fmax − Fobj ( s ) , s = s1 , s 2 , , s N angle

)

(3)

where Fmax is a rough estimation of the maximum value of the objective function, which makes sure that all the fitness values are positive, a requirement by the selec& tion operation. S is a group of angles to be selected, and Nangle is the number of the & beam angles of the plan. Both Fmax and Fobj (s ) are calculated using Eqs. (1) ~ (2). The whole optimization is terminated when no better plan can be found in the specified number of successive generations of GA, and the individual with the highest fitness in the last generation will be regarded as the optimal beam angles [7].

3 Results A clinical case with prostate tumor shown in Fig. 3 is optimized using the proposed method. There are four OARs to be considered: rectum, bladder, left and right femur head. The sizes and relative positions of the volumes change substantially from slice to slice. Seven 6MV coplanar photon beams are used to irradiate the tumor. The selection of parameters in GA is important for the optimization performance. Though some theoretical studies have been made for the parameter selection [10], all the parameters are mostly empirically selected in engineering applications [11]. As for the seven-beam plan of the case, the population size is experimentally set to 20. The crossover and mutation probability are set to 0.9 and 0.01, respectively. The results of the optimization running with and without expert knowledge are compared. For our new method, two beam orientation constraints are defined ((a) and (b) in Fig. 3), and a plan configuration candidate with beam angles of 0°, 50°, 100°, 150°, 210°, 260° and 310° is defined as an expert knowledge, shown as the solid white straight lines in Fig. 3. This plan candidate has become an informal standard for the prostate case in the clinical IMRT practice. The algorithm runs independently 20 times with and without the knowledge. All of the runs find the same optimal angles: 10°, 60°, 110°, 155°, 200°, 250° and 300°,

838

Y. Li and D. Yao

shown as the dotted white straight lines in Fig. 3. Table 1 lists the statistical results. Summarizing, about 46 minutes are taken when no knowledge is used, but the computation time is reduced to 32 minutes when all the knowledge is incorporated into the optimization progress, about 30% of the optimization time is saved. The influence of the quality and quantity of the knowledge on the performance are also studied. First, the influence is evaluated when some bad knowledge is provided. We provide some unreasonable plans as the templates, such as (10°, 20°, 30°, 40°, 50°, 60° and 70°). We find that the bad knowledge has little influence on the optimization performance, and when all the individuals in the first generation are initialized with the quiet bad templates, the computation time increased very slightly. Second, the influence of good knowledge is also studied. The good templates are produced by randomly sampling around the known optimal plan given by the above runs. Table 2 shows the computation time for different number of good templates. From the table we can find that the computation time meaningfully decreases when the number of good knowledge increases.

Fig. 3. The prostate case and the dose distribution of the optimized plan. The arcs (a) and (b) are the two orientation constraints. The solid white straight lines are the angles of a plan template (i.e., expert knowledge). The dotted white straight lines are the optimized beam angles. Table 1. The comparasion of the optimization runs with and without expert knowledge

With expert knowledge No Yes

Run times 20 20

Maximum computation time 52 min 42 s 38 min 19 s

Minimum computation time 40 min 37 s 29 min 05 s

Mean computation time 45 min 43 s 32 min 26 s

Table 2. The computation time for different number of good templates

Template Number 5 10 15 20 Comput. Time 28 min 12 s 26 min 45 s 25 min 23 s 23 min 37 s

4 Discussion and Conclusions In this paper, a new technique was developed for beam angle optimization in IMRT planning. The beam angles are selected with GA guided by the expert knowledge. The

Expert Knowledge Guided GA for Beam Angle Optimization Problem in IMRT

839

results on a clinical prostate tumor case show that the optimization efficiency is improved by incorporating the expert knowledge into the optimization process. The optimization of beam angles is a difficult process because of the extensive computation. By fully making use of the plentiful expert knowledge accumulated by the oncologists over time, the presented algorithm is hoped to be more feasible for routine IMRT planning. The optimization efficiency will be slightly or heavily improved, and the optimized angles are better, at least not worse than that of not utilizing knowledge. The degree of the improvement depends on the quantity and quality of the prior knowledge provided by the planner. The principle of GA is quite simple, however, it is not easy for GA to solve efficiently specified engineering optimization problems. Now it is a trend to explore some novel schemes to incorporate the expert knowledge into optimization process. This study has just provided a pilot frame for the combination of knowledge with GA. We are currently working on the building of an easily accessed knowledge database and on the more valid scheme for the guiding of GA with knowledge.

References 1. Webb, S.: Intensity-modulated Radiation Therapy. Bristol and Philadelphia, Institute of Physics Publishing (2000) 2. Spiro, S. V., Chui C. S.: A Gradient Inverse Planning Algorithm with Dose-volume Constraints. Med. Phys. 25 (1998) 321-333 3. Pugachev, A., Boyer A. L., Xing L.: Beam Orientation Optimization in Intensitymodulated Radiation Treatment Planning. Med. Phys. 27 (2000) 1238-1245 4. Hou, Q., Wang, J., Chen, Y., Galvin, J. M.: Beam Orientation Optimization for IMRT by a Hybrid Method of Genetic Algorithm and the Simulated Dynamics. Med. Phys. 30 (2003) 2360-2376 5. Gaede, S., Wong, E., Rasmussen, H.: An Algorithm for Systematic Selection of Beam Directions for IMRT. Med. Phys. 31 (2004) 376-388 6. Djajaputra, D., Wu, Q., Wu, Y. Mohan R.: Algorithm and Performance of a Clinical IMRT Beam-angle Optimization System. Phy. Med. Biol. 48 (2003) 3191–3212 7. Li, Y., Yao J., Yao, D.: Automatic Beam Angle Selection in IMRT Planning Using Genetic Algorithm. Phy. Med. Biol. 49 (2004) 1915-1932 8. Souza, W. D., Meyer, R. R., Shi, L.: Selection of Beam Orientations in Intensitymodulated Radiation Therapy Using Single-beam Indices and Integer Programming. Phy. Med. Biol. 49 (2004) 3465–3481 9. Wang, X., Zhang, X., Dong, L., Liu, H., Wu, Q., Mohan R.: Development of Methods for Beam Angle Optimization for IMRT Using an Accelerated Exhaustive Search Strategy. Int. J. Radiat. Oncol. Boil. Phys. 60 (2004) 1325–1337 10. Goldberg, D. E.: Genetic Algorithms in Search, Optimization, and Machine Learning (Addison-Wesley, Reading, Massachusetts) 1989 11. Yu, Y., Schell, M. C.: A Genetic Algorithm for the Optimization Prostate Implants. Med. Phys. 23 (1996) 2085–2091

Extracting Structural Damage Features: Comparison Between PCA and ICA∗ Luo Zhong1, Huazhu Song1, and Bo Han2 1

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, Hubei 430070, China [emailprotected] 2 Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, U.S.A

Abstract. How to effectively extract structural features from structural damage signals is always a hot problem in structural engineering domain. In this paper, principal component analysis (PCA) and independent component analysis (ICA) are disscussed in detail for selecting the feature from the measured time series data. Considering of the structural engneering data with unknow covariance and different scales, a standardization PCA based samples is used. In order to speed up the calculation for components, second-order-statistics spatio-temporal decorreltion algorithm is applied. Then, The components from PCA and different ICA algorithms are tested by the benchmark dataset from IASC-ASCE SHM group in British Columbia University. The results show that both PCA and ICA can effectivey reduce the influence from noise; different cumulate contribution rate in PCA plays different roles, and 99% is preferred . For two-damage level, both PCA and ICA are good; but for multi-damage level, ICA is better than PCA with 99% cumulate contribute rate. Therefore, ICA extracts structural features more accurately than PCA.

1 Introduction The identification of damage from vibration data is still a topic of ongoing intensive scientific research. A comprehensive literature review was made by Doebling and some successful methodologies were shown in his report [1,2]. Feature extraction is the process of the identifying damagesensitive properties, derived from the measured dynamic response, which allows one to distinguish between the undamaged and damaged structure. Therefore, how to seek the featues from the dynamic response time serials data becomes the most important step in detecting the civil structural damage. In addition, the measured data from sensors are sensitive to the structural environmental factors such as noise, temperature, and etc, they are difficult to be recognized and detected. Some reserachers have paid more attention to the processing of signals measured from sensors. Yuen proposed two-stage structural health monitoring ∗

This Project is supported by the Chinese Ministry of Education for Advanced University Action Plan (2004XD-03).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 840 – 845, 2006. © Springer-Verlag Berlin Heidelberg 2006

Extracting Structural Damage Features: Comparison Between PCA and ICA

841

approach for phase I benchmark studies [3]. Structural experts wish to find some new robust approaches to structural features selection. Recently, principal component analysis (PCA) and independent component analysis (ICA) have become very popular for feature selection in data mining. PCA involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. It can discover or reduce the dimensionality of the data set, and identify new meaningful underlying variables, which removes the influence of noise. ICA, mostly used in feature reduction from time series data, is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements or signals. Zang [4] and Song [5] applied ICA to model damaged structures. Their results showed that ICA is a more robust method for feature selection and leads to more accurate classification. However, they didn’t compare the two methods in details. In this paper, the measured time serials signals from sensors were processed by ICA and PCA. For convinent to compare PCA and ICA, the frame combining PCA/ICA with a classifier including support vector machine (SVM) and artificial neural network (ANN) is shown in brief; next, PCA and SOS-ICA are discussed in section 2. In section 3, the experiments, based on the benchmark data from the University of British Columbia, are shown. Finally, the conclusion is in section 4.

2 Methodology 2.1 Frame for Structural Damage Identification The frame of integrating PCA/ICA and classifier is shown in Fig.1. The original time domain data X1, X2, …, Xn and noise measured by the sensors are first used as the input to PCA/ICA model, and result in the independent component matrix z, which serves as the input attributes for Classifier (SVM/ANN model), and the output of the Classifier is the status of strucure. This is a non-physics based model, therefore, it can be treated as a general model in structural damage identification, where ICA/PCA is for feature selection and the Classifier is for classification. ANN is used in this paper. X1

Node 1

Node 2

Node n

ICA / PCA +

Classifier

status

(SVM / ANN) noise

Fig. 1. Frame for structural damage identification

2.2 Featrue Selection by PCA and ICA PCA finds orthogonal directions of gretest variance in the data. It extracts principal components amounts according to a variance maximizing rotation of the original structural features, whose algorithm is shown in [6].

842

L. Zhong, H. Song, and B. Han

ICA techniques provide statistical signal processing tools for optimal linear transformations in multivariate data. Considering the volume of the structural damage data, a fast method to calculate componetnts is needed. AMUSE, supposed by François Vialatte1and Andrzej Cichocki [7], arranges components not only in the order of devreasing variance, but also in the order of their decreased linear predictability. It belongs to the group of second-order-statistics spatio-temporal decorrelation algorithm (SOS-ICA), and uses simple principles that the estimated components should be spatio-temporally decorrelated and be less complex than any mixture of those sources. The components are ordered according to decreasing values of singular values of a time-delayed covariance matrix. Therefore, it applies two consecutive PCAs: first PCA is applied to input data; second PCA is applied to the time-delayed covariance matrix of the output of previous stage. Unlike in many ICA algorithms, all components estimated by SOS-ICA are uniquely defined and consistently ranked. Therefore, SOS-ICA is faster than some other ICA algorithms.

3 Experiments In this section, we will use PCA and ICA to extract features from structual data, apply both undamaged and damaged data as training data to construct a classifer ANN (three level: one input layer, 10 nodes in hidden layer, 1 node in output layer with Sigmoid function), and then apply the ANN model to test unseen data, explore that if they are correctly recognized. In addition, the different cumulate contribution rate is discussed for the practice structural damage domain. For comparing with different ICA algorithms, SOS-ICA and FastICA algorithm are applied in the experiments. The non-quadratic function g(y) = tanh (a1×y) is used to compute nongaussianity in FastICA [9]. Two-damage level and multi-damage level will be used to test PCA and ICA. 3.1 Data Sets A popular benchmark is used to testify the classification accuracies, which is developed by the IASC-ASCE SHM task Group at University of British Columbia. The structure is a 4-story, 2-bay by 2-bay steel-frame scale-model structure in the Earthquake Engineering Research Laboratory [8]. In our experiments, we mainly use seven data sets in the ambient data from this benchmark dataset, where C01 is an undamage dataset, C02∼C07 are different damage type datasets. The detailed information was shown in [9]. There are 15 attributes in each dataset. They correspond to the signals from 15 sensors located in this steel-frame. Also the benchmark dataset provide additional noise attribute. Seven datasets in the ambient data from this benchmark are used, in which we random extract 6000 data from original config01∼config07 data to respectively C01∼C07. 3.2 Identifying Two-Damage Level Experiment by PCA and SOS-ICA with ANN Two-damage level is often met in civil structural damage. C01 is treated as undamage data whose output is ‘1’ ; and so on, C07 is damage data whose output is ‘7’. Some of

Extracting Structural Damage Features: Comparison Between PCA and ICA

843

C01 and one from C02 to C07 are training data, corresponding the rest data are test data, the damage prediction values are shown in Table 1. So, PCA with 95% or 99% cumulate contribution rate and SOS-ICA can identify the two damage level well, any of these methods can be used. Table 1. Prediction value for two-damage level by PCA and SOS-ICA

C01/C02 C01/C03 C01/C04 C01/C05 C01/C06 C01/C07

1st class value 1 1 1 1 1 1

PCA (95%) 1.0000 1.0000 1.0002 1.5242 1.1758 1.0135

PCA (99%) 1.0001 1.0010 1.0385 1.0019 1.0008 0.9995

SOS 2nd -ICA class value 0.9999 2 0.9999 3 0.9999 4 0.9999 5 1.0001 6 1.0003 7

PCA (95%) 2.0002 2.9993 3.9445 4.3959 5.6817 7.0003

PCA SOS (99%) -ICA 1.9869 1.9912 2.9758 2.9737 3.9998 3.9308 4.9867 4.9665 5.9733 5.9542 6.8456 6.9339

3.3 Identifying Multi-damage Level Experiment by PCA and ICA with ANN For further test the identification methods, PCA and Fast-ICA are used to select features, and the result as an input to ANN. 3.3.1 Components from PCA and ICA By applying the PCA with different cumulate contribution and Fast-ICA algorithm, the original signal of C01 is shown in Fig.2. and the components of C01 are relatively shown in Fig.3∼Fig.5. Then, we calculate the correlative coefficient of the noise and the number of attributes correlated with noise from the component data, then calculate the correlative coefficient of the data and noise extracted by PCA and ICA, finally calculate the number of attributes correlated with noise, and the result is shown as Table 2, where we observed that PCA and ICA can effectively reduce the noise in structural signals. 3.3.2 Identifying Multi-damage For multi-damage level experiment, C01 is treated as undamage data whose output is ‘1’ ; and so on, C07 is damage data whose output is ‘7’. Some of C01∼C07 are as

-3

original dataset C01

x 10

PCA with 95% contribution rate on dataset C01

0 2

-2

2 0

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

x 10

-1

-2

0 -3

-2

1 -3 -4

0 -3

x 10

1000

2000

3000

4000

5000

Fig. 2. Original C01signal

6000

-1

Fig. 3. PCA(95%) components of C01

844

L. Zhong, H. Song, and B. Han

-3

5 0 -5 2 0 -2 1 0 -1 4 2 0 -2 -2.5 -3 0 -0.5 -1

PCA with 99% contribution rate on dataset C01

x 10

0 -3 x 10

1000

2000

3000

4000

5 0 -5

5000

5 0 -5

6000

10 0 -10

0 -3 x 10

1000

0 -3 x 10

1000

2000

3000

4000

5000

6000

5 0 -5 2 0 -2

0 -3 x 10

2000

3000

4000

5000

6000

5 0 -5 1000

2000

3000

4000

5000

10 0 -10

6000

10 0 -10

0 -3 x 10

1000

2000

3000

4000

3000

5000

4000

6000

5000

5 0 -5 5 0 -5

6000

ICA on dataset C01 0

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

1000

2000

3000

4000

5000

6000

Fig. 4. PCA(99%) components of C01

Fig. 5. ICA components of C01

Table 2. Attributes number correlative to noise ICA PCA(95%) Source data ICs# correlative noise # PCs# correlative noise # C01 C02 C03 C04 C05 C06 C07

9 2 2 3 10 6 4

10 7 7 7 8 9 10

3 0 0 0 4 2 1

3 7 9 7 2 3 10

PCA(99%) PCs# correlative noise #

0 0 0 0 1 0 1

6 11 11 10 5 6 13

Predict 7 different damage level with ICA components

Predict 7 different damage level by PCA with 99% contribution rate

Predict 7 different damage level by PCA with 95% contribution rate

8 C07

C07

2 1 1 2 2 0 2

C07

C06

C06(x)

C05 C05

C03(x) C04(x)

5 C04

4 C03 3

2000

C03 3 C02

1000

C01

C04

2 C01

C02

C02(x)

dam age lev el

dam age level

C05(x) 5 4

C06

3000

4000 5000 6000 number of samples

7000

8000

9000

Fig. 6. Predict by PCA(95%)

1000

2000

3000

4000 5000 6000 number of samples

7000

8000

9000

Fig. 7. Predict by PCA(99%)

1000

2000

3000

4000 5000 6000 number of samples

7000

8000

9000

Fig. 8. Predict by ICA

training data, the rest C01∼C07 are as test data, the damage prediction values are shown in Fig.6∼Fig.8. When using 95% as the PCA cumulate contribution rate, prediction of C03∼C06 are wrong, and also half error to C02, therefore, PCA with 95% cumulate contribute rate can not meet the requirements. But when changing to 99%, the seven damage situations of C01∼C07 can all be forecasted accurately. Comparing to C05∼C06, the forecast accurate rates of C01∼C04 and C07 are much higher. But the forecast effects of ICA are perfect, it can distinguish the seven different damages accurately, and has

Extracting Structural Damage Features: Comparison Between PCA and ICA

845

a high accuracy. Therefore, the experiments indicate both PCA(99%) and ICA can distinguish different damages, and ICA surpasses PCA in extracting structural feature.

4 Conclusions In this article, PCA and ICA of the statistical methods are applied to extract structural features from the original measured data, which have complex relations among them. Different ICA algorithms and different cumulate contribution rates are discussed. Our experiment with the data from the benchmark data from IASC-ASCE SHM task group is used to verify our solution: PCA and different ICA algorithms (SOS-ICA and FastICA) can reduce the effection by noise. For two-damage level, PCA with 95% or 99% cumulate contribution rate and ICA can get a accuracy prediction result; but for multi-damage level, PCA with 99% cumulate contribution rate can predict the damage level accurately, but ICA is more better way than PCA. Therefore, PCA is subject to ICA in damage feature extraction.

Reference 1. Doebling, S.W., Farrar, C.R., Prime, M.B., Shevitz, D.W.: Damage Identification and Health Monitoring of Structural and Mechanical Systems from Changes in Their Vibration Characteristics: a Literature Review. Los Alamos National Laboratory. The Shock and Vibration Digest, 30 (1998) 91-105 2. Fritzen, C.P., Jennewein, D., Kierer Th.: Damage Detection Based on Model Updating Methd., Mechanical systems and Signal Processin. 12 (1998) 163-186 3. Yuen, K. V.: Two-stage Structural Health Monitoring Approach for Phase 1 Benchmark Studies, Journal of Engineering Mechanics, ASCE, 130 (2004) 16-33 4. Zang, C.: Structural Damage Detection using Independent Component Analys, Structural Health Monitoring, Int. J. 3 (2004) 69-84 5. Song, H.: Structural Damage Detection by Integrating Independent Component Analysis and Artificial Neural Networks. MLMTA’05, CSREA Press, (2005) 190-196 6. Zhong, L.: Structural Damage Identification Based on PCA and ICA, Accepted, Journal of Wuhan University of Technology, will be shown in No.7 (2006) 7. Vialatte1, F., Cichocki, A.: Early Detection of Alzheimer’s Disease by Blind Source Separation, Time Frequency Representation, and Bump Modeling of EEG Signals. ICANN 2005, Springer-Verlag Berlin Heidelberg LNCS, 3696 (2005) 683-692 8. http://wusceel.cive.wustl.edu/asce.shm/benchmarks.htm 9. Song, H.: Structural Damage Detection by Integrating Independent Component Analysis and Support Vector Machine. ADMA 2005, Springer LNAI, 3584 (2005) 670-677

Face Alignment Using an Improved Active Shape Model Zhenhai Ji1, 2, Wenming Zheng1, Ning Sun1, 2, Cairong Zou2, and Li Zhao2 1

Research Center of Learning Science, Southeast University, Nanjing, Jiangsu, 210096, P.R.China 2 Department of Radio Engineering, Southeast University, Nanjing, Jiangsu, 210096, P.R.China {ji*zhenhai, sunning}@seu.edu.cn

Abstract. Active Shape Model (ASM) is composed of global shape model and local texture model, and it is a powerful tool for face alignment. However, its performance is often influenced by some factors such as the initial location, illumination and so on, which will frequently lead to the local minima in optimization. Fully using the local information of each landmark we propose an improved method of ASM, in which we extend traditional local texture model to be three sub models by adding another two local models, combining with two models we construct a more robust model of ASM. Experiments show that the improved method solves the local minima problem efficiently and demonstrates the better accuracy performance and wider region in searching the target face image.

1 Introduction Face alignment is a vital step in the field of face recognition and facial expression recognition. Beinglass et al. [1] described a scheme for locating such objects using a Generalized Hough Transform with the point of articulation as the reference point for each subpart. Kass et al. [2] introduced Active Contour Models, an energy minimization approach for shape alignment. Wiskott et al. [3] developed Elastic Bunch Graph to locate facial feature using Gabor Wavelet. Nastar et al. [4] used a finite element approach to model the vibration modes of a shape. Particularly, Cootes et al. [5], [6] proposed the Active Shape Model and Active Appearance Model (AAM) respectively, and they have been the representative statistical modeling methods for their better performance and generalized application in many fields. ASM includes the global shape model and local texture model, which are all derives from the point distribution model. However, its performance often suffers from the factors such as the initial condition, illumination, facial expression etc., and all the above factors frequently lead to the local minima in optimization. To solve the local minima, many improved methods have been proposed in this several years. Zhao et al. [7] proposed a new shape evaluation on ASM and use it to drag the search out of the local minima; and Jiao et al. [8] proposed an improved method of ASM, called W-ASM, to solve the local minima problem. Different from the above two D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 846 – 851, 2006. © Springer-Verlag Berlin Heidelberg 2006

Face Alignment Using an Improved Active Shape Model

847

methods and under another consideration of the essence in ASM, we explore the local information of each landmark fully, and extend the original local texture model to be a more robust model, which includes three sub models, the original local texture model and another two models. The added two sub models will comprise more information than the original unique model, so the combination of three sub models can capture more sufficient local information. The rest of paper is arranged as follows. The traditional ASM algorithm is described in Section 2. In Section 3, we present the improved method. Experimental results are presented in Section 4, and the last Section is conclusions.

2 Algorithm of Traditional ASM The traditional ASM method is constructed based on the theory of Point Distributed Model (PDM), which derived from the landmarks around the shape; Figure 1 is one of face example, which is handed annotated by 58 landmarks. Supposed that we have a training face image set, and all the coordinates of landmarks for every face image are Tangent Face contour

Normal p2

p1 Fig. 1. Annotated image with landmarks Fig. 2. Sampled process sketch of three sub models

connected sequentially into a vector. So, we can obtain a vector set corresponding with the training set. After aligning these vectors using Generalised Procrustes Analysis and Principle Component Analysis (PCA), we can generate the ASM global shape model, and each face image can be approximately represented by fewer main mode parameters. The formula of global shape model can be written as below:

s ≈ s + Pb .

(1)

Where s is the mean shape, b is the shape parameter, and P is the transformation matrix, which is composed of the principle component vectors. Another part of ASM is local texture model, which to describe local features character of each landmark, and it is constructed using the first derivative of the sampled intensity vectors perpendicular to this landmark contour. During searching facial feature points in a target face image, we calculate the Mahalanobis distance using formula (2) and the minimum value corresponding to point is the optimal −

candidate point. In which,

l p and ¦ p represent the mean texture vector and

848

Z. Ji et al.

p respectively; q is the near point on the line of perpendicular to the landmark p , and l q is the normalized sampled local texture covariance matrix in landmark

vector at the point

q , and q opt is the optimal candidate point of landmark p . −

−

qopt = arg min[(l q − l p ) T ¦ −p1 (l q − l p )] .

(2)

3 Improved Active Shape Model According to the whole construction process of ASM, we recognize that the local texture model acts as a key role in searching the candidate point of each landmark. Under the consideration we improved the local texture model and extend it into three sub models. The following describes the whole construed process of three sub models in detail. The first sub model is the original local texture model, and we rename it as middle local texture model for convenience to distinguish from the rest two sub models. The rest two sub models, named as the intern local texture model and extern local texture mode respectively, are constructed in similar way as the middle local texture model, and the difference between them is at the location of selection; the intern local texture model mainly sample “in” the face image, the extern local texture model mainly sample “out” of the face image, and the middle local texture model mainly sample on the contour of face image. The figure 2 shows the above sampled process roughly, supposed that p 0 is a point on the edge of face contour, p1 is the point “in” the face

p0 , and p 2 is a point “out” the face image corresponding point p 0 ; λ1 and λ 2 are the distances from the point p 0 to the point p1 and point p 2 respectively; let p0 , p1 , p 2 to be as the center point respectively, image corresponding the point

then sample some length intensity along the two sides of the each center point, thus we can obtain the corresponding sampled vectors for each landmark. After normalization for such sampled vectors just as described in section 2, we can obtain three mean vectors and three covariance matrixes for given landmark p , denoted as

l pm , l pi , l pe , Σ mp , Σ ip , Σ ep respectively; in which, the symbols meaning of l and ¦ represent mean vector and covariance matrix, and the superscripts m, i, e in them denote three kinds local texture model respectively. After the extension of local texture model, the evaluation function corresponding with formula (2) can generalize as the following: −1

qopt = arg min[α (lqi − l pi )T ¦ ip (lqi − l pi ) + q

β (l − l ) ¦ m q

m T p

m −1 p

(l − l ) + γ (l − l ) ¦ m q

m p

e q

e T p

e −1 p

(3)

(l − l )] . e q

e p

Face Alignment Using an Improved Active Shape Model

849

l qi , l qm , l qe demoted the corresponding three local texture vectors at point q respectively, and α , β , γ are the nonnegative weights such that the sum of them

Where

equals to 1. Comparing with the formula (2) and (3), we can see the formula (2) is a special case of extension of original ASM, and here the weighted coefficients only need to satisfy the conditions α = γ = 0 and β = 1 . On the other hand, Because of covering wider region in searching the target point, we may get more optimal candidate point q opt using the evaluation function of formula (3). During the construction of improved local texture model, it refers to selection of five coefficients λ1 , λ 2 , α , β , γ . In this paper, compared with the internal and extern local texture model, the middle local texture model should act as more important role for its kernel location, and the internal local texture model and external local texture model seem to play the same roles, so the value of α , β , γ can set to be 0.25,0.5,0.25 respectively. λ1 and λ 2 are the distance between two points, the values of them are neither too small nor too large, hereby, we must give a tradeoff of them, λ1 and λ2 set to be 4 pixels and 4 pixels in this paper experimentally.

4 Experimental Results Our database contains 160 different frontal face images [9], which includes 40 different characters and each has 4 different face images and the size of each image is 120*160. Each image is manually annotated 58 landmarks just as Fig. 1. We select 80 images as the training set, which are from 40 different characters and each is chosen 2 images randomly, and the rest 80 images are as the test set. In addition, we also select 40 images randomly from Yale face database B [10] as the supplement of test set.

(a) traditional ASM

(b) improved ASM

Fig. 3. Solving the problem of local minima

4.1 Solving Problem of Local Minima in Optimization ASM may plunge into the local minima problem in optimization when searching feature points in an unknown target face image. Through imposed the internal local texture model and extern local texture model on the point, we can drag such point out of the local minima, and figure 3 show the result.

850

4.2

Z. Ji et al.

Comparisons on Capture Range

For each test image, we begin searching from the identical mean model at different displacements by up to ± 10 pixels in x coordinate, and then perform searches to attempt to make the displaced mean model converge to the original optimal position. After the convergence, we calculate the point-to-point error from the convergence points to the labeled points. Figure 4 demonstrates the point-to-point errors vs. different displacements. From the two curves we can see that our improved method has more wider capture range than the traditional ASM.

Fig. 4. Point-to-Point errors with different initial displacements

4.3 Comparison on Point Location Accuracy We displace the landmarks of each test image from the accurate position to the different displacements in x, and then run the searches to make the displaced point regress to the original accurate positions. After convergence, we compare the estimated points with the original labeled points. Figure 5 shows the experimental result. The x coordinate is the distance from the found points to the labeled points. The y coordinate is the percentage of the number of the points. We can see the improved method has more accurate performance than the traditional ASM.

Fig. 5. Percentage of Point Number vs. Distance from Found Points to Labeled Points in x

5 Conclusions The conventional ASM is composed of global shape model and local texture model. However, its performance is often influenced by some factors, which will frequently

Face Alignment Using an Improved Active Shape Model

851

lead to the local minima in optimization. Through using the local information fully we construct a more robust model. Difference from the original ASM, the more robust model includes three sub models, which are combined together to search the new candidate point. Experiments show that the improved method solves the local minima problem efficiently and demonstrates the better accuracy performance and wider region in searching the target face image.

Acknowledgment This work was partly supported by the national natural science foundations of China under grant 60503023, and partly supported by the natural science foundations of Jiangsu province under the grant BK2005407.

References 1. Beinglass, A., Wolfson, H.J.: Articulated Object Recognition, or: How to Generalize the Generalized Hough Transform in Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (1991) 461-466 2. Kass, M., Witkin, A., Terzopoulos, D.: Snakes:Active Contour Models. 1st International Conference on Computer Vision, London. (1987) 259-268 3. Wiskott, L., Fellous, J. M., Kruger, N., Christoph V.M.: Face Recognition by Elastic Graph Matching: Interlligent Biometric Techniques in Fingerprint and Face Recognition, Eds. Jain L.C. et al. (1999) 355-396 4. Nastar, C., Ayache, N.: Fast Segmentation, Tracking and Analysis of Deformable Objects. International Conference on Computer Vision, IEEE Comput. Soc. Press. (1993) 275-279 5. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active Shape Models – Their Training and Application. Computer Vision and Image Understanding, 61 (1995) 38-59 6. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models. European Conf. On Computer Vision. Springer. 2 (1998) 484-498 7. Zhao, M., Li, S.Z., Chen, C., Bu, J.J.: Shape Evaluation for Weighted Active Shape Models. The Asian Conference on Computer Vision. .Korea. 2 (2004) 1074–1079 8. Jiao,F., Li, S.Z., Shum,H.Y., Schuurmans, D.: Face Alignment Using Statistical Models and Wavelet Features. Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition.1 (2003) 321-327 9. Stegmann, M.B., Ersb ll, B.K., Larsen, R.: FAME – A Flexible Appearance Modeling Environment. IEEE Trans. on Medical Imaging. 22 (2003) 1319-1331 10. Athinodoros, S.G., Peter, N. B.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. PAMI. 23 (2001) 643-660

Face Detection with an Adaptive Skin Color Segmentation and Eye Features Hang-Bong Kang Dept. of Computer Eng. Catholic University of Korea, #43-1 Yokkok 2-dong Wonmi-Gu, Puchon City Kyonggi-Do, Korea [emailprotected]

Abstract. In this paper, we propose a new method of face detection using an adaptive skin color model and eye features. First, we detect skin color segments adaptively using a two-dimensional Gaussian model from the CrCb skin color space. On these skin color segments, shape analysis is performed to reduce false alarms. Then, eye feature points for face are extracted. The possible eye feature points are compared with normalized eye features obtained from the training data for verification. At this step, we use a modified Hausdorff distance. Experimental results are given to demonstrate our face detection approach in slanted face images and different lighting conditions.

1 Introduction Detecting human faces is a first step in many multimedia applications such as automatic face recognition, face tracking and surveillance system. Several different cues such as skin color, motion, facial shape, and facial appearance can be used in face detection [1]. In particular, color has been suggested to be a powerful fundamental cue in face detection [2,3]. In recent years, various statistical color models are used in discriminating skin pixels and non-skin pixels for face detection. Single Gaussian model [2,3], Color histograms [4], and a Guassian mixture density model [5] were suggested. Wang and Chang [2] used color thresholding for face detection by means of suitable CrCb skin color distribution. Tsapatsoulis et al. [3] proposed a model which combines a twodimensional Gaussian color model and shape features with template matching. Jones and Rehg [4] proposed a comprehensive analysis of skin and non-skin color models and showed that histogram models were found to be slightly superior to Gaussian mixture models in terms of skin color classification. In this approach, Bayesian detectors based on skin color histogram produced higher face detection results but their adaptation involves increased computational cost. Even though color-based face detection system may work fast, there are still some limitations because the color of a face varies with changes in illuminant color, viewing geometry and miscellaneous sensor parameters. So, it is desirable to develop an adaptive algorithm to handle various situations. In this paper, we propose a new face detection approach that combines adaptive two-dimensional Gaussian color model D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 852 – 857, 2006. © Springer-Verlag Berlin Heidelberg 2006

Face Detection with an Adaptive Skin Color Segmentation and Eye Features

853

and eye features with template matching. Section 2 describes adaptive skin color segment detection method. Section 3 explains face candidate extraction and Section 4 discusses the template matching. Section 5 presents our experimental results.

2 Adaptive Skin Color Segmentation Generally, the skin color subspace in YCrCb color space covers a small area of the CrCb chrominance plane. However, it is very difficult to construct a skin color model to efficiently detect faces in all images. One possible solution is to use an adaptive skin color model to accommodate apparent changes in color due to varying lighting conditions. First, we model the CrCb skin-tone color distribution as two-dimensional Gaussian distribution.

½ 1 exp ®− ( x − μ 0 )T Σ −1 ( x − μ0 ) ¾ 2 ¯ ¿. P ( x | μ 0 , Σ) = 1 d ( 2π ) 2 | Σ | 2

(1)

where d = 2 , μ0 is mean vector and Σ is the covariance matrix. These parameters are initially estimated from training data which consist of face data of different races obtained from Web. Secondly, we detect skin segments by computing the likelihood of pixel using threshold value T = μ0 + σ . From the pixels that are classified as skin segments, we extract connected regions. After that, we compute average likelihood of the largest connected region R as μc

1 N

¦ p( x(i)) where N is the number of pixels in i∈R

region R. Finally, we update the mean of the Gaussian model as

μ0 = (1 − γ ) μo + γμc

(2)

where γ is the learning rate. The learning rate should be chosen depending on the lighting variations. In our experiments, γ = 0.7 has been shown to provide good results. If no skin regions are detected, the update process like Eq. (2) is not executed. Using adaptative color model, we can handle faces with new color characteristics or illumination changes.

3 Face Candidate Extractions From the list of skin segments, we execute morphological operations like opening and closing with structure element “+”. After that, it is necessary to reduce false alarms. We therefore compute the shape features because extracted segments are sometimes unrelated to the human faces.

854

H.-B. Kang

Since the shape of face is elliptical, it is desirable to compute the similarity of the shape of the extracted segment with an elliptical shape. The shape feature is computed from the bounding rectangle of each skin segment. We compute the ratio of the width (or short side) to the height (or long side) of the bounding rectangle. If the ratio is in the range between 0.4 and 0.9, we classify the segments as the candidates of face segments. To select probable face segments, we compute facial features from face candidates. There are various facial features such as eyes, a nose, and a mouth. Among these features, we select eye candidates as salient features because they are important in characterizing faces in different viewing geometry. To extract eye candidates, we transform the candidates of face segments into a gray scale image. After that, we make a binary image using a threshold value that is determined from the cumulative histogram. When the amount of cumulative histogram reaches the 15% of whole histogram, its value is chosen as the initial threshold value. From the binary image, we extract connected components which have enough size. If the number of salient features is not large enough at this threshold value, we increase the threshold value by 5% amount of cumulative histogram. If the number of extracted features is large enough, we stop the increase of threshold value. For detected feature points, we label them like Figure 1(a). To compute the possibility for detected points to be eye features, we divide the face segment into four regions because eyes are located in the upper side of the face. First, we divide the bounding rectangle of face segment into 3 to 2 ratio in height. Then, we compute the average of x coordinates of feature points in the window and then divide the window vertically using average value of x coordinates. This is shown in Figure 1(b). For each feature points in upper left window, we execute dilation operation on the feature points with a disk structuring element. Then, we compute the compactness value of dilated feature points. If the compactness value is near 1, we choose the point as eye candidate. So, the feature points are sorted according to the compactness value and saved as a list.

Fig. 1. Eye feature points extracted from a skin segment

4 Template Matching In this Section, we will discuss the face verification stage. From the eye feature list, we assess whether the segment is face or not by template matching. For template matching, we construct normalized eye feature template from training data. Figure 2 shows the normalized eye features. Before matching extracted features, we first compute the gradient of connecting line of two eye features. If the gradient of the eye line is not equal to horizon, we rotate the gradient of extracted eye line into horizontal.

Face Detection with an Adaptive Skin Color Segmentation and Eye Features

855

Then, we choose the eye region from the rotated images. By doing this process, our method can detect slanted faces efficiently. To compute similarities between extracted eye features and a template, we measure the modified Hausdorff distance [6]. The modified Hausdorff distance is defined as follows:

H ( A, B) = max(h( A, B), h( B, A)) . where

(3)

h f ( A, B) = f ath∈A min a − b and f x∈th X g (x) denotes the f-th quantile value b∈B

of g(x) over the set X. For a possible pair of eye features, we compute the modified Hausdorff distance from template. We choose the eye feature candidates which have the minimum Hausdorff distance from normalized eye features. After finding eye candidates, we compute the face area by the angle between eye lines and a chin. The angle is computed from the training image. After detecting eye and chins, we localize the face from the image data.

(a)

(b)

Fig. 2. Normalized eye feature

(c)

Fig. 3. Skin color segments detection

5 Experimental Results Our proposed method consists of three modules such as a skin segment extraction module, a face candidate extraction module, and a template matching module. We have tested our method on 14 video sequences which were captured from the web camera. Four sequences have severe illumination changes. First, we detected skin segments using adaptive two-dimensional Gaussian distribution model. The mean chrominance vector is updated from current image. Figure 3(b) shows the detected skin segments using initial mean vector of the image. Figure 3(c) shows updated detected segments. From the skin segments, we analyzed shape features of the bounding rectangle of each skin segment and then extracted eye feature candidates. We computed the gradient of eye lines and rotated the input image for template matching. Figure 4(a) shows input image and we rotated the image by 13.5 degree like Figure 4(b). Finally, we cut the eye regions from the image for template matching. The size of eye regions was adjusted by expanding or shrinking according to the normalized image. Sobel edge detection is performed and modified Hausdorff distance was computed for each pair of eye features. We selected desirable eye points and computed the face area using the angle between eyes and a chin.

856

H.-B. Kang

Figure 5 shows the detected slanted face. Table 1 shows the face detection result. The accuracy of face detection results on our test sequences are 96.8% for sequences with slanted faces and 95.1% for sequences with illumination changes. For the sequences having illumination changes or slanted faces, the results of our method is slightly better than boosting-based methods [7].

(a)

(b)

Fig. 4. (a) input image, (b) rotated image

Fig. 5. Face detection result

Table 1. Face detection results

Sequences Total Normal se3206 quences with slanted faces Sequences with illumina1467 tion changes

Detected

False Alarm

Accuracy

3104

102

96.8%

1396

95.1%

6 Conclusions In this paper, we proposed a new approach for face detection method with adaptive CrCb Gaussian color model and eye features. Skin color segments are extracted from adapted Gaussain model and then shape analysis is performed on the bounding rectangle of skin color segments to reduce false alarms. From extracted skin segments, we select eye candidates as salient features because they are important in characterizing faces in different viewing geometry. The template matching with eye features are performed by modified Hausdorff distance measure. The proposed face detection method can be used in various multimedia applications like video indexing and surveillance system.

References 1. Yang, M., Kriegman, D., Ahuja, N.: Detecting Faces in Images: A survey. IEEE PAMI (2002) 34-58 2. Wang, H., Chang, S.: A Highly Efficient System for Automatic Face Region Detection in MPEG video. IEEE Trans. Cir. Sys. Video. Tech. (1997) 615-628

Face Detection with an Adaptive Skin Color Segmentation and Eye Features

857

3. Tsapatsoulis, N., Avrithis, Y., Kollias, S.: Efficient Face Detection for Multimedia Applications. Int. Conf. ICIP (2000) 26-44 4. Jones, M., Rehg, J.: Statistical Color Model with Application to Skin Detection, Proc. IEEE CVPR’99 (1999) 63-87 5. Raja, Y., McKenna, S., Gong, S.: Colour Model Selection and Adaptation in Dynamic Scenes, Proc. ECCV (1998) 65-91 6. Rucklidge, W.: Locating Objects using the Hausdorff Distance, Proc. ICCV’95 (1995) 7. Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features, IEEE CVPR (2001) 78-85

Fall Detection by Wearable Sensor and One-Class SVM Algorithm Tong Zhang1, Jue Wang1, Liang Xu2, and Ping Liu1 1

Key Laboratory of Biomedical Information Engineering of Education Ministry, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P. R. China {zhangtong3000, liuhuangs}@163.com, [emailprotected] 2 School of Computer Science and Technology of Xidian University, Xi’an, Shaanxi 710071, P. R. China [emailprotected]

Abstract. The fall is a crucial problem in the elderly people’s daily life, and the early detection of fall is very important to rescue the subjects and avoid the badly prognosis. In this paper, we use a wearable tri-axial accelerometer to capture the movement data of human body, and propose a novel fall detection method based on one-class support vector machine (SVM). The one-class SVM model is trained by the positive samples from the falls of younger volunteers and a dummy, and the outliers from the non-fall daily activities of younger and the elderly volunteers. The preliminary results show that this method can detect the falls effectively, and reduce the probability of being damaged in the experiments for the elderly people.

1 Introduction The fall is a common unexpected event in daily life, it usually can only damage the young people slightly, but it is really a crucial problem to the elderly people. About 10% to 15% falls will cause serious injury in the elderly people, and more than 1/3 of the persons aged over 65 will fall at least once per year[1,2]. The early detection of fall is very important to rescue the subjects and avoid the badly prognosis[2,3]. Wearable sensor based fall detection means embedding some micro sensors into clothes, girdle, etc., to monitor the movement parameters of human body in real-time, and determine whether there is a fall occurred based on the analysis about these parameters. Currently, wearable sensor based fall detection systems usually collects acceleration, vibration and tilt signals, and set a few thresholds to these signals respectively, then make decisions by detecting whether there is one or several data over the thresholds[1,2,3]. But there exist many problems about this kind of algorithms, include lacking of adaptability, deficiently in classification precision, etc. For example: Hwang et al[3] use tilt switch to trigger the detection program, when the tilt of the person’s upper body over 70°, the program will start to process the acceleration signals to determine whether there is a fall occurred. However, if the person slides fall during going down-stairs, in general he will sit down on the stairs with only a small tilt degree on the upper body, and hence the detection program will not be triggered. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 858 – 863, 2006. © Springer-Verlag Berlin Heidelberg 2006

Fall Detection by Wearable Sensor and One-Class SVM Algorithm

859

In this paper, we use a tri-axial accelerometer to capture the movement signals of human body, and propose a novel method based on one-class SVM to detect falls. The one-class SVM model is trained by the positive samples from the falls of younger volunteers and a dummy, the outliers from the non-fall daily activities of younger and the elderly volunteers. The preliminary results show that this method can detect the falls effectively, and at the meantime the probability of being damaged in the experiments for the elderly people are reduced.

2 Methods 2.1 Materials A tri-axial accelerometer, “MMA7260Q”, is selected, it is a low-g micro-machined resistance accelerometer with the volume 4*4*1mm.The chosen measuring range is 4g and the sampling rate is 512 points per second. When in the state of static, it is the acceleration of gravity to the vector sum of the signals from axes x, y and z. The sensor is affixed to a belt and bound to the human body, as shown in Fig.1.

axz = ax + az atotle = ax + ay + az x

The Elderly Person

z O

Accelerometer (MMA7260Q)

Fig. 1. The accelerometer and its fixed position

2.2 One-Class SVM Algorithm One-class SVM is an extended algorithm of SVM[4,5], it divides all samples into objective field and non-objective field, mapping all the samples into high dimensional feature space by using of a kernel function. Then, in the feature space, one-class SVM computes the surface of a minimal hypersphere which involves all the objective data inside, and this minimal hypersphere will be the classifier. A group of variables named slack variables are introduced to realize the trade-off between the radius of the hypersphere and the number of the samples which outside the hypersphere. All the samples inside the hypersphere are known as positive samples, the outside samples as outliers. Let X be a positive sample set, X={xi, i=1,...,l}, xi ∈ Rd, then we use nonlinear mapping to find a minimal hypersphere in high dimensional feature space, let vector a be the centre, R be the radius of the hypersphere, and involves the samples as many as possible. That is an optimal problem as follows:

860

T. Zhang et al.

minA

R∈\ ,ξ ∈\ , a∈F

R2 +

1 ¦ξ νA i i

(1)

s.t.

ξi ≥ 0 , i ∈ [1, A ]

|| Φ( xi ) − a ||2 ≤ R 2 + ξi ,

where F is feature space, ξ i are the slack variables, 1/vl determines the volume of the hypersphere and the number of the samples which will be segmented beyond the hypersphere, v ∈ (0,1), and l expresses the number of all the samples. Based on KKT condition, and introduce the kernel function:

K ( x, y ) =< Φ ( x)

(2)

the dual expression of the optimal problem (1) is:

min ¦ λi λ j K ( xi , x j ) − ¦ λi K ( xi , xi ) λ

i, j

(3)

s.t.

0 ≤ λi ≤

1 , νA

¦λ

The centre of the hypersphere is: a = ¦ λi Φ ( xi )

(4)

After the training, a group of support vectors will be obtained, and we can calculate the radius R by the following equation:

R 2 = ¦ λi λ j K ( xi , x j ) + K ( xs , xs ) − 2¦ λi K ( xi , xs ) i, j

(5)

where xs is any support vector. And then the decision function is:

f ( x) = sgn(R2 − ¦ λi λ j K ( xi , x j ) + 2¦ λi K ( xi , x) − K ( x, x)) i, j

(6)

To any sample, if f(x)>0, the sample will be classified in positive, if f(x) 0, q(x ) = ® ¯2 for f ( x ) ≤ 0 . The n-class generalization involves a set of discriminant functions f y

(2)

:X ⊆

ℜ n → ℜ , y ∈ Y = {1, 2, ..., c}defined as

f y ( x) = α y k s (x ) + b y , Let the matrix

y ∈Y .

(3)

A = [α 1 ,..., α c ] be composed of all weight vectors and b =

[b1 ,..., bc ] be a vector of all biases. The multi-class classification rule q : X → Y = {1,2,..., c} is defined as q( x ) = arg max f y ( x ) . (4) T

y∈Y

In this formulation, however unclassifiable regions remain, where some f (x ) have the same values. Reference [3] propose Fuzzy Support Vector Machines for conventional one – to - (n - 1) formulation to solve unclassifiable regions.

4 Fuzzy Support Vector Machines In this section we present the Fuzzy Support Vector Machines (FSVM) proposed in [3]. FSVM were introduced in order to decide on unclassifiable regions. In an n-class problem, for class i there are defined one-dimensional membership functions mij(x) on the directions orthogonal to the optimal separating hyper planes fj(x)=0 as follows: For i=j

for f i > 1, 1 mii ( x ) = ® ¯ f i ( x ) otherwise .

(5)

for f j < −1, °1 mii ( x ) = ® °¯− f j ( x ) otherwise .

(6)

For i ≠ j

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

879

The class i membership function of x is defined using the minimum operator for mij(x)(j,…,n):

mi ( x ) = min mij ( x ) . j =1,...,n

(7)

The datum x is classified into the class

arg max mi ( x ) . i =1,...,n

(8)

In realizing the fuzzy pattern classification, it is not necessary to implement the membership functions mi(x) given by (7). The procedure of classification is as follows. 1. For x, if fi(x) > 0 is satisfied for only one class, the input is classified into the class. Otherwise, go to Step 2. 2. If fi(x) > 0 is satisfied for more than one class i(i = i2,…, il,l>1), classify the datum into the class with the maximum f i (x )(i ∈ {i1 ,..., il }) . Otherwise, go to Step 3. 3. If fi(x) ≤ 0 is satisfied for all classes, classify the datum into the class with the minimum absolute value of f i ( x ) .

5 Implementation For the present experiments we worked with a corpus of patterns of infant cries labeled with information like infant age and the reason for the cry. The infant cries were collected by recordings done directly by medical doctors. After filtering and normalizing, each signal wave was divided in segments of one second duration. Then, acoustic features were extracted by means of Frequencies in the Mel scale (MFCC), with the freeware program Praat v4.0.8 [8]. Every one second sample is divided in frames of 50-milliseconds and from each frame we extract 16 coefficients. This procedure generates vectors with 304 coefficients by sample. In order to reduce the dimensions of the sample vectors we apply Principal Component Analysis (PCA). Our corpus is composed by 209 samples from pain cries, 759 samples of hunger cry, and 659 samples representing the class no-pain-no-hunger, this last set includes the sleepy and uncomfortable types. For the classification of pathologic cry we had a corpus of 1627 samples of normal babies, 879 samples of deaf babies, and 340 samples of asphyxiating babies. All the parameter values of the classifier were established in a heuristic way after completion of several experiments. During the experiments the 10-fold cross validation technique was used to evaluate the performance and reliability of the classifier.

6 Experiments and Preliminary Results Two different classification tasks were performed. In the first one, to identify pathologies, we worked with cry samples belonging to the Normal–Deaf–Asphyxia (N-D-A) classes. In the second classification task we had a corpus formed by samples of normal babies to identify the Pain-Hunger-No-Pain-No-Hunger (P-H-NPNH) classes. The

880

S.E. Barajas-Montiel and C.A. Reyes-García

results of the experiments are shown in Table 1 and Table 2, respectively. Each Table shows the percentage of correct classification using SVM and FSVM. In each classification task different number of principal components (PC) was used, here the number of PC tested in the experiments was 2, 3, 10, 16 and 50 respectively. The experiments show that the best results are achieved when 10 PC and FSVM were used. Table 1. Results of Normal–Deaf–Asphyxia (N-D-A) infant cry classification with Support Vector Machines and Fuzzy Support Vector Machines

Problem SVM FSVM

PCA2 74.1304 75. 3913

(N-D-A) % Classification Accuracy PCA3 PCA10 PCA16 91.4493 77.7536 94.7741 91. 9710 78.5507 94. 9816

PCA50 77.667 82.8986

Table 2. Results of Pain-Hunger-No-Pain-No-Hunger (P-H-NPNH) infant cry classification with Support Vector Machines and Fuzzy Support Vector Machines

Problem SVM FSVM

PCA2 58.3436 59.8160

(N-D-A) % Classification Accuracy PCA3 PCA10 PCA16 89.1411 55.0307 97.7914 89.6687 58. 6135 97.8282

PCA50 54.0307 55.9816

When working with pathological and normal cry samples the maximum correct classification obtained was 94.9816%. The poorest classified class was asphyxia, perhaps because of its lower number of samples abailable. In the second classification task (P-H-Np_Nh) the best classification score was 97.8282% and the class with more identification problems was the hunger class. One reason might be that this class presents characteristics similar to the uncomfortable cries.

7 Conclusions In this paper we present the automatic classification of infant cry by means of Fuzzy Support Vector Machines. Fuzzy Support Vector Machines were introduced to solve unclassifiable regions that remain with the use of conventional Support Vector Machines. We worked with two different 3-class problems to classify infant cry, in the first one to identify pathologies, the cry samples were divided into the Normal, Deaf, and Asphyxia classes (N-D-A); in the second one we used samples only of normal babies labeled to identify the Pain, Hunger, and No-Pain-No-Hunger classes (P-H-NPNH). Particularly, in the kind of problems we explore in this work, the Fuzzy Support Vector Machines showed improvement in classification performance over conventional SVM. We obtained the best correct classification in both classification tasks when using 10 principal component vectors. In the N-D-A for SVM problem we obtained 94.77 % of correct classification, and for FSVM 94.98 %, an average improvement of 0.21% on classification accuracy. In the P-H-NpNh problem the correct classification percentage obtained by SVM was 97.79%; and 97.82 % by FSVM, which shows a 0.03 % of average improvement between the models. The infant cry

Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

881

correct classification results obtained with FSVM until this moment are very encouraging. We think that with a larger number of samples we could be able to generalize better our results in order to be closer to end up with a robust system that can be applicable to real life, and to other pathologies related with the central nervous system. The collection of more samples will also allow us to include a larger number of normal cry classes, and perhaps we could deal also with the identification of deafness levels.

Acknowledgments This work is part of a project that is being financed by CONACYT-Mexico (C01-46753).

References 1. Cortes, C. , Vapnik, V.: Support Vector Networks. Machine Learning, Vol. 20 (1995) 1-25

2. Wan, V., Campbell, W.M.: Support Vector Machines for Speaker Verification and Identification, IEEE International Workshop on Neural Networks for Signal Processing, Sydney, Australia (2000) 3. Inoue, T. , Abe, S.; Fuzzy Support Vector Machines for Pattern Classification. Proceedings of International Joint Conference on Neural Networks (IJCNN ‘01), Vol. 2 (2001)

1449–1454 4. Abe, S.: Pattern Classification; Neuro-Fuzzy Methods and their Comparison, SpringerVerlag, London (2001) 5. Livinson, S.E., Roe, D.B.: A Perspective on Speech Recognition, IEEE Communications Magazine, (1990) 28-34 6. Orosco, J. , Reyes, CA.: Mel-Frequency Cepstrum Coefficients Extraction from Infant Cry for Classification of Normal and Pathological Cry with Feed-Forward Neural Networks, Proc. International Joint Conference on Neural Networks. Portland, Oregon, USA (2003) 3140-3145 7. Reyes, O., Reyes, CA.: Clasificación de Llanto de Bebés para Identificación de Hipoacusia y Asfixia por medio de Redes Neuronales, Proc. of the II Congreso Internacional de Informática y Computación de la ANIEI, Zacatecas, México (2003) 20-24 8. Vojtech, F., Vaclav, H.: Statistical Pattern Recognition Toolbox, Czech Technical University, Prague (1999) 9. Boersma, P., Weenink, D.: Praat v 4.0.8: A System for Doing Phonetics by Computer. Institute of Phonetic Sciences of the University of Amsterdam (2002)

Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning Huajie Chen and Wei Wei College of Electrical Engineering, Zhejiang University, HangZhou 310027, China [emailprotected]

Abstract. As for the discriminant analysis on nonlinear manifold, a geodesic Gabriel graph based supervised manifold learning algorithm was proposed. Using geodesic distance to discover the intrinsic geometry of the manifold, the geodesic Gabriel graph was constructed to locate the key local regions where the local linear classifiers would be learned. The global nonlinear classifier was achieved by merging the multiple local classifiers applying the soft margin criterion, which assigned the optimal weight to each local classifier in an iterative way without any assumption of the distribution of the example data. The superiority of this algorithm is confirmed by experiments on synthesized data and face image databases.

1 Introduction Many high-dimensional data in real-world applications such as computer vision and pattern recognition can be modeled as data points lying on a low-dimensional nonlinear manifold. Some dimensionality reduction methods have been proposed for learning complex embedding manifolds, i.e. Locally Linear Embedding (LLE) [1] and Isomap [2], by using local geometric metrics within a single global coordinate system. However, these methods are developed to best preserve data localities or similarities in the embedding space, and consequently cannot guarantee good discriminating capability. Only a few extended manifold learning algorithms explicitly address classification problems [3], [4]. Because the corresponding labels are required to map example data to the low-dimensional space, it remains a difficult issue to extend the mapping function to new test data. The underlying idea of the above manifold learning is that, the local geometry on the manifold is obtained firstly, and then aligned and extended to the global geometry. Following this idea, we propose a supervised manifold learning algorithm by merging local linear classifiers. The geodesic distance [2] instead of Euclidean distance is applied to reflect the intrinsic geometry of the example data, and then geodesic Gabriel graph is constructed to find out the key local regions. The local classifiers are learned in these local regions and merged into a final global classifier using soft margin criterion. By so doing, the whole learning task is decomposed into two sub-tasks of local learning and classifiers merging, and therefore significantly simplified. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 882 – 887, 2006. © Springer-Verlag Berlin Heidelberg 2006

Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning

883

2 Geodesic Gabriel Graph Construction We assume that a d -dimensional manifold M embedded in a m -dimensional space ( d < m ) can be represented by the following function:

f : C → Rm ,C ⊂ Rd ,

(1)

where C is a compact subset of R d . Two points are said to be Gabriel neighbors if their diametral sphere does not contain any other points. For every potential pair of neighbors A and B, we just verify if any other point P is contained in the diametral sphere:

d 2 ( P, A) + d 2 ( P, B) < d 2 ( A, B ) ,

(2)

where d is some kind of metric such as the well-known Euclidean distance. A graph where all pairs of Gabriel neighbors are connected with an edge is called the Gabriel graph [5]. For a unit speed curve on a surface, the length of the surface-tangential component of acceleration is the geodesic curvature kδ . A curve with kδ = 0 is called geodesics. For data lying on a nonlinear manifold, the intrinsic distance between two data points is the geodesic distance on the manifold, i.e. the distance along the surface of the manifold, rather than the straight-line Euclidean distance, as Fig. 1 demonstrates. When using geodesic distance to capture the relations among the data points lying on nonlinear manifold, formula (2) is replaced by:

g 2 ( P, A) + g 2 ( P, B) > g 2 ( A, B ) ,

(3)

where g is the geodesic distance. Two points A and B are called geodesic Gabriel neighbors (GGN), if for all other points D in the set, the formula (3) is true. A graph where all pairs of GGNs are connected with an edge is called the geodesic Gabriel graph (GGG).

Fig. 1. Geodesic distance (solid line) and Euclidean distance (dashed line)

The basic idea of the geodesic distance estimation is that, for a neighborhood of points on a manifold, the Euclidean distances provide a good approximation to geodesic distance and for faraway points, the geodesic distance is estimated by the length

884

H. Chen and W. Wei

of the shortest path through neighboring points. For a new test data xt , we firstly calculate the Euclidean distances between xt and training data, and the k closest neighborhood training data form the neighborhood set Z ( xt ) . The geodesic distances between xt and other faraway training data are estimated as: g ( xt , xi ) = min ( g ( x j , xi ) + d ( xt , x j )) . j : x j ∈Z ( xt )

(4)

3 Local Linear Classifier Learning We firstly estimate the conditional probability of the example point for the local classifiers according to the geodesic distance between the example and the corresponding GGN, and then obtain the local classifier employing weighted regularized linear discriminant analysis (WRLDA). Suppose {( x1k , x2k ) | k = 1, 2,..., K } be a set of GGN that lie on a low dimensional manifold embedded in a high dimensional observed space. For each GGN ( x1k , x2k ) , a local classifier is given as f k . The conditional probability for f k of an arbitrary data x on this manifold, can be obtained as [6]: K

p( f k | x) = p k ( x) / ¦ j =1 p j ( x) , p k ( x) = exp(−α k ( x)) ,

(5)

where α k ( x) is the activity signal of the data for the kth local classifier, which can be estimated with the width constant t :

α k ( x) = ( g ( x, x1k ) + g ( x, x2k )) 2 / t .

(6)

In order to construct the local classifier f k , each example data is represented by a feature vector of its geodesic distance to GGN ( x1k , x2k ) : xi → zi = [ g ( xi , x1k ), g ( xi , x2k )]T .

(7)

The global intra-class scatter matrix can be represented as: S w = wT ( ¦ ( Z j − Z j )( Z j − Z j )T ) w = wT M w w , j =1,2

(8) 2× N

where w is the projection mapping matrix, the i -th column vector of Z j ∈ R j N j is the number of examples belonging to j -th class) is Z j (i ) = zi ⋅ p( f k | xi ) , and each column vector of Z j is the weighted mean of examples of j -th class. The global inter-class scatter matrix can be represented as: S B = wT ( ¦ N j ( Z j − Z )( Z j − Z )T ) w = wT M B w , j =1,2

where Z is the weighted mean of all the examples.

(9)

Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning

885

The Fisher criterion can be used to find the optimal projection to maximize the ratio of the determinant of the inter-class scatter matrix S B to the intra-class scatter matrix SW in traditional LDA. However, it is well-known that the applicability of Fisher criterion to high-dimensional pattern classification tasks such as face recognition often suffers from the so-called “small sample size” (SSS) problem arising from the small number of available training samples compared to the dimensionality of the sample space [7]. Considering the possible distributional sparseness of the example data in real world applications, the regularized Fisher criterion is adopted:

arg max ( wT M B w ) / ( λ wT M B w + wT M W w ) ,

(10)

where 0 ≤ λ ≤ 1 is a regularization parameter. This problem has close form solution and can be directly computed out using the eigen-decomposition of transferred matrix of M B and M W [7]. Thereafter, the corresponding local classifier can be obtained as f k ( xi ) = wT zi .

4 Local Classifiers Merging Based on Soft Margin Criterion A global nonlinear classifier is available by merging local classifiers: F ( x) =

α k p( f k | x) f k ( x) ,

k =1,... K

(11)

where f k denotes the k -th local classifier and α k is the corresponding weight coefficient. The Fisher criterion and some of its variations like formula (10) are powerful in many linear classification cases to find the optimal α = {α k } under the assumption that the data classes are Gaussian with equal covariance structure. Unfortunately, the assumption is not always true for the data lying on the complex nonlinear manifold, i.e. face images [8]. The margin based optimization methods concern the margins of the examples, i.e. the difference between the number of correct votes and the maximum number of votes received by any incorrect label, with out any limitation of the examples’ distribution [9]. Following this basic idea of margin, our merging algorithm is to maximize the minimal margin by updating the weight coefficients stepwise, with the soft margin criterion: arg min ¦ exp(− yi F ( xi )) , α

(12)

where F is the global classifier. Given the training example set {xi | i = 1, 2,...N } , the procedure of the optimization algorithm is given as following: 1) Start with weights wi = 1/ N , i = 1, 2,...N 2) Repeat for m = 1, 2,...M a) With {wi } , calculate the weight coefficients {α km } by using WRLDA ; b) Construct the classifier Fm ( x) according to formula (12);

886

H. Chen and W. Wei

Update wi ← wi exp(− yi Fm ( xi )), i = 1, 2,...N

¦w

and renormalize so that

=1;

3) Output the final result α k = ¦ α km . m

5 Experiments In order to evaluate the proposed algorithm, experiment of face recognition on real face images, are presented in this section.We tested the proposed algorithm against Eigenface, Fisherface, Isomap and LLE methods using the publicly available Yale (http://cvc.yale.edu/projects/yalefaces/yalefaces.html) and UMIST databases (http:// images.ee.umist.ac.uk/danny/database.html). There are total 165 face images belonging to 11 objects in the Yale database, varying in expression and lighting conditions. The face blocks were cropped, aligned, and then scaled to 28 (Height)×24(Width). For each object, 7 images were selected randomly as training examples and the other 8 images as test examples. The UMIST database contains 564 images belonging to 20 objects varying in pose. For each object, 10 images from frontal to profile were selected out, including 5 training examples and 5 test examples with the resolution 28 (Height)×23(Width). The nearest neighbor rule was applied to classify the examples after dimension reduction for Fisherface, Isomap and LLE method. Some examples from these two databases and the experimental results are available in Fig. 2. It is obvious that our algorithm improves the detection rate significantly. Example 25

20.45

18.18

Erroe rate: %

20 15

19.32

13.64

10.23

0 Eigenface Fisherface Isomap

(a) Yale

LLE Our algorithm

0 Eigenface Fisherface Isomap

LLE Our algorithm

(b) UMIST

Fig. 2. Examples and experimental results

6 Conclusions In this paper, we present a GGG based supervised learning algorithm for the supervised learning on nonlinear manifold. This algorithm utilizes Geodesic distance instead of Euclidean distance to capture the global geometry of the data points lying on

Geodesic Gabriel Graph Based Supervised Nonlinear Manifold Learning

887

some sort of manifolds. Thereafter, GGG segments the whole manifold into some key local regions, thus the global nonlinear classification problem is replaced by local linear classification problem. Multiple local classifiers are merged into a global classifier by using the soft margin criterion without any limitation of the example data’s distribution. Experiments on the synthesized data and real face images justify the effectiveness of this algorithm. We are going to apply this algorithm in some other real world tasks of computer vision including expression detection and face recognition under pose variation.

Acknowledgments This work is supported by a grant from the Natural Science Fund for distinguished scholars of Zhejiang Province (No.R105341).

References 1. Roweis S.T, Saul L.K: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science. 290 (2000) 2323–2327 2. Joshua B. Tenenbaum, Vin de Silva, Johh C. Langford: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science. 290 (2000) 2319–2323 3. Wu Y.M., Chan K.L: An Extended Isomap Algorithm for Learning Multi-Class Manifold. IEEE International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August, 6 (2004) 3429 –3433 4. Chen Hwann-Tzong, Chang Huang-Wei, Liu Tyng-Luh: Local Discriminant Embedding and Its Variants. Computer Vision and Pattern Recognition, 2005. IEEE Computer Society, 2(2005) 846 – 853 5. Zhang Wan, King I: A Study of The Relationship between Support Vector Machine And Gabriel Graph. Neural Networks. International Joint Conference on Neural Networks. 1(2002) 239–244 6. Roweis, S, Saul L, Hinton, G: Global Coordination of Local Linear Models. Advances in Neural Information Processing System. 14(2001) 889–896 7. Lu, J.W., Plataniotis, K.N., Venetsanopoulos, A..N.: Regularization Studies of Linear Discriminant Analysis in Small Sample Size Scenarios with Application to Face Recognition. Pattern Recognition Letter, 26(2005) 181–191 8. Kim, T.K,, Kittler, J.: Locally Linear Discriminant Analysis for Multimodally Distributed Classes for Face Recognition with A Single Model Image. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(2005) 318–327 9. Schapire, R,., Freund, Y,., Bartlett, P,., et al: Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. Annals of Statistics. 26(1998) 1651–1686

Grouping Sampling Reduction-Based Linear Discriminant Analysis Yan Wu and Li Dai Dept. of Computer Science and Technology, Tongji University, 1239 Si Ping Road, Shanghai 200092, P.R. of China [emailprotected]

Abstract. This paper proposes a new feature extraction method called grouping sampling reduction-based linear discriminant analysis. It solves the small sample size problem by using grouping sampling reduction to generate virtual samples with larger number and lower dimension than the original samples. The experiment result shows its efficiency and characteristic of high recognition rate.

1 Introduction Linear Discriminant Analysis (LDA) is a classical feature extraction method and has been widely used in pattern recognition. Traditional LDA is only capable of solving the problems in which the within-class scatter matrix is nonsingular. However, in some problems, such as face recognition, the dimension of the sample space is so high that it is difficult or impossible to find enough samples to make the within-class scatter matrix nonsingular. This is called a small sample size problem. How to extract fisher optimal discriminant feature in a small sample size problem has been an acknowledged problem and aroused people’s wide interest in recent years. Different methods have been proposed to solve it. They can be divided into two sorts: 1. From the view of algorithm, develop a new LDA-based algorithm. Yu and Yang [1] proposed Direct LDA (DLDA). It removes the null space of the between-class scatter matrix and seeks the optimal discriminant vectors in the remaining subspace. Another method called null space method [2] first removes the null space of the total scatter matrix, then projects samples to the null space of the withinclass scatter matrix, and finally removes the null space of the between-class scatter matrix to get optimal classifying performance. Two dimensional FLD (2DFLD) [3] proposed recently directly projects the image matrix under a specific projection criterion, rather than using the stretched image vector. 2. From the view of pattern samples, preprocess the samples to reduce the dimension or increase the number of them before performing LDA. Belhumeur [4] proposed Fisherface which performs principal component analysis (PCA) to reduce the dimension of samples, and then performs LDA in the lower-dimensional space. Another method called the sample-set-enlarging method adds new samples to the original sample set to increase the number of samples [5]. Based on this method, we propose grouping sampling reduction-based linear discriminant analysis. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 888 – 893, 2006. © Springer-Verlag Berlin Heidelberg 2006

Grouping Sampling Reduction-Based Linear Discriminant Analysis

889

2 Linear Discriminant Analysis Suppose n d-dimensional samples, x1 , , x n , belong to c different categories. The

subset Di of ni samples belongs to the category ω i ( i =1,2, ,c). The within-class scatter matrix SW and the between-class scatter matrix S B are defined as follows: c

SW = ¦¦ (x − mi )(x − mi ) = QW QW t

(1)

i =1 i∈Di

SB =

¦ n (m i

i =1

In (1) and (2), set, m i =

1 ni

¦x

x∈Di

− m )(mi − m ) = QB Q B t

(2)

mi is the mean vector of Di , and m is mean vector of the whole , m=

1 n

¦x = n ¦n m i

. The goal of LDA is to find an optimal

i =1

projection W : W = arg max W

W t S BW W t SW W

(3)

It has been proved that the column vectors of W are the eigenvectors corresponding to the maximum eigenvalues of SW−1 S B when SW is nonsingular. It is easy to prove that SW is a d × d matrix and the rank of it is not more than n − c . In practical face recognition problems, the number of the samples is much smaller than the dimension of the sample space. So SW is singular and the optimal matrix W can’t be directly extracted by calculating the eigenvectors of SW−1 S B . This is so-called the small sample size problem.

3 Grouping Sampling Reduction-Based LDA The sample-set-enlarging method increases the number of the samples to eliminate the singularity of the within-class scatter matrix by adding enough virtual samples to the training sample set. Virtual samples are generated from the original samples by doing simple geometric transformation such as rotation, translation, vertical mirroring and scale. The traditional sample-set-enlarging method [5][6][7] usually generates virtual samples by doing rotation or vertical mirroring transformations, which never change the sample’s size. So the virtual samples have the same dimension as the original samples and can be directly added to the original sample set. Since the dimension is not reduced and the number of the original training samples is much smaller than the dimension of the sample space, the traditional sample-set-enlarging method has to generate numerous virtual samples to make the within-class scatter

890

Y. Wu and L. Dai

matrix full rank and perform LDA in the original high-dimensional sample space, resulting in a waste of storage space and high computational complexity. A good method to overcome the limitations of the traditional sample-set-enlarging method is to generate lower-dimensional virtual samples. Considering that LDA should be performed on the sample set in which the samples have the same dimension, the lower-dimensional virtual samples can not be directly added to the original sample set, but replace the original samples to compose a new sample set. Then LDA is performed on the new sample set. Because the original samples are removed and LDA is performed completely on the virtual samples, the virtual samples should hold the discriminant information of the original samples. Only in this way is it possible to extract the optimal feature. To sum up, the generated virtual samples should satisfy these conditions: having larger number and lower dimension than the original samples to make the within-class scatter matrix nonsingular, and holding original samples’ information to ensure extracting the optimal feature. Sampling reduction is a method to reduce the digital image. To reduce an image 2 into the size 1 N of the original, it segments the image into a group of subdomains with the size of N × N , and assigns the values of the pixels at the top left corner of each subdomain to the pixels of the reduced image. In other words, it samples the original image every other N − 1 pixels in x and y directions. Since only one pixel in a subdomain is taken as representative and all the other N × N − 1 pixels are lost, the reduced image doesn’t hold enough information of the original image.

Fig. 1. (a) The original human face, (b) The result of grouping sampling reduction when N=2, (c) The result of grouping sampling reduction when N=4

If each pixel of a subdomain takes turns to be a certain pixel of the reduced image, we will get N 2 different reduced images. In this way, from an original sample, we 2 can generate a group of N 2 virtual samples whose dimension is reduced to 1 N the dimension of the original sample. We call this method grouping sampling reduction, and N the enlarging factor. Although a single virtual sample can’t hold much information of the original sample, a group of virtual samples generated from an original sample by grouping sampling reduction can hold nearly all the information of the original sample. For example, in Fig. 1, we can get the discriminant information from (b) and (c) as much as that from (a). It is mentioned in Section 2 that SW is a d × d matrix and the rank of SW is not more than n − c . So the condition, d ≤ n − c , must be satisfied if SW is nonsingular.

Grouping Sampling Reduction-Based Linear Discriminant Analysis

891

Since the original sample set doesn’t satisfy this condition, we use grouping sampling reduction to generate virtual samples, and then replace the original samples with the virtual samples to compose a new sample set. In the new sample set, the number and 2 the dimension of the samples are nN 2 and d N . So the within-class scatter matrix d d SW′ constructed by the new sample set is a matrix and × N2 N2 rank (S W′ ) ≤ nN 2 − c . The necessary condition to make SW′ nonsingular is d / N 2 ≤ nN 2 − c

(4)

Obviously, SW′ will be nonsingular if N is large enough, but not the larger the better. Because grouping sampling reduction distributes the pixel information of an original sample into each virtual sample generated from it; if N is too large, the pixel information is too much distributed to produce the discriminant information. So it is the best to assign N with the minimum integer making SW′ nonsingular. We choose the minimum integer satisfying (4) as the value of N . Although (4) is a necessary condition, usually the value of N chosen in this way can make SW′ nonsingular. However, the exceptional case must exist, since the rank of SW′ is not known before training and it may be any integer not more than nN 2 − c . In the exceptional case, to make SW′ nonsingular, add an extra 1 to the value of N . Grouping sampling reduction can generate the virtual samples with larger number and lower dimension than the original samples and eliminate singularity of the withinclass scatter matrix by choosing an appropriate value of the enlarging factor N . The generated virtual samples hold nearly all the discriminant information of the original samples and can represent human face patterns. So, after grouping sampling reduction, we can get the optimal discriminant feature by directly performing the traditional LDA on the new sample set completely composed of virtual samples. Ultimately, grouping sampling reduction based LDA we proposed in this paper solves the small sample size problem effectively. In addition, in the testing period, we still first use grouping sampling reduction to generate a group of virtual samples from an original testing sample, and then perform the recognition algorithm on the group of virtual samples to get each virtual sample’s category, and finally count the virtual samples which belong to the same category and take the category with maximum counting number as the original testing sample’s category.

4 Experimental Results The ORL face image database is used to compare our method with PCA, Fisherface, DLDA, Null space method and 2DFLD. This database contains 40 persons, each having 10 different images varying lighting slightly, facial expression and facial details. We test the recognition rates with different numbers of the training samples. k ( k =2, ,8 images of each person are randomly selected from the database for

892

Y. Wu and L. Dai

training and the remaining 10 − k images of each person for testing. For each value of k , 20 runs performs with different partition between the training set and the testing set. No any preprocessing is done, and the final dimension is chosen to be 39 ( c - 1). The nearest neighbor algorithm under the Euclidean distance is employed to classify the test images. Table 1 shows the average recognition rates. Table 1. Recognition rates (%) on ORL database

PCA

Fisherface

DLDA

2 3 4 5 6 7 8

80.6094 87.4286 91.8333 93.7250 95.5938 96.4167 97.6250

80.6094 90.9107 94.4792 95.3500 96.6875 97.0000 97.4375

83.8125 89.9107 93.3750 94.8250 96.2188 96.8333 97.6875

Null space method 85.9688 90.6071 94.5000 95.7000 96.8125 97.0000 97.6250

2DFLD 85.7031 90.7143 93.6458 95.4250 96.2813 97.0833 97.6250

Our method 87.7656 92.6607 95.5417 97.0750 97.8125 98.2917 98.8125

Fig. 2. Comparison of the six methods’ recognition rates over variation of k

Fig. 2 shows that for each value of k , the recognition rate of our method is the highest. From Table 1, we can see the recognition rate of our method is 1% higher than the other methods’. Especially when k is small ( k 2,3), the recognition rate of our method is 2% higher than the other methods’. The experiment result shows that our method outperforms the other methods on the whole, especially when the number of training samples is small, which is often the case in face recognition and other pattern recognition tasks.

Grouping Sampling Reduction-Based Linear Discriminant Analysis

893

5 Conclusions Grouping sampling reduction based LDA proposed in this paper is a new effective method to solve the small sample size problem. It first uses grouping sampling reduction to generate virtual samples which have larger number and lower dimension than the original samples and can represent human face patterns, and then uses these virtual samples to compose a new sample set, and finally performs LDA on the new sample set to extract the optimal feature. Since it increases the number of samples and reduces the dimension of samples at the same time, it has two advantages as follows when compared with other methods of solving the small sample size problem: 1. Compared with the dimension-reducing methods, it increases the number of samples, so it only need reduce the dimension in a smaller extent, and decreases the loss of discriminant information. 2. Compared with traditional the sample-set-enlarging methods, it reduces the dimension of samples, so it only need generate fewer virtual samples and perform LDA on a low-dimensional sample space. Therefore, it saves storage space and reduces computation complexity.

References 1. Yu, H., Yang, J.: A direct LDA Algorithm for High-Dimensional Data with Application to Face Recognition. Pattern Recognition, 34 (10) (2001) 2067-2070 2. Huang, R., Liu, Q. S., Lu, H. Q.: Ma S D. Solving the Small Sample Size Problem of LDA. ICPR'02, (2002) 29-32 3. Xiong, H. L., Swamy, M. N. S., Ahmad, M. O.: Two-dimensional FLD for Face Recognition. Pattern Recognition, 38 (2005) 1121-1124 4. Belhumeur, P. N., Hespanha, J. P., Kriegman, D. J.: Eigenfaces vs Fisherface: Recognition Using Class Special Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19 (7) (1997) 711-720 5. Marcel, S.: A Symmetric Transformation for LDA-based Face Verification. In Proc. Int. Conf. Automatic Face and Gesture Recognition (AFGR), Seoul, Korea (2004) 6. Beymer, D., Poggio, T.: Face Recognition from One Example View. In Proceedings of the Fifth International Conference on Computer Vision (ICCV), Cambridge (1995) 500-507 7. Torres, L., Vilà, J.: Automatic Face Recognition for Video Indexing Applications. Pattern Recognition, 35 (2002) 615-625

Hierarchical Adult Image Rating System* Wonil Kim1, Han-Ku Lee2,**, and Kyoungro Yoon3 1

College of Electronics and Information Engineering at Sejong University, Seoul, Korea [emailprotected] 2 School of Internet and Multimedia Engineering at Konkuk University, Seoul, Korea [emailprotected] 3 School of Computer Science and Engineering at Konkuk University, Seoul, Korea [emailprotected]

Abstract. Though popularity and improvement of Internet with the explosive proliferation of multimedia contents bring us the era of digital information, the unexpected popularity of the Internet brings us its own dark side. Everyday young children are exposed to images that should not be delivered to them. In this paper, we propose an adult image rating system that properly classifies an image into one of multiple classes such as swimming suit, topless, all nude, sex image, and normal. The simulation results show that the proposed system successfully rates images into multiple classes with the rate of over 70%.

1 Introduction In this paper, we propose image rating system that rates images according to the harmfulness to the minors. The development of the Internet has made people’s life much more convenient than any time before, however it also brought us its own dark sides. Among the millions of Web sites, more than 500,000 web sites are related to adult contents that your children should never see [1,6]. The proposed system uses MPEG-7 descriptors as the main input feature of the classification task. We first analyze several MPEG-7 descriptors and then create a prototype that extracts image features from adult image data. Using these descriptors, we rate adult images into multiple classes via effective hierarchical image classification technique. The proposed adult image rating system employs multiple neural network modules that perform binary classification task. The simulation results show that the MPEG-7 descriptors can be effectively used as main features of the image classification process. The proposed system indeed successfully rates images into multiple classes, such as swim suit, topless, nude, sex, and normal images, with the success rate of over 70%. In the next section, we discuss common approaches in the adult image detection. The proposed Hierarchical Adult Image Rating System is explained in section 3. The simulation environment and the results are shown in section 4. Section 5 concludes. * **

This paper is supported by Seoul R&BD program. Author for correspondence: +82-2-2049-6082.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 894 – 899, 2006. © Springer-Verlag Berlin Heidelberg 2006

Hierarchical Adult Image Rating System

895

2 Adult Image Detection The research on the automated adult image detection has started with the popularity of the World Wide Web (WWW) in the mid 90’s. In the early days, most algorithms are detecting the existence of naked people in the images [2, 7]. The main idea of this research is as follows; it effectively masks skin regions using colors and textures (e.g. the skin filter). If skin regions that passed mask tests are matched to persons’ figures, it is assumed that there are numerous naked parts in the image (e.g. the geometric filter). The algorithm tries to detect whether or not the filtered skin color is matched to a specific body part in the whole image, rather than extracts primitive features helping effective classification [3]. The most common method in the adult image classification is based on the color histogram. It is an effective feature for large amount of data. A simple experiment based on the color histogram results in 80% of the detection rate and 8.5% of the false positive rate [4]. Feature extraction technique using MPEG-7 descriptors for the adult image classification was proposed by Yoo in 2003 [5]. He used three descriptors that are effective for adult image classification among various standardized MPEG-7 visual descriptors. His system compares the descriptor value of a given image to that of images from the database, and retrieves 10 similar images with class information. The given image is classified as the class, as which the majority of the retrieved images are classified. Even though the results showed that MPEG-7 descriptors can be used as very efficient features in adult image classification, the classification method used in his paper is a simple k-nearest neighbor method. Thanks to the blooming of data mining field, image classification techniques based on statistical methods also have incredibly improved and a new field of study called image mining was born [8]. Thus, many research groups study and research the field of image classification via the image mining technique. The image classification technologies can be categorized as the Neural Network, the Decision-Tree Model, the Bayesian classification and the Support Vector Machine An Artificial Neural Network (ANN) is an information processing paradigm inspired by biological nervous systems, such as the brain system processing information. The key element of this paradigm is the novel structure of the information processing system. In the adult image rating system, the network concentrates on the decision-boundary surface to distinguish adult images from nonadult images via the computer-based classification rule, called perceptron [9, 10]. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by examples. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems is involved with adjustments to the synaptic connections that exist between the neurons. In this paper, we employ Neural Network for the classification module, which is structured hierarchically. Inputs to the neural network are fed from the feature values extracted from MPGE-7 descriptors [11]. Since each descriptor can represent specific features of a given image, a proper evaluation process to choose the best one for the adult image classification is required.

896

W. Kim, H.-K. Lee, and K. Yoon

3 The Proposed Hierarchical Adult Image Rating System 3.1 The Proposed Architecture Figure 1 illustrates the classification flow of the proposed Hierarchical Adult Image Rating System. The proposed system consists of two stages; the Feature Extraction Module and the Neural Network Classifier Module. Features defined in the MPEG-7 descriptors for the given query images are extracted, and then used as inputs for the classifier module.

Fig. 1. Classification flow of the Hierarchical Adult Image Rating System

The hierarchical neural network classifier module can categorize a given image into one of the five classes such as swim suit images (S), topless images (T), nude images (N), sex images (X), and normal images (I). By using multi module hierarchical structure, each stage can employ different descriptors, such as Color Layout for color, hom*ogeneous Texture for texture, or Region Shape for shape, that are most suitable for the given classification task. It is based on the fact that classifying normal and adult related images are heavily dependent on color information, whereas classifying nude and sexual images are likely to be dependent on shape descriptors or texture descriptors. At the top decision stage, it classifies images into adult related images and normal images. In the next stage, if the images are classified into adult related images, they are again categorized into 2 classes, such as topless images / swim suit images for category 1, and nude images / sexual images for category 2. These categorized images, which are all adult related images, are further classified in the next stages. The category 1 images are either topless images or swim suit images, and the category 2 images are either nude images or sexual images. 3.2 Feature Extraction and Classification Module The features of training images are extracted in XML format through the execution of the MPEG-7 XM software. This feature information in XML form is parsed in the

Hierarchical Adult Image Rating System

897

next step and is normalized into values between 0 and 1 with respect to values generated by each descriptor. These normalized values are used as inputs to the neural network classifier. The class information that is attached to the feature values is different depending on the stages in which it is used. The four modules that are used for classification employ neural network architecture. Each neural network classifier learns the relation of the feature values and the corresponding class by modifying the weight values between nodes. We use the backpropagation algorithm to train the network. The number of input nodes depends on the dimension of each descriptor, whereas the number of output nodes is two. The class information for the two output nodes is represented as (1, 0) or (0, 1) depending on the stage and images as mentioned above. In the testing process, as in the training process, the system extracts features from query images using MPEG-7 descriptors, and classifies the images using the neural network that are generated at the training process. The four modules are connected hierarchically starting from Module 1 (normal-adult related images), and then are traversed down if necessary.

4 Simulation Total of 4000 images (500 for swim suit images, 500 for topless images, 500 for nude images, 500 for sex images, and 2000 for normal images) are used for training and 800 images (100 for swim suit images, 100 for topless images, 100 for nude images, 100 for sex images, and 400 for normal images) for testing. It employs 5 visual descriptors for feature values, such as Color Layout (12), Color Structure (256), Edge Histogram (80), hom*ogenous Texture (30), and Region Shape (35), where values in the parentheses indicate the input dimension. The inputs consist of MPEG-7 normalized descriptor values. Table 1 shows the overall performance of each classification module. Table 2 presents the test results of each descriptor from the 800 images that are not used in the training process. In case of the Color Layout, the Edge Histogram, and the Region Shape, the results are better than the rest of the descriptors. We reason that each descriptor has its own advantages and disadvantages when they are used as feature (input) values for the network. For example, the Color Layout descriptor provides excellent result in the adult-normal image classification, but produces very poor results in topless-swim suit image classification, resulting in the degrade of overall performance. It is based on the fact that classifying normal and adult related images are heavily dependent on color information, whereas classifying nude and sexual images are likely to be dependent on Region Shape Descriptor. It would be a better strategy of using different descriptors or even combinations of different descriptors in each module. Finding the better combination would be very difficult if we are to consider the cases for many descriptors. Moreover, the best combination heavily depends on the domain of images. A strategy that works fine on one domain, such as sports image classification, may not work for other domains, such as flower image classification. The rest of descriptors can be used effectively for other domains.

898

W. Kim, H.-K. Lee, and K. Yoon Table 1. Classification rates of each Module Module 1 (Adult Images / Normal Images)

Adult Images Images 71.4 %

28.6 %

Normal Images 14.5 %

85.5 %

Adult

Module 2 (Topless, Swim suit/ Nude, Sex)

Normal Images

Topless/ Nude/ Swim suit Sex Topless/ Swimsuit Nude/Sex

Module 3 (Topless/Swim suit)

Topless Swim suit

Topless 75 % 32 %

Swim suit 25 % 68 %

73 %

27 %

23 %

77 %

Module 4 (Nude/Sex)

Nude Sex

Nude 87 % 10 %

Sex 13 % 90 %

Table 2. Test Results of Hierarchical Classifier using each descriptor

descriptors S T Color Layout N X I S T Color N Structure X I S T Edge N Histogram X I S T hom*ogeneous N Texture X I S T Region Shape N X I

S T N X (Swimsuit) (Topless) (Nude) (Sex) 42 21 21 9 17 42 15 12 11 12 64 6 8 7 7 74 2.25 3.25 1.5 1.25 34 19 17 14 24 22 20 22 6 7 66 11 18 13 4 54 5.75 5.25 0.75 2 56 19 10 11 20 38 11 15 15 7 53 10 13 21 9 50 1.5 2.25 2.5 2.25 61 14 7 4 21 39 11 8 11 8 61 7 7 11 9 61 2.75 8 5.25 5 55 11 5 2 11 55 8 3 5 13 57 10 3 5 4 70 7.25 5 3.75 5

I (Normal) 7 14 7 4 91.75 16 12 10 11 86.25 4 16 15 7 91.5 14 21 23 12 79 27 23 15 18 79

Total Total = 800 Correct = 589 Error = 211 % = 73.625 Total = 800 Correct= 521 Error = 279 % = 65.125 Total = 800 Correct = 563 Error = 237 % = 70.375 Total = 800 Correct = 538 Error = 262 % = 67.25 Total = 800 Correct = 553 Error = 247 %= 69.125

Hierarchical Adult Image Rating System

899

5 Conclusion This paper proposes a hierarchical adult image rating system using neural networks. This system consists of four modules that have a hierarchical structure. Each module learns corresponding classification task according to the feature values extracted from MPEG-7 descriptors. The selected MPEG-7 descriptors are used as inputs of the network. It classifies the images into multiple classes (5 classes) and the simulation shows that the system achieved above 70% of the success rate for this hard tasks. It is also shown that using different descriptors or even combinations of different descriptors in each module is better strategy for the classification task. Finding the better combination would be very difficult if we are to consider the cases for many descriptors. Consequently the proposed multi module hierarchical classification system is useful for well defined features and the framework can be effectively used as the kernel of web contents rating systems.

References 1. Arentz, W., Olstad, B.: Classifying Offensive Sites Based on Image Contents. Computer Vision and Image Understanding, Vol. 94, (2004) 293-310 2. Fleck, M., Forsyth, D., Bregler, C.: Finding Naked People. Proc. 1996 European Conference on Computer Vision, Vol. 2 (1996) 592-602 3. Jones, M., Rehg, J.: Statistical Color Models with Application to Skin Detection. Technical Report Series, Cambridge Research Laboratory (1998) 4. Jiao, F., Giao, W., Duan, L., Cui, G.: Detecting Adult Image Using Multiple Features. Proc. IEEE conference, Vol.3 (2001) 378 - 383 5. Yoo, S.: Intelligent Multimedia Information Retrieval for Identifying and Rating Adult Images. Lecture Notes in Computer Science, Vol. 3213. Springer-Verlag, Berlin Heidelberg, New York (2004) 165-170 6. Hammami, M., Chahir, Y., Chen, L.: WebGuard : Web Based Adult Content Detection and Filtering System. Proc. IEEE/WIC International Conference on Web Intelligence (2003) 574-578 7. Forsyth, D., Fleck, M.: Identifying Nude Pictures. Proc. IEEE Workshop on the Applications of Computer Vision (1996) 103-108 8. Hand, D., Mannila, H., Smyth P.: Principles of Data Mining. MIT Press (2001) 343-347 9. Rosenblatt F.: The Perceptron: A Probabilistic Model for Information Storage and Organization in Brain. Psychology Review 65 (1958) 386-408 10. Rumelhart, D., Hinton, G., Williams, R.: Learning Representations by Back-Propagating Errors. Nature (London), Vol. 323 (1986) 533-536 11. Kim, W., Lee, H., Yoo, S., Baik, S.: Neural Network Based Adult Image Classification. Lecture Notes in Computer Science, Vol. 3696. Springer-Verlag, Berlin Heidelberg, New York (2005) 481-486

Shape Representation Based on Polar-Graph Spectra Haifeng Zhao1 , Min Kong1,2 , and Bin Luo1, 1

Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230039, P.R. China [emailprotected], [emailprotected] 2 Department of Machine and Electron engineering, West Anhui University, Liuan 237012, P.R. China

Abstract. In this paper, a new shape representation method is proposed. We exploit our strategy in three steps. First we calculate the centroid and centroid distance of an image contour. Then based on polar coordinate system, the contour points are selected to construct graph, which is called Polar-Graph. The spectra of these graphs are ﬁnally organized as feature vectors for future clustering or retrieval. Our experiments show that the proposed representation is invariant to scale, translation and rotation, and is insensible to slight distortion and occlusion in some measure.

Introduction

The rapid development of digital technology and Internet has led to huge and ever-growing archives of images, audio or video data, which enables people to handle a large amount of visual information. Shape is one of the most important features that represent the visual content of an image. However, shape retrieval is yet a very diﬃcult task because of the ambiguous and incomplete information about shape available in an image. Shape representation is the key step for shape application. Barrow et al. [1] and Fischler et al.[2] were among the ﬁrst to demonstrate the potential of relational graphs as abstractions for pictorial information. Since then graph-based representations have been exploited widely for the purposes of shape representation, segmentation, matching and retrieval [3],[4],[5],[6]. However, there are two main problems with graph-based shape representation. One of the problems is how to construct the graph from the shape contour. If we try to build a graph with all contour points, the complexity of graph let proves to be useless. The other problem is to measure similarity of large sets of graphs, which hinders the manipulation of graph methods. Graph-spectral approach is an eﬃcient way to solve the problem [7]. We aim to construct a graph directly from a shape. The centroid is an invariant to the shape. After calculating the centroid and centroid distance of the object shape, we construct graph by selecting contour points based polar coordinate

Corresponding author. Tel./fax: 086-551-5108445.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 900–905, 2006. c Springer-Verlag Berlin Heidelberg 2006

Shape Representation Based on Polar-Graph Spectra

901

system. Graph representation also provides scale invariance by itself. Then the spectra of these graphs will be obtained as feature vectors for clustering or retrieval. Our experiments show that the proposed representation is invariant to scale, translation and rotation, even for slight distortion and occlusion. Furthermore, according to the polar angle of the selected contour points, the shape descriptor can be used as a kind of hierarchical coarse to ﬁne representation.

Polar Graph

Edge detection is the preprocessing step to get the shape contour. The accuracy and reliability of edge detection is critical to the shape representation. In our implementation, we use the Canny edge detection algorithm [8], which is known to many as the optimal edge detector, to extract the contour points. First, we calculate the centroid of the object as a reference point. Next, relative to the centroid, the distance and angle of the contour points are computed for polar space. Because we use the pole at the centroid of the object, the representation of a distance sequence is translation invariant. To achieve rotation invariant, we choose the maximum distance that is associated with the angle of zero and normalize all the angles of contour points. Locating the centroid of a contour is key for shape representation. In order to obtain invariance to translation, rotation and scaling, the geometric center of object is selected as a reference point in each image. We use the average value of N the contour coordinates (xi , yi ) to compute the centroid (xc , yc ),xc = N1 i=1 xi , N yc = N1 i=1 yi , where N is the number of all contour points. The centroid distance (ρi ) is expressed by the distance of the contour point (xi , yi ) to the centroid(xc , yc ): ρi = (Δx2 + Δy 2 )1/2

Where Δx = (xi − xc ) and Δx = (yi − yc )

(1)

The centroid angle (θi ) means the anti-clockwise angle between the centroid distance line and x-axis. θi is calculated by equation(2). ⎧ − arctan(Δx/Δy) if Δx > 0 and Δy ≤ 0 ⎪ ⎪ ⎪ ⎪ if Δx = 0 and Δy < 0 ⎨ π/2 if Δx < 0 and Δy = 0 (2) θi = arctan(Δx/Δy) + π ⎪ ⎪ 3π/2 if Δx = 0 and Δy ≥ 0 ⎪ ⎪ ⎩ 2π − arctan(Δx/Δy) if Δx > 0 and Δy > 0 If we take the pole of a polar space at the centroid (xc , yc ), the contour point (xi , yi ) can be represented in the polar space (ρi , θi ), which is shown in Figure1. In order to compare and select points for graph easily, we change the radian form of θi ∈ [0, 2π) to the integer degree form of θi ∈ [0, 1, 2, . . . , 359). To obtain rotation invariance of shape, we rotate the polar space so that the maximum distance (ρi ) was associated with 0 degree. Other θi should be calculated in terms of the direction θmax of ρmax , which is shown in Figure 1. It is impossible to select all contour points to build the graph because of the complexity. Assume that many rays are sent out from the centroid, the rays will

902

H. Zhao, M. Kong, and B. Luo

Fig. 1. Polar Space and Rotation

Fig. 2. Two problematic examples

intersect with the contour. But it is hard to compute the coordinate of point of intersection because there is no equation for a complicated contour. On the contrary, we can select the contour points by θi with the same interval Δθ, such as an interval of 10 degree. we certainly get 36 feature points when we choose 10 degree interval. If we use diﬀerent intervals(for example,40,35,30,25,...,5), a hierarchical coarse to ﬁne representation could be carried out. Because of the intricate information of contours, some points may have the same angle θi and degrees in the [0, 359] may be discontinuous especially when the centroid is next to the boundary. Two situations are showed in Figure 2. Aiming at the multiple θi of ﬁrst sitation, we use the average point of all contour points with the same degree. Secondly, If one degree (α) is missed, we take these steps: using the average of the degree (α ± 1) points to substitute for the degree point. If the degree points (α ± 1) cannot be obtained, we use the average of the (α ± 2) degree points. The worst situation is that no point in the (α ± 2) degree range. In this case, we make use of the centroid in the (α) degree direction. Therefore, the minimal interval can be equal to 5 degree. That is to say, the maximal number of feature points is 72. Because the graph with the same vertices is easy for the following processing, it is better to get the same contour points to construct the graph for each image. According to these selected points we build Delaunay graphs, which shown in the Figure 3. Obviously we can easily know that the graph with the smaller Δθ can represent the original object more accurately . We call the kind of graph as Polar-Graph. There are several distinct advantages of Polar-Graph. The Polar-Graph can be generated automatically in Polar Coordinate System. PolarGraph is a whole description of image contour so that every point selected for graph is very important. At the same time we can construct the same size of graph in term of the same interval because the inter-comparison of graphs with

Original object

Δθ = 20 degree Fig. 3. Coarse to ﬁne Polar-Graphs

Δθ = 10 degree

Shape Representation Based on Polar-Graph Spectra

903

the same size is relatively easy. Compared with the shock graph[4], which is a shape abstraction that decomposes a shape into hierarchically organized primitive parts, a Polar-Graph is built directly from original contour points. Therefore, it keeps the shape information to a maximum extend.

Graph Spectral Decomposition

An extreme problem that hinders the manipulation of large sets of graph is to measure their similarity when we successfully construct Polar-Graph for representing object shapes. The problem arises in a lot of situations where the graphs must be matched or clustered together. Here, we work with the spectral decomposition of the weighted adjacency matrices. 3.1

Graph Spectra

We are concerned with a set of Polar-Graphs (G1 , . . . , Gk , . . . , Gn ). The k-th graph is denoted by Gk = (Vk , Ek ), which Vk is the set of vertices and Ek ⊆ (Vk ×Vk ) is the edge-set. For each Gk we compute the weighted adjacency matrix Ak . Ak is a |Vk | × |Vk | matrix whose element with row indexed and column indexed is 2 − d (i,j) 2 if (i, j) ∈ Ek , and d(i,j) is the Euclidean distance Ak (i, j) = e σ 0 otherwise (3) From the adjacency matrices Ak , k=1,. . . ,n at hand, now we can calculate the ω eigenvalues λω k by solving the equation |Ak − λk I| = 0, where ω is the eigenvalue index and ω = 1, 2, . . . |Vk |. We order the eigenvalues in descending order, i.e., |V | |λ1k | > |λ2k | > · · · > |λk k |. Furthermore, we can acquire the associated eigenvecω ω ω ω tors φk of λk by solving the system of equations Ak φω k = λk φk . The eigenvectors |V | are stacked in order to construct the modal matrix φk = (φ1k |φ2k | · · · φk k ). With the eigenvalues and eigenvectors of the adjacency matrix to hand, the spectral decomposition for the adjacency matrix of the graph-indexed k is Ak =

|Vk |

ω ω T λω k φk (φk )

(4)

ω=1

For each graph, we use only the ﬁrst d eigenmodes of the adjacency matrix. The |V | truncated vector is φk = (φ1k |φ2k | · · · φk d ). 3.2

Spectral Features

Our goal is to use the spectral features computed from the eigenmodes of the adjacency matrices for Polar-Graphs to construct feature vectors. To avoid the diﬃculty of correspondence, we employ the order of eigenvalues to establish

904

H. Zhao, M. Kong, and B. Luo

the order of the feature vectors. Features suggested by spectral graph theory include leading eigenvalues, eigenmode volume, eigenmode perimeter, cheeger constant, inter-mode adjacency matrix and distance, which are analyzed in Luo’s paper[7]. Here we simply use leading eigenvalues for experiments to test our shape representation. For the graph-indexed k, the vector is Lk = (λ1k , λ2k , . . . , λdk )T . The vector Lk represents the spectrum of the graph. 3.3

Experiments

We use ”heart” sequence with total 20 images in the shape database of MPEG-7 CE-1-SetB, shown in Figure 4. The sequence involves translation, rotation, deformation, and occlusion. By our proposed method, ﬁrst we extract the contour points to construct Delaunay graphs. Since the Delaunay graph is the neighborhood graph of the Voronoi tessellation, i.e. the locus of the median line between adjacent points, it may be expected to reﬂect changes of the contour. To test

Fig. 4. Heart sequence for experiments

Fig. 5. 3D projection of the spectra of the 20 graphs

Fig. 6. Distance map of spectral featurevectors

the validity of the new approach to the representation of shapes, we just keep the ﬁrst three of eigenvalues as graph spectral feature. Figure 5 shows the 3D projection of the spectra of the twenty graphs after we construct the graph from shape contours (Δθ = 10 degree). We can ﬁnd No.10, No.18, No.19 and

Shape Representation Based on Polar-Graph Spectra

905

No.20 graphs are more comparatively deviated than other graphs. From 5, we can easily ﬁnd these images deviated are actually deformed to a great extent. The reason is that these corresponding images have more diﬀerent shapes than other images. It demonstrates our representation reﬂect the real situation. In Figure6, we show the matrix of pairwise Euclidean distances which are deﬁned as distances between every two feature-vectors (It is best viewed in color). The matrix has 20 rows and 20 columns (one for each of sequence images), and the indexes of graph are ordered according to the position of images in the sequence. The Figure6 also visually shows our representation is right and eﬀective in its diﬀerent colors.

Conclusions

In this paper, we present a novel approach of shape representation based on polar-graph Spectra.Our results demonstrate its robustness in the presence of translation, rotation and scale, even for slight distortion and occlusion. Based on Polar-Graph Spectra, it will be convenient to develop graph matching and clustering algorithms for the purpose of shape retrieval.

Acknowledgements This research is supported by the National Natural Science Foundation of China (No.60375010), the Excellent Young Teachers Program of the Ministry of Education of China,Innovative Research Team of 211 Project in Anhui University,and Natural Science Project of Anhui Provincial Education Department (No.2006KJ053B).

References 1. Barrow, H.G., Burstall, R.M.: Subgraph isomorphism, matching relational structures and maximal cliques. Inform. Process. Lett. 4 (1976) 83–84 2. Fischler, M., Elschlager, R.: The representation and matching of pictorical structures. IEEE Trans. Compute. 22 (1973) 67–92 3. Luo, B., Robles-Kelly, A., Torsello, A., Wilson, R.C., Hanco*ck, E.R.: Learning shape categories by clustering shock trees. Proceedings of International Conference on Image Processing. 3 (2001) 672–675 4. Sebastian, T.B., Klein, P.N., Kimia, B.B.: Recognition of shapes by editing their shock graphs. IEEE Trans. Pattern Analysis and Machine Intelligence. 26 (2004) 550–571 5. Badawy, O.E., Kamel, M.: Shape representation using concavity graphs. Proceedings of 16th International Conference on Pattern Recognition. 3 (2002) 461–464 6. Siddiqi, K., Shokoufandeh, A., Dickenson, S.J., Zucker, S.W.: Shock graphs and shape matching. Sixth International Conference on Computer Vision (1998) 222-229 7. Luo, B., Wilson, R.C., Hanco*ck, E.R.: Spectral embedding of graphs. Pattern Recognition.36 (2003) 2213–2223 8. Canny, J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 8 (1986) (1986)

Hybrid Model Method for Automatic Segmentation of Mandarin TTS Corpus Xiaoliang Yuan1 , Yuan Dong1,2 , Dezhi Huang2 , Jun Guo1 , and Haila Wang2 1 School of Information Engineering Beijing University of Posts and Telecommunications, 100876, P. R. China [emailprotected], {yuandong, guojun}@bupt.edu.cn 2 France Telecom R&D Beijing Co, Ltd., 2 Science Institute South Road, Haidian District, Beijing, 100080, P. R. China {dezhi.huang, haila.wang}@francetelecom.com

Abstract. For a corpus-based Mandarin text-to-speech system, the quality of synthesized speech is highly aﬀected by the accuracy of unit boundaries. In this paper, we proposed a hybrid model method for automatic segmentation of Mandarin text-to-speech corpus. The boundaries of acoustic units are categorized into eleven phonetic groups. For a given phonetic group of boundaries, the proposed method selects an appropriate model from initial-ﬁnal monophone-based HMM, semi-syllable monophone-based HMM and initial-ﬁnal triphone-based HMM. The experimental results show that the hybrid model method can achieve better performance than the single model method, in terms of error rate and time shift of boundaries.

Introduction

The corpus-based method has been applied in most Mandarin text-to-speech (TTS) synthesis systems, because it enables these systems to produce the synthesized speech with high articulation and intelligibility [1]. At the same time, this method also inﬂates the urgent need for a high-quality speech corpus. Especially, the boundaries accuracy of acoustical units highly impacts the quality of synthesized speech. The classical solution to segment the acoustic unit is to automatically or manually label the speech signals. Obviously, the manual labelling requires tremendous and laborious human work. It deﬁnitely introduces a great cost. Besides, it is diﬃcult to keep consistency among the diﬀerent labelers, especially when the error threshold is set to 10 ms. Even practically, most of the Mandarin TTS systems build diﬀerent speech corpus according to their special applications. Consequently, the methods for automatic segmentation of speech corpus gain great attention and applause nowadays. Many methods for automatic segmentation have been proposed in the past several years [2][3][4][5][6]. Most of them adapt a phonetic recognizer to the task of acoustic alignment with the given phonetic transcription. They often comprise two stages: (1) the alignment stage, in which the boundaries are roughly D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 906–912, 2006. c Springer-Verlag Berlin Heidelberg 2006

HMM for Automatic Segmentation of Mandarin TTS Corpus

907

estimated by applying forced alignment, using a hidden Markov model (HMM) or a Gaussian mixture model (GMM); (2) the reﬁnement stage, in which the boundaries are reﬁned by high time-resolution analysis and reﬁnement process, depending acoustic characteristics or checking rules. The literature [2] introduced a local reﬁnement stage, in which the acoustic features and phonetic information, provided by the forced alignment, are combined to improve the segmental results. In reference [3], a post-reﬁning method with ﬁne contextual-dependent GMMs is used for the automatic segmentation. In reference [4], seven acoustic features, as well as statistical pattern recognition, are adopted to identify the most valuable features for each phonetic group. Among the former methods, most studies on automatic segmentation are based upon a single model, context-dependent or context-independent model. However, strong evidence from [2] points out that a context-dependent model achieves diﬀerent performances with a context-independent model. Moreover, an inherent problem of the single model method is that each boundary is estimated only once. Other than the former methods, this paper has sought to improve the performance using hybrid model in the alignment stage. In this method, each boundary has several estimates. A mapping rule is trained and applied to select a best one from these estimates. The experiment result shows that the proposed method can increase the performance in the alignment stage from 79.7% in 20ms, 86.7% in 30ms to 86.1% in 20ms, 91.8% in 30ms.

The Hybrid Model Method

2.1

Boundary Grouping

As we known, a syllable in Mandarin includes an optional initial and a ﬁnal. The Mandarin tone is always manifested in the part of ﬁnal. Acoustically, a syllable always has a stable inner structure. Therefore, many of Mandarin TTS system adopt a syllable as the basic acoustic unit. For this paper, the segmentation task is to locate the boundary of a syllable (or the starting and ending time). Table 1. Four phonetic categories are deﬁned as fricative and aﬀricate, unaspirated stop, aspirated stop, and voiced consonant in Mandarin consonant phonemes Phonetic Category

SAMPA

Chinese Pinyin

Fricative and aﬀricate Unaspirated stop Aspirated stop Voiced consonant

f x 6 S s t6 t6 h TS TS h ts ts h f h x sh s j q zh ch z c ptk bdg ptk phthkh mnlZ mnlr

It is also known, phonemes has dissimilar acoustic characters in Mandarin. It brings the diﬃculties to develop a general model, which can be used to segment all the possible boundary groups. The classiﬁcation of consonants and vowels should be done separately. Hence, we divide all Mandarin consonants into four

908

X. Yuan et al.

categories according to their acoustic characteristics. They are fricative and affricate, unaspirated stop, aspirated stop, and voiced consonant [4], as listed in Table 1. All the consonants are grouped into four phonetic categories. With regard to the transitions between phonetic categories, the boundaries are further divided into eleven groups in Table 2. The phonetic categories in the left of the boundary include silence and vowel. The phonetic categories in the right embody zero initial, fricative and aﬀricate, unaspirated stop, aspirated stop, and voiced consonant. The boundary structure ”vowel + zero initial”, namely, equals with ”vowel + vowel”. Table 2. Boundaries s are divided into eleven groups according to diﬀerent phonetic categories between the left and the right of the boundaries Group ID Left Phonetic Category Right Phonetic Category B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10

2.2

vowel silence vowel silence vowel silence vowel silence vowel silence vowel

fricative and aﬀricate fricative and aﬀricate unaspirated stop unaspirated stop aspirated stop aspirated stop voiced consonant voiced consonant vowel vowel silence

The Hybrid Models

The boundaries in Mandarin always locate between the previous syllable’s ﬁnal and the following syllable’s initial, both of whose acoustic characteristics will directly impact the positions of boundaries. Therefore, the method is proposed taking into account phonetic groups of the phonemes which are closed to the boundaries, when deciding which model to choose. Firstly, in the hybrid model method, the initial-ﬁnal monophone-based model (IFMM), the semi-syllable monophone-based model (SSMM) and initial-ﬁnal triphone-based model (IFTM) are trained by a large corpus uttered by many speakers. These models are speak-independent (SI) models. All the sentences in the training and test set are used to adapt the SI models as mentioned above. Then, these adapted models are respectively employed to do forced alignment. Secondly, a mapping rule between diﬀerent models and boundary groups is then trained with the training set. All boundaries are categorized into eleven phonetic groups. An adapted model is ﬁnally selected to match a given phonetic group by voting of all the boundaries of this group. Finally, the test set is used to evaluate the mapping rule. The mapping rule can be later modiﬁed continuously according to the feedback from the test set.

HMM for Automatic Segmentation of Mandarin TTS Corpus

909

Since the most suitable model among the diﬀerent phonetic categories were selected, the proposed approach can achieve better performance than using a single model, which can be proved by our experiment.

3 3.1

Experiments Mandarin Speech Corpus

Count of boundaries

A Mandarin speech corpus is uttered by a professional female speaker, recorded in a sound-proof room and sampled in 16 bits, 16000 Hz. For the sake of performance evaluation, the syllables of 4,000 sentences, as our test set, are labeled in a completely manual way. Subsequently another 400 randomly selected manual labeled sentences, as our training set, are used to training our mapping rule. There are 142,493 boundaries in our test set. The distribution of the boundary groups is depicted in Fig. 1. 60000 40000 20000 0

B10

Boundary 46728 11084 9234 9567 4602 2749 13154 2219 10748 3392 29016 Boundary group

Fig. 1. The distribution of the boundary groups in our test set

3.2

Evaluation of Boundaries

Many measures to evaluate segmentation performance are mentioned in [2],[4],[7], such as measuring the word error rate of a recognizer that uses a segmentation stage, or measuring the subjective quality of a speech synthesizer obtained by automatic segmentation. In this paper, the performance was evaluated by comparing the automatic segmentation with the manual segmentation and computing the error rates smaller than a threshold including 20ms and 30ms. In order to analyze the eﬀects on boundary groups brought by various models, the mean errors and root mean square (RMS) errors of boundary shifts, which are computed with the distances between the automatic segmentation and the manual labeled data, are employed. 3.3

Baseline Performance in Several Models

An IFMM, a SSMM and an IFTM SI model are trained with a large speakerindependent corpus using HTK toolkits. Then 4,000 sentences without manually labeled and 400 manually labeled sentences mentioned before are used to adapt

910

X. Yuan et al.

the SI HMM models. With regard to the test set, IFMM, SSMM and IFTM adaptation models are respectively utilized to do forced alignment. The baseline performance is listed in Table 3. Table 3. Baseline performance was evaluated with IFMM, SSMM, IFTM smaller than a threshold including 20ms and 30ms Tolerance IFMM SSMM IFTM 20ms 30ms

3.4

79.7 86.7

77.2 84.5

76.7 83.7

Experimental Results

The experiment is designed to demonstrate the diﬀerent results in the eleven boundary groups among IFMM, SSMM and IFTM. In addition, the mapping rule can be seen in this experiment. Posterior to forced alignment and boundary group decision, the mean and RMS of boundary shifts in IFMM, SSMM and IFTM are illustrated in Table 4. It is clearly to see that IFMM achieves better performance in many groups than Table 4. The mean and RMS of boundary shifts in IFMM, SSMM and IFTM IFMM SSMM IFTM Group Mean RMS Mean RMS Mean RMS B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10

0.007 0.013 0.008 0.008 0.008 0.014 0.018 0.018 0.031 0.018 0.018

0.017 0.030 0.027 0.019 0.016 0.022 0.027 0.032 0.034 0.028 0.025

0.007 0.019 0.009 0.016 0.008 0.015 0.019 0.018 0.028 0.024 0.026

0.014 0.031 0.023 0.019 0.016 0.022 0.021 0.032 0.031 0.042 0.035

0.009 0.026 0.015 0.032 0.010 0.019 0.021 0.020 0.032 0.022 0.014

0.020 0.037 0.023 0.029 0.023 0.024 0.027 0.038 0.044 0.036 0.025

SSMM and IFTM. A theoretical explanation for this observation was presented in [8]. The author claimed that the reason is the loss of alignment between the context-dependent models and the phones [9] during the training process. Moreover, the reason is also given by [2], where it was argued that contextdependent models are always trained with realizations of phones in the same context, the reason why the models have no information to discriminate between

HMM for Automatic Segmentation of Mandarin TTS Corpus

911

the phone and its context. Therefore, IFMM can achieve higher performance in that it is trained with realizations of phones in diﬀerent contexts. Meanwhile, the mapping rule in our experiment is shown in Table 5. Table 5. Mapping rule of the proposed method B0

B10

SSMM

IFMM

SSMM

IFMM

SSMM

IFMM

IFTM

The results shown in the Table 6 provide us the validity of the hybrid model method. Moreover, when comparing the results from the two represented papers [3][4] with single model method, it is observed that the hybrid model method achieves a superior performance than that in the alignment stage. Theoretically, the proper boundary in alignment stage is selected according to the mapping rules, thereby leads to an overall success. Table 6. Performance comparison 20ms 30ms baseline 79.7 proposed method 86.1

86.7 91.8

Conclusion

In this paper, we proposed a hybrid HMM-based method for automatic segmentation of Mandarin TTS corpus. Although, a single HMM-based method can be used to roughly segment the boundaries, we have noticed that for a speciﬁed HMM, the segmentation for the diﬀerent group of boundaries has remarkably diﬀerent performance. As an alternative, the hybrid model automatic segmentation adopts the method of boundary grouping and model selection to improve the accuracy of forced alignment. The experimental results show that the proposed method can achieve the better performance than a single model method. From the experimental results, we also found that the performance of group B8 (vowel followed by vowel) is still not good enough. In the near future, we will develop the new methods to improve the accuracy of group B8 in the stage of reﬁnement.

Acknowledgement The authors would like to thank GuangRi Cui from FTR&D Beijing, for the fruitful discussions and great help in training the mentioned HMMs.

912

X. Yuan et al.

References 1. Cai, L., Huang, D., Cai, R.: Fundamental and Applications of Modern Speech Technology. Tsinghua University Press, Beijing (2003) 2. Toledano, D., G´ omez, L., Grande, L.: Automatic Phonetic Segmentation. IEEE Trans. on Speech Audio Proc, 11 (2003) 617–625 3. Wang, L., Zhao, Y., Chu, M., et al.: Reﬁning Segmental Boundaries for TTS Database Using Fine Contextual-Dependent Boundary Models. In: Proc. of ICASSP, Montreal (2004) 641-644 4. Lin, C., Jang, J., Chen, K.: Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpus for Concatenation-based TTS. Computational Linguistics and Chinese Language Processing, 10 (2005) 145-166 5. Zhu, D., Hu, Y., Wang, R.: Automatic Segmentation and Labeling of Speech Corpus Based on HMM with Adaptation. In. Proc. of ISCSLP (2000) 351-354 6. Tao, J., Hain, H.: Syllable Boundaries Based Speech Segmentation in Demi-Syllable Level for Mandarin with HTK, In: Proc. of Oriental COCOSDA (2002) 7. Cox, S., Brady, R., Jackson, P.: Techniques for Accurate Automatic Annotation of Speech Waveforms, in Proc. of ICSLP (2002) 1947-1950 8. Malfr`ere, F., Deroo, O., Dutoit, T.: Phonetic Alignment: Speech Synthesis Based vs. Hybrid HMM/ANN. In: Proc. of ICSLP, 4 (1998) 1571–1574 9. Wightman, C., Talkin, D.: The Aligner: Text-to-speech Alignment Using Markov Models. In: Progress in Speech Synthesis, Springer-Verlag Inc., New York (1997) 313–323

ICIS: A Novel Coin Identification System Adnan Khashman1, Boran Sekeroglu2, and Kamil Dimililer1 1

Electrical & Electronic Engineering Department 2 Computer Engineering Department Near East University, Lefkosa, Mersin 10, Turkey [emailprotected], [emailprotected], [emailprotected]

Abstract. When developing intelligent recognition systems, our perception of patterns can be simulated using neural networks. An intelligent coin identification system that uses coin patterns for classification helps prevent confusion between different coins of similar physical dimensions. Currently, coin identification by machines relies on the assessment of the coin’s physical parameters. In this paper, a rotation-invariant intelligent coin identification system (ICIS) is presented. ICIS uses a neural network and pattern averaging to recognize rotated coins at various degrees. Slot machines in Europe accept the new Turkish 1-Lira coin as a 2-Euro coin due to physical similarities. A 2-Euro coin is roughly worth 4 times the new Turkish 1-Lira. ICIS was implemented to identify the 2 EURO and 1 TL coins

and the results were found to be encouraging.

1 Introduction Artificial neural networks can be used to simulate our perception of objects and pattern recognition in intelligent machines. We can easily recognize familiar patterns or objects regardless of their size or orientation differences. This is due to our intelligent system of perception which had been trained to recognize the objects over time. Coin identification using pattern recognition has an advantage over the conventional identification methods used commonly in slot machines. Most of the coin testers in slot machines, work by testing physical properties of coins such as size, weight and materials using dimensioned slots, gates and electromagnets. However, if physical similarities exist between coins of different currencies, then the traditional coin testers would fail to distinguish the different coins. One such case is the identification of the 2Euro (EURO) and the new Turkish 1-Lira (TL) coins [1]. The 1 TL coin resembles very much the 2 EURO coin in both weight and size, and both coins seem to be recognized and accepted by slot machines as being a 2 EURO coin, which is roughly worth 4 times more than a 1 TL coin [2], [3]. Several coin recognition systems were previously developed and showed encouraging results. f*ckumi et al [4] described a system based on a rotation-invariant neural network that is capable of identifying Japanese coins. This has the advantage of identifying rotated coins at any degree, however, the use of slabs is time consuming [5]. Other methods for coin identification were also suggested, such as the use of coin surface colour [6] and the use of edge detection of coin patterns [7]. The D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 913 – 918, 2006. © Springer-Verlag Berlin Heidelberg 2006

914

A. Khashman, B. Sekeroglu , and K. Dimililer

use of colour seems to increase the computational costs unnecessarily, whereas edgebased pattern recognition has the problem of noise sensitivity. The aim of the work presented within this paper is to develop and implement an intelligent coin identification system; abbreviated as (ICIS), that uses coin patterns for classification. ICIS uses image processing and pattern averaging to pre-process coin images prior to training a back propagation neural network using these images. ICIS is a rotation-invariant system that identifies obverse and reverse sides of coins rotated by 15o. A real life application will be presented by implementing ICIS to correctly identify the 2 EURO and 1 TL coins.

2 Coin Image Database There are 12 European countries in the euro area, where all 2 EURO coins have the same design on the obverse side but different designs for each European country on the reverse side [8]. The implementation of ICIS involves distinguishing the 2 EURO coins from the 1 TL coin. Five coins are used for this purpose: one 1 TL coin and four 2 EURO coins of Germany, France, Spain and Netherlands as shown in Figure 1.

(a)

(b)

(c)

(d)

Fig. 1. Coin Samples (a) 2 EURO common obverse side (b) 2 EURO reverse sides of Germany, France, Spain and Netherlands (c) 1 TL observe side (d) 1 TL reverse side

Images of the obverse and reverse sides of the five coins were captured using a Creative WebCam (Vista Plus). The coins were rotated at intervals of (15o) degrees as shown in Figure 2, and images of rotated coins were captured. For example, rotation by 15o results in 48 images of the 1 TL coin (24 obverse sides and 24 reverse sides) and 120 images of the 2 EURO coins (24 obverse sides, 24 reverse sides of each coin of Germany, France, Spain and Netherlands). Training the neural network within ICIS uses 28 of the captured images (at 0o, 90o, 180o and 270o degrees rotations) for all coins and different sides. The remaining 140 images of the various coins at different rotations will be used for testing the trained neural network within ICIS. Table 1 shows the number of coin images obtained using rotation interval of 15o degree. Figure 3 shows examples of rotated coins.

ICIS: A Novel Coin Identification System

120o

o 105o 90

75o 60o

135o 150 165

45o

30o

15o

180o 195

915

(a) 255o

0o o

345

210

(b) 105o

330o 315o

225o 300o

240o 255o

270o

285o

Fig. 2. Rotation Degrees of Germany 2 EURO Coin

(d) 270o

Fig. 3. Rotated Coins (a) 2 EURO Obverse Common Side (b) 2 EURO France (c) 2 EURO Spain (d) 1 TL Obverse Side

Table 1. Number of Coin Patterns Using 15 o Rotations 2 EURO images 1 TL images Total

Obverse 24 24 48

Reverse 96 24 120

Total 120 48 168

3 ICIS Implementation The implementation of ICIS consists of two phases: an image processing phase where coin images undergo compression, segmentation and pattern averaging in preparation to be presented to the second phase; which is training a back propagation neural network. Once the neural network converges and learns, the second phase consists only of one forward pass that yields the identification result. 3.1 Image Processing Phase This phase is a data preparation phase for neural network training. Coin images undergo mode conversion, cropping, thresholding, compression, trimming and pattern averaging. The original captured coin image is in RGB color and with the dimensions of 352x288 pixels. First, the mode of the pattern is converted into grayscale. Second, the grey coin image is cropped to an even size image of 250x250 pixels. Third, the cropped grey coin image undergoes thresholding using a threshold value of 135 (as shown in equation 1), thus converting the image into black and white image. Finally, the thresholded coin image is compressed to 125x125 pixels and then trimmed to 100x100 pixels image that contains the patterns of the coin side.

916

A. Khashman, B. Sekeroglu , and K. Dimililer

P[x, y ] ≤ 135 0, if P[x, y ] = ® ¯255, else.

(1)

The 100x100 pixel image will provide the input data for the neural network training and testing. However, in order to provide a faster identification system and yet maintain meaningful learning, the 100x100 pixel image is further reduced to a 20x20 bitmap that represents the original coin image. This is achieved by segmenting the image using segments of size 5x5 pixels, and then taking the average pixel value within the segment, as shown in the following equations.

((

) )

Segi = Sumi / D / 256

(2)

D = §¨ TPx .TPy ·¸ / S © ¹

(3)

where Segi is the segments number, Sumi is the summation of the defined segments and D is the total number of each pixel, TP denotes the x and y pixel size of image and S is the total segment number. Pattern averaging provides meaningful learning and marginally reduces the processing time. For the work presented within this paper, pattern averaging overcomes the problem of varying pixel values within the segments as a result of rotation, thus, providing a rotation invariant system. Using a segment size of 5x5 pixels, results in a 20x20 bitmap of averaged pixel values that will be used as the input for the second phase, namely neural network training and generalization. 3.2 Neural Network Phase ICIS uses a 3-layer back propagation neural network with 400 input neurons, 30 hidden neurons and 2 output neuron; classifying the 1 TL and the 2 EURO coins. This phase comprises training and generalization or testing. 1 2 3

1 2

Coin Pattern

Euro Coin

Turkish Lira Coin

398 399 400 Input Layer

Output Layer

Hidden Layer

Fig. 4. ICIS Neural Network Topology

ICIS: A Novel Coin Identification System

917

Table 2. Final Neural Network Parameters Input Hidden Output Learning Momentum Minimum Training Iterations Time Nodes Nodes Nodes Rate Rate Error 400 30 2 0.0099 0.80 0.001 2005 46 seconds* *using a 2.4 GHz PC with 256 MB of RAM, Windows XP OS and Borland C ++ compiler

The neural network is trained using only 28 coin images of the available 168 coin images. The 28 training images are of rotated coins at (0o, 90o, 180o and 270o degrees) resulting in 8 (1 TL) coin images (4 obverse and 4 reverse sides) and 20 (2 EURO) coin images (4 obverse common side, 4 reverse side of Germany, 4 reverse side of France, 4 reverse side of Spain and 4 reverse side of Netherlands). The remaining 140 coin images are the testing images which are not exposed to the network during training and shall be used to test the robustness of the trained neural network in identifying the coins despite the rotations. During the learning phase, the learning rate and the momentum rate; were adjusted during various experiments in order to achieve the required minimum error value and meaningful learning. An error value of 0.001 was considered as sufficient for this application. Figure 4 shows the topology of the neural network within ICIS. Table 2 shows the final parameters of the trained neural network.

4 Results and Analysis The Intelligent Coin Identification System (ICIS) was implemented using the Cprogramming language. The neural network learnt and converged after 2005 iterations and within 46 seconds. The processing time for the generalized neural network after training and using one forward pass, in addition to the image preprocessing phase was a fast 0.02 seconds. The robustness, flexibility and speed of this novel intelligent coin identification system have been demonstrated through this application. Coin identification results (Table 3) using the training image set yielded 100% recognition as would be expected. ICIS identification results using the testing image sets were successful and encouraging. An overall correct identification of 158 coin images out of the available 168 coin images using all rotation degrees yielded a 94.04% correct identification. This successful result was obtained by using only the 90o rotated coin images for training the neural network. Table 3. ICIS Identification Results Coin Image Set Recognition Rate

Training 20/20 (100 %)

2 EURO Testing 96/100 (96 %)

Combined 116/120 (96.66 %)

Training 8/8 (100%)

1 TL Testing 34/40 (85 %)

Combined Total 42/48 158/168 (87.5 %) (94.04%)

5 Conclusions In this paper, a novel coin identification system is presented. This system, named as (ICIS), uses image preprocessing and a neural network. Image preprocessing is the

918

A. Khashman, B. Sekeroglu , and K. Dimililer

first phase in ICIS and aims at providing meaningful representations of coin patterns while reducing the amount of data for training the neural network; which receives the optimized data representing the coin images and learns the coin patterns. ICIS has been successfully implemented in this paper to identify the 2 EURO and 1 TL coins. The neural network training and generalizing used the Turkish 1-Lira coin and 4 2Euro coins of Germany, France, Spain and Netherlands. This solves a real life problem where physical similarities between these coins led to abusing slot machine in Europe. An overall correct identification result of 94.04% has been achieved, where 158 out of 168, variably rotated coin images, were correctly identified. These results are very encouraging when considering the time costs. The neural network training time was 46 seconds, whereas the ICIS run time for both phases (image preprocessing and neural network generalization) was 0.02 seconds.

References 1. Euro Coins: http://www.arthistoryclub.com/art_history/Euro_coins (2005) 2. Deutsche Welle, Current Affairs, Currency Confusion Helps Smokers, http://www.dwworld.de/dw/article/0,1564,1477652,00.html (04.02.2005) 3. Verband Turkischer Industrieller und Unternehmer: New Turkish Lira Symbolizes Closeness to Europe, Voices of Germany, Press review compiled by the Berlin Office of TÜSIAD, (19.01.2005) 4. f*ckumi, M., Omatu, S., Takeda, F., Kosaka, T.: Rotation Invariant Neural Pattern Recognition System with Application to Coin Recognition, IEEE Transactions on Neural Networks, Vol. 3, No. 2 (1992) 272–279 5. f*ckumi, M., Omatu, S., Nishikawa, Y.: Rotation Invariant Neural Pattern Recognition System Estimating a Rotation Angle, IEEE Transactions on Neural Networks, Vol. 8, No. 3 (1997) 568-581 6. Adameck, M., Hossfeld, M., Eich, M.: Three Color Selective Stereo Gradient Method for Fast Topography Recognition of Metallic Surfaces, Proceedings of Electronic Imaging, Science and Technology, Machine Vision Applications in Industrial Inspection XI, Vol. SPIE 5011 (2003) 128–139 7. Nolle, M. et al: Dagobert – A New Coin Recognition and Sorting System, Proceedings of VIIth Digital Image Computing: Techniques and Applications (2003) 329-338 8. European Central Bank: Euro Banknotes & Coins http://www.euro.ecb.int/en/section/euro0/ coins.html (2005)

Image Enhancement Method for Crystal Identification in Crystal Size Distribution Measurement Wei Liu and YuHong Zhao Institute of Industrial Control, Zhejiang University, 310027 Hangzhou, China {wliu, yhzhao}@iipc.zju.edu.cn

Abstract. The control of crystal size distribution is critically important in crystallization process. Therefore the measurement of crystal size distribution attracts much attention. Image analysis is an advanced method developed recently for crystal size distribution estimation. A feasible image enhancement method is proposed for crystal identification in crystallization image by using histogram equalization and Laplacian mask algorithm sequentially. The experiments result indicates that the effect of the crystal image can be improved obviously, and the crystals can be identified more easily and exactly.

1 Introduction Control of crystallization processes is critical in a number of industries, including microelectronics, food, and pharmaceuticals, which constitute a significant and growing fraction of the world economy [1]. In the pharmaceutical industry, the primary bottleneck to the operation of production-scale drug manufacturing facilities is associated with difficulties in controlling the size and shape distribution of crystals produced by complex crystallization process. Since Crystal Size Distribution (CSD) is the key controlled variable, an accurate measurement of the crystal size distribution is extremely important for crystallization control. The method of measuring CSD customarily used is sieving samples first and then analyzing the size distribution using a Coulter Counter. Another method used currently is Focused Beam Reflectance Measurement (FBRM), which is brought up according to Laser Scattering Theory. However, both of these methods are not satisfying enough in veracity and facility [2]. An advanced method proposed recently is based on image analysis. Improvements in image processing have made online video microscopy a promising technology to achieve these measurements [3]. After obtaining the crystal image in the crystal solution by a microscope in an appropriate scale, the major axis of the best-fit ellipse (approximate crystal length) could be lined out either manually or automatically. Then the length of the axis in the image can be calculated by distance measurement methods, and the real crystal size can be obtained through scale conversion. However, because of characteristics of the crystal image, it is not easy to identify the crystals correctly and completely. So image enhancement is needed. One of the primary characteristics of digital image processing is that there is a serious pertinence between processing algorithms and characteristics of the image. In other words, a certain D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 919 – 924, 2006. © Springer-Verlag Berlin Heidelberg 2006

920

W. Liu and Y. Zhao

algorithm just has good effect on several limited types of image. For others it won’t have the same effect, or even make the effect worse [4]. So it is a key problem to find out a proper and feasible image enhancement method for crystal identification. Since the automatic identification technique has not been developed maturely, most measures depend on manual work. In this paper an image enhancement method is provided for identifying the crystal manually. Histogram equalization and Laplacian mask are operated on the crystal images synthetically. As a result, the target parts of the image are protruded and the crystals can be identified easily. A large amount of crystal images have been processed by the proposed enhancement method, all experiments have shown good effects. Due to the length restrictions, only two of them are given in this paper to demonstrate the performance of the proposed method. In the following, the characteristic of crystal image is analyzed and the enhancement method is described in section 2. Histogram equalization methods and Laplacian mask algorithms are introduced respectively in section 3 and 4. The results of each processing are provided simultaneously. The conclusions are drawn and the further research is indicated in the last section.

2 Characteristics and Enhancement Method of Crystal Image Two typical crystal image obtained from the crystal solution are depicted in Figure 1 as (a) and (b). Their gray histograms are depicted in Figure 2 as (a) and (b) correspondingly.

(a)

(b)

Fig. 1. Two typical crystal images

From the figures above, the characteristics of a crystal image can be derived that: 1) The gray values concentrate in a narrow area and put up an unimodal-distribution, which results in an unobvious contrast on the edge of the crystals. So it is hard to find a division point to segment the image into foreground and background by the computer. 2) Most crystals in the image partly have the same gray values as the background. 3) For a crystal itself, the image has obviously different gray values, and generally has obvious edges inside because of the existing of different crystal planes.

Image Enhancement Method for Crystal Identification in CSD Measurement

921

Because of these characteristics of the crystal image, there has not been a general and effective method to identify the crystal in the image automatically. Identifying the crystals manually is most general considering the correctness of the result. However even for human eyes it is still not easy to identify the crystal correctly, especially in an image such as (a) in Figure 1. So an enhancement method is necessary to make the crystals identified more easily by manual work.

(a)

(b)

Fig. 2. Gray histograms of Figure 1

According to the characteristics discussed above, a feasible image enhancement method is proposed. Histogram equalization is adopted first to give the significant contrast difference of the images, and then Laplacian mask algorithm is used to make the images visibly sharper.

3 Histogram Equalization Histogram equalization is an image enhancement method based on cumulative distribution function (CDF) transformation. Let the variable r represent the gray levels of the image to be enhanced, and pr (r ) denotes the probability density functions of random variable r. The CDF transformation represents as the following: r

s = T (r ) = ³ pr (ω )dω 0

(1)

where ω is a dummy variable of integration. This transformation function satisfies the following conditions: (a) T (r ) is single-valued and monotonically increasing in the interval 0r1; (b) 0T(r)1 for 0r1 It can be demonstrated that the transformation function given in Eq.1 yields s characterized by a uniform probability density function. The transformation enlarges the gray value area. The transformation results of Figure 1 are depicted in Figure 3. While the gray histograms after transformation is depicted in Figure 4 correspondingly.

922

W. Liu and Y. Zhao

A conclusion can be drawn from these figures that the histograms have already spread to the full spectrum of the gray scale and the edges of the crystals are more obvious. As a result the crystals in the image are more clearly to identify. However, it is still not easy to line out the crystals manually especially in (a) of Figure 3. So further processing is needed. It is found in the experiment that Laplacian mask algorithms processing is an appropriate method.

(a)

(b)

Fig. 3. Transformation results of Figure 1

(a)

(b)

Fig. 4. Gray histograms of Figure 3

4 Image Enhancement Using Laplacian Mask Algorithm Laplacian is the simplest isotropic derivative operator. And it is used in the imaging sharpening frequently. For a function (image) f ( x, y ) of two variables, the discrete form of Laplacian is expressed as

Image Enhancement Method for Crystal Identification in CSD Measurement

∂ 2 f ( x, y ) ∂ 2 f ( x, y ) + ∂x 2 ∂y 2 = f (i + 1, j ) + f (i − 1, j ) + f (i, j + 1) + f (i, j − 1) − 4 f (i, j )

∇2 f =

923

(2)

In practice, this is usually implemented with one pass of a single mask. The coefficients of the single mask can be obtained by the following equation: g (i, j ) = f (i, j ) − ∇ 2 f (i, j ) = 5 f (i, j ) − f (i + 1, j ) − f (i − 1, j ) − f (i, j + 1) − f (i, j − 1)

(3)

The enhancement results of the original images (Figure 1) and the histogram-equalized images (Figure 3) are depicted in Figure 5 and 6 respectively.

(a)

(b)

Fig. 5. Laplacian mask enhancement results of original images

(a)

(b)

Fig. 6. Laplacian mask enhancement results of histogram-equalized images

Contrasted with the flat background, most of the crystals in each image are embossed in vision by Laplacian mask. However, the crystals in the images which have been processed by histogram equalization before are more clearly. So using these two

924

W. Liu and Y. Zhao

methods synthetically to process crystal images can improve the visual effect obviously. After processed by these two steps, crystals in the images are much easier to identify and line out manually.

5 Conclusions Crystal size distribution is important for the control of crystallization. Image analysis becomes an attractive method for the measurement of the CSD. The characteristics of the crystal image are analyzed in this paper, and a feasible digital image processing method is provided accordingly. Histogram equalization methods and Laplacian mask algorithms are used synthetically, as a result, the target crystals can be easily identified. The comparison between the images processed by the proposed method and the original images as well as the images processed by just one single method demonstrate that the image effect has been obviously improved. After these processing, the crystals can be lined out more easily and correctly either manually or automatically, and the real crystal size can be obtained through distance measurement algorithms and scale conversion. For the purpose of the control of the CSD, a further study will be focused on the automatic identification of the crystals. However, the processing method provided in this paper is also a necessary step in the research.

Acknowledgements The project is supported by National Natural Science Foundation of China (No.60503065).

References 1. Richard, D.B.: Advanced Control of Crystallization Processes. Annual reviews in control, Vol 26 (2002) 87-99 2. Daniel, B.P.: Crystal Engineering Through Particle Size and Shape Monitoring, Modeling, and Control. Ph.D Dissertation, University of Wisconsin-Madison (2002) 3. Paul, L., James, R.: Crystallization Studies using In-situ, High-Speed, Online Video Imaging. TWMCC (2004) 4. Lang, R.: Digital Image Processing & achieving by VC++. Beijing Hope Electronic Press, Beijing (2002) 5. Rafael, C.G., Richard E.W.: Digital Image Processing. Publishing House of Electronics Industry, Beijing (2002)

Image Magnification Using Geometric Structure Reconstruction Wenze Shao1 and Zhihui Wei2 1

Department of Computer Science and Engineering, Nanjing University of Science and Technology, 210094 Nanjing, China [emailprotected] 2 Graduate School, Nanjing University of Science and Technology, 210094 Nanjing, China [emailprotected]

Abstract. Though there have been proposed many magnification works in literatures, magnification in this paper is approached as reconstructing the geometric structures of the original high-resolution image. The structure tensor is able to estimate the orientation of both the edges and flow-like textures, which hence is much appropriate to magnification. Firstly, an edge-enhancing PDE and a corner-growing PDE are respectively proposed based on the structure tensor. Then, the two PDE’s are combined into a novel one, which not only enables to enhance the edges and flow-like textures, but also to preserve the corner structures. Finally, the novel PDE is applied to image magnification. The method is simple, fast and robust to both the noise and the blocking-artifact. Experiment results demonstrate the effectiveness of our approach.

1 Introduction Image magnification is mainly to produce the original high-resolution (HR) image from a single low-resolution (LR) and perhaps noisy image. Taking into account the insufficient density of the imaging sensor, it is reasonable to consider the observation model: g = Du + n , where g , u , and n are respectively the column-ordered vectors of the M × N LR image g , the original qM × qN HR image u , and the additive random noise n . Besides, the variable q represents the undersampling factor, and the matrix D describes the nonideal sampling process, i.e., first local-average and then down-sampling. There have been proposed many magnification algorithms [1, 2, 3, 4, 5, 6] in literatures, while currently the PDE-based level-set approaches [7, 8] are the most popular choices, which are fast, edge-enhancing and robust to the noise. Generally, the levelset magnification approaches can be unified to the following PDE ∂u / ∂t = c1D 2u (η ,η ) + c2 D 2u (ξ , ξ ) .

with the initial image as the bilinear or bicubic interpolation of the LR image. In the above PDE, η = Du / | Du | , ξ = Du ⊥ / | Du | are orthonormal vectors in the direction of gradient and tangent respectively, and c1 , c2 are the diffusivity functions of the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 925 – 931, 2006. © Springer-Verlag Berlin Heidelberg 2006

926

W.Z. Shao and Z.H. Wei

gradient | ∇u | in each direction. Nevertheless, as for the case of nonideal sampling, image sharpening has to be incorporated into the interpolation process. Therefore, we propose the following PDE through incorporating in the shock filtering [9, 10] ∂u / ∂t = c1D 2u (η ,η ) + c2 D 2u (ξ , ξ ) − β sign( D 2u (η ,η )) | ∇u | .

(1)

where β is a positive controlling parameter. PDE (1) essentially magnifies images driven by the level curves, however, level curves do not capture all the geometric information that one desires to analyze the image content. Hence, more flexible PDE’s should be exploited to tackle with different geometric structures. Image magnification in this paper is approached as reconstructing the geometric structures of the original HR image. The structure tensor is able to estimate the orientation of both the edges and flow-like textures, which hence is much appropriate to magnification. Firstly, an edge-enhancing PDE and a corner-growing PDE are respectively proposed based on the structure tensor. Then, the two PDE’s are combined into a novel one, which not only enables to enhance the edges and flow-like textures, but also to preserve the corner structures. Finally, the novel PDE is applied to image magnification. The method is simple, fast and robust to both the noise and the blockingartifact. Experiment results demonstrate the effectiveness of our approach. The paper is organized as follows. Section 1 gives a brief review of the previous PDE-based magnification algorithms. In section 2, an edge-enhancing PDE and a corner-growing PDE are respectively proposed based on the structure tensor, and then the two PDE’s are combined into a novel. Subsequently, the novel PDE is applied to image magnification in section 3. The conclusion is finally given in section 4.

2 Edge-Enhancing PDE and Corner-Growing PDE To estimate the orientation of local geometric structures, Weickert [11] proposed the well-known structure tensor J ρ ( ∇uσ ) = Gρ ∗ ( ∇uσ ⊗ ∇uσ ) , ρ ≥ 0 .

(2)

denoted as ( J m, n ) m =1,2;n =1,2 , where uσ is the regularized version of u with a Gaussian kernel N (0,σ 2 ) , making the edge detection insensitive to the noise at scales smaller than σ ; the tensor ∇uσ ⊗ ∇uσ is convolved by a Gaussian kernel N (0, ρ 2 ) , making the structure analysis more robust to the flow-like structures and noises. The matrix J ρ is symmetric and definite semi-positive, and hence has orthonormal eigenvectors, denoted as w and w⊥ respectively. The vector w , defined as § w=¨ ¨ J 22 − J11 + ©

2 J12

( J 22 − J11 )

· ¸ , w = w/ | w | . + 4 J122 ¸¹

(3)

points in the direction with the largest contrast, and the orthonormal vector w⊥ points in the structure direction. Their corresponding eigenvalues μ and μ ⊥ can be used as descriptors of local structures: Constant areas are characterized by μ = μ ⊥ = 0, straight

Image Magnification Using Geometric Structure Reconstruction

927

edges by μ μ ⊥ = 0, and corners by μ ≥ μ ⊥ > 0. Besides, ( μ − μ ⊥ )2 is the measure of the local coherence, represented by ( μ − μ ⊥ )2 = ( J 22 − J11 ) 2 + 4 J122 . 2.1 Edge-Enhancing PDE Based on the two eigenvectors w and w⊥ provided by the structure tensor J ρ , we generalize PDE (1) to the following PDE ∂u / ∂t = c1D 2u ( w, w) + c2 D 2u ( w⊥ , w⊥ ) − β sign( D 2u ( w, w)) | ∇u | .

(4)

where β is a positive parameter, c1 and c2 are respectively the diffusivity functions of the local coherence ( μ − μ ⊥ )2 . Particularly, when c1 is a monotonically decreasing function ranged from 1 to 0, and c2 is equal to 1, the behavior of PDE (4) is easy to interpret. On hom*ogeneous regions of u , the function c1 has values near 1. Then the first two terms of the PDE will be combined into Δu , yielding the isotropic diffusion. As for regions with edges and flow-like textures, the function c1 has values near 0 and hence the first term of PDE (4) vanishes. The second term forces PDE (4) to smooth the image in the structure direction, and therefore preserves the edges and flow-like textures. The last term in PDE (4) corresponds to the shock filtering for image sharpening, while the vector η in PDE (1) is replaced by the vector w . In fact, we notice that the same modification has also been proposed in [12]. Though PDE (4) may perform well on preserving the edges and flow-like textures, while how about the corner structures. Fig. 2(d) shows that PDE (4) is not capable of preserving the corner structures. 2.2 Corner-Growing PDE To overcome the blurring effect of corner structures in image diffusion, we propose the following PDE for corner growing ∂ u / ∂ t = c 3 (∇ u ) T ⋅ (∇ ⋅ ( w ⊥ ⊗ w ⊥ )) .

(5)

where c3 is a positive controlling parameter, and the divergence operator ∇ for a 2×2 matrix M is defined as § ∇ ⋅ (m1 1 m12 ) T · § m11 m12 · , M =¨ ∇⋅M =¨ ¸. T¸ © ∇ ⋅ (m 21 m 22 ) ¹ © m2 1 m2 2 ¹

Here, we demonstrate the performance of PDE (5) utilizing an interesting experiment as shown in Fig. 1. Obviously, Fig. 1(b) shows that PDE (5) plays the role of corner growing, and in fact, the rate of corner growing is determined by both parameters ρ and c3 . Therefore, we combine PDE’s (4) and (5), obtaining a novel PDE (6) which not only enables to enhance the edges and flow-like textures, but also to preserve the corner structures c1D 2u ( w,w) + c 2 D 2u ( w ⊥ ,w ⊥ ) + c 3 (∇ u ) T ⋅ (∇ ⋅ ( w ⊥ ⊗ w ⊥ )) ½ ∂u / ∂t = ® ¾. 2 ¯− β si gn( D u ( w,w)) | ∇ u | ¿

(6)

928

W.Z. Shao and Z.H. Wei

Other than PDE (6), Weickert [11] proposed the following PDE ∂ u / ∂ t = ∇ ⋅ ( D( J ρ (∇ u σ ))∇ u ) .

Where D is defined as D( J ρ (∇uσ )) = c1 ( w ⊗ w) + c2 ( w⊥ ⊗ w⊥ ) , called diffusion tensor. To achieve image sharpening, the modified shock filtering can be also incorporated in the above PDE, obtaining ∂ u / ∂ t = ∇ ⋅ ( D( J ρ (∇ u σ ))∇ u ) − β si gn( D 2u ( w,w)) | ∇ u | .

(7)

Nevertheless, PDE (6) is more powerful than PDE (7) in preserving corner structures. Experiment results shown in Fig. 2 tell us the truth.

(a)

(b)

(c)

(d)

(e)

Fig. 1. (a) Original image, (b) Diffuse Fig.1 (a) with PDE (5), (c) Gaussian noisy image ( μ = 0, σ = 10), (d) Diffuse Fig.1 (c) with PDE (4), (e) Diffuse Fig.1 (c) with PDE (6)

(a)

(b)

(c)

(d)

Fig. 2. (a) Original image, (b) Convolved image by the Gaussian kernel ( σ = 2), (c) Diffuse Fig.2 (b) with PDE (7), (d) Diffuse Fig.2 (b) with PDE (6)

3 Geometry-Driven Image Magnification In this section, we make use of PDE (6) for image magnification, with u ( x,0) = u0 ( x ) as the initial image (bilinear or bicubic interpolation of the LR image g ). By now, each term in PDE (6) has had its corresponding physical meaning in magnification: the first and second term combine to perform the isotropic diffusion in the hom*ogeneous regions; the second term plays the role of smoothing the blocking artifacts in the structure direction; the third term plays the role of preserving the corner structures; while the fourth term overcomes the blurring effect introduced in the interpolation process. Since PDE (6) considers almost all of the geometric structures in images, which hence is much more appropriate to magnification than the level-set approaches.

Image Magnification Using Geometric Structure Reconstruction

929

For a vector ϖ = (ϖ 1 ,ϖ 2 )T , D 2u (ϖ ,ϖ ) = (ϖ 1 ) 2 u xx + 2ϖ 1ϖ 2u xy + (ϖ 2 ) 2 u yy . And as for the first-order partial derivatives ∂ x and ∂ y , they are calculated by recently proposed optimized derivative filters [13], with properties of rotation invariance, accuracy and avoidance of blurring effects. The corresponding numerical scheme of PDE (6) is as follows: °c1D 2 (uσt ) x ( w, w) + c2 D 2 (uσt ) x ( w⊥ , w⊥ ) + c3 (∇u xt ) T ⋅ (∇ ⋅ ( w⊥ ⊗ w⊥ )) ½° u xt +1 = u tx + τ ® ¾. °¯ + β (− sign( D 2 (uσt ) x ( w, w))) ∇u xt ¿°

(8)

Then, PDE (6) can be implemented by the following steps: 1. Calculate the initial image u0 ( x ) using bilinear interpolation; 2. Calculate the structure tensor J ρ (∇uσ ) = Gρ ∗ (∇uσ ⊗ ∇uσ ) using (2); 3. Calculate the dominate vector w using (3); 4. Calculate the local coherence and diffusivity functions c1 and c2 ; 5. Calculate (uσ ) x , (uσ ) y , (uσ ) xx , (uσ ) xy , (uσ ) yy , D 2uσ ( w, w) , and D 2uσ ( w⊥ , w⊥ ) ; 6. Calculate u x , u y , | ∇u | and (∇u )T ⋅ (∇ ⋅ ( w⊥ ⊗ w⊥ )) ; 7. Update the iteration process using (8) (The number of iteration step is T).

(a)

(b)

(c)

Fig.3. (a) Original image, (b) Level-set approach ( T = 20,τ = 0.24, c = 1, c2 = 1, β = 0.15 ), (c) Our proposed approach ( T = 20,τ = 0.24,σ = 1.5, ρ = 2, c = 1, c2 = 1, c3 = 1.5, β = 0.15 )

(a)

(b)

(c)

Fig. 4. Magnified portions corresponding to (a) Fig. 3(a), (b) Fig. 3(b), (c) Fig. 3(c)

930

W.Z. Shao and Z.H. Wei

The diffusivity function c1 in PDE’s (1) and (6) is chosen as c1 (t ) = c /(1 + t 2 ) for gray images ranged from 0 to 255, and c2 is defined as a variable for simplification. Hence, there are overall 8 parameters in the numerical scheme (8): T,τ ,σ , ρ , c, c2 , c3 , β . T is the number of iteration step, determined by the undersampling factor q and the noise level; τ is the size of each iteration step, bounded in the interval (0, 0.25) for numerical stability; σ is defined to regularize the noisy image, mainly determined by the noise level; ρ controls the size of neighborhood, large neighborhoods making the estimation of the structure orientation more robust to the interrupted lines, texture details, and random noises; c controls the strength of isotropic diffusion, also determined by the noise level; c2 controls the strength of diffusion in the structure direction, mainly determined by the undersampling factor q ; c3 controls the strength of enhancing the corner structures, determined by c1 , c2 and q ; β controls the strength of image sharpening, determined by c , c1 , c2 and q . Here, we only try to give some empirical intervals of the above parameters: τ ∈ (0, 0.24] , σ ∈ (0, 3] , ρ ∈ (1, 3] , c ∈ [1, 2] , c2 ∈ [1, 5] , c3 ∈ [1, 2] , and β ∈ [0.1, 0.3] . The shock filtering-incorporated level-set approach (1) and our proposed PDE (6) are utilized for image magnification. The initial guess for both PDE (1) and PDE (6) is chosen as the bilinear interpolation of the LR image. Fig. 3 shows the magnification results when the undersampling factor is 4. And Fig. 4 shows the portions of Fig. 3. Obviously, our proposed approach achieves much better visual quality compared with the level-set approach. As a matter of fact, the level-set approach not only removes the corners, but also shortens the level curves of the original HR image.

4 Conclusions The paper proposed an alternative PDE approach for image magnification based on the proposed edge-enhancing PDE and corner-growing PDE, which is not only capable of enhancing the edge structures, but also preserving the corner structures. The method is simple, fast and robust to both the noise and the blocking-artifact. Experiment results demonstrate the effectiveness of our approach.

References 1. Blu, T., Thévenaz, P., Unser, M.: Linear Interpolation Revisited. IEEE Transactions on Image Processing, 13 (2004) 710-719 2. Li, X., Orchard, T.: New Edge-Directed Interpolation. IEEE Transactions on Image Processing, 10 (2001) 1521-1527 3. El-Khamy, S. E., Hadhoud, M. M., Dessouky, M.I., Salam, B. M., El-Samie, F. E.: Efficient Implementation of Image Interpolation as an Inverse Problem. Digital Signal Processing, 15 (2005) 137-152 4. Schultz, R. R., Stevenson, R. L.: A Bayesian Approach to Image Expansion for Improved Definition. IEEE Transactions on Image Processing, 3 (1994) 233-242 5. Guichard F., Malgouyres F.: Total Variation based Interpolation. EUSIPSO'98, 3 (1998) 1741-1744

Image Magnification Using Geometric Structure Reconstruction

931

6. Chan, T. F., Shen, J. H: Mathematical Models for Local Nontexture Inpaintings. SIAM J. Appl. Math., 62(3) (2002) 1019-1043 7. Belahmidi, A., Guichard, F.: A Partial Differential Equation Approach to Image Zoom. Proceedings of International Conference on Image Processing, (2004) 8. Morse, B. S., Schwartzwald, D.: Isophote-based Interpolation. In 5th IEEE International Conference on Image Processing, (1998) 9. Osher, S. J., Rudin, L. I.: Feature-Oriented Image Enhancement Using Shock Filters. SIAM J. Numer. Anal, 27 (1990) 919-940 10. Alvarez, L., Mazorra, L.: Signal and Image Restoration Using Shock Filters and Anisotropic Diffusion. SIAM J. Numer. Anal., 31 (2) (1994) 590–605 11. Weickert, J.: Coherence-Enhancing Diffusion Filtering. International Journal of Computer, 31 (2/3) (1999) 111–127 12. Weickert, J.: Coherence-Enhancing Shock Filters. Pattern Recognition. Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, 2781 (2003) 1–8 13. Weickert, J.: A Scheme for Coherence-Enhancing Diffusion Filtering with Optimized Rotation Invariance. Journal of Visual Communication and Image Representation, 13 (1/2) (2002) 103–118

Image-Based Classification for Automating Protein Crystal Identification Xi Yang1, Weidong Chen1, Yuan F. Zheng1, 2, and Tao Jiang3 1

Department of Automation, Shanghai Jiao Tong University, Shanghai, 200240, China 2 Electrical & Computer Engineering, The Ohio State University, USA 3 National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Science, Beijing 100101, China [emailprotected]

Abstract. A technology for automatic evaluation of images from protein crystallization trials is presented in this paper. In order to minimize the interference posed by the environmental factors, the droplet is segmented from the entire image first. The algorithm selects different features, which are derived from the pixels within the droplet, and obtains a 16-dimensional feature vector which will then be fed to the classifier to make a classification. Each image is classified into one of the following classes: “Clear”, “Precipitate” and “Crystal”. We have achieved an accuracy rate of 84.8% with our algorithm.

1 Introduction The analysis of the protein structure is an important component of protein crystallography, which has been one of the most popular research areas in recent years. Study of the function of protein crystal helps us to understand the mechanism of the protein as well as the interplay between protein molecule and other molecules [1]. The high-throughput protein crystallization system can prepare thousands of trials per day. Conventionally, the outcomes of the protein crystallization trials are assessed by human experts. This procedure is slow and inefficient. Therefore, an automatic technology needs to be studied to replace the manual work. Several methods have been proposed by other researchers [2 - 5]. The best result was achieved by Bern et al. in 2004 [6]. However, when their algorithm is applied to our image set, the accuracy rate is not acceptable. In this paper, we propose an automatic protein crystallization classification algorithm, which is based on the digital image processing technology. All the image samples obtained from the protein crystallization equipment are classified into 3 different classes, called “Clear” – no substance is produced, “Precipitate” – the primary substances produced are precipitates, and “Crystal” – the primary substances produced are crystals.

2 Methodology The procedure of the algorithm, as shown in Fig. 1, consists of 3 steps, including image segmentation, feature extraction and classification. Otsu automatic threshold [7], D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 932 – 937, 2006. © Springer-Verlag Berlin Heidelberg 2006

Image-Based Classification for Automating Protein Crystal Identification

933

Canny edge detection [8] and Active Contour Model (ACM) [9] are utilized to locate the boundary of the droplet. Image features are derived by calculating the Gray Level Co-occurrence Matrix (GLCM) [10], Hough Transform and Discrete Fourier Transform (DFT) of all the pixels belonging to the droplet. The classification procedure is divided into two stages; each of which is a two-class problem.

Fig. 1. Procedure of the algorithm

3 Algorithm 3.1 Image Segmentation In our algorithm, firstly, the image is converted from gray scale to binary image. The threshold used is computed by the Otsu algorithm. Then an ACM is applied to the image. ACM is defined as a dynamic contour which can change its shape, based on its energy function, to adapt to the local feature, for example, the boundary of the droplet. The energy function of ACM can be expressed as equation (1):

E = E int + E ext

(1)

Eint is the internal energy based on the shape of the dynamic contour, and Eext is the external energy based on the local feature of the image. Interested readers are referred to [9] for the extra formulas. The internal energy will smooth the dynamic contour and the external energy will adapt it to the edges detected in the image when minimizing the energy function. The ACM should be initialized with a position, from which it begins to change its shape. The minimum circle surrounding the connected component, which has the maximum area, is selected as the initial location of the contour. Canny edge detection is employed to detect the edge of the droplet, which is taken as the final position of the dynamic contour. After several iterations, the contour will converge to the boundary of the droplet, and all the pixels within the contour can be segmented from the image. The procedure of image segmentation is shown in Fig. 2.

(a)

(b)

(c)

(d)

(e)

Fig. 2. Procedure of the segmentation. (a): original image; (b): binary image converted from (a); (c): initial circle; (d): image with edges detected; (e): the nodes represent the boundary detected by ACM.

934

X. Yang et al.

3.2 Feature Extraction The classification is made based on the features extracted from the image. At first, individual features are extracted from the image, and then a feature vector is formed by these individual features. The features used include the texture, geometry and frequency features derived from the pixels inside the droplet. A detailed description of the features is presented as follows: A notable characteristic of the image is the texture features based on the statistical analysis of the image. The images with and without crystals will present different texture features in terms of contrast, correlation, etc. Texture features are obtained by computing the GLCM of the pixels belonging to the droplet. The GLCM is defined by the following equation:

§ f ( x + Δx, y + Δy) = j · ½ ° ° ¨ ¸ p(i, j) = # ®( x, y) | f ( x, y) = i, and ¨ or , x , y 0,1 N 1 = − …… (2) ¾ ¸ ° ¨ f ( x − Δx, y − Δy) = j ¸ ° © ¹ ¿ ¯ p(i, j ) is the element of the GLCM. x and y are the coordinates of each pixel, and f ( x, y ) represents the gray scale value of that pixel. # {Ω} means the number of elements within the set. Four properties can be computed from the GLCM: Entropy, Energy, Contrast and Correlation, which are selected as the texture features together with the mean value and the standard deviation of the gray scale values of all the pixels within the droplet. Another significant property of the image is the straight lines detected in the droplet. It can be seen that the edges of the crystals are always presented as straight lines while the edges of precipitates are usually curves. Hough Transform is utilized to detect the straight lines. Two values are taken as the geometry features, as ever mentioned by Cumbaa et al. [5], the total length and the maximum length of the lines detected in the image. Note that for the protein crystals, their edges are always clear and sharp while for the precipitates, their edges are always fluffy and smooth. As a result when the images are converted from the spatial to frequency domain, the images with precipitates will have more energy at high frequency components than the images with crystals. We select the mean values and the standard deviations of the image energy at four different frequency bands as the frequency features. 3.3 Classification

The classification is formed by two steps. In the first step, each image needs to be classified into “Clear” – no substance is generated or “Not Clear” – something (either precipitates or crystals) is generated. In the second step, the images which are labeled as “Not Clear” are classified into “Precipitate” and “Crystal”. The parameters used in the first step are defined as follows: A1: the number of grids, whose gray scale standard deviation exceed TC1. A2: the number of grids, whose gray scale standard deviation exceed TC2. A3: the number of grids, whose entropy exceed TC3. K: The number of connected components detected within the binary image. TC1, TC2 and TC3, TA1, TA2 and TA3 are manually determined thresholds.

Image-Based Classification for Automating Protein Crystal Identification

935

Firstly, the image is divided into several 30 × 30 pixels grids. If the A1 value of the image exceeds TA1, the image is marked as “Not Clear”. Otherwise, the image is converted from gray scale to binary image by the self-adaptive threshold algorithm. We detect the connected components within the binary image, and compute K. If K is not zero, then draw several rectangles surrounding each connected component, and divide each rectangle into 5 × 5 grids. If the A2 value of either rectangle exceeds TA2, the image is “Not Clear”. If K is zero or A2 is smaller than TA2, again, divide the segmented image into several 30 × 30 pixels grids. If the A3 value of the image is greater than TA3, then the image is “Not Clear”. Otherwise, the image is “Clear”. The flowchart shown in Fig. 3 describes the algorithm performed in the first step.

Fig. 3. The flowchart of the algorithm used in the first step of the classification

In the second step, a Fisher classifier, which has been trained by a human labeled learning set, is employed to make the classification. The learning set consists of images with crystals, called positive samples, and images with precipitates, called negative samples. The 16-dimensional feature vector f can be obtained from each image. Compute a project vector w , which should satisfy the following condition: when each f is projected onto w , the positive and the negative samples should be maximally separated. For an image with unknown class, compute its feature vector f , if f ⋅ w exceeds a scalar quantity l , then label the image as “Crystal”. Otherwise, label

the image as “Precipitate” as shown in equation (3). The scalar quantity l can be determined by prior knowledge.

≥ l → Crystal ¯< l → Precipitate

f ⋅w®

(3)

4 Experimental Results The learning set is formed by 10 images with crystals, and 10 images with precipitates. The testing set comes from a combination of 52 “Clear” images, 12 “Precipitate” images and 46 “Crystal” images. The experiment is performed on a PC with Windows XP operating system, and the CPU is AMD 2500+. With the project vector w derived from the learning set, we achieve a result as shown in Table 1:

936

X. Yang et al. Table 1. Result of the experiment

Detected True

“Clear”

“Precipitate”

“Crystal”

“Clear” (52) “Precipitate” (12) “Crystal” (46)

82.7% (43) 8.3% (1) 2.2% (1)

1.9% (1) 58.3% (7) 13.0% (6)

15.3% (8) 33.3% (4) 84.8% (39)

Typical images processed in the experiment are shown in Fig. 4.

(a)

(b)

(c)

(d)

(e)

Fig. 4. Typical images processed in the experiment. (a) (b) and (c) can be classified correctly, where (a) is “Clear”, (b) is “Crystal”, and (c) is “Precipitate”; (d): “Clear” image is classified as “Crystal” due to the light reflection as shown in the block; (e): image which contains grainy crystals is falsely classified as “Precipitate”.

5 Conclusion The algorithm proposed in this paper is proved to be effective and efficient. 84.8% “Crystal” images can be recognized correctly in the experiment. In order to increase the accuracy rate, new features should be considered, for example, the ones suggested by Bern et al. [6], corners, transparency and closed outer contours. Besides DFT used in our algorithm, wavelet transform can also be utilized to obtain more information. Finally, although the images with crystals can be differentiated from those with precipitates, the capability and quality of each protein crystallization trial are still unknown. The number of the crystals generated and the size of each crystal need to be studied to evaluate the performance of each trial in the future.

Acknowledgement This work is partly supported by the National Hi-Tech Research and Development Program under grant 2005AA420010.

References 1. Abola, E., Kuhn, P., Earnest, T., Stevens, R.: Automation of X-ray Crystallography. Nature Structural Biology, 7 (2000) 973-977 2. Wilson, J.: Towards The Automatic Evaluation of Crystallization Trials. Acta Crystallographica D, vol. 58 (2002) 1907-1914

Image-Based Classification for Automating Protein Crystal Identification

937

3. Spraggon, G., Lesley, S. A., Kreusch, A., Prestle, J. P.: Computational Analysis of Crystallization Trials. Acta Crystallographica D, vol. 58 (2002) 1915-1923 4. Jurisica, I., Rogers, P., Glasgow, J. I., Fortier, S., Luft, J. R., Woilfley, J.R.: Intelligent Support for Protein Crystal Growth. IBM System Journal, vol. 40, no. 2 (2001) 394-409 5. Cumbaa, C. A., Lauricella, A., Fehrman, N., Veatch, C.: Automatic Classification of Submicrolitre Protein-crystallization Trials in 1536-well Plates. Acta Crystallographica D, vol. 59 (2003) 1619-1627 6. Bern, M., Goldberg, D., Stevence, R. C., Kuhn, P.: Automatic Classification of Protein Crystallization Images Using A Curve-tracking Algorithm. Journal of Applied. Crystallography D, vol. 37 (2004) 279-287 7. Otsu, N. A.: Threshold Selection Method from Gray-level Histograms. IEEE Trans. Systems, Man and Cybernetics, vol. 9, no. 1 (1979) 62-66 8. Canny, J.: A Computational Approach to Edge Detection. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 6 (1986) 679-698 9. Kaas, M., Witkins, A., Terzopolus, D.: Snakes-Active Contour Models. International Journal of Computer Vision, vol. 1, no. 4 (1987) 321-330 10. Haralick, R., Shanmugan, K., Dinstein, I.: Textural Features for Image Classification. IEEE Trans. Systems, Man and Cybernetics, vol. SMC-3, no. 6 (1973) 610-621

Inherit-Based Adaptive Frame Selection for Fast Multi-frame Motion Estimation in H.264 Liangbao Jiao1, 3, De Zhang1, 2, and Houjie Bi2 1

Institute of Acoustics, State Key Lab of Modern Acoustics, Nanjing University, 210093 2 Institute of Communication Technique, Nanjing University, 210093 3 Nanjing Institute of Technology, 210000 [emailprotected]

Abstract. H.264 allows motion estimation performing on multiple reference frames and seven modes. This new feature improves the prediction accuracy of inter-coding blocks significantly, but it is extremely computational intensive because the complexity of multi-frame motion estimation is quickly increased with the number of used reference frames. Moreover, the distortion gain caused by each reference frame in various modes is correlated, therefore it is not efficient to scan all the candidate frames in all seven modes. In this paper, a novel inherit-based adaptive frame selection method is proposed to reduce the complexity of the multi-frame motion estimation process. A new reference list for ME (Motion Estimation) of low level mode is constructed adaptively according to the results of ME of up level mode. Simulation results show that the proposed method can save about 15% to 50% computations while getting almost the same Rate-Distortion performance as the full scan.

1 Introduction H.264/MPEG-4 AVC [1] is the latest video coding standard developed by the Joint Video Team (JVT). One of its significant advantages is high compression efficiency. It could save half of the bit-rate compared with the H.263 [2]. The significant improvement of compression efficiency in H.264 is achieved as the cost of the increasing of the computation and complexity. Because of utilizing a lot of interframe coding, up to 80% computational power of an encoder is consumed by ME [3]. To reduce the computation, blocking matching algorithm (BMA) is generally adopted in ME. New three-step search [4], four-step search [5], diamond search [6], and crossdiamond search [7] are some of the fast BMAs. In H.264, motion estimation is allowed searching on multiple reference frames to further reduce the temporal redundancy. Certainly, it adds the computational load due to motion estimation with the number of reference frames increasing, and the cost of motion estimation will dominate the complexity of the video codec. This is absolutely a challenge for mobile computing devices which limits the computing power. Therefore, the strategy of faster motion estimation is an eager requirement in H.264. Early works [8] [9] on reference frame selection is aimed at optimizing streaming quality. a novel frame selection method is proposed in [10] to speed up the D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 938 – 944, 2006. © Springer-Verlag Berlin Heidelberg 2006

Inherit-Based Adaptive Frame Selection for Fast Multi-frame ME in H.264

939

multi-frame motion estimation in H.264. Based on the center-biased MVP distribution characteristic of real world sequences, a center-biased frame selection path is applied to efficiently locate an ultimate frame in this method. From the simulation in [10], it can be seen that the computation cost is significantly reduced. At the same time, the rate distortion performance may be drop down because of the skipping of a lot searching area. However, the main focus in this paper is the solution of the linear increasing of the ME computation cost with the reference frame number. That is to say, decreasing the computation cost while maintaining the RD performance, even in the worst case. In this paper, we present a simple and effective method to reduce the computational cost without significant quality degradation in H.264. Except ME of 16X16 mode, only part of reference frames in other modes are scanned. The new reference frame list is constructed adaptively according to the results of up level mode ME. In Section 2, we will analyze the distribution of the lowest ME cost among the reference frames, and the ME cost correlation between up and low level modes. The results encourage the formation of the proposed method. In Section 3, inherit-based adaptive frame selection algorithm will be described in detail. The simulation results will be shown in Section 4. Finally, a conclusion is given.

2 Analysis and Observations In the JM (JVT Model) software, the RD optimizing in ME is to find that the lowest ME cost can be acquired when encoding a macroblock. Thus ME may be computed in seven inter-frame modes (i.e. 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4 mode) and intra-frame mode on all reference frames. To each macroblock, total 16×7=112 times inter-frame ME should be done if 16 reference frames are used; it is a big computation cost. Therefore it is worth to analyze the cost caused by each reference frame and the cost’s relationship in ME between MB (macroblock) and block of different modes at corresponding location. Generally, the temporal redundancy between two frames will increase with the time closing between the frames. That is to say, in the candidate reference frames, the three closest frames to current frame have the bigger probability to have the least ME cost than the latter frames. The probability to have the least ME cost among different reference frames is shown in table 1, when six QCIF(176X144) sequences is coded by all seven modes with the reference frame number set to 16. The sequences

Table 1. The probability distribution of the least ME cost Reference frame Pos 1st(%) 2nd(%) 3rd(%) Total(%)

940

L. Jiao, D. Zhang, and H. Bi Table 2. The accordant probability between correlative code modes Up level mode

Low level mode

Test Sequence 1

16X16

16X8

67.13

75.26

80.86

85.19

64.59

77.59

16X16

8X16

65.32

73.12

80.49

82.91

63.24

78.61

16X16

8X8

54.24

67.45

77.67

80.04

51.74

72.9

8X8

8X4

83.7

83.98

89.34

88.29

81.37

90.09

8X8

4X8

77.78

83.58

91.85

88.59

78.47

89.76

8X8

4X4 Average

74.15

80.51

90.41

86.73

74.12

87.91

70.39

77.32

85.10

85.29

68.92

82.81

are Forman, Mother&Daughter, Hall_Monitor, News, Carphone and Container, which will be represented by number (1,2,3,4,5,6) in the tables of this paper respectively. From table 1, it can be seen that the average probability of latest three reference frames to have the least ME cost is more than 75%. Secondly, in motion estimation of MB using JM software, the order of ME process is that 16×16 mode is scanned first, then 16×8, 8×16 mode, finally 8×8 mode and its sub-modes. In general, the processed object in low level mode is part of that in up level mode, therefore the result of ME on up level modes could be the guidance of ME on low level modes. For example, the reference frame, which has the least cost in ME of 16×16 mode, may also have the least one in 8×8 mode. The accordant probability with lowest ME cost between correlative code modes is shown in table 2 where six QCIF sequences is coded with 16 reference frames. It shows that the accordant probability with lowest ME cost between 16×16 mode and 16×8 mode is 67.13% to the Foreman sequence, and the accordant probability is 89.34% to the Hall_Monitor sequence between 8×8 mode and 8×4 mode.

3 Proposed Scheme It is inefficient to design the H.264 encoder by all cost. For that reason, it is essential to design a fast algorithm in keeping the RD performance and reducing its complexity. On the basis of the analysis in Section 2, to each ME mode, enough accordant probability of the ME result will be guaranteed when only latest three reference frames is processed. In addition, the optimized reference frame list can be adaptively constructed according to the frames which have lower cost in up level mode ME. The two considerations above is the basis of the inherit-based adaptive frame selection algorithm in this paper. In the algorithm, a new optimized reference list is constructed according to the inheriting of each mode before ME. To 16×16 mode, all the candidate reference frames is used to ME, because the ME result is the base of the new reference list construction of all the other modes. To 16×8 and 8×16 mode, the new reference list (named Reflist_1 in algorithm) contains 1st, 2nd, and 3rd reference frames and the three frames with the least ME cost in 16×16 mode. To 8×8 mode, the

Inherit-Based Adaptive Frame Selection for Fast Multi-frame ME in H.264

A B 7

941

B 8

Fig. 1. 8×4 and 4×8 mode frame selection reference

Fig. 2. The ME Time using different Reference Frame Number to Mother&Daughter Sequence

new reference list contains not only all the frame in list Reflist_1, but also 4th and 5th reference frames, because the ME result of 8×8 mode should be used in the reference list construction of modes 8×4, 4×8 and 4×4. The reference list construction of mode 8×4 and 4×8 is some complicated. It contains the 1st, 2nd and 3rd reference frames, the frames which has least three cost in ME of 8×8 mode and the two frames with least cost of ME in 16×8(or 8×16) mode. The relation between 8×4(4×8) block and 16×8(8×16) block in reference frame selection is shown in figure 1, i.e. the reference list of block 1-4(8×4 mode or 4X8 mode) is constructed according to the ME result of block A(16×8 mode or 8×16 mode) and the reference list of block 5-8(8×4 mode or 4×8 mode) is constructed according to the ME result of block B(16×8 mode or 8×16 mode). To 4×4 mode, the frame 1st, 2nd, 3rd, and three frames with the least ME cost in 8×8 mode form the new reference list.

4 Simulation Results In the simulation, the above six sequences is used from frame 0 to 299 (total 300 frames). All P-Frame coding and Fast ME (UseFME=1) is adopted for convenience. The searching range is set to 16. To compare the computation and RD performance, the traditional JM8.6 and modified JM8.6 by suggested algorithm in this paper are both used to encode the sequences. In traditional real time or wireless applications, no more than 10(normally 5) reference frames is used because of the computation cost increasing quickly with reference frame increasing. Figure 2 shows that the ME-time vs. the number of reference frames in the case of the tradition and modified algorithm to Mother&Daughter sequence respectively. It is clearly to see that using improved JM8.6, the ME-time is increasing slowly as the reference frame number increasing. Therefore it could utilize more reference frames to ensure the RD with high qualities. For comparison in detail, the value of METime(the time used in ME) and the RD(bit rate and PSNR of luminance component) of six QCIF(176X144) are shown in table 3, Each value is averaged by QP(quantitative parameter) from 25 to 34. It can be seen in table 3 that comparing with the tradition JM8.6 with 5 reference frames, the modified JM8.6 with 10 reference frames saves the METime more than 10% and the RD performance is improved, i.e. PSNR is advance 0.03db and the bit rate is saved

942

L. Jiao, D. Zhang, and H. Bi Table 3. RD performance and ME Time of different coding methods Test sequence 1 2 3 4 5 6

Ave

Test seque nce 1 2 3 4 5 6

Ave

New Method with 10 reference frames METime SNRY BitRate(K @30HZ) (s) (db) 34.91 120.14 63.16 36.35 37.19 39.08 36.17 45.30 28.56 35.50 65.15 35.85 35.74 122.99 52.12 34.81 33.34 31.43 35.58 70.68 41.70 New Method with 16 reference frames SNRY METime BitRate(K @30HZ) (db) (s) 34.95 118.87 75.18 36.37 36.91 45.69 36.19 45.38 33.44 35.51 65.03 42.58 35.77 122.42 60.55 34.81 33.39 37.59 35.60 70.68 49.17

Traditional JM8.6 with 5 reference frames SNRY METime BitRate(K@ 30HZ) (db) (s) 34.87 121.25 68.17 36.31 37.65 44.20 36.16 45.27 35.36 35.49 65.25 41.78 35.68 124.06 59.39 34.78 34.13 37.97 35.55 71.27 47.81 Traditional JM8.6 with 10 reference frames SNRY METime BitRate(K@ 30HZ) (db) (s) 34.95 119.75 146.07 36.36 37.10 89.94 36.17 45.27 67.22 35.51 65.03 83.71 35.78 122.64 139.10 34.82 33.23 70.85 35.60 70.50 99.48

1% in average value of six QCIF. Figure 3 is the RD curves of traditional JM8.6 with 5 reference frames & Improved JM8.6 with 10 reference frames according to Carphone sequence. It could be seen from Fig.3 that the RD performance of the algorithm in this paper is better than that of traditional algorithm.

Fig. 3. The RD curves of traditional JM8.6 with 5 reference frames & Improved JM8.6 with 10 reference frames according to Carphone Sequence

Fig. 4. The RD curves of traditional JM8.6 with 10 reference frames & Improved JM8.6 with 16 reference frames according to Mother&Daughter sequence

As one known, the RD performance could be enhanced by more reference frames, for example, using 10 reference frames in tradition JM8.6. The simulation results are shown in table 3 as well. However the METime is nearly double time of that with 5 reference frames. For keeping the RD performance, 16 reference frames should be used in modified JM8.6, however the METime is only 50% of that with 10 reference

Inherit-Based Adaptive Frame Selection for Fast Multi-frame ME in H.264

943

frames in the tradition JM 8.6. Figure 4 shows the RD curves of traditional JM8.6 with 10 reference frames & Improved JM8.6 with 16 reference frames according to Mother&Daughter sequence, which demonstrates that the RD performance of modified JM8.6 with 16 reference frames are almost the same as the traditional JM8.6 with 10 reference frames. It should be noted that the new algorithm will consume more memory space. However for mobile equipment, the significant decrease of computation is more worthy than the increase of memory. At the same time, according to the motion estimation, the computation cost of the sorting is so low that can be negligible.

5 Conclusion In this paper, a novel reference frame selection method is proposed to speed up the multi-frame motion estimation in H.264. Based on the distribution of ME cost in different reference frames and the inheriting of least cost of reference frame among the seven ME mode, a adaptive reference frame selection method is adopted to construct the new reference list to each ME mode, which saving a lot ME time. Simulation testify that, when multi-frame reference is used (i.e. 16), more than 50% ME time is saved while the RD performance is retained. The new algorithm has also solved the problem that the ME computation cost is quickly increasing when reference frame number increases. The proposed algorithm is highly suitable for realtime video-conferencing applications and mobile equipment.

Acknowledgments This paper is supported by National Nature Science Foundation of China (Grant No. 10234060).

References 1. Joint Video Team of ITU-T and ISO/IEC JTC 1: Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-G050 (2003) 2. Girod, B., Flierl, M.: Multi-Frame Motion-Compensated Video Compression for the Digital Set-Top Box, Proc. IEEE ICIP (2002) 3. Pirsch, P., Demassieux, N., Gehrke W.: VLSI Architectures for Video Compression – A Survey, Proc. IEEE. 83 (1995) 220-246 4. Li, R., Zeng, B., Liou, M.L.: A New Three-Step Search Algorithm for Block Motion Estimation, IEEE Trans. Circuits System Video Technology. 4 (1994) 438-443 5. Po, L.M, Ma, W.C.: A Novel Four-Step Search Algorithm for Fast Block Motion Estimation, IEEE Trans. Circuits System Video Technology. 6 (1996) 313-317 6. Tham, J.Y., Ranganath, S., Ranganath, M., Kassim, A.A.: A Novel Unrestricted CenterBiased Diamond Search Algorithm for Block Motion Estimation, IEEE Trans, Circuits System Video Technology. 8 (1998) 369- 377 7. Cheung, C.H., Po, L.M.: A Novel Small-Cross Diamond Search Algorithm for Fast Video Coding and Video Conferencing Applications, Proc. IEEE ICIP (2002)

944

L. Jiao, D. Zhang, and H. Bi

8. Wiegand T., Farber, N., Girod, B.: Error-Resilient Video Transmission Using Long-Term Memory Motion-Compensated Prediction, IEEE J. Select. Areas. Comm. 18 (2002) 1050– 1062 9. Liang, Y., Flieri, M., Girod, B.: Low-Latency Video Transmission over Lossy Packet Networks Using Rate-Distortion Optimized Reference Picture Selection, IEEE International Conference on Image Processing, Rochester. NY(2002) 10. Ting, C.W., Po, L.M., Cheung, C.H.: Center-Biased Frame Selection Algorithms for Fast Multi-Frame Motion Estimation in H.264, Proceeding of 2003 IEEE International Conference on Neural Networks and Signal Processing, Nanjing, China. (2003)1258-1261

Intelligent Analysis of Anatomical Shape Using Multi-sensory Interface Jeong-Sik Kim, Hyun-Joong Kim, and Soo-Mi Choi School of Computer Engineering, Sejong University, Seoul, Korea [emailprotected]

Abstract. This paper presents a method for intelligent shape analysis of the hippocampus in a human brain using multi-sensory interface. To analyze the shape difference between two groups of the hippocampus, initially we extract quantitative shape features from input images, and then perform statistical shape analysis using parametric representation and Support Vector Machines (SVMs) learning algorithm. Results suggest that the presented shape representation and a polynomial kernel based SVMs algorithm can effectively discriminate between normal controls and epilepsy patients. To provide a more immersive and realistic environment in analysis, we combined a stereoscopic display and a 6-DOF force-feedback haptic device. The presented multi-sensory environment improves space and depth perception, and provides users with sense of touch feedback while making it easier to manipulate 3D objects.

1 Introduction Typically, image-based statistical studies of morphology were based on simple measurements of size, area and volume. Shape-based intelligent analysis can provide much more detailed descriptions of morphological changes and can minimize an expert’s interference. Thus, users with insufficient knowledge of anatomy can easily understand the morphological changes when comparing patients vs. normal controls. For instance, it is known that an abnormal shape of the hippocampus involves with neurological diseases such as epilepsy, schizophrenia, and Alzheimer’s disease. In order to estimate shape deformation of the hippocampus by computer, it is essential to select an efficient shape representation scheme. Then, a powerful classifier is used to discriminate a patient group from the normal one. It is difficult for a user to feel real spatial and haptic (touch of sense) effect because anatomical structures in virtual scene are represented visually in 2D or 2.5D. For a long time, a haptic modality was considered inferior to visual sensory in terms of perceptual accuracy [1]. “Co-location” is a term used to describe a haptic and a visual display which is defined a same co-ordinate system. Although results would seem to suggest that a co-located display offers no significant advantage to that of a traditional 2D mouse interface held to one side of the body in a translational positioning task, colocation of the hand and virtual workspace improved performance in tasks involving object rotation [2]. Therefore, multi-sensory interface is very useful for interactive medical applications. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 945 – 950, 2006. © Springer-Verlag Berlin Heidelberg 2006

946

J.-S. Kim, H.-J. Kim, and S.-M. Choi

In our work, we develop a method for intelligent shape analysis using parametric representation and a SVM algorithm. For better understanding and improved manipulation of the anatomical structure, we construct and experiment multi-sensory virtual environment using a haptic device and a stereoscopic display.

2 Related Work Intelligent shape analysis based on the statistical model has been used to diagnose and treat for the diseases of the 3D human organs extracted from medical imaging data set. Zhu [3] introduced a parametric modeling method for the lateral ventricle in the statistical shape analysis. And Styner [4] proposed an approach for applying SPHARM (spherical harmonics) representation into the 3D shape analysis. PCA (Principle Components Analysis) is the most commonly used algorithm for separating two groups. PCA reduces the space dimensionality of the shape representation and can be used for the binary classification problem. But it has a limitation for constructing an efficient maximum likelihood classifier because of its small size of samples. In recent years, an artificial neural network or a SVM based classifier have been used in the statistical shape analysis. In special, a SVM turns out to be a powerful classifier since it guarantees to converge to an optimal solution even for small set of training samples. Researches for estimating the performance of multi-sensory interface with haptic and visual information in virtual environment are generally focused on time and accuracy efficiency. Basdogan [5] introduced an auto-stereoscopic and a haptic visualization method for spatial exploration and task design. Specially, he comprised a multi modality environment that can touch and manipulate a virtual object. This method has advantage that use non-invasive auto-stereoscopic display as a substitute for the shuttered glasses based stereo display. But it has no significant comparison result for the performance of the co-located interface. Wall [6] introduced the effect of haptic feedback and stereo graphics in a 3D target acquisition task. The equipment consisted of a Reachin Developer Display with a PHANToM haptic feedback device, equipped with the instrumented stylus. As a result, a haptic feedback improved subject’s accuracy, but did not improve the time taken to reach the target.

3 Intelligent Shape Analysis Using Multi-sensory Interface Generally, a SVM based shape analysis method consists of three main steps. First, we extract quantitative shape features from medical data set. This is used to create a generative model using model variation from sample data set, or is utilized to generate discriminative model helping a classification between two groups. We focus on the latter. Fig. 1 shows an overall procedure for 3D shape analysis. Initially, we make parametric models using a PDM (Point Distribution Modeling) method from mesh models set. And then we construct two average models representing each shape group statistically. Finally, we execute a classification task based on a SVM classifier. The procedure for creating a parametric model consists of five steps. First, we find the center of mass of the model and its principal axes of the surface points. Then we create an initial super-ellipsoid and triangulate it. Here, we use a single 3D blob

Intelligent Analysis of Anatomical Shape Using Multi-sensory Interface

947

element proposed as suggested in [7]. After the triangulation, we adopt the vertices of meshes to the FEM nodes to achieve the physics-based shape deformation. The mode shape vectors form an orthogonal object-centered coordinate system for describing feature locations. We transform these into nodal displacement vectors and iteratively compute the new position of the points on the deformable model using 3D Gaussian interpolation functions. We can get the final deformed positions if the energy value is below the threshold or the iteration count is over the threshold.

Fig. 1. Overall procedure for SVM based intelligent shape analysis

Once the feature vectors are extracted from the parametric model, they can be used to analyze the shape differences between populations, for example, normal controls and epilepsy patients in our case. In this section, we briefly describe our approach based on discriminative modeling method using SVM [8]. First, we train a classifier for labeling new examples into one of the two groups. Each training data set is composed of coordinates for the deformable meshes. Wethen extract an explicit description for the differences between two groups captured by the classifier. This method is to detect statistical differences between two populations. In order to acquire the optimal solution, it is important to select a good classifier function. SVM is known to be robust and free from the over-fitting problem. Given a training data set {(xk, yk), 1 k }, where xk are observations and yk are corresponding groups, and a kernel function, K : \ n × \ n 6 \ , the SVM classification function: n

yk (x) = ¦ k =1 ak yk K (x,x k ) + b.

(1)

Where the coefficients ak’s and b are determined by solving a quadratic optimization problem that is constructed by maximizing the margin between the two classes. For the non linear classification, we employ the commonly used polynomial function K(x,xk)=(x·xk + 1)d (where the kernel K of two objects x and xk is the inner product of their vectors in the feature space, parameter d is the degree of polynomial). In order to estimate the accuracy of the resulting classifier and decide the optimal parameters in

948

J.-S. Kim, H.-J. Kim, and S.-M. Choi

the non-linear case, we use cross-validation. To obtain error, recall, and precision, we evaluate the performances for three types of SVM kernels (i.e. polynomial, RBF, sigmoid) and the linear case. Results are described in Section 4. In order to support an efficient user interaction in virtual environment, we design and implement a multi modality interface integrating stereo graphic display with a haptic feedback. Fig. 2 describes hardware and software setup using multi-sensory interface.

Fig. 2. Overview of Co-located interface: (left) software setup; (right) hardware setup

In our work, we use a haptic device to touch virtual objects and to feel materials of the object’s surface and also to manipulate the grasped object. We choose the pointbased haptic rendering technique. In point-based haptic rendering, the tip point of the end-effector like as HIP (Haptic Interface Point) is digitized via encoders and used to detect collisions with virtual objects. Finally, a reaction force is calculated based on the depth of penetration and reflected to a user through a haptic device. We used a PHANToM haptic device and OpenHaptics Library for calculating a force model. Consequently, a user can control a point object to touch an object using a stylus handle, and then touch and manipulate static scene object [9]. A separation distance of human eyes is about 65mm.Because of a binocular parallax originated by this distance, our brain perceives two different 2D images for a single object. And the brain builds a sense of perspective by fusing these images. Using this theory, we can simulate our optic system in a computer application. In our work, we set up stereoscopic hardware environment using a CRT monitor and special glasses synchronized by an infrared emitter with that monitor. Additionally we developed a software module for stereoscopic rendering using OpenGL library. First, we create views for the left and right eye. Then, we deal with how each rendering is displayed to a user to create the desired stereoscopic effect. To generate two slightly different view frustums of the same image, we utilize two perspective cameras, one for each eye. In this part, main issue is to determine the method for setting these two different frustums using binocular parallax. A “toe-in” method creates a viewing frustum that leads each eye toward to one focus. This method is not good solution for accurate perspective because of mismatch of two frustums. So we use a “asymmetric frustum perspective projection” method, which two frustum generated from eyes are

Intelligent Analysis of Anatomical Shape Using Multi-sensory Interface

949

asymmetric. Finally we can display stereo images as we render two different images to each hardware buffer using final frustums [10, 11].

4 Experimental Results In our experiment, we estimate the performance of a SVM based classifier and of a parametric modeling method, and also investigate the effect of the multi-sensory interface consisting of a haptic device and a stereo rendering display. Initially, we collected two template 3D models (normal controls and epileptic patients) from the real MRI. And we also generated 80 deformed models using a modeling tool in order to estimate the capacity of our deformable modeling method.

Fig. 3. The result of the training test using SVM for four types of kernels

Fig. 4. Shape analysis using multi-sensory interface: (left) screenshot of the shape analysis for hippocampuses; (right) result of stereo rendering

We used 3D parametric and deformable meshes as shape features. In order to implement the non-linear classifier (SVMs), we tested three types of kernels; RBF, polynomial, sigmoid functions. And we adopted the cross validation (CV) technique in order to overcome the problem of small training sets. In our experiment, we tested four conditions: 1) sequential without CV, 2) sequential with CV, 3) randomized

950

J.-S. Kim, H.-J. Kim, and S.-M. Choi

without CV, 4) randomized with CV. As a result, we acquired that a polynomial function shows the best performance (Fig. 3). In our experiment, we used the multi-sensory interface to control a camera viewing and virtual objects (3D shape and Octree). In order to validate quantitative result of the shape difference, a user wears the stereo glasses and grasps a haptic stylus handle, and then one can change a camera view-point and manipulate the object (translating, rotating), and also pick and select Octree sub-space using a haptic device with a stereo visual cues. Fig. 4 shows the result of the shape analysis using our interface.

5 Conclusion In this paper, we presented a framework of intelligent 3D shape analysis based on a SVM classifier and shape differences between normal and epilepsy patient group. And we also set up and investigated the effect of the multi-sensory interface to analyze qualitative and quantitative results of the shape analysis using a haptic device and a stereo display. Our parametric modeling method is effective to construct the statistical model from the 3D model data. And a SVM based classifier applying polynomial kernel shows good performance to discriminate two groups. Multisensory interface using a haptic feedback and stereo visual cues provides immersive and perspective feelings for exploring the shape, so a user can explore and manipulate objects in virtual environment intuitively.

Acknowledgments This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2005-205-D00105).

References 1. Loftin, R.B.: Multisensory Perception: Beyond the Visual in Visualization. Computing in Science & Eng., Vol. 5 (2003) 56-58 2. Wall, S.A., Harwin, W.S.: Quantification of the Effects of Haptic Feedback during a Motor Skills Task in a Simulated Environment. In Proc. of the 2nd PHANToM Users Research Symposium, (2000) 61-69 3. Zhu, L. Jiang, T.: Parameterization of 3D Brain Structures for Statistical Shape Analysis. In Proc. of SPIE Medical Imaging, Vol. 5370 (2004) 1254-1262 4. Styner, M., Gerig, G.: Statistical Shape Analysis of Neuro-anatomical Structures based on Medical Models. Medical Image Analysis, Vol. 7 (2003) 207-220 5. Basdogan, C., et.al.: Autostereoscopic and Haptic Visualization for Space Exploration and Mission Design. IEEE Virtual Reality Conference, (2002) 271-276. 6. Wall, S.A., et.al.: The Effect of Haptic Feedback and Stereo Graphics in a 3D Target Acquisition Task. Proc. of Eurohaptics, (2002) 23-29 7. Choi, S.M., et.al.: Shape Reconstruction from Partially Missing Data in Modal Space. Computer & Graphics, Vol.26 (2002) 701-708

Modeling Expressive Music Performance in Bassoon Audio Recordings Rafael Ramirez, Emilia Gomez, Veronica Vicente, Montserrat Puiggros, Amaury Hazan, and Esteban Maestre Music Technology Group Pompeu Fabra University Ocata 1, 08003 Barcelona, Spain Tel:+34 935422165, Fax:+34 935422202 {rafael,vicente,puiggross,hazan,maestre,gomez}@iua.upf.es

Abstract. In this paper, we describe an approach to inducing an expressive music performance model from a set of audio recordings of XVIII century bassoon pieces. We use a melodic transcription system which extracts a set of acoustic features from the recordings producing a melodic representation of the expressive performance played by the musician. We apply a machine learning techniques to this representation in order to induce a model of expressive performance. We use the model for both understanding and generating expressive music performances.

Introduction

Expressive performance is an important issue in music which has been studied from diﬀerent perspectives (e.g. [2]). The main approaches to empirically study expressive performance have been based on statistical analysis (e.g. [11]), mathematical modelling (e.g. [13]), and analysis-by-synthesis (e.g. [1]). In all these approaches, it is a person who is responsible for devising a theory or mathematical model which captures diﬀerent aspects of musical expressive performance. The theory or model is later tested on real performance data in order to determine its accuracy. In this paper we describe an approach to investigate musical expressive performance based on machine learning [7]. Instead of manually modelling expressive performance and testing the model on real musical data, we let a computer use an inductive logic programming algorithm to automatically discover regularities and performance principles from real performance data (i.e. bassoon audio performances). The rest of the paper is organized as follows: Section 2 describes how the acoustic features are extracted from the monophonic recordings. In Section 3 our approach for learning rules of expressive music performance is described. Section 4 reports on related work, and ﬁnally Section 5 presents some conclusions and indicates some areas of future research. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 951–957, 2006. c Springer-Verlag Berlin Heidelberg 2006

952

R. Ramirez et al.

Melodic Description

In order to obtain a symbolic description of the expressive audio recordings we compute descriptors related to two diﬀerent temporal scopes: some of them related to an analysis frame, and some other features related to a note segment. Firstly, we deivide the audio signal into analysis frames, and a set of low-level descriptors are computed for each analysis frame. Then, we perform a note segmentation using low-level descriptor values. Once the note boundaries are known, the note descriptors are computed from the low-level and the fundamental frequency values. The main low-level descriptors we use to characterize expressive performance are instantaneous energy and fundamental frequency. Energy is computed on the spectral domain, using the values of the amplitude spectrum. For the estimation of the instantaneous fundamental frequency we use a harmonic matching model, the Two-Way Mismatch procedure (TWM) [5]. First of all, we perform a spectral analysis of a portion of sound, called analysis frame. Secondly, the prominent spectral peaks of the spectrum are detected from the spectrum magnitude. These spectral peaks of the spectrum are deﬁned as the local maxima of the spectrum which magnitude is greater than a threshold. These spectral peaks are compared to a harmonic series and an TWM error is computed for each fundamental frequency candidates. The candidate with the minimum error is chosen to be the fundamental frequency estimate. Note segmentation is performed using a set of frame descriptors, which are energy computation in diﬀerent frequency bands and fundamental frequency. Energy onsets are ﬁrst detected following a band-wise algorithm that uses some psycho-acoustical knowledge [3]. In a second step, fundamental frequency transitions are also detected. Finally, both results are merged to obtain the note boundaries. We compute note descriptors using the note boundaries and the low-level descriptors values. The low-level descriptors associated to a note segment are computed by averaging the frame values within this note segment. Pitch histograms have been used to compute the pitch note and the fundamental frequency that represents each note segment, as found in [6]. This is done to avoid taking into account mistaken frames in the fundamental frequency mean computation.

Learning the Expressive Performance Model

In this section, we describe our inductive approach for learning an expressive performance model from audio performances of bassoon pieces. Our aim is to ﬁnd note-level rules which predict, for a signiﬁcant number of cases, how a particular note in a particular context should be played (e.g. longer than its nominal duration). We are aware of the fact that not all the expressive transformations regarding tempo (or any other aspect) performed by a musician can be predicted at a local note level. Musicians perform music considering a number of abstract structures (e.g. musical phrases) which makes of expressive performance a multilevel phenomenon. In this context, our ultimate aim is to obtain an integrated

Modeling Expressive Music Performance in Bassoon Audio Recordings

953

model of expressive performance which combines note-level rules with structurelevel rules. Thus, the work presented in this paper may be seen as a starting point towards this ultimate aim. The training data used in our experimental investigations are monophonic audio recordings of XVIII century bassoon pieces performed by a professional musician. Each piece has been recorded at 3 diﬀerent tempos: for pieces indicated as adagio the recorded tempos are 50, 60, 100 ppm, for pieces indicated as allegro moderato and aﬀectuoso the recorded tempos are 60, 92, 120 ppm. In this paper, we are concerned with expressive transformations of note duration, onset, energy and trills. The note-level performance classes which interest us are: lengthen, samedur and shorten for note duration, advance, ontime and delay for note onset, louder, medium and softer for note energy, and few, average and many for a trilled note. A note is considered to belong to class lengthen if its performed duration is 20% or more longer that its nominal duration, e.g. its duration according to the score. Class shorten is deﬁned analogously. A note is considered to be in class advance if its performed onset is 5% of a bar earlier (or more) than its nominal onset. Class delay is deﬁned analogously. A note is considered to be in class louder if it is played louder than its predecesor and louder then the average level of the piece. Class softer is deﬁned analogously. Finally, a note is considered to be in class few, average or many if the number of trills is less tan 4, between 5 and 9, or more than 10, respectively. For synthesizing trills, we apply a nearest neighbor algorithm which selects the most similar trill (in terms of musical context) in the training examples and adapts it to the new musical context (e.g. the key of the piece). Each note in the training data is annotated with its corresponding class and a number of attributes representing both properties of the note itself and some aspects of the local context in which the note appears. Information about intrinsic properties of the note includes the note’s duration, pitch and metrical position, while information about its context includes the duration of previous and following notes, extension and direction of the intervals between the note and both the previous and the subsequent note, the note Narmour groups [8], and tempo of the performance. Using this data, we apply a greedy set covering algorithm in order to induce an expressive performance model. We obtain an ordered set of ﬁrst-order rules each of which chharacterises a subset of the training data. We deﬁne four predicates to be learned: duration/4, onset/4, energy/4, and trills/4. For each note of our training set, each predicate corresponds to a particular type of transformation: duration/4 refers to duration transformation, onset/4 to onset deviation, energy/4 to energy transformation, and alteration/4 refers to note alteration. For each predicate we use the complete training set and consider a background knowledge containing the note’s local information (context/6 predicate) and the Narmour structures (narmour/2 predicate), as well as predicates for specifying an arbitrary-size context (i.e. any number of successors and predecessors) of a note (succ/2 predicate), and auxiliary predicates (e.g. member/3). Once we obtain a set of rules for a particular concept, e.g. duration, we collect the ex-

954

R. Ramirez et al.

amples correctly covered by each rule and apply a linear regression on the their numerical values. The numerical values of the covered examples are aproximated by a linear regerssion in the same way as a model tree approximates examples at its leaves. The diﬀerence with a model tree is that the induced rules do not form a tree as it is the case in model trees. The algorithm is as follows: SEQ-COVERING(Target_attribute,Attributes,Examples,Threshold) Learned_classification_rules := {} Learned_regression_rules := {} Rule := LEARN-ONE-RULE(Target_attribute, Attributes, Examples) while PERFORMANCE(Rule, Examples) > Threshold do Learned_classification_rules := Learned_classification_rules + Rule Examples := Examples - {examples correctly classified by Rule} Rule := LEARN-ONE-RULE(Target_attribute, Attributes, Examples) For each Rule in Learned_classification_rules do collect correctly covered examples by the Rule approximate the examples’ numerical value by linear regression LR Construct Rule_1 as: body(Rule_1) := body(Rule) head(Rule_1) := LR Learned_regression_rules := Learned_regression_rules + Rule_1 Return Learned_regression_rules

SEQ-COVERING learns rules until it can no longer learn a rule whose performance is above the given Threshold. The LEARN-ONE-RULE subroutine generates one rule by performing a general-to-speciﬁc search through the space of possible rules in search of a rule with high accuracy. It organises the hypothesis space search in the same general fashion as the CN2 algorithm mantaining a list of k best candidates at each step. In order to handle three classes (e.g. in the case of note duration, lengthen, shorten and same) we have forced the LEARN-ONE-RULE subroutine to learn rules that cover positive examples of one class only. Initially, it learns rules that cover positive examples of one of the classes (e.g. lengthen) and considers the examples of the other two classes (e.g. shorten and same) as negative examples. Once the rules for the ﬁrst class have been learned, LEARN-ONE-RULE learns rules that cover only positive examples of a second class (e.g. shorten) in the same way it did for the ﬁrst class, and similarly for the third class. The PERFORMANCE procedure computes the function tpα /(tp + f p) where tp is the number of true positives, f p is the number of false positives and α is a parameter which provides a trade-oﬀ between the rule’s accuracy and coverage. For each type of rule, depending on the exact number of positive examples, we tuned both the parameter α and the Threshold to constrain the minimum number of positive examples as well as the ratio of positive and negative examples covered by the rule. This is, using α and Threshold we restrict the area in the coverage space 1 in which the induced rules must lie. Inductive logic programming has proved to be an extremely well suited technique for learning expressive performance rules. This is mainly due to three reasons: Firstly, inductive logic programming allows the induction of ﬁrst order logic rules. First order logic rules are substantially more expressive than the traditional propositional rules used in most rule learning algorithms (e.g. the widely used C4.5 algorithm [9]) which allows specifying musical knowledge in a more 1

Coverage spaces are ROC spaces based on absolute numbers of covered examples.

Modeling Expressive Music Performance in Bassoon Audio Recordings

955

natural manner. Secondly, Inductive logic programming allows considering an arbitrary-size note context without explicitly deﬁning extra attributes. Finally, the possibility of introducing background knowledge into the learning task provides great advantages in learning musical concepts where often there is a great amount of available background information (i.e. music theory knowledge). Synthesis Tool. We have implemented a tool which transforms an inexpressive melody input into an expressive one following the induced model tree. The tool can either generate an expressive MIDI performance from an inexpressive MIDI description of a melody, or generate an expressive audio ﬁle from an inexpressive audio ﬁle.

Related Work

Widmer [14,15] reported on the task of discovering general rules of expressive classical piano performance from real performance data via inductive machine learning. The performance data used for the study are MIDI recordings of 13 piano sonatas by W.A. Mozart performed by a skilled pianist. In addition to these data, the music score was also coded. The resulting substantial data consists of information about the nominal note onsets, duration, metrical information and annotations. When trained on the data an inductive rule learning algorithm discovered a small set of quite simple classiﬁcation rules [14] that predict a large number of the note-level choices of the pianist. Tobudic et al. [12] describe a relational instance-based approach to the problem of learning to apply expressive tempo and dynamics variations to a piece of classical music, at diﬀerent levels of the phrase hierarchy. The diﬀerent phrases of a piece and the relations among them are represented in ﬁrst-order logic. The description of the musical scores through predicates (e.g. contains(ph1,ph2)) provides the background knowledge. The training examples are encoded by another predicate whose arguments encode information about the way the phrase was played by the musician. Their learning algorithm recognizes similar phrases from the training set and applies their expressive patterns to a new piece. Ramirez [10] et al report on a system capable of generating audio expressive saxophone performances of Jazz standards. The system is based on a similar approach to the one presented here, where diﬀerent acoustic features of real saxophone Jazz performances are extracted and used to induce an expressive performance model. Lopez de Mantaras et al report on SaxEx [4], a performance system capable of generating expressive solo performances in jazz. Their system is based on casebased reasoning, a type of analogical reasoning where problems are solved by reusing the solutions of similar, previously solved problems. In order to generate expressive solo performances, the case-based reasoning system retrieve, from a memory containing expressive interpretations, those notes that are similar to the input inexpressive notes. The case memory contains information about metrical strength, note duration, and so on, and uses this information to retrieve the appropriate notes.

956

R. Ramirez et al.

Conclusion

This paper describes an inductive logic programming approach for learning an expressive performance model from recordings of XVIII century bassoon pieces by a professional musician. With this aim, we have extracted a set of acoustic features from the recordings resulting in a symbolic representation of the performed pieces and then applied a rule-based algorithm to the symbolic data and information about the context in which the data appeared. In this context, the algorithm has proved to be an extremely well suited technique for learning an expressive performance model. It naturally allows background knowledge (i.e. musical theory knowledge) to play an important role in the learning process, and permits considering an arbitrary-size note context without explicitly deﬁning extra attributes for each context extension. Currently, we are in the process of increasing the amount of training data as well as experiment with diﬀerent information encoded in it. Increasing the training data, extending the information in it and combining it with background musical knowledge will certainly generate a more complete set of rules. Acknowledgments. This work is supported by the Spanish TIC project ProMusic (TIC 2003-07776-C02-01).

References 1. Friberg, A.: A Quantitative Rule System for Musical Performance. PhD Thesis, KTH, Sweden,(1995) 2. Gabrielsson, A. The Performance of Music. In D.Deutsch (Ed.), The Psychology of Music (2nd ed.) Academic Press.(1999) few, average or many 3. Klapuri, A.: Sound Onset Detection by Applying Psychoacoustic Knowledge, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP.(1999) 4. Lopez de Mantaras, R. and Arcos, J.L. AI and music, from composition to expressive performance, AI Magazine, 23-3.(2002) 5. Maher, R.C. and Beauchamp, J.W. Fundamental frequency estimation of musical signals using a two-way mismatch procedure, Journal of the Acoustic Society of America, vol. 95(1994)2254-2263 6. McNab, R.J., Smith Ll. A. and Witten I.H., Signal Processing for Melody Transcription,SIG working paper, vol. (1996)95-22 7. Mitchell, T.M.: Machine Learning. McGraw-Hill.(1997) 8. Narmour, E.: The Analysis and Cognition of Basic Melodic Structures: The Implication Realization Model. University of Chicago Press.(1990) 9. Quinlan, J.R. C4.5: Programs for Machine Learning, San Francisco, Morgan Kaufmann. (1993) 10. Ramirez, R. Hazan, A. G´omez, E. Maestre, E.: A Machine Learning Approach to Expressive Performance in Jazz Standards MDM/KDD’04, Seattle, WA, USA.(2004) 11. Repp, B.H.: Diversity and Commonality in Music Performance: an Analysis of Timing Microstructure in Schumann’s ‘Traumerei’. Journal of the Acoustical Society of America 104.(1992)

Modeling Expressive Music Performance in Bassoon Audio Recordings

957

12. Tobudic A., Widmer G.: Relational IBL in Music with a New Structural Similarity Measure, Proceedings of the International Conference on Inductive Logic Programming, Springer Verlag.(2003) 13. Todd, N.: The Dynamics of Dynamics: a Model of Musical Expression. Journal of the Acoustical Society of America 91.(1992) 14. Widmer, G. Machine Discoveries: A Few Simple, Robust Local Expression Principles. Journal of New Music Research 31(1), (2002)37-50 15. Widmer, G.: In Search of the Horowitz Factor: Interim Report on a Musical Discovery Project. Invited paper. In Proceedings of the 5th International Conference on Discovery Science (DS’02), Lbeck, Germany. Berlin: Springer-Verlag.( 2002)

Modeling MPEG-4 VBR Video Traffic by Using ANFIS Zhijun Fang, Shenghua Xu, Changxuan Wan, Zhengyou Wang, Shiqian Wu, and Weiming Zeng School of Information Technology, Jiangxi University of Finance & Economics Nanchang, Jiangxi 330013, China [emailprotected],[emailprotected],[emailprotected], [emailprotected],[emailprotected], [emailprotected]

Abstract. Video traffic predicting and modeling are very important for compressed video transmission. The traditional method delineates the process by a rigid model with several parameters, which are difficult to estimate. In this paper, the MPEG-4 VBR (Variable Bit Rate) video traffic is modeled by the ANFIS (Adaptive Neuro-Fuzzy Inference System). Then, it is applied to modeling and predicting the MPEG-4 VBR video traffic. Simulations show the GoP (Group of Pictures) loss probabilities in actual video traffic are very close to those in ANFIS modeling traffic at the same experimental conditions and the prediction errors (1/SNR) are very small.

1 Introduction Nowadays, video applications, such as videophone, real time videoconference, and streaming stored video have become major components of broadband multimedia services. In the year 2000, MPEG-4 (Moving Picture Experts Group) became an international standard as it is a digital multimedia standard with associated protocols for representing, manipulating and transporting natural and synthetic multimedia content over a very broad range of communication infrastructures. It is also an object-based compression and streaming standard where a scene can be composed of a set of semantically meaningful objects (i.e. audio, video objects). Compared to the conventional frame-based coding techniques, MPEG-1 or MPEG-2, for example, the objectbased code and representation of the audio and video information enable MPEG-4 to cover a very wide scope of emerging and future applications [1]. However, the Quality of Service (QoS) for transporting video is frequently inconsistent and unpredictable since the Internet provides only a best-effort service. Therefore traffic modeling and predicting is a key solution to offer good QoS. Conventional traffic modeling and predicting schemes use a model-and-parameter approach to provide QoS guarantees while maintaining a high utilization of network resources. However, the application of this type of approach to MPEG-4 VBR video services involves several problems [2]. First, it is well known that modeling and predicting D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 958 – 963, 2006. © Springer-Verlag Berlin Heidelberg 2006

Modeling MPEG-4 VBR Video Traffic by Using ANFIS

959

MPEG-4 video traffic with only a few parameters is very difficult due to its complex traffic characteristics. Second, the high burstiness of the MPEG-4 VBR video traffic causes large queues, delays, and excessive cell losses. Third, characterizing the input video traffic prior to the call setup is only possible for video applications that use prerecorded streams, such as video on demand [2]. In this paper, adaptive neuro-fuzzy inference system (ANFIS) [3] is presented to model and predict the MPEG-4 VBR video traffic without any predetermined parameters. It is implemented to test the existence of package lost in different rates of bandwidth usage. Simulation results show that the group of pictures (GoP) loss probabilities in ANFIS modeling traffic are more approximate to those in actual traffic and the prediction errors (1/ SNR) in different video sequences (Silence of The Lambs, and Alpin Ski, Jurassic Park (I)) are very small. Consequently, this method is promising to be applied to MPEG-4 VBR video traffic modeling, prediction and resource reservation. After introducing the ANFIS arithmetic in Section 2, the experimental results are presented in Section 3, and the conclusions are drawn in Section 4.

2 ANFIS Arithmetic For simplicity, it is assumed that the fuzzy inference system under consideration has two inputs, x and y , and one output, z . Suppose that the rule base contains two fuzzy if-then rules of Takagi and Sugeno’s type [3], which are: Rule 1. If x is A1 and y is B1 , then

f1 = p1 x + q1 y + r1 . Rule 2. If

(1)

x is A2 and y is B2 , then f 2 = p2 x + q2 y + r2 .

If the membership functions of fuzzy set

(2)

Ai , Bi , i=1,2, are represented by μ A , μ B , i

and we choose product for T-norm (logical and) [4] in evaluating the rules, i.e.,: 1) Evaluating the rule premises results in

wi = μ Ai ( x ) μ Bi ( y ), i = 1, 2.

(3)

2) Evaluating the implication and the rule consequences yields

f ( x, y ) =

w1 ( x, y ) f1 ( x, y ) + w2 ( x, y ) f 2 ( x, y ) . w1 ( x, y ) + w2 ( x, y )

(4)

or, more simplely:

f =

w1 f1 + w2 f 2 . w1 + w2

(5)

960

Z. Fang et al.

This can be separated into phases by first defining

wi =

wi . w1 + w2

(6)

hence f can be written as

f = w1 f1 + w2 f 2 .

(7)

The structure of the ANFIS is shown in Figure 1 [4].

Fig. 1. Structure of the ANFIS

In this paper, the ANFIS uses a hybrid-learning algorithm to identify the membership function parameters and the Takagi and Sugeno’s type fuzzy inference systems (FIS). A combination of least squares and back propagation gradient descent methods is utilized for training FIS membership function parameters to model the MPEG-4 trace data. By implementing 10 bell membership functions for each input and choosing the epoch times to be 1000, it generates an initial Takagi and Sugeno’s type FIS for ANFIS training using a grid partition.

3 Simulation Results The MPEG-4 VBR video trace researched in [5] is adopted in this paper. The YUV information of each video is encoded into the MPEG–4 bit stream with the MOMUSYS MPEG–4 video software [6]. We set the number of video objects to 1 (i.e., the entire scene is one video object). The video format is CIF, whose size is 176 * 144 pixels with depth of 8 bits, but no rate control and the scalable layer coder technology in the encoding was used. The rate was set to 25 frames per second. Meanwhile, the GoP pattern was IBBPBBPBBPBB, and the quantization parameters were fixed at 10 for I-frames (VOPs), 14 for P frames, and 18 for B-frames. We observed 8197 frames traffic on the basis of the data of a GoP group. Let 1/ SNR

(¦ v 2 (m) / ¦ x 2 ( m)) (note the font size) denote the whole performance metric for prediction, it is obvious that the smaller the 1/ SNR , the better the forecast.

Modeling MPEG-4 VBR Video Traffic by Using ANFIS Silence of The Lambs 1

Original Trace(U1=90%) ANFIS(U1=90%) Original Trace(U2=80%) ANFIS(U2=80%) Original Trace(U3=70%) ANFIS(U3=70%)

0.9

GOP Loss Probability

0.8

0.7

0.6

0.5

0.4

0.3

0.2 0

30 Buffer Size (ms)

Fig. 2. Comparison of Silence of The Lambs Sequence GoP Loss Probability ( 1 / SNR =0.0111) Alpin Ski 1

Original Trace(U1=90%) ANFIS(U1=90%) Original Trace(U2=80%) ANFIS(U2=80%) Original Trace(U3=70%) ANFIS(U3=70%)

0.9

0.8

GOP Loss Probability

0.7 0.6

0.5

0.4 0.3

0.2

0.1

30 Buffer Size (ms)

Fig. 3. Comparison of Alpin Ski Sequence GoP Loss Probability ( 1 / SNR =0.0167)

961

962

Z. Fang et al. Jurassic Park (I) 1

Original Trace(U1=90%) ANFIS( U1=90%) Original Trace(U2=80%) ANFIS( U2=80%) Original Trace(U3=70%) ANFIS( U3=70%)

0.9

0.8

GOP Loss Probability

0.7 0.6

0.5

0.4 0.3

0.2

0.1

30 Buffer Size (ms)

Fig. 4. Comparison of Jurassic Park (I) Sequence GoP Loss Probability ( 1 / SNR =0.1095)

In this experiment, the network access speed was 5 × 10 bit / s (note the font size). Under the different bandwidth utilization rates (e.g.,. U=90%, 80%, 70%), GoP loss probability is compared between actual bit stream and ANFIS arithmetic, with the simulation shown in Figure 2 – Figure 4. Figure 2 shows the Silence of The Lambs’s simulation result (1/SNR=0.0111), Figure 3 is the Alpin Ski’s simulation result (1/SNR=0.0167) and Figure 4 is the Jurassic Park (I)’s simulation result (1/SNR=0.1095). 5

4 Conclusions In this paper, the MPEG-4 VBR video traffic model based on ANFIS is analyzed and discussed. The ANFIS model does not need to solve any extra parameters. It is applied to modeling and predicting the MPEG-4 VBR video traffic. Simulations show this model is concise and effective. Comparison of the GoP loss probabilities in actual traffic under different utilizations (U=70% 80% 90%) illustrates that those in the ANFIS modeling traffic are the closest approximation at the same experimental conditions and the prediction errors ( 1/ SNR ) of different video sequences (Silence of The Lambs, Alpin Ski, Jurassic Park (I)) are very small. Acknowledgments. This Project Supported by NSFC (No. 60462003), the Science and Technology Research Project of the Education Department of Jiangxi Province (No.2005-115 and 2006-231) and Jiangxi University of Finance & Economics Innovation Fund.

Modeling MPEG-4 VBR Video Traffic by Using ANFIS

963

References 1. Ahmed, T., Buridant, G., Mehaoua, A.: Delivering of MPEG-4 Multimedia Content over Next Generation Internet. Lecture Notes In Computer Science, 2216 (2001)110-127 2. Yoo, S. J.: Efficient Traffic Predication Scheme for Real-time VBR MPEG Video Transmission over High-speed Networks. IEEE Trans. Broadcasting, 48 (2002) 10-18 3. Roger Jang, J. S.: ANFIS: Adaptive-Network-Based Fuzzy Inference Systems. IEEE Trans. Systems, Man, and Cybernetics, 23 (1993) 665-685 4. Koivo, H.: ANFIS (Adaptive Neuro-Fuzzy Inference System). online: http://www.control.hut.fi 5. Fitzek, F. H. P., Reisslein, M.: MPEG-4 and H.263 Video Traces for Network Performance Evaluation, IEEE Network, 15 (2001) 40-54 6. Heising, G., Wollborn, M. MPEG-4 Version 2 Video Reference Software Package, ACTS AC0 ’98 MOMUSYS, (1999)

Multiple Textural Features Based Palmprint Authentication Xiangqian Wu1 , Kuanquan Wang1 , and David Zhang2 1 School of Computer Science and Technology, Harbin Institute of Technology (HIT), Harbin 150001, China {xqwu, wangkq}@hit.edu.cn http://biometrics.hit.edu.cn 2 Biometric Research Centre, Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong [emailprotected]

Abstract. This paper proposes two novel palmprint textural features, orientationCode and diﬀCode, and investigates the fusion of these features at score level for personal recognition. The orientationCode and diﬀCode are ﬁrst deﬁned using four directional templates and the diﬀerential operation, respectively. And then the matching score are computed to measure the similarity of the features. Finally, several fusion strategies are investigated for the matching scores of orientationCode and diﬀCode. Experimental results show that the orientationCode and diﬀCode can describe a palmprint eﬀectively and the Sum, Product and Fisher’s Linear Discriminant (FLD) fusion strategies can greatly improve the accuracy of palmprint authentication.

Introduction

Computer-aided personal recognition is becoming increasingly important in our information society. Biometrics is one of the most important and reliable methods in this ﬁeld [1]. The palmprint, as a relatively new biometric feature, has several advantages compared with other currently available features [1]: palmprints contain more information than ﬁngerprint, so they are more distinctive; palmprint capture devices are much cheaper than iris devices; palmprints also contain additional distinctive features such as principal lines and wrinkles, which can be extracted from low-resolution images; a highly accurate biometrics system can be built by combining all features of palms, such as palm geometry, ridge and valley features, and principal lines and wrinkles, etc. It is for these reasons that palmprint recognition has recently attracted an increasing amount of attention from researchers [5, 2, 3, 6, 4]. A palmprint contains following basic elements: principal lines, wrinkles, delta points and minutiae, etc.. And these basic elements can constitute various palmprint features, such as palm lines [5], textural features [4], etc.. Diﬀerent palmprint features reﬂect the diﬀerent characteristic of a palmprint. Fusion of multiple palmprint features may enhance the performance of palmprint authentication D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 964–969, 2006. c Springer-Verlag Berlin Heidelberg 2006

Multiple Textural Features Based Palmprint Authentication

965

system. Up to now, textural features based algorithms are most eﬀective for palmprint recognition. This paper will investigate some textural features and their fusion. Two novel palmprint textural features, orientationCode and diﬀCode, are computed using the directional templates and diﬀerential operation, respectively, and then compute a matching score for each feature. Finally, several strategies are investigated to fuse these two matching scores for personal authentication. When palmprints are captured, the position, direction and amount of stretching of a palm may vary so that even palmprints from the same palm may have a little rotation and translation. Furthermore, palms diﬀer in size. Hence palmprint images should be orientated and normalized before feature extraction and matching. In this paper, we use the preprocessing technique described in [4] to align and normalize the palmprints. After preprocessing, the central part of the image, which is 128 × 128, is cropped to represent the whole palmprint.

2 2.1

Feature Extraction OrientationCode Extraction

We devise several directional templates to deﬁne the orientation of each pixel. The 0◦ -directional template is devised as below: ⎡ ⎤ 111111111 ⎢2 2 2 2 2 2 2 2 2⎥ ⎢ ⎥ ⎥ ◦ T0 = ⎢ (1) ⎢3 3 3 3 3 3 3 3 3⎥ ⎣2 2 2 2 2 2 2 2 2⎦ 111111111 And the α-directional template (Tα ) is obtained by rotate T0◦ with Angle α. Denote I as an image. The magnitude in the direction α of I is deﬁned as Mα = I ∗ Tα

(2)

where “∗” is the convolution operation. Mα is called the α-directional magnitude (α-DM). Since the gray-scale of a pixel on the palm lines is smaller than that of the surrounding pixels which are not on the palm lines, we take the direction in which the magnitude is minimum as the orientation of the pixel. That is, the orientation of Pixel (i, j) in Image I is computed as below: O(i, j) = arg min Mα (i, j) ∀α

(3)

O is called the OrientationCode of the Palmprint. Four directional templates (0◦ , 45◦ , 90◦ and 135◦) are used to extract the OrientationCode in this paper. Extra experiments shows that the image with 32×32 is enough for the OrientationCode extraction. Therefore, before compute the OrientationCode, we resize the image from 128 × 128 to 32 × 32. Hence the size of the OrientationCode is 32 × 32. Figure 1 shows some examples of the OrientationCodes.

966

X. Wu, K. Wang, and D. Zhang

(a)

(b)

(c)

(d)

Fig. 1. Some examples of OrientationCodes. (a) and (b) are two palmprint; (c) and (d) are the corresponding OrientationCodes.

2.2

DiﬀCode Extraction

Let I denote a palmprint image and Gσ denote a 2D Gaussian ﬁlter with the variance σ. The palmprint is ﬁrst ﬁltered by Gσ as below: If = I ∗ Gσ

(4)

where “∗” is the convolution operator. Then the diﬀerence of If in the horizontal direction is computed as following: D = If ∗ b

(5)

b = [−1, 1]

(6)

where “∗” is the convolution operator. Finally, the palmprint is encoded according to the sign of each pixel of D: 1, if D(i, j) > 0; C(i, j) = (7) 0, otherwise. C is called diﬀCode of the palmprint I. Extra experiments also shows that the image with 32 × 32 is enough for the diﬀCode extraction. Therefore, before compute the diﬀCode, we resize the image from 128 × 128 to 32 × 32. Hence the size of the diﬀCode is also 32 × 32. Figure 2 shows some examples of DiﬀCode.

(a)

(b)

(c)

(d)

Fig. 2. Some examples of DiﬀCodes. (a) and (b) are two palmprint; (c) and (d) are the corresponding DiﬀCodes.

Multiple Textural Features Based Palmprint Authentication

967

Feature Matching

According to the deﬁnitions of the orientationCode and diﬀCode, the size of both features are same, i.e. 32 × 32. Let C1 and C2 denote two same type features (orientationCode or diﬀCode). Since C1 and C2 has the same length, we can use Hamming distance to deﬁne their similarity. The Hamming distance between C1 and C2 (H(C1 , C2 )) is deﬁned as the number of the places where the corresponding values of C1 and C2 are diﬀerent. That is, H(C1 , C2 ) =

32 32

C1 (i, j) ⊗ C2 (i, j)

(8)

i=1 j=1

where ⊗ is the logical XOR operation. The matching score of C1 and C2 is then deﬁned as below: H(C1 , C2 ) (9) 32 × 32 Actually, S(C1 , C2 ) is the percentage of the places where C1 and C2 have the same values. Obviously, S(C1 , C2 ) is between 0 and 1 and the larger the matching score, the greater the similarity between C1 and C2 . The matching score of a perfect match is 1. S(C1 , C2 ) = 1 −

Score Fusion

Denote x1 and x2 as the matching scores of the orientationCode and diﬀCode, respectively. We fuse these two scores by following strategies to obtain the ﬁnal matching score x. S1 : Maximum Strategy: x = max(x1 , x2 ) (10) S2 : Minimum Strategy: x = min(x1 , x2 ) S3 : Product Strategy: x=

√ x1 x2

(11) (12)

S4 : Sum Strategy:

x1 + x2 2 S5 : Fisher’s Linear Discriminant (FLD) Strategy: x=

(13)

W T SB W (14) W W T SW W ! x1 T x = Wopt ∗ (15) x2 where SB and SW are the between-class scatter matrix and the within-class scatter matrix of the genuine and impostor matching scores [7]. Wopt = arg max

968

X. Wu, K. Wang, and D. Zhang

Experimental Results

We employed the PolyU Palmprint Database [8] to test our approach. This database contains 600 grayscale images captured from 100 diﬀerent palms by a CCD-based device. The orientationCode matching scores and the diﬀCode matching scores of each couple of the samples in this database are computed and then fused to get the ﬁnal scores. All of the described fusion strategies were tested. And their ROC curves are plotted in Figure 3. And their equal error rate (EER) are listed in Table 1.

Maximum Minimum Product Sum FLD orientationCode diffCode

False Reject Rate (%)

0.9 0.8

EER

0.7 0.6 0.5 0.4 0.3 0.2 0.2

0.3

0.4

0.5 0.6 0.7 0.8 0.9 False Acceptance Rate (%)

1.1

Fig. 3. The ROC Curves of the orientationCode, the diﬀCode, and the diﬀerent fusion strategies (S1 , S2 , S3 , S4 and S4 ) Table 1. EERs of the orientationCode, the diﬀCode, and the diﬀerent fusion strategies (S1 , S2 , S3 , S4 and S5 ) Strategy OrientationCode diﬀCode S1 S2 S3 S4 S5 EER (%) 0.73 0.64 0.65 0.66 0.45 0.45 0.49

According to Figure 3 and Table 1, the performances of the maximum strategy and minimum strategy are worse than the diﬀCode, while the Sum, Product and FLD strategies can greatly improve the accuracy. The performances of the Sum, Product and FLD strategies are similar. However, the speed of the Sum strategy is much faster than the other two because of the less computation complexity. Therefore, the Sum fusion strategy is more suitable for the on-line palmprint authentication.

Conclusions

This paper proposed two novel palmprint textural features and investigated the fusion of these features. Several fusion strategies has been investigated. The maximum strategy and minimum strategy cannot improve the accuracy for palmprint

Multiple Textural Features Based Palmprint Authentication

969

recognition while the Sum, Product and FLD strategies greatly outperform both the orientationCode and diﬀCode. Considering the computation complexity, the Sum strategy is more suitable for the on line palmprint recogntion system.

Acknowledgements This work is supported by the National Natural Science Foundation of China (No. 60441005), the Key-Project of the 11th-Five-Year Plan of Educational Science of Hei Longjiang Province, China (No. HZG160) and the Development Program for Outstanding Young Teachers in Harbin Institute of Technology.

References 1. Jain, A., Ross, A., Prabhakar, S.: An Introduction to Biometric Recognition. IEEE Transactions on Circuits and Systems for Video Technology 14 (2004) 4–20 2. Wu, X., Wang, K., Zhang, D.: Fisherpalms Based Palmprint Recognition. Pattern Recognition Letters 24 (2003) 2829–2838 3. Duta, N., Jain, A., Mardia, K.: Matching of Palmprint. Pattern Recognition Letters 23 (2001) 477–485 4. Zhang, D., Kong, W., You, J., Wong, M.: Online Palmprint Identiﬁcation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003) 1041–1050 5. Wu, X., Wang, K., Zhang, D., Huang, B.: Palmprint Classiﬁcation Using Principal Lines. Pattern Recognition 37 (2004) 1987–1998 6. Han, C., Chen, H., Lin, C., Fan, K.: Personal Authentication Using Palm-print features. Pattern Recognition 36 (2003) 371–381 7. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classiﬁcation. John Wiley & Sons, Inc. (2001) 8. PolyU Palmprint Palmprint Database. http://www.comp.polyu.edu.hk/∼biometrics/

Neural Network Deinterlacing Using Multiple Fields∗ Hyunsoo Choi, Eunjae Lee, and Chulhee Lee Dept. Electrical and Electronic Engineering, Yonsei University 134 Shinchon-Dong, Sedaemoon-Gu, Seoul, 120-749, South Korea {piyagihs, ejlee, chulhee}@yonsei.ac.kr

Abstract. In this paper, we proposed a deinterlacing algorithm using neural networks for conversion of interlaced videos to progressive videos. The proposed method uses multiple fields: a previous field, a current field, and a next field. Since the proposed algorithm uses multiple fields, the neural network is able to take into account the motion pattern which might exists in adjacent fields. Experimental results demonstrate that the proposed algorithm provides better performances than existing neural network deinterlacing algorithms that uses a single field.

1 Introduction Since the invention of analog TV over 80 years ago, the industrial standard interlaced scan has been widely adopted in various TV broadcasting standards, which include NTSC, PAL and SECAM. The interlaced scan doubles the frame rate as compared to that of the progressive scan using the same bandwidth occupation [1]. However, this interlaced scan also introduces undesirable artifacts such as line crawling, interline flickering, and line twitter because of the nature of the interlaced scan. These artifacts can impair the visual quality of videos. In addition, interlaced scanning is unsuitable for display devices such as LCD-TVs, PDPs, PC monitors that require progressive formats. For example, it is necessary to convert interlaced DVD videos into the progressive format to display on PC monitors and LCD-TV monitors. Furthermore, recent HDTV monitors and multimedia PCs require the conversion between interlaced and progressive video sequences. A large number of techniques have been proposed for the interlaced to progressive scan conversion [1-8]. Some methods are based on intra-field de-interlacing. The main advantage of such algorithms is an easy implementation. These algorithms include line doubling, vertical averaging and edge-based line averaging (ELA) [2]. The ELA technique performs interpolation in the direction which has the highest correlation. However, these intra-field deinterlacing methods fail to provide good performance in the motion area of video sequence. To improve the performance within motion area, motion compensation methods [6-8] have been introduced. These methods are the most ∗

This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment) (IITA-2005(C1090-0502-0027)).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 970 – 975, 2006. © Springer-Verlag Berlin Heidelberg 2006

Neural Network Deinterlacing Using Multiple Fields

971

advanced approaches in deinterlacing and involve estimating specific motion trajectories. However, these methods suffer from computational complexity. In addition, incorrect motion estimation produces performance degradation. Recently, deinterlacing methods based on neural networks have been proposed [9], which uses the present field to obtain inputs of a neural network. In this paper, we propose to use a neural network for de-interlacing, which uses inputs from the previous, current, and next fields.

2 Neural Network Deinterlacing 2.1 Neural Network Deinterlacing Using a Single Field In this section, we briefly describe deinterlacing methods using neural networks which use a single field. The multilayer feed-forward network is one of the most popular neural network architectures [10]. The neural network has shown good performance in many applications such as pattern recognition and data optimization. Typically, the back-propagation algorithm is used to adjust the weight vector, which is updated so that the following error is reduced:

1 2

¦ (t

− ok ) 2 .

(1)

where t k is a target value and ok is an output value of the neural network. During training phase, the weight vector is updated as follows:

Δwkj = −η

∂E , ∂wkj

Δw ji = − η

∂E . ∂w ji

(2)

where η is the learning rate. Plaziac proposed a deinterlacing method based on neural networks which use a single field [9]. In the algorithm of [9], the neural network has 30 inputs, 16 hidden neuron, and 3 outputs as shown in Fig. 1.

Pixels to be interpolated Existing pixels from the decimated image Pixels chosen as the input of neural network Pixels used to test the neural network accuracy

Fig. 1. Pixels used for inputs and outputs of Plaziac’s line-doubling method

2.2 Neural Network Deinterlacing Using Multiple Fields

In interlaced videos, adjacent fields provide valuable information for filling in the missing lines in the current field. In order to utilize this information the adjacent fields

972

H. Choi, E. Lee, and C. Lee

may provide, the proposed deinterlacing method uses three fields: the previous, current, and next fields. Fig. 2 shows the input pixels the proposed algorithm uses. It is noted that the proposed method uses 20 inputs. The proposed neural network has 16 hidden neurons and 1 output neuron. From the present field, 10 pixels are selected as inputs. The previous and next fields provide additional 10 pixels. In other words, the input vector and the output vector can be represented as follow: A = {a1 , a2 , a3 ,, a20 } ,

Previous frame

a1 a2 a3 a4 a5

B = {b1} .

(3)

Current frame

Next frame

a6 a7 a8 a9 a10 b1 a11 a12 a13 a14 a15

a16 a17 a18 a19 a20

Input : A Target: B

Fig. 2. Pixels used for the inputs and output in the proposed method

3 Experiments and Results Experiments were conducted in order to evaluate the performance of the proposed deinterlacing method. First, interlaced sequences were made by eliminating an even or odd line of progressive videos. Fig. 3 shows a process of progressive to interlaced Fn

Original Progressive Sequence

Fn+1

0 Interlaced Sequence

1 2 3 4 5 6 7

Fig. 3. Progressive-to-interlaced conversion (scan line decimation) [11]

Neural Network Deinterlacing Using Multiple Fields

973

Table 1. Average PSNRs(dB) Format

QCIF

CIF

Video

ELA

Coastguard Container Foreman Hall & Monitor Mobile Mother & Daughter Silent Stefan Table Coastguard Container Hall & Monitor Mobile Akiyo Miss Mother & Daughter Silent Singer Stefan Table Average

26.38 26.62 33.38 28.65 23.04 35.98 32.92 23.63 26.76 28.09 27.77 30.82 23.40 37.86 40.94 38.60 33.91 33.42 26.07 29.39 30.38

NNDSF (by Plaziac) 26.13 26.37 30.55 28.34 23.49 33.98 32.07 24.09 26.22 28.16 28.27 30.45 24.92 35.29 37.09 35.97 32.65 32.22 26.85 29.98 29.66

NNDMF (Proposed) 31.92 35.24 35.00 34.91 30.25 39.35 37.28 26.49 28.44 31.03 36.16 35.22 28.71 41.46 41.73 40.17 36.96 37.05 25.84 27.61 34.04

Fig. 4. PSNR results of the three algorithms for the mobile sequence (QCIF format)

conversion. Then, the neural network is trained by using 5 QCIF video sequences: foreman, coastguard, container, hall&monitor, and mobile. From the 5 video sequences, the first 100 fields are used as training data. At the boundaries of images, pixels are mirrored to produce input vectors. After training, test videos of two video formats (QCIF, CIF) are used for performance evaluation. First, 9 QCIF videos (foreman, coastguard, container, hall&monitor, mobile, mother&daughter, silent, stefan, and table) are used as test data. Next, 11 CIF format videos (coastguard, container, hall&monitor, mobile, akiyo, miss, xtmother&daughter, silent, singer, stefan, and table) are used as test data.

974

H. Choi, E. Lee, and C. Lee

The proposed is compared with ELA and the neural network deinterlacing which uses a single field [9]. The peak signal to noise ratio (PSNR) is used as a criterion. Table 1 shows performance comparison of three methods: ELA, NNDSF (neural network deinterlacing which uses a single field), NNDMF (the proposed method). As can be seen, the proposed algorithm provides the best performance. Fig. 4 shows the frame PSNRs of the proposed method and other two methods for the mobile video sequence in QCIF format. Fig. 5 shows the reconstructed images and the original image of the 27th frame of the mobile video sequence.

(a)

(c)

(b)

(d)

Fig. 5. Reconstructed images of the 3 methods and the original image of the mobile sequence (27th frame). (a) ELA (b) NNDSF (c) NNDMF (Proposed method) (d) Original image.

4 Conclusion and Discussion In this paper, we proposed to use a neural network for deinterlacing with inputs taken from several fields. The proposed method use three fields: the previous, current, and next fields. Experimental results show that the proposed method significantly outperforms the existing methods.

References 1. Li, R.X., Zeng, B., Liou, Ming L.: Reliable Motion Detection/Compensation for Interlaced Sequences and Its Applications to Deinterlacing. IEEE Trans. Circuits and Systems for Video Technology, 10(1) (2000) 23-29 2. Doyle, T., Looymans, M.: Progressive Scan Conversion Using Edge Information. Signal Processing of HDTV II, L. Chairglione, Ed. Amsterdam, The Netherlands: Elsevier (1990) 711-721

Neural Network Deinterlacing Using Multiple Fields

975

3. Unser, M.: Splines: A Perfect Fit for Signal and Image Processing. IEEE Signal Processing Magazine, 16(6) (1999) 22-38 4. Bock, M.: Motion Adaptive Standards Conversion between Formats of Similar Field Rates. Signal Processing: Image Commun., 6(3) (1994) 275-280 5. Kovacervic, J., Safrank, R. J., Yeh, E. M.: Deinterlacing by Successive Approximation. IEEE Trans. Image Processing, 6(2) (1997) 339-344 6. Woods, J.W., Han, S.C.: Hierarchical Motion Compensated Deinterlacing. in Proc. SPIE, vol. 1605 (1991) 819-825 7. Bellers, E. B., de Haan, G.: Advanced Motion Estimation and Motion Compensated Deinterlacing. In Proc. Int. Workshop HDTV, Los Angeles, CA, Oct. , session A2 (1996) 8. Kwon, O., Sohn, K., Lee, C.: Deinterlacing Using Directional Interpolation and Motion Compensation. IEEE Trans. Consumer Electronics, 49(1) (2003) 198-203 9. Plaziac, N.: Image Interpolation Using Neural Networks. IEEE Trans. Image processing, 8(11) (1999) 1647-1651 10. Patterson, Dan. W.: Artificial Neural Networks. Prentice Hall (1995) 11. Jack, K.: A Handbook for the Digital Engineer. Fourth Edition, Elsevier (2004)

Non-stationary Movement Analysis Using Wavelet Transform Cheol-Ki Kim1, Hwa-Sei Lee1, and DoHoon Lee2 1

Department of Design, Pusan National University, South Korea School of Computer Science & Engineering, Pusan National University, South Korea [emailprotected], [emailprotected]

Abstract. This paper presents a method that automatically detects the insect’s abnormal movements. In general, the ecological data are difficult for analysis due to complexity residing in the systems with the variables varying in nonstationary fashion. Therefore, we needs to efficient methods that are able to measure from various environmental conditions. In this paper the wavelet transform are introduced as an alternative tool for extracting local and global information out of complex ecological data. And we discuss the method that is applicable to various relative fields.

1 Introduction Non-stationary movement data are difficult for analysis in ecological systems. Various mathematical methods have been used in ecology to analyze computational behaviors [1][2]. Recently IT techniques in ecological informatics have been applied to extraction of information from various fields such as forecasting and patterning, etc [3][4]. In real situations, however, local information in ecological data may also be important in revealing the states of individual specimens or ecological systems. The compressed information in parameters is usually to brief to address the local and global information sufficiently at the same time. In this regard, wavelets could be considered as an alternative tool to extract local and global information at the same time. Wavelet theory has been one of the most successful tools to analyze, visualize, and manipulate complex time based data, for which the traditional Fourier methods cannot be applied directly [5]-[7]. Analysis of data by wavelets allows one to study the data as if one studies material objects with a microscope capable of many levels of magnification. Wavelet approach is also flexible in handling irregular data sets. It can represent complex structures without the knowledge of the underlying function that generated the structure. It can precisely locate the jump discontinuities, singularities in dynamical systems [6]-[9]. Wavelets would be especially useful for finding scale dependent regularities from ecological data measured in experimental and field conditions. In implementation, wavelets have been efficiently used in extracting local information in time development and have been useful for characterizing or identifying changes in shapes of the curves [8][9]. One of the first applications of wavelets in ecology can be found in [10] in analyzing coherent structure existed between atmosD.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 976 – 981, 2006. © Springer-Verlag Berlin Heidelberg 2006

Non-stationary Movement Analysis Using Wavelet Transform

977

phere and forest. This paper outlines application of wavelets to ecological data in various types, and demonstrates the usage of wavelets for behavioral monitoring of indicator species treated with a toxic chemical.

2 The Proposed Method Wavelets were further used for monitoring continuous movement of an indicator species. DWT was applied to detection of changes in the shape of the movement tracks of the larvae of Chironomid after being individually treated with an insecticide, carbofuran, at the concentration of 0.1 mg/l. Figure 1 shows typical examples of the movement tracks of C. samoensis in the long term after treated with the chemical. This type of data could be continuously obtained from the image processing system [12]-[13]. In this case the typical responding behaviors could be observed from the movement tracks located at the center and right upper side of the cage in Figure 1, the shaking and highly-curved circular movements. In preliminary tests, various variables such as angle speed, angle acceleration, speed, acceleration, location, and maximum movement length were checked in revealing the states of the testing specimens. Among the variables phase angle was selected as input variable to detect the response behaviors of the specimens in this study, regarding that the typical symptomatic movements of the specimens after the treatments of the chemical were the highly shaking movements (Fig. 1). The response behaviors were characterized in changes in phase angles. The relationships of abnormal states of the test specimens with other characterizing variables will be discussed elsewhere. In the example segment shown in Fig. 1, the movement tracks were characterized in different phases. Usually the highly shaking movements with sharp changes in phase angles were observed when the species showed a number of small circular movements in a limited area.

Fig. 1. The movement tracks of the specimens of Chironomus samoensis larvae in 2D for the period of approximately 82000s first days after the treatment of carbofuran (0.1 mg/l)

Since the movement tracks were recorded in 2 dimensions from top view, we obtained the phase angle of the movement tracks in the usual manner for the complex variables for a point, x + yi , as shown in equation (1). Z = x + yi , θ = angle(Z ) , Z =| Z | . exp(iθ ) = R. exp(iθ )

(1)

978

C.-K. Kim, H.-S. Lee, and D. Lee

In the equation, ș= angle(Z) returns the phase angles in radians for each element of complex array Z, while the magnitude is given by R =| Z | . The phase angles were obtained continuously in every 0.25s during the observation period. Fig. 2a shows changes in phase angles of the movement tracks of the specimens after the treatments of the chemical. Different patterns in phase angles were correspondingly observed compared with Figure 1. The phase of movement in the limited area from the start to the middle period was presented with flat curves with fluctuation in small scale, while the longer circular turning movements were matched to clear periodic changes in phase angles in the latter part of observation (Fig. 2a).

Fig. 2. Changes in the values of phase angle in the movement tracks (approximately 760s) of Chironomus samoensis shown in Fig. 1 (a), and corresponding amplitude terms in different frequency components, The first level (D1) (b) and 8th level in scaling minimum (D8) (c) in 8 step decomposition according to Daubechie 4 function

After preliminary tests, Daubechie 4 was selected as the base function for DWT to detect changes in phase angles of the movement tracks among various base functions. To extract information of wavelets, hierarchical processes were applied to the data of phase angles. Figure 3 shows the filtering procedure in obtaining coefficients in wavelet analysis in this study. High (H_1) and low (H_0) filters were applied to the data to every step of filtering. When the data for signal were initially provided to the filters the signal is decomposed to two components in high frequency (D1) and low frequency (A1). Subsequently, A1 is decomposed to D2 and A2, and the component for low frequency (A2) was further decomposed for the third step of filtering. For simplicity of presentation the process up to D3 and A3 was listed in Figure 3. This process was repeated until A8 and D8 were obtained. Consequently, the formula to present the relationships between θ and amplitudes for different frequencies are as follows:

θ = A8 + D1 + D 2 + + D8

(2)

While the high frequency component in the first level (D1) provided the short term information with good time resolution, the components of high frequency in the eighth level (D8) carried the long term information. Among 8 components, we chose two levels components, minimum (D1) and maximum (D8), for detection of the

Non-stationary Movement Analysis Using Wavelet Transform

979

movement patterns (Fig. 2b and 2c). The minimum level component represents the highest frequency (D1) with the finest time resolution (Fig. 2b). With D1, the changes in phase angle in the lowest scale could be efficiently detected with the highest time resolution. Slight changes in phase angle in the shortest time period could be detected. On the other hand, the maximum level component (D8) could detect the ranges of somewhat longer ranges in the changes in phase angles (Fig. 2c). With preliminary tests, these two levels suffice for revealing the changes of behavioral states.

Fig. 3. Wavelet decomposition procedure, where S is the original signal to be decomposed, and H_0 and H_1 are lowpass and highpass filters, respectively

The changes in the amplitude terms in decomposition were differently observable in D1 and D8. The curve for the changes in the amplitude for D1 was sharp (Fig. 2b), while the corresponding curve for D8 was smoother, consequently indicating overall changes in amplitude terms (Fig 2c). By combining these two components, the time points for changes in the variables for both high and low frequency resolution could be detected through DWT. We selected the points that show the amplitude terms above the threshold both for D1 and D8 at the same time through AND logic. The characterizing coefficients at D1 were initially selected if the level was higher than a threshold ( θ = 0.01 ). Subsequently the coefficients at D8 were selected if the level was higher than a threshold ( θ = 0.05 ). Finally the points satisfying both conditions were chosen for indicating the changes in the patterns of the movement tracks.

3 Experimental Results We implemented the proposed method in Pentium IV 2.3GHz, 512MB and Matlab 6.5 and Visual C++. Detection was subsequently possible when a stream of input data for the data of phase angles was fed to the model to meet the criteria stated above (Fig. 4). The time points of the movement tracks with higher levels of changes in phase angles were sequentially detected (bold case in Fig. 4a). The detected patterns (bold case in Fig. 4a) were concentrated in the early phase when the specimens moved in a limited area in this case. An example of the enlarged segment detected by the model is shown in Figure 4b. The shaking and limited movement (bold case) and the normal movements not detected the model were both observed. The model was evaluated with the movement data for different individuals before and after the treatments. The data for the movement tracks were divided in every 757.5 s, and the total time of detection was calculated in each segment. The changes in detection time were significantly increased the data for different species after the treatments (Tab. 1).

980

C.-K. Kim, H.-S. Lee, and D. Lee

Fig. 4. The movement tracks detected by DWT. (a) track for the period of approximately 757.5s three days after the treatment. (b) track for the period of approximately 50s(from 200s to 250s) (c) track for the period of approximately 50s(from 450s to 500s). Table 1. Changes in detection rate of the movement patterns of Chironomus samoensis larvae before and after the treatments by using DWT (Student’s T-test)

Specimens

Treat

Mean

S.D

Before After Before After Before After

22 38 35 35 15 15

99.50 271.84 196.40 383.91 49.07 203.67

57.14 189.64 63.88 155.58 53.67 98.67

-4.142

Probability (p) 2,P

Sb}.

(3)

The steps of the segmentation algorithm are described below: • Step 1: Extract all fork points and add them to Sf. • Step 2: Convert the value of all fork points to 0, i.e. S(P)=0, P Sf. Now there are only two kinds of points: end point and connective point. • Step 3: Trace out all segment curves between end points to constitute the segments set by scanning the skeleton image after fork points removed following the natural writing behaviour top-to-bottom and left-to-right. • Step 4: Clear the segments in which the number of black points is smaller than a certain value, in our experiments it is 5.

Fig. 3. Sample result of segmentation

This segmentation method is different from the strokes extraction method used for Chinese character recognition in two aspects mainly. First, the definition of fork point is different. The definition described by equation 3 enlarges the number of fork point. Because the fork points in our method serve only for disconnecting the connection of multi-strokes, expanding the quantity of fork points doesn’t lead to increasing of

998

Z. Wu, X. Chen, and D. Xiao

computation cost. On the contrary, it removes some bug points introduced at thinning stage. Second, there is no broken strokes connection part in our segmentation method since standard strokes is not necessary for Chinese signature verification mentioned above. This simple segmentation algorithm has three advantages: (1) decreasing the computational cost, (2) increasing the robustness for the large variations of Chinesesignature and (3) preserving more individuality of signature. Figure 3 shows a sample result of this segmentation. 3.3 Feature Extraction After preprocessed and segmented, the signature has a series of segments with the same size of the signature. In order to find the two most matching segments between the test and the reference signature, every segment is represented by a set of six features for comparison The first two features are relative horizontal and vertical center of the segment: N

C xs

x ⋅ s ( x, y ) x =1

¦ ¦ = ¦ ¦ y =1 N

y =1

s ( x, y ) x =1

/ M , C ys

x =1 N

y =1 M

¦ ¦ = ¦ ¦ y =1

y ⋅ s ( x, y )

/N .

(4)

s ( x, y ) x =1

Where, M denotes the width of the signature; N denotes the height of the signature; s ( x, y ) is the image of the segment s. The other four features reflect the trace or slant information of the segment s. They s are (1) the number ( Ph ) of points which has horizontal neighbor, (2) the number s s ( Pv ) of points which has vertical neighbor, (3) the number ( Po ) of points which has s positive diagonal neighbor and (4) the number ( Pe ) of points which have negative diagonal neighbor. The four kinds of neighbor are shown in figure 4. All the four values are normalized to [0 1]. They are given as

Rks = Pks

¦P

s k k∈{ h ,v ,o ,e}

, k ∈{h, v, o, e} .

(5)

Fig. 4. Four kinds of neighbor

4 Comparison and Experimental Results 4.1 Similarity Calculation Between Two Signatures After the processing of segmentation and feature extraction, the signature is represented by a series of 6 dimensions vectors composed of a set of six features. Each of

Offline Chinese Signature Verification

999

these vectors represents a segment of the signature. For a pair of signatures, the most corresponding segment in signature B of each segment in signature A is found by using a feature matching approach based on Euclidean distance. Let there are n segments in signature A and m segments in signature B. For each segment (segment i) in signature A, the 2*m Euclidean distances are calculated as

d ij1 = (C xi − C xj ) 2 + (C yi − C yj ) 2 ,1 ≤ j ≤ m , d ij2 =

¦ (R

i k

− Rkj ) 2 ,1 ≤ j ≤ m .

k∈{ h ,v ,o ,e}

(6)

(7)

Then the corresponding of segment i in signature A is segment q in signature B. q is calculated by formula described as below:

k , min{d ik2 , d il2 , d is2 } = d ik2 ° q = ® l , min{d ik2 , d il2 , d is2 } = d il2 ° s, min{d 2 , d 2 , d 2 } = d 2 ik il is is ¯

{k , l , s} ⊂ [1, m] ,

(8)

where, k, l and s are the serial number of three segments in signature B, and these three segments in signature B have the three smallest distance 1 with segment i of 1

signature A. i.e. d it

≤ d ij1 , t ∈ {k , l , s}; j ∈ [1, m] and j ≠ k , l , s .

For each segment in signature A, the value

vi are given as

1, d iq1 i + d iq2 i < T vi = ® , ¯ 0, otherwise

(9)

where vi=1 represents the segment i of signature A matches with a segment qi of signature B, and vi=0 represents the segment i of signature A has no matching segment in signature B; T is a distance threshold which is experimentally decided in training stage. Let matab represents the number of matching segments of signature A. matab is given as n

mat ab = ¦i =1 vi .

(10)

The matab represents the number of matching segments of signature B and is computed vice-verse. The similarity degree between signature A and B is computed as

simab = min{

mat ab matba × 100, × 100} . n m

(11)

4.2 The RBFNN Classifier and Comparison Stage The RBFNN Classifier is a three-layers feedforward network. The three-layers are input layer with four neurons, hidden layer with four to nine neurons and output layer

1000

Z. Wu, X. Chen, and D. Xiao

with two neurons. The hidden unit uses radially symmetric function as activation function. Two kind of basic functions including Gaussian function and thin plate spline function are applied in the experiments. A two-stage training algorithm is introduced. Stage 1, it uses 500 iterations of EM to position the centers. And then in stage 2 the pseudo-inverse of the design matrix to find the second layer weights. The input of the classifier is a 4 dimensions vector composed of 4 similarity degrees between the question sample and 4 reference samples. The output of the classifier is a simple linear discriminant that outputs a weighted sum of the basic functions. There are two stages in the comparison phase: training and verification. Training stage has two aims. One is experimentally adjusting system parameter: the distance threshold value T (see equation 9) between two compared signature segments to decide if these two segments are matching; we set T equal to 0.4 in our experiments. The other is training the RBFNN classifier. In verification stage, all 9184 (287*32) vectors of similarity degree in second database are inputted to RBFNN classifier. A type I error has occurred when genuine samples are identified as forgery. On the contrary, when forgery is identified as genuine sample, a type II error occurred. 4.3 Experimental Results Table 2 shows the results obtained using the second database. The experiments have shown promising results in terms of general error rate. The simulated forgery acceptance rate was high because the features extracted were not enough to identify this type of forgery. Table 1. Experimental results using the second signature database (GF: Gaussian Function; TPSF: Thin Plate Spline Function; HNN: Number of Neuron in Hidden layer) HNN

Error Type (%) Error Type (%) Random Simple Simulate

HNN

Error Type (%) Error Type (%) Random Simple Simulate

5.05

0.03

6.25

13.75

4.09

0.03

6.64

11.25

5.29

0.03

6.25

13.13

5.05

0.03

6.25

12.50

5.65

0.03

5.47

11.87

4.57

0.03

6.25

12.50

6.13

0.03

5.47

10.63

3.85

0.04

7.42

15.63

6.13

0.03

5.47

10.63

4.09

0.03

8.59

15.63

6.37

0.03

5.86

12.50

3.00

0.04

8.59

15.63

TPSF

5 Conclusion A novel off-line Chinese signature verification approach is proposed. This method is based on feature extraction of every segment of the signature skeleton and a general model of RBFNN classifier. The simple segmentation method proposed requires lower computational cost and is more robust than strokes extraction method, which is achieved by means of simplifying the definition of feature points and getting ride of the broken strokes connection part. By using a global model instead of setting up

Offline Chinese Signature Verification

1001

independent model for each writer, this method reduces the number of genuine sample required by each writer in training phase. For each writer, only 4 genuine samples are required as reference in this method.

Reference 1. Qi, Y. Y., Hunt, B. R.: Signature Verification Using Global and Grid Features. Pattern Recognition, 22 (12) (1994) 95-103 2. Sabourin, R., Genest, G., Prêteux, F.: Off-line Signature Verification by Local Granulometric Size Distributions. IEEE Trans. PAMI, 19 (9) (1997) 976-988 3. Justino, E. J. R., Bortolozzi, F., Sabourin, R.: Off-Line Signature Verification Using HMM for Random, Simple and Skilled Forgeries. Proc. of the International Conference on Document Analysis and Recognition, Seattle, USA, IEEE Computer Society Press, (2001) 105-110 4. Justino, E. J. R., Yacoubi, El. A., Bortolozzi, F. and Sabourin, R.: An Off-line Signature Vverification System Using Hidden Markov Model and Cross-validation. Proc. XIII Brazilian Symposium on Computer Graphics and Image Processing, Gramado, Brazil (2000) 105-112 5. Justino, E. J. R., Bortolozzi, F., Sabourin, R.: An Off-line Signature Verification Method Based on SVM Classifier and Graphometric Features. Proc. of the 5th International Conference on Advances in Pattern Recognition, Calcutta, India (2003) 134-141 6. Lam, L., Lee, S. W., Suen, C. Y.: Thinning Methodologiesa Comprehensive Survey. IEEE Trans. PAMI, 14 (9) (1992) 869-885 7. Abuhaiba, I. S. I., Holt, M. J. J., Datta, S.: Processing of Binary Images of Handwritten Text Documents. Pattern Recognition, 29 (7) (1996) 1161-1177 8. Liu, K., Huang, Y. S., Suen, C. Y.: Robust Stroke Segmentation Method for Handwritten Chinese Character Recognition. Proc. of the Fourth International Conference on Document Analysis and Recognition, ULm, German, 1 (1997) 211-215 9. Lin, F., Tang, X.: Offline Handwritten Chinese Character Stroke Extraction. Proc. of the 16th International Conference on Pattern Recognition, Quebec, Canada, 3 (2002) 249-252

On-Line Signature Verification Based on Wavelet Transform to Extract Characteristic Points LiPing Zhang1,2 and ZhongCheng Wu1 1

Center for Biomimetic Sensing and Control Research Institute of Intelligent Machines, CAS, Hefei, 230031 Anhui, China 2 Department of AutomationUniversity of Science & Technology of China Hefei, 230026 Anhui, China [emailprotected], [emailprotected]

Abstract. On-line signature verification is one of the most accepted means for personal verification. This paper proposes an on-line signature verification method based on Wavelet Transform (WT). Firstly, the method uses wavelet transform to exact characteristic points of 3-axis force and 2-dimension coordinate of signature obtained by the F-Tablet. And then it builds 5dimension feature sequences and dynamically creates multi-templates using clustering. Finally, after the fusion of the above-mentioned 5-dimension feature sequences, whether the signature is genuine or not is decided by majority voting scheme. Experimenting on a signature database acquired by F-Tablet, the performance evaluation in even EER (Equal Error Rate) was improved to 2.83%. The experimental results show that the method not only reduces the amount of data to be stored, but also minimizes the duration of the whole authentication processing and increases the efficiency of signature verification.

1 Introduction As one of biometric authentication methods, signature verification is considered as the most convenient and no intrusion. Moreover signature has been the primary form for identity verification for a long history and has been widely accepted. Plamondon et al [1] categorized the various signature verification methodologies into two types: functional approach and parametric approach. In parametric algorithms, the task of selecting the right set of parameters is not trivial. And one of the major issues of function-based approaches is how to compare two signature patterns despite their different durations and their non-linear distortion with respect to the time parameter. DTW (Dynamic Time Warping) has provided a major tool in overcoming this problem [2]. Although this method has been highly successful in signature verification processes, it still has a high computational complexity due to the repetitive nature of its operations for the optimization process. Some authors tried to improve this method. Y.J. Bae et al [3] propose a parallel DTW algorithm, which results in a reduction of time complexity. Hao Feng et al [4] present a new extreme points warping technique for the functional approach. It improves the equal error rate D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1002 – 1007, 2006. © Springer-Verlag Berlin Heidelberg 2006

On-Line Signature Verification Based on WT to Extract Characteristic Points

1003

and reduces the computation time. But the raw data of signatures is sizeable, so the computational complexity of the DTW is hard to be improved greatly. In this paper, we particularly insist on the extraction of the characteristic points representing signatures based on wavelet transform, and fuse 3-axis force and 2-dimension coordinate information. This enables to reduce the amount of signature data to be stored and makes verification process as fast and as efficient as possible.

2 Signature Verification System The block diagram of proposed on-line handwritten signature verification system is shown in Fig.1. Our method first extracts characteristic points of signatures by wavelet transform and builds 5-dimension feature sequences, then matches these sequences of test signature and its corresponding templates and fuse five decisions by majority voting strategy to make final decision.

Fig. 1. The block diagram of proposed online signature verification system

2.1 Signature Database and Preprocessing Differing from other tablets, we use the F-Tablet to capture 3-axis force and 2dimension coordinate information of the pen-tip at 100 samples per second [5]. Fig 2 shows the online signature system interface and the F-Tablet. There are 32 writers enrolled to our database. Writers signed in their most natural way. Each signer supplied 30 genuine signatures in two sessions over a period of one month. We generated also 20 skilled forgeries (10 random forgeries and 10 skilled ones) to each writer. To collect skilled forgeries forgers were free access to trajectory and writing sequence of each signature and with no limitation in time for training. Before verification, the method does some preprocessing such as discarding the head and tail empty strokes of signatures, filter and normalization for 3-axis force and 2-dimension coordinate information of signatures. Fig. 3 displays the information sequences after preprocessing respectively.

1004

L. Zhang and Z. Wu

Fig. 2. Online signature system interface and the F-Tablet 100

300

fx fy fz

200

100

x y

Positions

100

-100 0

100

200

300

0 0

400

(a1) The 3-Axis force curves

100

200

300

400

100

fx fy fz

100

(a3) Genuine signature

100

x y

300

Positions

200

100

0 20

-100

-200 0

(a2) 2-dim coordinate curves

500

400

100

200

300

400

0 0

500

100

200

300

400

500

0 0

(b1) The 3-Axis force curves (b2) 2-dim coordinate curves

100

(b3) Forgery

Fig. 3. The 3-Axis force, 2-dim coordinate information and the corresponding signatures

2.2 Extraction of Characteristic Points by Wavelet Transform Wavelets are a family of functions that are able to cut up a signal into different frequency components and then study each component with the resolution matched to its scale [6]. Usually, a continuous wavelet transform can be written as: WT x ( a , b ) =

1 a

³ x ( t )ψ

∗

(

t−b ) dt = a

³ x (t )ψ a ,b

∗

( t ) dt = ¢ x ( t ), ψ

a ,b

( t )².

(1)

Where the symbol ‘ ¢, ² ’ is the inner product operation. And WT x (a, b) is called the wavelet coefficient of x(t ) with respect to the mother wavelet ψ (t ) . a is the dilation factor, b is the distance of translation. S.Mallat [7] described that sharp variation points are among the most meaningful features for characterizing signals, and the zero-crossings of a wavelet transform provide the locations of the signal sharp variation points at different scales. The completeness and stability of a signal representation based on zero-crossings of a wavelet transform at the scales 2 , for integer j are studied. Because the pen motion in the y -direction is more obvious during the writing, the function y is decomposed by wavelet transform. Then the zero-crossings of the detail at a certain level are j

On-Line Signature Verification Based on WT to Extract Characteristic Points

1005

extracted: ZC iy ( 1 ≤ i ≤ N , N is the number of zero-crossings) and the corresponding points of 3-axis force and 2-dimension coordinate sequences, namely characteristic points, are represented by ( Fxi , Fyi , Fzi , x i , y i ) ( 1 ≤ i ≤ N ), as feature functions of signatures. Characteristic points extracted by wavelet transform are showed in Fig.4. 100

100

90 80

70 60

60 50

40 30

30 20

0 0

100

(a) Genuine signature

100

(b) Forgery of signature

Fig. 4. Characteristic points extracted by WT

2.3 Templates and Thresholds Using above-obtained five feature sequences ( Fxi , Fyi , Fzi , x i , y i ) , the method will create reference templates respectively and then select three reference templates from the ten training signatures depending on clustering. As a result, it follows that there are multireference templates R1 , R 2 , R3 , R 4 and R5 of the five feature sequences. At the enrollment stage, the verification system calculates the average distance between each training signature and the multi-reference templates are calculated and their expectation μ j and standard deviation σ j can be gotten. Then let thresholds TH j be the following formulas: TH j = μ j + wσ j .

(2)

Where the value w is chosen to adjust the threshold to assure the even error rates of verification. As a consequence, the method will obtain five thresholds for each information sequences, TH kj , 1 ≤ k ≤ 5 . 2.4 Decision Fusion and Verification

For each feature sequence Tk of a signature T , firstly, the average distance d (Tk , Rk ) between Tk and multi-reference Rk are calculated and compared to the corresponding threshold. Then the decision u k is defined by: 1 if d (Tk , Rk ) ≤ TH k uk = ® ,1 ≤ k ≤ 5 . ¯0 otherwise

(3)

This paper uses the majority voting strategy to combine the five decisions u k and 5

make final decision u 0 . The total votes are given by α = ¦ u k and compare it to the k =1

threshold of majority voting TH . If α is above TH , u 0 is 1, otherwise 0. Depending

1006

L. Zhang and Z. Wu

on the value of u 0 , the verification system could judge whether the signature is genuine or not.

3 Experimentation and Results The proposed method was tested using the above-mentioned database. In our experiment, adjusting w ∈ (1,5) and TH (3,4,5) is to obtain the performance evaluation in EER (Equal Error Rate). Tests were executed with the mother wavelets Daubechies1 (Haar), Biorthogonal5.5 (Bior5.5), Daubechies10 (db10) and Symlets6 (sym6) at different levels of resolution (2,3,4). Results for this configuration are presented in Table 1 and Table 2. It is obvious that mother wavelet and resolution level will impact the performance of verification. The amount of signature data will reduce with respect to the increase of resolution level. Low-resolution level may include unimportant points, which could disturb the verification and high level will lose some characteristic points. Therefore in general, the level of resolution is 3 in our experiment. We also performed an investigation to produce better results for each writer by substituting of mother wavelet. Table 3 shows the even EER of single information sequence and information fusion and compares to the method without using wavelet transform to extract characteristic points. The result indicated that, after extracting characteristic points, the data to be stored reduces and the operating speed is more rapid. Moreover the authentication efficiency is enhanced and Information fusion also improves the verification performance. In this way, online signature verification can be well applied to the actual authentication system. Table 1. EER of different mother wavelet (level=3)

Mother wavelet Db1 (Haar) Bior5.5 Db10 Sym6

Writer1 7.5% 10% 6.67% 6.67%

Writer 2 10% 5% 5% 5%

Writer 3 36.67% 6.67% 0% 6.67%

Writer 4 13.33% 23.33% 3.333% 10%

Writer 5 10% 0% 3.33% 10%

Table 2. EER of different resolution level (using mother wavelet db10)

Resolution level 2 3 4

Writer 1 15% 6.67% 12%

Writer 2 36.67% 6.67% 15%

Writer 3 3.33% 0% 5%

Writer 4 15% 5% 11%

Writer 5 15% 5% 6.67%

Table 3. Comparison of Even EER between the proposed method and the one without WT

Without WT The proposed method

fx 26.13% 23.59%

fy 26.97% 22.29%

fz 24.45% 17.08%

x 25.74% 20.25%

y 28.48% 23.89%

Even EER 11.98% 2.83%

On-Line Signature Verification Based on WT to Extract Characteristic Points

1007

4 Conclusion and Future Work This paper proposes a method based on wavelet transform. The method extracts the characteristic points of the 3-axis force and 2-dimension coordinate signature information and creates 5-dimension feature sequences. Using these feature sequences it verifies the online signatures and the even EER (Equal Error Rate) is just 2.83%. Experiments show that the method not only reduces the amount of stored data, but also minimizes the duration of the authentication phase, and increases the efficiency of signature verification. The further work is to take the relationship of characteristic points into consideration to make the verification more robust and extract characteristic points of other information sequences such as velocity and acceleration for signature verification.

Acknowledgments This research is supported by the National Natural Science Foundation of China under Grant No.60475005, 60575058 and 10576033. The authors express their thanks for these supports and also would like to thank Dr. Meng Ming, Mrs. Shen Fei, Mr.Wei Ming-xu and Kang Le for their support to this work.

References 1. Plamondon, R., Lorette, G.: Automatic Signature Verification and Writer Identification-The State of the Art. Pattern Recognition, Vol.22, No.2 (1989) 107-131 2. Sato, Y., Kogure, K.: On-line signature Verification Based on Shape, Motion, and Writing Pressure. Proc. 6th Int. Conf. on Pattern Recognition (1982) 823-826 3. Bae, Y.J., Fairhurst, M.C.: Parallelism in Dynamic Time Warping for Automatic Signature Verification. In ICDAR’95, Vol.1, (1995) 426 - 429 4. Hao, Feng., Chan, Choong Wah.: Online Signature Verification Using a New Extreme Points Warping Technique. Pattern Recognition Letters Vol.24, (2003) 2943-2951 5. Ping, Fang., ZhongCheng, Wu,. Ming Meng, YunJian Ge, Yong Yu: A Novel Tablet for On-Line Handwriting Signal Capture. Proceedings of the 5th World Congress on Intelligent Control and Automation, Vol.6, (2004) 3714-3717 6. Graps, Amara.: An Introduction to Wavelets, IEEE Computational Science and Engineering, Vol. 2, No. 2 (1995) 1-18 7. Mallat, S.: Zero-crossings of a Wavelet Transform. IEEE Transactions on Information Theory, Vol.37, No.4 (1991) 1019 – 1033

Parameter Estimation of Multicomponent Polynomial Phase Signals Han-ling Zhang 1, Qing-yun Liu 2, and Zhi-shun Li 2 1

College of Computer and communications , Hunan Universty, ChangSha, 410082, China School of Marine Engineering, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, China [emailprotected]

Abstract. This paper addresses the issue of detection and parameter estimation of multicomponent Polynomial Phase Signals (mc-PPS’s) embedded in noise based on high-order ambiguity function (HAF). We first show how existing PHAF-based techniques (PHAF—product HAF) are inadequate mainly in providing reliable detection for mc-PPS. The main contribution of this paper is that we present a novel parameter estimation method. Firstly, given a set of time delay, it gives rise to a set of estimates of phase coefficients based on HAF. Then it produces a final estimate of phase coefficient by means of voting. The new method improves the probability of detection and estimation accuracy while avoiding the issue of threshold selection. Computer simulations are carried out to illustrate the advantage of the proposed method over existing techniques.

1 Introduction In certain communication and signal processing applications such as synthetic aperture radar imaging and mobile communications, the phase of the observed signal can be modeled as a polynomial function of time. Such signals are commonly known as polynomial phase signals (PPS). The analysis and estimation of PPS has been an area of recent interest[1 5]. The existing HAF-based methods work very well when applied to analyze single component PPS. But when applied to analyze multicomponent polynomial phase signals (mc-PPS), due to the existence of spurious peaks in HAF, these methods may don’t work. In order to suppress the spurious peaks, in [2,3], PHAF-based methods are proposed. In this paper, a novel method providing more reliable detection and higher estimation accuracy than PHAF-based methods is proposed. This paper is organized as follows. In Section 2, limitation inherent in PHAF-based methods is briefly discussed. In Section 3, the novel method is presented. The final remarks are presented in Section 4.

2 Signal Model and Limitations of PHAF-Based Methods We assume an observation model composed of the sum of discrete-time polynomial phase signals embedded in additive white Gaussian noise D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1008 – 1012, 2006. © Springer-Verlag Berlin Heidelberg 2006

Parameter Estimation of Multicomponent Polynomial Phase Signals

x(n) =

¦b

l =1

where L is the number of PPS components, M order for the lth component, and lth component. The amplitudes

{a }

i , l i =1

1009

(1)

is the (highest) polynomial phase

are the polynomial phase coefficients for the

are assumed to be real and positive constants. The

results developed here for the constant amplitude can also be extended to the time-varying amplitude case, provided that the amplitude variation is slow. The notations used here are same to the notations in [2]. If more than one PPS component share the same leading coefficients am , the mth-order high order ambiguity function (HAF), Pm [x , ω , τ ] , at any frequency

the sum of vectors. For a given

and ω = m!τ

small or even to be zero. So for any

m −1

ω , is

a m , Pm [x , ω , τ ] may be very

of a given set of lags

{τ h }hH=1 , the assumption

that Pm [x , ω , τ ] peaks at ω = m !τ m −1 a m is not always true. When this assumption is not true, the product high-order ambiguity function (PHAF) of order m ,

a m , so it is not possible to estimate am . PM m [x , ω ] , will not peak at ω = m!τ Here, we give an example to illustrate the disadvantages of the existing PHAF-based methods. that L = M = 2 , b1 = b2 and a1, 2 = a 2, 2 = K , then the We suppose in m −1

second-order instantaneous moments p 2 [ x ( n ), τ ] = =

¦¦b

2 1

m = 1 l =1 l≠m

¦b

2 1

l =1

ª exp « j 2π ¬

1 ª º½ exp ® j 2π « ( a1,l − a1, m + K τ ) n + a1, lτ + K τ 2 + a 0 , l − a 0 , m » ¾ 2 ¬ ¼¿ ¯

(2)

meets the condition

τ =

2k + 1 2 ( a 1 ,1 − a 1 , 2 )

(3)

where k is any integer, there doesn’t exist sinusoidal signal with frequency Kτ in p 2 [x ( n ), τ ] . So P2 x, ω , τ and PM 2 [x , ω ] will not peak at frequency

[

]

corresponding to K , let alone estimate K .

3 The New Detection and Estimation Method In this paper, we focus on substituting a novel method for PHAF. As to estimation procedure, please refer to [2]. For any

τh

of a given set of lags

{τ h }hH=1 (we assume,

1010

H.-l. Zhang , Q.-y. Liu, and Z.-s. Li

without loss of generality, that

τ 1 < τ 2 < < τ H ≤ N m ), compute {aˆ m, g ,τ

}

g =1

the estimates of phase coefficients corresponding to the locations of the first G (G is given in advance) strongest peaks in Pm x, ω , τ h , then we get a set of estimates of

[

{{aˆ } }

]

phase coefficients

G . m , g ,τ h g =1 h =1

Intuitively speaking, if the given mc-PPS consists

of at least one mth-order PPS whose phase coefficient of order corresponding to

m is a m , the peaks

a m should emerge in most of Pm [x, ω , τ h ] h=1 . In other words, the H

} time and again. If no mth-order PPS } } would be disorderly and exists in the given mc-PPS, the elements of {{aˆ estimate of

{{

a m should appear in aˆ m, g ,τ h

}

g =1 h =1

H G m , g ,τ h g =1 h =1

unsystematic. This is the basis of our proposed method. It is well known that the larger the τ h , the higher the estimation accuracy of phase

{

coefficient. So we take all elements of aˆ m , g ,τ H

}

as the finial estimates of different

g =1

phase coefficients. At the same time, we think that all elements of

[

{{aˆ } }

H −1 G m , g ,τ h g =1 h =1

which are within the immediate neighborhood aˆ m , g ,τ H − δ , aˆ m , g ,τ H + δ determined by physical frequency resolution of DFT and

τ H ) of aˆ m, g ,τ

] (δ

should be

considered as different estimates of the same phase coefficient corresponding to aˆ m, g ,τ H . As to element of aˆ m , g ,τ h G which is not an estimate of a certain phase

{

coefficient as the one of aˆ m , g ,τ k

}

G g =1

g =1

, where k > h , we take it as the finial estimate of

another phase coefficient, and all elements of

[

{{aˆ } }

d G m , g ,τ h g =1 h =1

( d < h ) which is

]

within the immediate neighborhood aˆ m , g ,τ h − δ , aˆ m , g ,τ h + δ of aˆ m , g ,τ h as different estimates of the same phase coefficient. We denote q , R as the number of elements of

{{aˆ } }

G H m, g ,τ h g =1 h=1

which belongs to different estimates of the same phase coefficient

and the product of peak intensities of different estimates, respectively. If q ≥ γ ( γ is given in advance), and at the same time, there is at least one time that intensity of an estimate satisfies detection criterion proposed in [2], we think that an mth-order PPS is present in x ( n ) . We take the one whose product of q and R largest as the estimate of the parameter of this component. To demonstrate the validity of our proposed method, we consider the estimation of second –order phase coefficients of a six-component PPS, which consists of one harmonic, five chirps. The amplitudes and second-order phase parameters of these signal components are given in Table 1. Data length used was N = 5 1 2 . Additive

Parameter Estimation of Multicomponent Polynomial Phase Signals

noise

was

white

Gaussian

noise

{6 4 , 9 6 ,1 2 8 ,1 6 0 ,1 9 2 , 2 2 4 , 2 5 6 } and

with

variance

1.0.

The

lag

1011

sets

G = 12 . The sum of second-order HAF

of the chirps whose chirp rate are 20 is zero when

equals to N 4 .

Table 1. Amplitude and second-orderphase parameters of a six component LFM signal

l b1 a2,l

1 1.0 20

2 0.7 20

3 0.7 20

4 1.0 35

5 1.0 35

6 1.0 0

Fig.1 shows the second-order phase parameter estimates obtained from 300 independent Monte-Carle simulations. The estimates based on the novel method ( q > 4 ) were shown in Figs. 1(a). The estimates based on PHAF were shown in Figs.1(b), (c). To show the difference of detection performance of the novel method and PHAF-based methods used in Figs. 1(a), (b), (c) respectively, table 2 shows the number of missed detection and that of false alarm of these method when noise variance is 1.414. If the algorithm fails to detect any one of the PPS components, the event is considered as missed detection. Similarly, if the algorithm estimates a non-existent PPS component, the event is considered as false alarm. Table 1 shows that Amplitude and second-order phase parameters of a six component LFM signal.

(a)

(b)

(c)

Fig. 1. The second-order phase parameter estimates obtained from 300 independent Monte-Carle simulations. (a) The estimates based on the novel method ( q > 4 ). (b) The estimates based on PHAF using the sets of lags ( τ 1 = N 8 and τ 2 = N 4 ). (c) The estimates based on PHAF

using the sets of lags ( τ 1

= N 4 and τ 2 = N 2 ).

From Table 2, we observe that the new method improves the probability of detection. We always take the phase coefficient estimate corresponding to larger τ as the final estimate, so we can see from Fig.1 roughly, that the new method also improves estimation accuracy.

1012

H.-l. Zhang , Q.-y. Liu, and Z.-s. Li

Table 2. The number of missed detection and that of false alarm of the method used in Fig. 1 when noise variance is 1.414

l Missed Detection False Alarm

Fig.1(a) 0 0

Fig.1(b) 32 27

Fig.1(c) 62 62

As to what is the minimum signal-to-noise ratio (SNR) for which our proposed method still works, the answer also depends on many factors, including the number of PPS components, their relative strengths, the highest PPS order in each component, and data length. If we take the example used in section 5 in [2] as an example, and use the same evaluation measure, our proposed method can work at -6dB, far below the minimum SNR 9.5dB in [2].

4 Conclusion In this paper, we investigated the detection and estimation of a sum of PPS components embedded in noise. The existing PHAF-based techniques do not perform well in providing reliable detection for mc-PPS. We present a novel detection and estimation method. Given a set of time delay, it gives rise to a set of estimates of phase coefficients based on HAF. Then it produces a final estimate of phase coefficient by means of voting. The new method improves the probability of detection and estimation accuracy while avoiding the issue of threshold selection. Computer simulations are carried out to illustrate the advantage of the proposed method over existing techniques.

References 1. Peter O’Shea: A New Technique for Instantaneous Frequency Rate Estimation[J]. IEEE Signal Processing Letters.9(8) (2002) 251-252 2. Ikram, M. Z., Tong Zhou, G.: Estimation of multicomponent polynomial phase signals of mixed orders[J]. Signal Processing. 81(2001)2293-2308 3. Barbarossa et al, S.: Product High-order Ambiguity Function for Multicomponent Polynomia-phase Signal Modeling [J]. IEEE Trans. Signal Processing. 46(3) (1998) 691-708 4. Wang, Y., Tong Zhou, G.: On the Use of High-order Ambiguity Function for Multicomponent Polynomial Phase Signals[J]. Signal Processing. 65. (1998)283-296. 5. S. Peleg, B. Friedlander. Multicomponent signal analysis using the polynomial phase transform[J]. IEEE trans. On Aerospace and Electronic systems.32(1). (1996) 378-386 6. M. Z. Ikram, et al. Estimating the parameters of Chirp signals: an iterative approach[J]. IEEE Trans. Signal Processing.46(12). (1998) 436-3440

Parameters Estimation of Multi-sine Signals Based on Genetic Algorithms∗ Changzhe Song, Guixi Liu, and Di Zhao Department of Automation, P.O. Box 185, Xidian University, Xi’an, 710071, China [emailprotected], [emailprotected]

Abstract. An improved Genetic Algorithms (GA) for parameters estimation of multi-sine signals (PEMS) is proposed. The strategies of self-adaptive elite criterion, two-points crossover and cataclysmic mutation are employed in this algorithm to improve the performance of GA. For simplifying the computation, a complicated operating process is converted into several simple processes. A model of PEMS is also built which is conveniently applied to GA. Simulation results show that the proposed method is effective and superior to the least -mean squares (LMS) method.

1 Introduction Parameters estimation of multi-sine signals is a classical problem in signal processing and is becoming increasingly important in radar, sonar, biomedical and geophysical application. There are two main methods to solve this problem. One is the FFT-based methods [1]. They are simple and convenient. But the disadvantages of FFT, such as fence effect, leakage etc., affect the estimation accuracy. Another is the optimization algorithms based on time-domain searching like the LMS [2] algorithm etc.. They do have certain weaknesses, such as the local convergence and low accuracy etc.. An improved GA is proposed in the paper to overcome these barriers. The strategies of Self-adaptive elite criterion [3], two-points crossover [4] and cataclysmic mutation [5] are introduced in this algorithm to modify the performance of GA. For reducing the complexity of GA, we estimate the sinusoids one by one. Simulation results illustrate the effectiveness of the proposed algorithm. And comparison is done to demonstrate the better performance of GA over LMS.

2 Problem Description and Model Building In this paper, consider the case where the sinusoidal signals are corrupted by additive Gaussian noise, i.e.

y (t ) =

Ai sin(2π ¦ i

f i t + Φ i ) + w (t )

(1)

∗

This research is supported by the Preliminary Research Foundation of National Defence Science and Technology (51416060205DZ0147).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1013 – 1017, 2006. © Springer-Verlag Berlin Heidelberg 2006

1014

C. Song, G. Liu, and D. Zhao

Where Ai , f i , Φ i are the parameters to be estimated, and represent the amplitude, frequency and initial phase respectively. Parameter K is the number of sine signals. 2 The element w ( t ) is Gaussian noise with mean 0 and variance σ . ~ ~ ~ T When K = 1 (for single sinusoid signal), let vector θ = [ A , f , Φ ] . The criterion function is defined as:

σ (θ ) = abs[ y (t ) − A sin( 2πf t + Φ )]

(2)

It is clear that when σ (θ ) reaches the minimum, vector θ is the estimate values of the parameters. When K > 1 (for multi-sine signals), we write equation (2) as follows.

σ i (θ i ) = abs[ y (t ) − Ai sin(2πf i t + Φ i )] i = 1, ⋅⋅⋅, K

(3)

According to the same rules as above, equation (3) is used as the criterion function of ~ ~ ~ T multi-sine signals. When σ i (θ i ) achieves the minimum, vector θi = [ Ai , f i , Φi ] is the estimate values of the parameters in sinusoid i . Then the sinusoid i is deleted from observation datum. According to the result the parameters of another sinusoid are estimated. Running this operation circularly, all parameters will be estimated. It is well known that for more sinusoids and/or longer datum, the computation of GA is very complicated [6]. But this course of estimate sinusoids one by one can reduce the computational complexity greatly.

3 Parameters Estimation Method Based on Genetic Algorithms 3.1 Modified Genetic Algorithms GA is an interactive process of selection, crossover and mutation. It is a kind of self-adaptive global searching optimization algorithm different from conventional optimization algorithms [4]. To improve the performance of GA, some strategies are utilized to modify the GA. 1) Self-adaptive elite criterion In selecting operation, self-adaptive elite criterion [3] is adopted besides Roulette Wheel Selection [4]. According to the bias between the best fitness and the average fitness, the method self-adjusts the number of elitists, which are reproduced to offspring directly in every generation. It makes sure GA converge to the global optimum. 2) Two-points crossover One-point crossover has a major drawback that certain combination of schema cannot be combined in some situations. The two-point crossover can be introduced to overcome this problem [4]. 3) Cataclysmic mutation In addition to traditional mutation operator, cataclysmic mutation is used in the algorithm. When individuals of a population are over convergence, the mutation process

Parameters Estimation of Multi-sine Signals Based on Genetic Algorithms

1015

is operated with a probability much larger than the usual. It is able to retain diversity of the population and prevent premature phenomena effectively [5]. 3.2 Proposed Genetic Algorithms Implementation According to the model described above, the sinusoid is estimated one by one. Equation (3) is chosen as the objective function. Our purpose is to find minimum of the objective function. So the fitness function can be constructed as follows [4].

Fi (θi ) = Cmax − σ i (θi )

(4)

Where C max is a constant big enough, satisfies Cmax ≥ max[σ i (θi )] to keep Fi (θi ) ≥ 0 . The complete process of the GA propounded in the paper is summarized in this section. Steps of the proposed algorithm are the following.

Step 1: Initialization. Choose population size N , crossover rate p c , mutation rate pm and the maximum generation G . Encode the parameters by binary string. Step 2: Set t = 0 . Generate the initial population p(t ) randomly. Step 3: Calculate the fitness of the individuals of the population, and rank the individuals on the basis of their fitness values. Step 4: Selecting operation. Choose N × Pk individuals from parent population and reproduce them directly to the offspring. Parameter Pk is proportional to the average fitness of current population and inversely proportional to the maximum fitness of the population. Other individuals are obtained by Roulette Wheel Selection. Step 5: Two-point crossover operation. Step 6: Mutation operation. Combine traditional mutation operator and cataclysmic mutation operator. Step 7: If t ≤ G , let t =t +1 and go to Step 3. If t > G , export the results. Step 8: Subtract the sine component estimated from sampling datum. If there is still sine component left, go to Step2; otherwise, finish program.

4 Simulations In this section, a computer simulation is afforded to show the performance of the proposed GA. An example of two sine signals corrupted by Gaussian noise is considered in this simulation. The signal is expressed by equation (5). 2

y (t ) = ¦ Ai sin(2π f i t + Φ i ) + w(t )

(5)

i =1

Where A1 = 2 , f1 = 2.5Hz Φ1 = 1.5rad A2 = 1 f2 = 1.5Hz Φ 2 = 2.5rad . Component w(t ) is Gaussian noise with mean 0 and variance 1. Sample frequency is

1016

C. Song, G. Liu, and D. Zhao

f = 20 Hz . The binary coding length of each parameter is 15. The population size is N = 60 . The crossover rate is Pc = 0.82 , and the mutation rate is pm = 0.045. The maximum generation is G = 400 . Data length is L = 128 . Fig. 1 shows the signal y (t ) at SNR = −5dB . Obviously, a strong influence on the signal is caused by the noise. Fig. 2 is the curves of the best fitness and average fitness. The two curves all converge at certain values. They almost match together eventually. So we can obtain the consequence that the proposed GA is a stable and convergent method.

Fig. 1. Waveform of the signal y(t)

Fig. 2. Curves of the best fitness and average fitness

Fig. 3. Evolution curves of parameters of component 1

Fig. 4. Evolution curves of parameters of component 2

The evolution curve of each parameter is shown in Fig.3 and Fig.4. It is clear that every parameter is able to converge at its truth-value nearly at generation 100. The method converges quickly. A qualitative cognition of the estimate accuracy is obtained according to these figures. Each parameter is estimated for 20 times. The average is used as the value of the parameter. In Table 1, we present the estimate results of GA and LMS at SNR=-5dB, 10dB, 30dB, 100dB. A quantitative cognition of the estimate accuracy is obtained. The

Parameters Estimation of Multi-sine Signals Based on Genetic Algorithms

1017

proposed method has a high degree of accuracy. And the accuracy of the proposed GA is higher than the LMS entirely. This sufficiently testifies the performance of the global searching of the proposed GA. Table 1. Estimate values of parameters -5 dB GA A1=2

LMS

10 dB GA

LMS

30 dB GA

LMS

100 dB GA

LMS

2.0758 2.6140

1.9732 2.1710

1.9928 1.9897

1.9999 2.0003

f1=2.5Hz 2.5073 2.3796

2.5042 2.4375

2.5034 2.5072

2.5005 2.5009

ĭ1=1.5rad 1.4101 1.0512

1.4354 1.7230

1.4554 1.5549

1.4966 1.4886

1.0861 1.1041

1.0324 1.0167

1.0049 0.9880

1.0014 1.0027

f2=1.5Hz 1.5103 1.5901

1.5074 1.4588

1.5013 1.5073

1.5001 1.4993

ĭ2=2.5rad 2.3125 2.6507

2.3441 2.6914

2.4762 2.5461

2.4809 2.5159

A2=1

5 Conclusions This paper introduces a modified GA to estimate the parameters of sinusoids. GA is a fast and effective global searching optimization algorithm. It is able to overcome the weaknesses of conventional optimization algorithms, such as the local convergence and low accuracy etc.. But GA is too complicated to be applied to real-time operation. In the paper, we estimate sinusoids one by one to reduce its complexity. Simulation results show the excellent performance of the improved GA and demonstrate that the proposed algorithm is effective to estimate the parameters of sine signals.

References 1. Zhang, X.D.: Modern Signal Processing [M]. Beijing: Tsinghua University Press (1995) 2. Mayyas, K.: Performance Analysis of the Deficient Length LMS Adaptive Algorithm IEEE Transactions on Signal Processing. 53(8) (2005) 2727-2734 3. Yu, W., NIE, Y.F.: Genetic Algorithm Approach to Blind Source Separation. J. of Wuhan Uni. of Sci. & Tech (Natural Science Edition), 26(3) (2003) 297-300 4. Xuan, G.N., Cheng, R.W.: Genetic Algorithm and Engineering Optimization. Beijing: Tsinghua University Press (2004) 5. Liu, J.: Application in Parameter Estimation of Nonlinear System Based upon Improved Genetic Algorithms of Cataclysmic Mutation. Journal of Chongqing Normal University (Natural Science Edition). 21(4) (2004) 13-16 6. TANG, K.S., MAN, K.F., WONG, S.K., HE, Q.: Genetic Algorithms and Their Applications. IEEE Signal Processing Magazine (1996) 22-37

Fast Vision-Based Camera Tracking for Augmented Environments Bum-Jong Lee and Jong-Seung Park Dept. of Computer Science & Engineering, University of Incheon, 177 Dohwa-dong, Nam-gu, Incheon, 402-749, Republic of Korea {leeyanga, jong}@incheon.ac.kr

Abstract. This article describes a fast and stable camera tracking method aimed for real-time augmented reality applications. From the feature tracking of a known marker on a single frame, we estimate the camera position and translation parameters. The entire pose estimation process is linear and initial estimates are not required. As an experimental setup, we implemented a video augmentation system to replace detected markers with virtual 3D graphical objects. Experimental results showed that the proposed camera tracking method is robust and fast applicable to interactive augmented reality applications.

Introduction

Augmented reality applications involve seamless insertion of computer-generated 3D graphical objects into a live-action video stream of unmodeled real scenes. The primary requirement for the practical augmented reality system is a method of accurate and reliable camera tracking. Most augmented reality applications require the camera pose in online mode to project computer generated 3D models into the real world view in real-time. Hence, utilization of a ﬁducial marker is a natural choice for the fast feature tracking as well as for the computation of the initial camera pose. The projective camera model is frequently used in computer vision algorithms. The projective model has eleven unknowns and it is unnecessarily complex for many applications. Since the camera tracking for an augmented reality application must be fast enough to handle real-time interactions, appropriate restrictions on the camera model should be introduced as long as the approximation is not far from the optimal solution. This paper describes a stable real-time marker-based camera tracking method for augmented reality systems working in unknown environments. In the next sections, we propose a fast linear camera matchmoving algorithm which does not require the initial estimates.

Camera Matchmoving

In the perspective model, the relations between image coordinates (u and v) and model coordinates (x, y and z) are expressed by non-linear equations. By D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1018–1023, 2006. c Springer-Verlag Berlin Heidelberg 2006

Fast Vision-Based Camera Tracking for Augmented Environments

1019

imposing some restrictions on the projection matrix P, linearized approximations of the perspective model are possible. A well-known linearized approximation of the perspective camera model is the weak-perspective model[1]. The weak-perspective model can be used instead of the perspective model when the dimensions of the object are relatively small compared to the distance between the object and the camera. In the weak-perspective model, all object points lie on a plane parallel to the ¯ = [¯ image plane and passing through the centroid x x y¯ z¯ 1]T of the object points. Hence, all object points have the same depth z¯ = (¯ x −t)·rz where R = [rx ry rz ]T T and t = [tx ty tz ] represent the relative rotation and translation between the object and the camera. The projection equations for an object point x to an z )(rx · (x − t)) and v = image point m = [u v 1]T are expressed by u = (1/¯ ¯ = 0, then z¯ = −t · rz and it leads the projection (1/¯ z)(ry · (x − t)) . Assume x equations: z , v = ˜ry · x + y¯/¯ z (1) u = ˜rx · x + x¯/¯ z , ˜ry = ry /¯ z, x ¯ = −t · rx , and y¯ = −t · ry . where ˜rx = rx /¯ The equation (1) corresponds to an orthogonal projection of each model point into a plane passing the origin of the object space and parallel to the image plane, followed by a uniform scaling by the factor (1/¯ z). The equation (1) can be solved by the orthographic factorization method[2] with constraints |˜rx | = |˜ry | = 1 and ˜rx · ˜ry = 0. Once ˜rx and ˜ry are obtained, the motion parameters rx and ry can be computed by normalizing ˜rx and ˜ry , i.e., rx = ˜rx /|˜rx | and ry = ˜ry /|˜ry |, and the translation along the optical axis is computed by z¯ = 1/|˜rx |. The weak-perspective model approximates the perspective projection by assuming that all the object points are roughly at the same distance from the camera. The situation becomes true if the distance between the object and the camera is much greater than the size of the object. We assume that the depths of all points of the object are roughly at the same depth. All depths of the object points can be set to the depth of a speciﬁc object point, called a reference point. Let x0 be such a reference point in the object and all other points have roughly same depth denoted by z0 . We set the reference point x0 as the origin of the object space and all other coordinates of the object points are deﬁned relative to x0 . Consider an object point xi and its image mi = [ui vi 1]T which is the scaled orthographic projection of xi . From the equation (1), the relation can be written: ˜rx · xi = ui − u0 , ˜ry · xi = vi − v0 (2) where ˜rx = rx /z0 , ˜ry = ry /z0 , u0 = −t · rx /z0 , v0 = −t · ry /z0 , and m0 = [u0 v0 1]T is the image of the reference point x0 . Since the object points are already known and their image coordinates mi (0 ≤ i < N ) are available, the equations (2) are linear with respect to the unknowns ˜rx and ˜ry . For the N − 1 object points (x1 , . . . xN −1 ) and their image coordinates (m1 , . . . mN −1 ), we construct a linear system using equation (2) by introduc˜2 · · · x ˜ N −1 ]T , the (N − 1)-vector ing the (N − 1) × 3 argument matrix A = [˜ x1 x T T u = [˜ u1 u ˜2 · · · u ˜N −1 ] , and the (N − 1)-vector v = [˜ v1 v˜2 · · · v˜N −1 ] where ˜ i = xi − x0 , u x ˜i = ui − u0 and v˜i = vi − v0 . All the coordinates are given by

1020

B.-J. Lee and J.-S. Park

column vectors in the non-hom*ogeneous form. The unknowns ˜rx and ˜ry can be obtained by solving the two linear least squares problems: A˜rx = u and A˜ry = v.

(3)

The solution is easily obtained using the singular value decomposition (SVD). The parameters rx and ry are computed by normalizing ˜rx and ˜ry . Once the unknowns rx and ry have been computed, more exact values can be obtained by an iterative algorithm. Dementhon[3] showed that the relation of the perspective image coordinates (ui and vi ) and the scaled orthographic image coordinates (ui and vi ) can be expressed by ui = ui + αi ui and vi = vi + αi vi in which αi is deﬁned as αi = rz · xi /z0 where rz = rx × ry . Hence, in the equations (2), we replace ui and vi by ui and vi and obtain: ˜rx · xi = (1 + αi )ui − u0 , ˜ry · xi = (1 + αi )vi − v0 .

(4)

Once we have obtained initial estimates of ˜rx and ˜ry , we can compute αi for each xi . Hence, the equations (4) are linear with the unknowns, ˜rx and ˜ry . The term αi is the z-coordinate of xi in the object space, divided by the distance of the reference point from the camera. Since the ratio of object size to z0 is small, αi is also small, which means only several iterations may be enough for the approximation.

Marker-Based Pose Estimation

Assume the N object points x0 , x1 , . . . xN −1 are observed in a frame and their image coordinates are given by m0 , m1 , . . . mN −1 in a single frame. All the points are given by column vectors in the non-hom*ogeneous form. We automatically choose the most preferable reference point which minimizes the depth variation. The focal lengths in two image directions (fx and fy ) and the coordinates of the principal point (px and py ) are also used for the accurate pose estimation. The overall steps of the algorithm are as follows: Step 1 (Selecting xk ): Choose a reference point xk satisfying arg mink i (zi − zk )2 where zi is the z-coordinate of xi . Step 2 (Normalizing coordinates): Translate all the input object points by −xk so that the reference point xk becomes the origin of the object space. Also, translate all the input image points by [−px − py ]T so that the location of principal point becomes the origin of the image space. Step 3 (Establishing A, u and v): Using object points xi and their corresponding image points mi (i = 1, . . . , N −1), build the (N −1)×3 argument matrix A, the (N − 1)-vector u, and the (N − 1)-vector v shown in equation (3). Set ut and vt by ut = u and vt = v. Step 4 (Computing ˜rx and ˜ry ): Solve the two linear least squares problems, A˜rx = ut and A˜ry = vt , for the unknowns ˜rx and ˜ry . The solution is easily obtained using the singular value decomposition (SVD).

Fast Vision-Based Camera Tracking for Augmented Environments

1021

Step 5 (Computing zk , rx , and ry ): Compute zk by zk = 2fx fy /(fy |˜rx |+fx |˜ry |) where fx and fy are the camera focal lengths by the x- and y- axes. Compute rx and ry by rx = ˜rx /|˜rx | and ry = ˜ry /|˜ry |. Step 6 (Computing αi ): Compute αi by αi = rz · xi /zk where rz = rx × ry . If αi is nearly same to the previous one, stop the iteration. Step 7 (Updating ut and vt ): Update ut and vt by ut = (1 + αi )u and vt = (1 + αi )v. Goto Step 4. The rotation matrix R is the arrangement of the three orthonormal vectors: R = [rx ry rz ]T . The translation vector t is the vector from the origin of the camera space to the reference point. Hence, once we found ˜rx and ˜ry , the depth of the reference point zk is computed by zk = 2fx fy /(fy |˜rx | + fx|˜ry |). Then, the translation t is obtained by t = [zk uk /fx zk vk /fy 2/(|˜rx | + |˜ry |)]T .

Experimental Results

To demonstrate the eﬀectiveness of the proposed method we implemented the camera pose tracking system that relies on known marker tracking from a real video stream. For each frame, the camera pose for the current frame is calculated using the tracked feature points on a marker from a single frame. The implemented system recognizes two types of markers (Cube and TagMarker as shown in Fig. 1). The continuous marker tracking and re-initialization are robust and also are not sensitive to illumination changes. From the marker features, we estimate the camera pose. Then, we project 3D graphical objects onto the frame and render the projected virtual object together with the original input frame. Fig. 1 shows the AR application which inserts a virtual object into a live video stream. The upper ﬁgure shows the insertion of a virtual ﬂowerpot at the cube marker position and the lower ﬁgure shows the insertion of a building with a helicopter attached on it into the AR marker position. We compared the estimation accuracy of the proposed method with the linear scaled orthographic method (SOP)[4] and the iterative scaled orthographic method (POSIT)[3]. The projection error of the proposed method is under 2 pixels in most cases and it is less than that of SOP and POSIT (See Table 1). The comparison of accuracy and stability is shown in Fig. 2. In the left ﬁgure, the error is measured as the average reprojection error when the depth variance of scene points relative to the reference point increases. We also measured the error according to the relative distance of the marker from the camera divided by the marker radius, which is shown in the right ﬁgure. Table 1. Accuracy and computing time with respect to marker types marker type #frames #feature points avg accuracy(pixel) time(ms) CubeSparse 1000 5 1.64 1.73 CubeDense 700 72 0.327 2.113 TagMarker 8 6 0.948 1.756

1022

B.-J. Lee and J.-S. Park

Fig. 1. Augmented reality applications to insert virtual objects into a video stream 14

3.5

Proposed POSIT SOP

10 error(pixel)

error(pixel)

2.5

1.5

0.5

Proposed POSIT SOP

0.2

0.4

0.6

0.8 1 1.2 point distribution(m)

1.4

1.6

8 10 distance(m)/radius(m)

Fig. 2. Comparison of camera tracking accuracy of three diﬀerent methods

Fast Vision-Based Camera Tracking for Augmented Environments

1023

The processing speed of the proposed camera tracking method is about 27 frames per second on a Pentium 4 (2.6GHz) computer including all steps of the system such as frame acquisition, marker detection, feature extraction, pose estimation, and 3D rendering. The camera pose estimation process and the registration process roughly take 4 ms and 8 ms, respectively, and the speed is suﬃciently fast for real-time augmented reality applications. The time is proportional to the number of feature points and the accuracy is inversely proportional to the number of feature points. Overall numerical values indicate that the type of markers does not aﬀect the pose accuracy critically.

Conclusion

This article has presented a real-time camera pose estimation method assuming a known marker is visible to the video camera. From the marker tracking from a single frame, the camera position and translation parameters are estimated using a linear approximation. The pose estimation process is fast enough to real-time applications since the entire pose estimation process is linear and initial estimates are not required. Compared with previous fast camera pose estimation methods, the camera pose accuracy is greatly improved without paying extra computing time. As an application of the proposed method, we implemented an augmented reality application which inserts computer-generated 3D graphical objects into a live-action video stream of unmodeled real scenes. Using the recovered camera pose parameters, the marker in the image frames is replaced by a virtual 3D graphical object during the marker tracking from a video stream. Experimental results showed that the proposed camera tracking method is robust and fast enough to interactive video-based applications.

Acknowledgement This work was supported in part by the Ministry of Commerce, Industry and Energy (MOCIE) through the Incheon IT Promotion Agency (IITPA) and in part by the Brain Korea 21 Project in 2006.

References 1. Carceroni, R., Brown, C.: Numerical Methods for Model–Based Pose Recovery. (1997) 2. Tomasi, C., Kanade, T.: Shape and Motion from Image Streams Under Orthography: A Factorization Approach. IJCV, 9 (1992) 137–154 3. Dementhon, D., Davis, L.: Model-based Object Pose in 25 Lines of Code. IJCV, 15 (1995) 123–141 4. Poelman, C., Kanade, T.: A Paraperspective Factorization Method for Shape and Motion Recovery. IEEE T-PAMI, 19 (1997) 206–218

Recognition of 3D Objects from a Sequence of Images Daesik Jang Department of Computer Information Science Kunsan National University, Gunsan, South Korea [emailprotected]

Abstract. The recognition of relatively big and rarely movable objects such as refrigerators and air conditioners, etc. is necessary because these objects can be crucial global features for Simultaneous Localization and Map building(SLAM) in indoor environment. In this paper, we propose a novel method to recognize these big objects using a sequence of 3D scenes. The particles which represent an object to be recognized are scattered into the 3D scene captured from an environment and then the probability of each particle is calculated by matching the 3D lines of the object model with them of the environment. Based on the probabilities and the degree of convergence of the particles, the object in the environment can be recognized and the position of the object can also be estimated. The experimental results show the feasibility of the suggested method based on particle filtering and its application to SLAM problems.

1 Introduction Object recognition has been one of the most challenging issues in computer vision and intensively investigated for several decades. In particular, the object recognition has played an important role for manipulation and SLAM in robotics fields. Many researchers suggested various 3D object recognition approaches. Among them, the model-based approach mentioned in this paper is the most general one for recognizing shapes and objects. It recognizes objects by matching features extracted from an object on the scene with the stored features of the object in advance[1]. Some famous model-based recognition studies are as follows. The method suggested by Fischler and Bolles [2] uses RANSAC for recognizing objects. It projects points of all models on the scene and decides if the projected points are similar to those of the captured scenes and recognizes the object based on the similarity. This method is not so efficient because the procedure including assumption and verification is repeated many times to get an accurate result. In addition, Johnson and Herbert [4] proposed a spin-image based recognition algorithm in cluttered 3D scenes and Andrea Frome et al. [3] compared the performance of 3D shape contexts with that of spin-image. Jean Ponce et al. [5] introduced a 3D object recognition approach using affine invariant patches. However, these methods are working well only when accurate 3D data or fully textured environments are provided, while our approach makes it possible to recognize objects when there are lots of noises and uncertainties in the captured scenes stemming from low-quality sensors. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1024 – 1029, 2006. © Springer-Verlag Berlin Heidelberg 2006

Recognition of 3D Objects from a Sequence of Images

1025

In this paper we propose a new approach to recognize big and rarely movable objects in a sequence of images by applying a probabilistic model in noisy and textureless environments.

2 Extraction of 3D Line Features We use 3D lines as key features for object recognition because 3D data can be obtained robustly on the boundaries of objects even in texture-less environments as shown in Fig. 1(a).

(a) 3D data from stereo camera

(b) Noisy 3D data

Fig. 1. 3D data

(a) Experimental environment

(b) Edges extracted

(d) 3D lines

Fig. 2. Results of the line feature extraction in 2D and 3D

Due to the poor accuracy of 3D data as illustrated in Fig. 1(b), all lines are firstly extracted from 2D images and the 2D lines can be transformed to 3D lines by mapping corresponding 3D points (2D images and their corresponding 3D points are captured at the same time from a stereo camera). We made a simple algorithm to find 2D lines based on the edge following approach so that we could find most of lines in a scene efficiently. First of all, the edges are drawn by canny edge detection algorithm. Then, we categorize the edges as hori-

1026

D. Jang

zontal, vertical and diagonal line segments based on the connectedness of edges. 2D lines are found by connecting each line segments with adjoining line segments considering the aliasing problem. 3D lines can be obtained if there are corresponding 3D points at the pixels of 2D lines. The 2D lines are transformed into 3D lines by assigning the 3D positions of the corresponding 3D points. Fig. 2 shows the results of line extraction in 2D and 3D.

3 Object Recognition Based on Particle Filter 3.1 Initial Particle Generation An object to be recognized is modeled with 3D lines and we consider this line set as a particle. Many particles which represent this object will be spread into possible positions of objects in a 3D scene based on the initial particle generation. Fig. 3 (a) illustrates the particle of an object. At every scene, the particles are initially generated to find other possible positions of an object, which could not be extracted in previous scenes.

(a) The model o f a refrigerator

(b) Vertical line

(d) Relationship of lines

Fig. 3. A particle of one object and the initial particle generation using lines (dot : particles generated, solid: 3D line in the scene)

Fig. 3 shows how particles are generated by using directional features of lines. In case of (b), many particles can be generated by rotating vertical lines based on core vertical line. All particles located apart from floor are eliminated because refrigerator does not stand up with certain gap from floor. 3.2 Determination and Updating of the Probabilities of Particles The probability is obtained after getting positive and negative similarity of each particle by using 3D line in space. The positive similarity shows how much the lines composed of particles are matched with lines in space after projecting them into space. It is decided by following two elements. The first element is S1 and it is the degree that lines composed of particles are matched with lines in space. S1 is determined as follows. It is tested if there is a 3D line around each line of a particle or not. And the similarities in length, orientation, and distance of the corresponding lines are verified. The second element is S2. It

Recognition of 3D Objects from a Sequence of Images

1027

shows how many lines of a particle are matched with the lines in space. These two elements of similarity S1 and S2 are integrated by weighted sum. On the other hand, the negative similarity shows how lines in space are matched with those of particles. For example, if an air conditioner having the same shape and dimension with refrigerator is existing in space, the positive similarity of the air conditioner is identical to that of a refrigerator. But in this case, the number of lines included in the air conditioner is greater than that of lines comprised in a refrigerator due to the geometric shape difference. All particles at time t-1 should be propagated to scene at time t to update the probability of particles. Probabilities of particles newly founded at time t and existing particles are updated based on (1).

Pt (n) =

Np Np

¦¦ d P n=0 m=0

t −1

(m) Pt (n)

(1)

where d is the difference of the pose of Pt −1 (m) and Pt (n) .

(a) 2D image of environment

(b) The particles at first scene

(d) The particles at third scene

Fig. 4. The distribution of particles in a sequence of images (The particles are represented by green boxes)

After that, particles are sampled again according to the probabilities of them. Particles with high probability can generate more particles and particles, while those with low probability are disappeared. Fig. 4 shows how particles are updated in continuous scenes.

4 Experimental Results This paper aims to make a robot to know the location of large objects such as refrigerators when the robot is navigating. We assume that all objects to be applied for this approach should have major straight line features enough for recognition. In order to

1028

D. Jang

get 2D images and 3D data, a stereo camera named as Bumblebee is used. The stereo camera is mounted on the end effecter of an arm with an eye-on-hand configuration. Fig. 5 shows the stereo camera and the robot used for experiment and an eye-on-hand configuration.

(a) Stereo camera

(b) Robot

Fig. 5. Equipments for the experiment

(a) 0°

(b) 30°

Fig. 6. The recognition results from different viewpoints

(a) Occluded by mannequin

(b) Occlusion with another view a ngle (30°)

Fig. 7. The recognition results with static occlusions

Fig. 6 shows experimental results with different viewpoints. Three sequences of images were used and the robot is looking at and approaching the object from different directions of 0, 30 and 60 degree respectively. Although the probability of each particle is getting lower if the angle between the robot and the object is increasing from 0 to 60 degree, these results are acceptable since the probabilities of particles are

Recognition of 3D Objects from a Sequence of Images

1029

over a predefined threshold of 0.7. The blue box in the scene means the estimated model position after recognition. Fig. 7 shows the result in the presence of occlusions. A mannequin was put in front of the object to make occlusions. Even though the mannequin stands in front of the refrigerator, the position of the object was recognized successfully with 3 consecutive scenes.

5 Conclusion In this paper, the method to recognize objects and estimate their poses by using sequential scenes in noisy environments is suggested. Under the assumption that we already know the object to be recognized, particles of the object are scattered into the 3D scene and the probability of each particle is determined by matching 3D lines. And the probabilities of particles are updated in the same way after reading the next scene and then the object is measured and its pose is estimated. This method can be applied to recognize large objects such as refrigerators, air conditioners and bookcases that have many line features. It is proved by experiment that this method is robust to orientation changes and occlusions. Moreover, it can be used to perform SLAM more reliably by providing the positions and poses of the recognized objects as landmarks.

Acknowledgement This work was supported by IITA through IT Leading R&D Support Project.

References 1. Farias, M.F.S., Carvalho, J.M.: Multi-view Technique For 3D Polyhedral Object Rocognition Using Surface Representation. Revista Controle & Automacao. (1999) 107-117 2. Fischler, M. A., Bolles, R. C.: Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm. Assoc. Comp. Mach. Vol. 24 No. 6 (1981) 381-395 3. Frome, A., Huber, D., Kolluri, R., Bulow, T., Malik, J.: Recognizing Objects in Range Data Using Regional Point Descriptors. European Conference on Computer Vision, Prague, Czech Republic, (2004) 4. Johnson, A. E., Hebert, M.: Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 21 No. 5 (1999) 433-449 5. Fred, R., Svetlana, L., Cordelia, S., Jean, P.: 3D Object Modeling and Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints. CVPR (2003) 272-280

Reconstruction of Rectangular Plane in 3D Space Using Determination of Non-vertical Lines from Hyperboloidal Projection Hyun-Deok Kang and Kang-Hyun Jo Intelligent Systems Laboratory, School of Electrical Engineering, University of Ulsan, 680-749 San 29, Muger 2 - dong, Ulsan, Korea {hdkang,jkh2005}@islab.ulsan.ac.kr

Abstract. This paper describes the 3D reconstruction of planar objects using parallel lines from single panoramic image. Determination of non-vertical lines is depends on position of vanishing points with two lines on panoramic image. The vertical 3D line is projected as radial line on panoramic image and horizontal line is projected as curve or arc in panoramic image. Two parallel vertical lines are converged on center point in calibrated panoramic image. On the contrary, two parallel horizontal lines have the pair of vanishing points on the circle at infinity in panoramic image. We reconstruct the planar objects with parallel lines using the vanishing points and properties of parallelism in lines. Finally, we analysis and present the results of 3D reconstruction of planar objects by synthetic or real image.

1 Introduction This paper describes the 3D reconstruction of objects from single panoramic image. In general, we need two different views of images to acquire the 3D information of objects. We calculate the 3D information of objects using the geometric constraints of camera and curved mirror under the catadioptric imaging system. Tracing the previous work of structure from motion with the omnidirectional vision system, we reconstructed a geometric information using two omnidirectional camera or single camera like a motion stereo method[3,8,10]. The previous works to acquire geometric information from single panoramic image have been used the system with conical mirror and camera; one of the properties of conical mirror is non-SVP(Non-Single ViewPoint constraint). Brassart measured the location of robot and geometric information of features with SYCLOP(Conical SYstem for LOcalization and Perception) system[6]. Especially, Pinciroli explained the reason that the condition of conical mirror and non-SVP constraints are used in 3D line reconstruction[15]. As described in Pinciroli, conical mirror system is good to reconstruct the spatial information of environment. However, it is frequently to blur the image by property of non-SVP constraints and also has limitation of vertical field of view because of the mirror's shape. A method for line reconstruction from single panoramic images has been presented, using prior information about parallelism and coplanarity[5]. In this paper, we present D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1030 – 1035, 2006. © Springer-Verlag Berlin Heidelberg 2006

Reconstruction of Rectangular Plane in 3D Space

1031

the conditions under which straight lines in 3D space can be reconstructed from single panoramic image using the vanishing points and properties of lines such as parallelism and coplanarity[15]. In this paper, we describe the camera model with hyperboloidal mirror which consist of combination of curved mirror and conventional camera in section II. Horizontal line is converted the arc(curve) with pair of vanishing points on the circle at infinity by polar-coordinates transformation. In section III, we explain the estimation of trajectory of features and vanishing points. When we get the part of vertical or horizontal line, we also discuss how to calculate the point intersecting between xy-plane and extended vertical or horizontal line and also get the parallel plane in xy-plane. In experiment, we test our proposed method in synthetic and real environment. We discuss about the measurement error based on experimental data finally.

2 Geometric Model of Vertical Line Segments in Catadioptric Camera Let us assume that we consider the triangle which consists of mirror focal point F ' and 3D points P, H , Q and the figure with the mirror focal point F ' and points of mir))))))& )))))))& ror surface Pm , H m , Qm as shown in figure 1. Length of line segments F ' Pm , F ' H m calculated by camera model[14]. Then, the angle ϕ is calculated by inner product of ))))))& )))))))& two vectors F ' Pm , F ' H m . As the same of calculation of angle ϕ ,θ is calculated by )))))))& ))))))& inner product of two vectors F ' H m , F ' Qm is shown. )))))& ))))))& ))))))& ))))))& −1 F ' Pm ⋅ F ' H m −1 F ' H m ⋅ F ' Qm ϕ = cos , θ = cos . (1) F ' H m F ' Qm F ' Pm F ' H m

Finally, we calculate the distance( R ) of camera and vertical line segments on xyplane and height of vertical line segment( Z H ) using the angles ϕ ,θ and distance between camera and mirror focal points( 2e ). R = 2e cot(θ ) , Z H = R tan(ϕ ) + 2e .

(2)

Therefore, we know the 3D information of feature in environment from located feature vertically.

3 The Estimation of Horizontal Line and Vanishing Points For 3D localization of features, we have to know the intersection points which located in ground plane and vertical line segments in wall. It is difficult to extract the intersection point because of occlusion of objects or partial arc of horizontal curve in image. We use the hyperboloidal mirror; we know the curve lies in a circle. The shape of corridor is similar to pseudo-sinusoidal function in panoramic image. If we regard the curve of shape of corridor as the function of pseudo-sinusoidal, we derive the intersection points. The synthetic and panoramic images are shown in figure 2. Our created corridor has T-type junction and also has three vanishing points. We calculate the

1032

H.-D. Kang and K.-H. Jo

vanishing point which locates on the circle at infinity and calculate the 3D localization of features in image. The trajectory of shape of corridor shown at figure 2(d); The arcs are similar to pseudo-sinusoidal function with different period. For estimation of pseudo-sinusoid function, we need to transform between Cartesian to Polar coordinates. The arc is the part of the circle in omnidirectional image and also transformed as the pseudo-sinusoidal function in panoramic image. Consider a set of feature points pi = ( xi , yi )T with i = 1, 2,..., N in circle. ( xi − a ) + ( yi − b) = ρ . 2

(3)

The pseudo-sinusoidal function is

ri = ( a cos θ i + b sin θ i ) ±

ρ − ( a + b ) + ( a cos θ i + b sin θi ) . 2

(4)

rrim

ϕ F'

Hm Qm

22ee

p f

Q F

camera

Fig. 1. Geometry of camera and feature points of vertical line

Fig. 2. Synthetic corridor(T-junction): (a) Omnidirectional image with feature points which have the circle trajectory and (b) Panoramic image and pseudo-sinusoidal trajectory in each transformed points

Reconstruction of Rectangular Plane in 3D Space

1033

4 Experiments To illustrate and verify the method, experiments with synthetic and real image and hyperbolic mirror are presented. We extract the corner points or vertical edge for feature extraction. In order to get the vertical segments, the omnidirectional image translated to panoramic image. The width of transformed image presents the azimuth and elevation of feature around the robot and also extracted as the vertical line segments respective to angle. The extraction of vertical line segments as a feature described in [9].

Fig. 3. The reconstruction of synthetic corridor: (a) arbitrary viewpoint, (b) xy plane, (c) xz plane, (d) yz plane

Fig. 4. Estimation of curve and vanishing points: (a) Input image with provided features, (b) Estimated curves. (c) The results of curves as pseudo-sinusoidal functions with different periods

1034

H.-D. Kang and K.-H. Jo

4.1 Estimation of Curves Using Geometrical Constraints

We extract the curves of provided feature in image which meant by horizontal distorted line. The purpose of preliminary experiments is to analysis where the intersection points are located in the estimated curve with vanishing points. The estimated curve regards as the pseudo-sinusoidal function and its intersection points also regard as the vanishing points of corridor. Geometrical constraints means the properties such as co-planarity, perpendicularity and parallelism using lines[15]. Results of reconstruction of features are shown in figure 3 and 5.

Fig. 5. Result of reconstruction of features (up to scale)

5 Conclusion We proposed the method to acquire the 3D geometric information of feature using the geometric properties of mirror and circle at infinity in single omnidirectional image. Location of features or 3D reconstruction calculated by usage of motion stereo method or least two different point of view images. It is able to get the spatial information of located features vertically against the ground plane which uses the constraints of plane and lines and geometric condition of camera and mirror in omnidirectional vision system alternatively. These methods have the merit that is possible to know the 3D spatial information of feature and also the location of robot in navigation; because it is to obtain the 3D information of features in the single image. We use the hyperboolidal mirror with high curvature in order to view the features which located in high pose of feature in corridor and also tested our proposed method. Now we are analyzing to fit and estimate the curve of feature points in image. Also, we should experiment and analyze an estimation of curve using the fitting of pseudosinusoidal function in real environment.

Acknowledgement This work was originally motivated and supported by Research Fund of University of Ulsan in part. Also, we would like to thank Ministry of Commerce, Industry and Energy and Ulsan Metropolitan City which partly supported this research through the Network-based Automation Research Center (NARC) at University of Ulsan.

Reconstruction of Rectangular Plane in 3D Space

1035

References 1. Yamazawa, K., Yagi, Y., Yachida, M.: Omnidirectional Imaging with Hyperboloidal Projections. Proc. IROS. (1993) 2. Baker, S., Nayar, S.K.: A theory of Catadioptric Image Formation. Int. Conf. Computer Vision. (1998) 35-42 3. Gluckman, J., Nayar, S.: Ego-motion and Omnidirectional Cameras. Int. Conf. Computer Vision. (1998) 999-1005 4. Criminisi, A., Reid I., Zisserman, A.: Single View Metrology. Proc. of the 7th Int. Conf. on Computer Vision. (1999) 5. Sturm, P.: A Method for 3D Reconstruction of Piecewise Planar Objects from Single Panoramic Images. Proc. IEEE Workshop OMNIVIS, USA (2000) 6. Brassart, E., Delahoche, L., Caushois, C., Drocourt, C., Pegaro, C., Mouaddib, E.M.: Experimental Results got with the Omnidirectional Vision Sensor: SYCLOP. Proc. IEEE Workshop OMNIVIS, (2000) 145-152 7. Schaffalitzky, F., Zisserman, A.: Planar Grouping for Automatic Detection of Vanishing Lines and Points. Int'l Journal of Image and Vision Computing. Vol. 18, (2000) 647-658 8. Zhigang Z.: Omnidirectional Stereo Vision. Workshop on Omnidirectional Vision Applied to Robotic Orientation and Nondestructive Testing (NDT), The 10th IEEE Int’l. Conf. on Advanced Robotics, Budapest, Hungary (invited talk). (2001) 9. Kang, H.D., Jo, K.H.: Self-localization of Autonomous Mobile Robot from the Multiple Candidates of Landmarks. Int. Conf. on Optomechatronic Systems III. Vol. 4092. Germany. (2002) 428-435 10. Svoboda, T., Pajdla, T.: Epipolar Geometry for Central Catatioptric Cameras. Int. J. Computer Vision. Vol 49. No. 1. (2002) 23-37 11. Caushois, C., Brassart, E., Delahoche, L., Clerentin, A.: 3D Localization with Conical Vision. Proc. IEEE Workshop OMNIVIS (2003) 12. Ying, X., Hu, Z.: Catadioptric Line Feature Detection using Hough Transform. Int. Conf. Pattern Recognition. Vol. 4. (2004) 839-842 13. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. 2nd Edition. Cambridge University Press, (2004) 14. Caglioti, V., Gasparini, S.: On the Localization of Straight Lines in 3D Space from Single 2D Images. Proc. Conf. Computer Vision and Pattern Recognition. Vol. 1. USA, (2005) 1129-1134 15. Piciroli, C., Bonarini, A., Mattercci, M.: Robust Detection of 3D Scene Horizontal and Vertical Lines in Conical Catadioptric Sensors. Proc. IEEE Workshop OMNIVIS. China, (2005)

Region-Based Fuzzy Shock Filter with Anisotropic Diffusion for Adaptive Image Enhancement∗ Shujun Fu1,2, Qiuqi Ruan2, Wenqia Wang1, and Jingnian Chen3 1 2

School of Mathematics and System Sciences, Shandong University, Jinan, 250100, China Institute of Information Science, Beijing Jiaotong University, Beijing, 100044, China 3 School of Arts and Science, Shandong University of Finance, Jinan, 250014, China [emailprotected]

Abstract. A region-based fuzzy shock filter with anisotropic diffusion is presented for image noise removal and edge sharpening. An image is divided into three-type different regions according to image features. For different regions, a binary shock-type backward diffusion or a fuzzy backward diffusion is performed in the gradient direction to the isophote line, incorporating a forward diffusion in the tangent direction. Gaussian smoothing to the second normal derivative results in a robust process against noise. Experiments on real images show that this method produces better visual results of the enhanced images than some related equations.

1 Introduction Image enhancement and sharpening are important operations in image processing and computer vision. Many different methods have been put forth in the past [1]. However, major drawbacks of these methods are that they also enhance noise in image, and ringing artifacts may occur along both sides of an edge. More importantly, traditional image sharpening methods mainly increase the gray level difference across edge, while its width remains unchanged. For a wide and blurry edge, increasing simply its contrast produces only very limited effect. As the extension of conventional (crisp) set theory, L. A. Zadeh put forward the fuzzy set theory to model the vagueness and ambiguity in complex systems, which is a useful tool for handling the uncertainty associated with vagueness and/or imprecision. Image and its processing bear some fuzziness in nature. Therefore, fuzzy set theory has been successfully applied to image processing and computer vision [2]. In the past decades there has been a growing amount of research concerning partial differential equations in image enhancement, such as anisotropic diffusion filters [3-6] for edge preserving noise removal, and shock filters [7-9] for edge sharpening. Incorporating anisotropic diffusion with shock filter, we present a region-based fuzzy shock filter with anisotropic diffusion to remove image noise, and to sharpen edges by reducing their width simultaneously. ∗

This work is supported by the national natural science fund, China (No. 60472033), the Key Laboratory Project of Information Science & Engineering of Railway of National Ministry of Railways, China (No. TDXX0510), and the Technological Innovation Fund of Excellent Doctorial Candidate of Beijing Jiaotong University, China (No. 48007).

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1036 – 1041, 2006. © Springer-Verlag Berlin Heidelberg 2006

Region-Based Fuzzy Shock Filter with Anisotropic Diffusion

1037

2 Region-Based Fuzzy Shock Filter with Anisotropic Diffusion 2.1 Some Related Work One of most influential work in using partial differential equations (PDEs) in image processing is the anisotropic diffusion (AD) filter, which was proposed by P. Perona and J. Malik [4] for image denoising, enhancement, etc. Let ( x, y ) ∈ Ω ⊂ R 2 , and t ∈ [ 0, + ∞ ) , a multiscale image u ( x, y, t ): Ω × [ 0, + ∞) → R , is evolved according to the following equation:

∂u ( x, y , t ) = div ( g ( ∇u ( x, y , t ) )∇u ( x, y , t )) , g ( ∇u ) = 1 (1 + ( ∇u K )2 ) . ∂t

(1)

where K is a gradient threshold. The scalar diffusivity g ( ∇u ) , chosen as a non-

increasing function, governs the behaviour of the diffusion process. Performing a * backward diffusion for ∇u > K along N , this equation can sharpen the edge. Different from the nonlinear parabolic diffusion process, L. Alvarez and L. Mazorra [7] proposed an anisotropic diěusion with shock filter (ADSF) equation by adding a hyperbolic equation, called shock Filter which was introduced by S.J. Osher and L.I. Rudin [8], for noise elimination and edge sharpening:

∂u = −sign(G ∗ u )sign(G ∗ u ) ∇ u + cu . σ NN σ N TT ∂t

(2)

where Gσ is a Gaussian function with standard deviation σ , c is a positive constant. A more advanced scheme was proposed by P. Kornprobst, et al. [9], which combines image coupling, restoration and enhancement (CRE) in the following equation:

∂u = − a (u − u )+a (h u +u ) − a (1 − h )sign(G ∗ u ) ∇ u . f 0 r τ NN TT e τ σ NN ∂t

(3)

where a f , ar and ae are some constants, u0 is the original image; hτ = hτ ( Gσ ∗ u N )

= 1 , if Gσ ∗ u N < τ , and 0 elsewhere. The first term on the right is a fidelity term to carry out a stabilization effect. In order to reinforce robustness against noise, G. Gilboa et al. [10] generalized the real-valued diffusion to the complex domain, by incorporating the free Schrödinger equation. They utilized the imaginary part to approximate the smoothed second derivative when the complex diffusion coefficient approaches the real axis, and proposed an interesting complex diffusion process (CDP): ∂u = − 2 arctan(aIm( u )) ∇ u + λ u + λu . NN TT π ∂t θ

(4)

where Im(x) is the imaginary part of a complex variable x, λ = reiθ is a complex scalar, θ is a small angle, λ is a real scalar; and a is a parameter to control the sharpness of the slope near zero.

1038

S. Fu et al.

2.2 Region-Based Fuzzy Shock Filter with Anisotropic Diffusion

In equations (2) and (3), however, to enhance an image using the symbol function sign(x) is a binary decision process. This is a hard partition without middle transition. Unfortunately, the obtained result is a false piecewise constant image, where a bad visual quality is produced in some areas. In Fig.1, zoomed part of results obtained by the binary shock filter to blurry images, such as the Lena and the Peppers, is shown respectively. One can see obviously unnatural illumination transition and annoying artifacts in the image enhancement process.

Fig. 1. Zoomed part of results by the binary shock filter: left, the Lena; right, the Peppers

Fuzzy set theory discovers the fuzziness of the information received from nature by human. Fuzzy techniques are powerful tools for knowledge representation and processing, and fuzzy techniques can manage the vagueness and ambiguity efficiently. In image processing applications, many difficulties arise because the data / tasks / results are uncertain [2]. Denote the fuzzy set S on the region R as:

S=³

μS ( x)

. x , x∈R

(5)

where μ S ( x ) ∈ [ 0, 1] is called the membership function of S on R. Chen, etc [11] extended further above set to the generalized fuzzy set, where they denoted the generalized membership function (GMF) μ S ( x) ∈ [ −1, 1] to substitute μ S ( x ) ∈ [ 0, 1] . An image comprises regions with different features, such as edges, textures and details, and flat areas, which should be treated differently to obtain a better result in an image processing task. We divide an image into three-type regions by its smoothed gradient magnitude: big gradients (such as boundaries of different objects), medium gradients (such as textures and details) and small gradients (such as smoother segments inside different areas). In our algorithm, for edges between different objects, a shock-type backward diffusion is performed in the gradient direction to the isophote line (edge), incorporating a forward diffusion in the isophote line direction. For textures and details, shock filters with the symbol function enhance image features in a binary decision process, which produce unfortunately a false piecewise constant result. We notice that the variation of texture and detail is fuzzy in these areas. In

Region-Based Fuzzy Shock Filter with Anisotropic Diffusion

1039

order to approach this variation, we extend the binary decision to a fuzzy one substituting sign(x) by a hyperbolic tangent membership function th(x), which guarantees a natural smooth transition in these areas, by controlling softly changes of gray levels of the image. As a result, a fuzzy shock-type backward diffusion is introduced to enhance these features while preserving a natural transition in these areas. The normal derivative of the smoothed image is used to detect image feature. Finally, an isotropic diffusion is used to smooth flat areas simultaneously. Thus, incorporating shock filter with anisotropic diffusion, we develop a regionbased fuzzy shock filter with anisotropic diffusion (RFSFAD) process to reduce noise, and to sharpen edges while enhancing image features simultaneously:

v = Gσ ∗ u ° . ® ∂u = + − ( )sign( ) c u c u w v v u N NN T TT N NN N °¯ ∂t

(6)

with Neumann boundary condition, where the parameters are chosen as follows according to different image regions:

1 (1 + l1 u )

T2 < vN ≤ T1

1 (1 + l1 u )

else

vN > T1

w(vN ) 2 TT 2 TT

th(l2 vNN ) 0

cN and cT are the normal and tangent flow control coefficients respectively. The tangent flow control coefficient is used to prevent excess smoothness to smaller details; l2 is a parameter to control the gradient of the membership function th(x); T1 and T2 are two thresholds which divide the image into three-type different regions; l1 and l2 are constants.

3 Numerical Implementation and Experimentals We develop a speeding shock capturing scheme by using the MS limiter function [12], and present results obtained by using our scheme (6) comparing its performance with above related methods. First, we compare performances of related methods on the blurred Peppers image (Gaussian blur, σ =2.2) with added middle level noise (SNR=21dB). In Fig.2 local enlarged results are shown. As it can be seen, although AD denoises the image well specially in the smoother segments, it produces the blurry image with unsharp edges, whose ability to sharpen edges is limited, because of its poor sharpening process with the improper diffusion coefficient along the gradient direction. Moreover, with the diffusion coefficient in inverse proportion to the image gradient magnitude along the tangent direction, it does not diffuse fully in this direction and presents rough contours. For ADSF and CRE, though they sharpen edges very well, in a binary decision process they yield the false piecewise constant images, which look unnatural

1040

S. Fu et al.

Fig. 2. Zoomed parts of above results (from top-left to bottom-right): a noisy blurry image, results by AD, ADSF, CRE, CDP and RFSFAD respectively

with a discontinueous transition in the hom*ogenous areas. Further, ADSF cannot reduce noise well only by a single directional diffusion in the smoother regions. Performing a complex diffusion process, CDP present a relative good result. But on edges with big gradient magnitude between different objects, because the diffusion process is weight- ed by the arctan(x), the sharpness of its result is somewhat lower than that using the sign(x). And that, it should be pointed out that image enhancement by the complex computation is time consuming than the real one.

Region-Based Fuzzy Shock Filter with Anisotropic Diffusion

1041

4 Conclusions This paper deals with image enhancement for noisy blurry images. By reducing the width of edges, a region-based fuzzy shock filter with anisotropic diffusion process is proposed to remove noise and to sharpen edges. Our model performs a powerful process to noise blurry images, by which we not only can remove noise and sharpen edges effectively, but also can smooth image contours even in the presence of high level noise. Enhancing image features such as edges, textures and details with a natural transition in interior areas, this method produces better visual quality than some relative equations.

References 1. Castleman, K.R. (ed.): Digital Image Processing, Prentice Hall, (1995) 2. Hamid, R.T.: Fuzzy Image Processing: Introduction in Theory and Applications. SpringerVerlag, (1997) 3. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. Springer-Verlag, Applied Mathematical Sciences, Vol.147 (2001) 125-164 4. Perona, P., Malik, J.: Scale-space and Edge Detection Using Anisotropic Diffusion. IEEE Trans. Pattern Anal. Machine Intell, Vol.12 (1990) 629-639 5. Nitzberg, M., Shiota, T.: Nonlinear Image Filtering with Edge and Corner Enhancement. IEEE Transactions on PAMI, Vol.14 (1992) 826-833 6. You, Y.L., Xu, W., Tannenbaum, A., Kaveh, M.: Behavioral Analysis of Anisotropic Diffusion in Image Processing, IEEE Transactions on Image Processing. Vol.5 (1996) 1539-1553 7. Alvarez, L. Mazorra, L.: Signal and Image Restoration Using Shock Filters and Anisotropic Diffusion. SIAM J. Numer. Anal. Vol.31 (1994) 590-605 8. Osher, S.J., Rudin, L.I.: Feature-oriented image enhancement using shock filters. SIAM J. Numer. Anal., 27(1990) 919-940. 9. Kornprobst, P., Deriche, R., Aubert, G.: Image coupling, restoration and enhancement via PDE’s. IEEE ICIP, 2(1997) 458-461. 10. Gilboa, G., Sochen, N., Zeevi, Y.Y.: Image Enhancement and denoising by complex diffusion processes. IEEE Transactions on PAMI, 26(8)( 2004) 1020-1036. 11. Chen, W.F., Lu, X.Q., Chen, J.J., Wu, G.X.: A new algorithm of edge detection for color image: Generalized fuzzy operator. Science in China (Series A), 38(10)(1995) 1272-1280. 12. Liu, R.X., Shu, Q.W.: Some new methods in Computing Fluid Dynamics, Science Press of China, Beijing (2004).

Robust Feature Detection Using 2D Wavelet Transform Under Low Light Environment Jihoon Lee1, Youngouk Kim1, 2, Changwoo Park2, Changhan Park1, and Joonki Paik1 1

Image Processing and Intelligent Systems Laboratory, Department of Image Engineering, Graduate School of Advanced Imaging Science, Multimedia, and Film, Chung-Ang University 2 Precision Machinery Center, Korea Electronics Technology Institute, 401-402 B/D 193, Yakdae-Dong, WonMi-Gu, Puchon-Si, KyungGi-Do 420-140, Korea [emailprotected]

Abstract. A novel local feature detection method is presented for mobile robot’s visual simultaneous localization and map building (v-SLAM). Camerabased visual localization can handle complicated problems, such as kidnapping and shadowing, which come with other type of sensors. Fundamental requirement of robust self-localization is robust key-point extraction under affine transform and illumination change. Especially, localization under low light environment is crucial for the purpose of guidance and navigation. This paper presents an efficient local feature extraction method under low light environment. A more efficient local feature detector and a compensation scheme of noise due to the low contrast images are proposed. The propose scene recognition method is robust against scale, rotation, and noise in the local feature space. We adopt the framework of scale-invariant feature transform (SIFT), where the difference of Gaussian (DoG)-based scale-invariant feature detection module is replaced by the difference of wavelet (DoW).

1 Introduction SLAM is required multi-modal sensors, such as ultrasound sensor, range sensor, infrared (IR) sensor, encoder (odometer), and multiple visual sensors. Recognitionbased localization is considered as the most promising method of image-based SLAM [1,2]. IR-LED cameras are recently used to deal with such complicated conditions. Map building becomes more prone to illumination change and affine variation, when the robot is randomly moving. The most popular solution for the robust recognition method is scale-invariant feature transform (SIFT) approach that transforms an input image into a large collection of local feature vectors, each of which is invariant to image translation, scaling, and rotation [3]. The feature vector is partially invariant to illumination changes and affine (or three-dimensional) projection. Such local descriptor-based approach is generally robust against occlusion and scale variance. In spite of many promising factors, SIFT has many parameters to be controlled, and it requires the optimum Gaussian pyramid for acceptable performance. Intensity-based local feature extraction methods cannot avoid estimation error because of low light-level D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1042 – 1050, 2006. © Springer-Verlag Berlin Heidelberg 2006

Robust Feature Detection Using 2D Wavelet Transform

1043

noise.Corner detection [5] and local descriptor-based [2] methods fall into this category. An alternative approach is moment-based invariant feature extraction that is robust against both geometric and photometric changes [9,11]. This approach is usually effective for still image recognition. While a robot is moving, the moment-based method frequently recognizes non-planar objects, and can hardly extract invariant regions under illumination change. This paper presents a real-time local keypoint extraction method in the two-dimensional wavelet transform domain. The proposed method is robust against illumination change and low light-level noise, and free from manual adjustment of many parameters. The paper is organized as follows. In section 2, noise adaptive spatio-temporal filter (NAST) is proposed to remove low light-level noise as a preprocessing step[6]. Section 3 describes the proposed real-time local feature extraction method in the wavelet transform domain. Section 4 summarizes various experimental results by comparing DoW with SIFT methods, and section 5 concludes the paper.

2 Noise Adaptive Spatio-temporal Filter The proposed NAST algorithm adaptively processes the acquired image to remove low light level noise. Depending on statistics of the image, information of neighboring pixels, and motion, the NAST algorithm selects a proper filtering algorithm for each type of noise. A conceptual flowchart of the proposed algorithm is illustrated in Fig. 1. The proposed NAST algorithm has four different operations which are applied to the low light images.

Fig. 1. Conceptual flowchart of the proposed algorithm

2.1 Noise Detection Algorithm The output of the noise detection block determines the operation of filtering blocks. The proposed spatial hybrid filter (SHF) can be represented as y ( i , j ) = n ( i , j ) × xˆ ( i , j ) + ( 1 − n ( i , j ) ) × x ( i , j )

(1)

where xˆ (i , j ) represents a pixel filtered by the SHF and n (i , j ) , which is the result of the noise detection process, takes 1 for the position of photon counting noise (PCN)

1044

J. Lee et al.

pixels and 0 elsewhere. In equation (1), x (i , j ) and y (i , j ) denote the (i, j)-th pixels in noisy and filtered images, respectively. In the proposed noise detection scheme, n (i , j ) forms a binary noise map denoted by N, which is used to filter out uncorrelated noise and to indicate the reference points for the subsequent filtering of correlated noise. 2.2 Filtering Mechanism of SHF

If the central pixel in the window (W) is considered to be noise (i.e., n (i , j ) = 1 in the noise map N), it is substituted by the median value of the window as a normal median filter. Then the noise cancellation scheme in SHF is extended to the correlated pixels in the local neighborhood ( x (i , j ) where n (i , j ) ≠ 1 and at least one n (k , l ) = 1 in W). In order to identify the correlated noise, the de-noised pixel value x ′ (i, j ) can be

defined as x ′( i , j ) =

(i, j ) × x (i, j ) + x

(i, j )

σ (i, j ) + x (i, j )

(2)

where x (i , j ) , and σ (i, j) respectively represent the mean and variance of the window W. 2

2.3 Statistical Domain Temporal Filter (SDTF) for False Color Noise (FCN) Detection and Filtering

We use a new SDTF for removing FCN. The sum of the absolute differences (SAD) between the two working windows of consecutive frames is used for motion detection to avoid motion blur due to temporal averaging. Let xˆ (i , j , t ) and xˆ (i , j , t − 1) denote intensity values at the (i, j)-th pixel in the spatially filtered frames at time t and t − 1 , respectively, the proposed temporal filter can then be realized as xˆ ( i , j , t − 1 ) ,

(i, j, t ) > S

(i, j, t − 1)

¯ xˆ ( i , j , t ) ,

(i, j, t ) ≤ S

(i, j, t − 1)

y (i, j, t ) = ®

(3)

where y (i, j , t ) represents the final result of the proposed NAST and ST is the local statistics defined as S T (i, j , t ) =

( x (i,

j , t ) − x (i, j , t ) ) − σ 2 (i, j, t ) 2

(4)

3 A New Method for Local Feature Detector Using 2D Discrete Wavelet Transform In this section 2D discrete wavelet transform is briefly described as a theoretical background [7]. Based on theory and implementation of 2D discrete wavelet transform, the DoW-based local extrema detection method is presented.

Robust Feature Detection Using 2D Wavelet Transform

1045

3.1 Characteristics of 2D Wavelet Transform

Human visual characteristics are widely used in image processing. One example is the use of Laplacian pyramid for image coding. SIFT falls into the category that uses Laplacian pyamid for scale-invariant feature extraction [3]. On the other hand wavelet transform is a multiresolution transform that repeatedly decompose the input signal into lowpass and highpass components like subband coding [7,8]. Wavelet-based scale-invariant feature extraction method does not increase the number of samples in the original image, which is the case of the Gaussian pyramid-based SIFT method. Wavelet transform can easily reflect human visual system by multiresolution analysis using orthogonal bases[12]. Because the wavelet-based method does not increase the number of samples, computational redundancy is greatly reduced, and its implementation is suitable for parallel processing. 3.2 Difference of Wavelet in the Scale Space

Most popular wavelet functions include Daubechies [7] and biorthogonal wavelet [10]. Although Daubechies designed a perfect reconstruction wavelet filter, it does not have symmetry. In general image processing applications symmetric biorthogonal filter is particularly suitable[10], but we used Daubechies coefficient set{DB2, DB10, DB18, DB26, DB34, DB42} for just efficient feature extraction purpose.

Fig. 2. Structure of Difference of Wavelet

A. Parameter Decision for Wavelet Pyramid. In order to construct the wavelet pyramid, we decide the number of Daubechies coefficients and approximation levels, which can be considered as a counterpart of the DoG-based scale expansion. Fig. 3 shows that DB6 provides the optimum local key points, and Fig. 4 shows that approximation level 3 is the most efficient for matching. Although larger coefficients have better decomposition ability, we used DB2 as the first filter, and increased the step by 8. Because all DB filters have even numbered supports, difference between adjacent DB filters’ support is recommended to be larger than or equal to 4 for easy alignment. In this work we used difference of 8, because difference of 4 provides almost same filtered images. Table 1 summarizes results experimental of processing time and matching rate using different wavelet filters in the SIFT framework. Coefficient set of the first row provides the best keypoint extraction result with significantly reduced computational overhead. The combination given in the second row is the best in the sense of matching time and rate.

1046

J. Lee et al.

number of keypoints per imag e

800 700 600 500 400 300 200 100 0 4

number of scales(wavelet coefficients)

matching rate(%)

Fig. 3. The number of extracted keypoints versus the number of wavelet coefficients

Fig. 4. Matching rate versus the number of approximation level

Table 1. Various coefficient sets of Daubechies coefficients in the SIFT framework for measuring processing time and matching rate under low light(0.05lux) condition Comparison factor Coefficient set DB2, DB6, DB10, DB14, DB18, DB22 DB2, DB10, DB18, DB26, DB34, DB42 DB2, DB14, DB26, DB38, DB50, DB62 DB2, DB18, DB34, DB50, DB68, DB86 SIFT[4]

(ı=1.6, k= 2 , 1D Gaussian kernel size = 11) (Images per octave = 6, Number of octaves = 3)

Processing time(msec)

Matching rate(%)

121 130 173 213

34.72 71.92 72.37 72.87

925

57.68

B. Wavelet-like Subband Transform. As shown in Fig. 2, the proposed wavelet pyramid is constructed using six Daubecies coefficient sets with three approximation levels. Because the length of each filter is even number, we need appropriate alignment method for matching different scales, as shown in Fig. 5, where DB10 is used for 320 × 240 input images.

Robust Feature Detection Using 2D Wavelet Transform

1047

Fig. 5. Proposed alignment method for different approximation levels

3.3 Local Extrema Detection and Local Image Descriptors

In the previous subsection we described the detail construction method for wavelet pyramid and DoW. In keypoints extraction step, we used min-max extrema [4] with consideration of aligning asymmetrically filtered scales.

Fig. 6. Maxima and minima of the difference of Wavelet images are detected by comparing a pixel (marked with X) to its 26 neighbors in 3 × 3regions at the current and adjacent scales (marked with circles)

In order to extract scale-invariant feature points, we compute DoW in the scale space, and locate the minimum and maximum pixels among the neighboring 8 pixels and 18 pixels in the upper and lower-scale images. Such extrema become scaleinvariant features. DoW-based scale space is constructed as shown in Fig. 2. For each octave of scale space, the initial images are repeatedly convolved with the corresponding wavelet filter to produce the set of scale space images shown in the left. DoW images are shown in the center, and in the right maxima and minima of the difference of wavelet images are detected by comparing a pixel, marked with × , to its 26 neighbors in three 3 × 3 templates, marked with circle. For discrete wavelet transform, we used six different sets of Daubechies coefficients to generate a single octave, and make each difference image by using three octaves as DoW1 = DB10 _ L1 − DB 2 _ L1,

DoW 2 = DB18 _ L1 − DB10 _ L1

DoW 3 = DB 26 _ L1 − DB18 _ L1,

DoW 4 = DB 34 _ L1 − DB 26 _ L1

DoW 5 = DB 42 _ L1 − DB 34 _ L1

(5)

1048

J. Lee et al.

Equation (5) defines how to make a DoW image using two wavelet transformed images. Feature points obtained by the proposed method are mainly located in the neighborhood of strong edges. DoW also has computational advantage to DoG because many octaves can be generated in parallel.

4 Experimental Result We first enhanced the low light-level image quality using NAST filter, whose result is shown in Fig. 7. Comparison between DoG-based SIFT and the proposed DoW methods is shown in Fig. 8. As shown in Fig. 8, the proposed DoW method outperforms the DoG-based SIFT in the sense of both stability of extracted keypoints and computational efficiency. Fig. 9, Compares performance of combined NAST and DoG method with the DoG-based SIFT algorithm.

(a)

(b)

Fig. 7. (a) Input low light-level image with significant noise and (b) NAST filtered image

(a)

(b)

(c)

(d)

Fig. 8. Keypoint extraction results: (a) DoG, (b) DoW, and (c, d) translation of (a) and (b), respectively

(a)

(b)

(c)

Fig. 9. Keypoints extraction results under low light-level condition using (a)DoG, (b) DoG with NAST, and (c) DoW with NAST

Table 2 shows performance evaluation for processing time, matching rate and the PSNR in dB is obtained by using pre-filtering algorithm. The low pass filter(LPF)[13] were simulated for comparison with the NAST filter. In order to measure PSNR, we add synthetic noise (20dB PCN, and 15dB FCN) to the acquired low light images. This work was tested using a personal computer with Pentium- 3.0GHz.

Robust Feature Detection Using 2D Wavelet Transform

1049

Table 2. Performance evaluation of DoG and DoW with NAST filter Comparison Factor Type of method DoG under low light NAST + DoG under low light LPF + DoW under low light NAST + DoW under low light

Processing time (msec) PSNR(dB) Matching rate (%) 925 1,104 254 355

39.48 37.13 39.50

68.88 70.98 73.69 77.24

5 Conclusion The paper presents a local feature detection method for vSLAM-based selflocalization of mobile robots. Extraction of strong feature points enables accurate self-localization under various conditions. We first proposed NAST pre-processing filter to enhance low light-level input images. The SIFT algorithm was modified by adopting wavelet transform instead of Gaussian pyramid construction. The waveletbased pyramid outperformed the original SIFT in the sense of processing time and quality of extracted keypoints. A more efficient local feature detector and a compensation scheme of noise due to the low contrast images are also proposed. The proposed scene recognition method is robust against scale, rotation, and noise in the local feature space.

Acknowledgement This research was supported by Korean Ministry of Science and Technology under the National Research Laboratory Project, by Korean Ministry of Education under the BK21 Project, and by Seoul Future Content Convergence Cluster established by Seoul Industry-Academy-Research Cooperation Project.

References 1. Dissanayake., M.W.M.G., et al.: A Solution to the Simultaneous Localization and Map Building (SLAM) Problem. IEEE Trans. (2001) 229–241 2. Lionis, G.S.; Kyriakopoulos, K.J.: A Laser Scanner Based Mobile Robot SLAM Algorithm with Improved Convergence Properties. IEEE International Conference, 1 (2002)582 – 587 3. Lowe, D.G.: Object Recognition from Local Scale-invariant Features. Proc. of 7th Int’l Conf. on Computer Vision, 2 (1999) 1150-1157 4. Lowe, D.G.: Distinctive Image Features from Scale Invariant Keypoints. Int’l Journal of Computer Vision, 60 (2004) 91-110 5. Zhang, Z., Deriche, R., Faugeras, O., Luong, Q.T.: A Robust Technique for Matching Two Uncalibrated Images through the Recovery of the Unknown Epipolar Geometry. Artificial Intelligence, (1995) 87-119 6. Lee, S., Maik, V., Jang, J., Shin, J., Paik, J.: Noise-Adaptive Spatio-Temporal Filter for Real-Time Noise Removal in Low Light Level Images. IEEE Trans. Consumer Electronics, 51 (2005) 648-653

1050

J. Lee et al.

7. Daubechies, I.: Orthogonal Bases of Compactly Supported Wavelets. Comun, Pure. Appl. Math, 41 (1998) 909-996 8. Mallat, S.G.: Multifreuency Channel Decompositions of Images and Wavelet Models. IEEE Trans. On ASSP, 37 (1989) 2091-2110 9. Mindru, F., et al.: Moment Invariants for Recognition under Changing Viewpoint and Illumination. Computer Vision and Image Understanding 94 (2003) 3-27 10. Feauvean, J.C., Mathieu, P., Barlaud, M., Antonini, M.: Recursive Biorthogonal Wavelet Transform for Image Coding. in Proc. IEEE ICASSP’91 (1991)2649-2652 11. Janne Heikkila.: Pattern Matching with Affine Moment Descriptors. Pattern Recognition, 37 (2004) 1825-1834 12. Irie, K., Kishimoto, R.: A Study on Perfect Reconstructive Subband Coding. IEEE Trans.on CAS for Video Technology, 1 (1991) 42-48 13. Richardson, I.: Video Codec Design. 1st ed., John Wiley & Sons, West Sussex (2002) 195209

Robust Music Information Retrieval in Mobile Environment Won-Jung Yoon and Kyu-Sik Park Dankook University, Division of Information and Computer Science San 8, Hannam-Dong, Yongsan-Ku, Seoul Korea, 140-714 {helloril, kspark}@dankook.ac.kr

Abstract. In this paper, we propose a music information retrieval (MIR) system. In the real mobile environment, a query music signal is captured by a cellular phone. A major problem in this environment is distortions contained in the features of the query sound due to the mobile network and environmental noise. In order to alleviate these noises, a signal subspace noise reduction algorithm is applied. Then a robust feature extraction method called Multi-Feature Clustering (MFC) combined with SFS feature optimization is implemented to improve and stabilize the system performance. The proposed system has been tested with using cellular phones in the real world and it shows about 65% of average retrieving success rate.

1 Introduction A number of content-based music retrieval methods are available in the literature as in [1-3]. However these studies are mainly concern on the PC based music retrieval system with no noise condition. These methods are tend to fail when the query music signal contains background noises and network errors as in mobile environment. MIR in mobile environment is relatively new field of study in theses days. Burges et al [4] proposed an automatic dimensionality reduction algorithm called Distortion Discriminant Analysis (DDA) for the mobile audio fingerprint system. Kurozumi et al [5] combined local time-frequency-region normalization and robust subspace spanning, to search for the music signal acquired by the cellular phone. Phillips [6] introduces a new approach to audio fingerprinting that extracts a 32-bit energy differences along the frequency and time axes to identify the query music. In contrast to previous works, this paper focuses on the following issues on the mobile music information retrieval system. Firstly, the proposed system accepts query sound captured by a cellular phone in real mobile environment. In order to release noises due to a mobile network and environment, a signal subspace noise reduction algorithm is applied. Further effort to extract a noise robust feature is performed by SFS (sequential forward selection) feature optimization method Secondly, the music retrieval results corresponding to different input query patterns (or portions) within the same music file may be much different. In order to overcome this problem, a robust feature extraction method called MFC (multi-feature clustering) combined with SFS (sequential forward selection) is proposed. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1051 – 1055, 2006. © Springer-Verlag Berlin Heidelberg 2006

1052

W.-J. Yoon and K.-S. Park

2 Robust Music Feature Extraction, Selection and Clustering Before feature extraction, a well known signal subspace noise reduction algorithm [8] is applied to the query signal acquired by the cellular phone to reduce the mobile noises. Then, at the sampling rate of 22 kHz, the music signals are divided into 23ms frames with 50% overlapped hamming window at the two adjacent frames. Two types of features are computed from each frame: One is the timbral features such as spectral centroid, spectral roll off, spectral flux and zero crossing rates. The other is coefficient domain features such as thirteen mel-frequency cepstral coefficients (MFCC) and ten linear predictive coefficients (LPC). The means and standard deviations of these six original features and their delta values are computed over each frame for each music file to form a total of 102-dimensional feature vector. In order to reduce the computational burden and so speed up the search process, an efficient feature selection method is desired. As described in paper [9], a sequential forward selection (SFS) method is used to meet these needs. Firstly, the best single feature is selected and then one feature is added at a time which in combination with the previously selected features to maximize the classification accuracy. This process continues until all 102 dimensional features are selected. After completing the process, we pick up the best feature lines that maximize the classification accuracy. As pointed out earlier, the classification results corresponding to different query patterns within the same music file may be much different. It may cause serious uncertainty of the system performance. In order to overcome these problems, a new robust feature extraction method called multi-feature clustering (MFC) with previous feature selection procedure is implemented. Key idea is to extract pre-defined features over the full-length music signal in a step of 20 sec large window and then cluster these features in four disjoint subsets (centroids) using LBG-VQ clustering technique.

3 Experimental Setup and Simulation Results The proposed algorithm has been implemented and used to retrieve music data from a database of 240 music files. 60 music samples were collected for each of the four genres in Classical, Hiphop, Jazz, and Rock, resulting in 240 music files in database. The excerpts of the dataset were taken from radio, compact disks, and internet MP3 music files. Fig. 1 shows the block diagram of experimental setup. In order to demonstrate the system performance, two sets of experiment have been performed. One is the system with a proposed signal subspace noise reduction technique and MFC-SFS feature optimization (dashed line). The other is the system without any noise reduction technique and the feature optimization.

Fig. 1. Two sets of experimental setup

Robust Music Information Retrieval in Mobile Environment

1053

The proposed mobile MIR system works as follows. Firstly, a query music signal is picked up by the single microphone of the cellular phone and then transmitted to the MIR server. Then the signal is acquired by the INTEL dialogic D4PCI-U board in 8 kHz sampling rate, 16 bit, MONO. Secondly, a signal subspace noise reduction algorithm is applied to the query signal. Thirdly, pre-defined set of features are extracted from the enhanced query signal. At this moment, a trained music DB is available where the music files in DB were indexed by MFC feature clustering with SFS feature selection method Finally the, queried music is identified from the music DB using a simple similarity measure and the retrieval result will be transmitted via SMS server. A similarity measure between the queried music and the music file in DB is based on the minimum Euclidean distance measure. Two sets of experiment have been conducted in this paper. • Experiment 1: Demonstration of retrieval performance for the proposed MIR system and comparison analysis • Experiment 2: Retrieval test using MFC method with different query patterns Fig. 2 shows average retrieval accuracy of the system with noise reduction algorithm and MFC - SFS feature optimization method with respect to music query captured by cellular phone. From the figure, we can see that the retrieval performance increases with the increase of features up to certain number of features, while it remains almost constant and decreased after that. Thus based on the observation of these boundaries in figure 2, we can select first 20 features up to the boundary and ignore the rest of them.

Fig. 2. MFC-SFS feature selection procedure Table 1. MIR statistics for the system with and without NR and MFC-SFS feature optimzation

MIR system Retrieval accuracy Feature dimension

Noisy Query 44.3% 102

Query with NR and MFC-SFS 65.2% 20

As seen on the table 1, the proposed method achieves more than 20% higher accuracy than the one without noise reduction and MFC-SFS algorithm even with only a 20 feature set. To verify the performance of the proposed MFC-SFS method, seven excerpts with fixed duration of 5 sec were extracted from every other position in same query

1054

W.-J. Yoon and K.-S. Park

music- at music beginning and 10%, 20%, 30%, 40%, 50%, and 80% position after the beginning of music signal. Fig. 3 shows the retrieval results with seven excerpts at the prescribed query position. with MFC

with ou t MFC

10 0

Retriev al Accu racy (%)

90 80 70 60 50 40 30 20 10 0 Beg in n in g

1 0%

2 0%

3 0%

40 %

50 %

80 %

Query p o sitio n

Fig. 3. Retrieval results at different query portions with MFC-SFS

As we expected, the retrieval results without MFC-SFS greatly depends on the query positions and it’s performance is getting worse as query portion towards to two extreme cases of beginning and ending position of the music signal. On the other hand, we can find quite stable retrieval performance with MFC-SFS method and it yields relatively higher accuracy rate in the range of 55% ~ 67%. Even at two extreme cases of beginning and ending position, the system with MFC-SFSS can achieves classification accuracy as high as 62% which is more than 20% improvement over the system without MFC-SFS. This is a consequence of good MFC property which helps the system to build robust musical feature set over the full-length music signal.

4 Conclusion In this paper, we propose music information retrieval (MIR) system in mobile environment. The proposed system has been tested with using cellular phones in the real mobile environment and it shows about 65.2 % of average retrieving success rate. Experimental comparisons for music retrieval with several query excerpts from every other position are presented and it demonstrates the superiority of MFC-SFS method.

Acknowledgment This work was supported by grant No. R01-2004-000-10122-0 from the Basic Research Program of the Korea Science & Engineering Foundation

Reference 1. Tzanetakis, G., Cook, P.: Musical Genre Classification of Audio Signals, IEEE Trans. on Speech and Audio Processing, vol. 10, no. 5 (2002) 293-302 2. Wold, E., Blum, T., Keislar, D., Wheaton, J.: Content-based Classification, Search, and Retrieval of audio, IEEE Multimedia, vol.3, no. 2 (1996) 26-39

Robust Music Information Retrieval in Mobile Environment

1055

3. Foote, J.: Content-based Retrieval of Music and Audio, in Proc. SPIE Multimedia Storage Archiving Systems II, vol. 3229, C.C.J. Kuo et al., Eds. (1997) 138-147. 4. Burges, C.J.C., Platt, J.C., Jana, S.: Extracting Noise Robust Features From Audio Data, Proceedings of ICASSP (2002) 1021-1024 5. Kurozumi, T., Kashino, K., Murase, H.: A Robust Audio Searching Method for Cellularphone-based Music Information Retrieval, IAPR 16thICPR, Vol.3 (2002).991-994 6. Haitsma, J., Kalker, T.: A Highly Robust Audio. Fingerprinting System, 3rd Int. Symposium on Music. Information Retrieval (ISMIR), Oct. (2002) 14-17 7. Ephraim, Y.: A Signal Subspace Approach for Speech Enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, July (1995) 251-266. 8. Liu, M., Wan, C.: A Study on Content-Based Classification Retrieval of Audio Database, Proc. of the International Database Engineering & Applications Symposium (2001) 339 345

Robust Speech Feature Extraction Based on Dynamic Minimum Subband Spectral Subtraction Xin Ma, Weidong Zhou, and Fang Ju College of Information Science and Engineering, Shandong University Jinan, Shandong, 250100, P.R. China {max, wdzhou, jufang}@sdu.edu.cn

Abstract. Based on theoretical analysis of nonlinear feature extraction, we proposed a new method called a dynamic minimum subband spectral subtraction (DMSSS) and discussed its effects to the results of speech recognition. We illustrate the process of removing corrupted components by subtracting the estimated dynamic minimum of short-time spectra. Experimental results show the proposed method is stable and yield a good performance in ASR under noisy environments. If combined with peak isolation method, DMSSS can improve the recognition performance significantly.

1 Introduction Noise robustness research is a important aspect in speech recognition, its mainly involve noise-robust speech feature extraction [1], acoustic model adaptation [2], noise compensation [3], Parallel model [4] etc. The aim of robust speech feature extraction is extracting noise-resistant features. Its main idea is to reserve the components which are insensible to noise while include linguistic information and suppress the sensible components to noise by taking advantage of masking property of speech. Peak isolation [5] is one of the noise-robust speech feature extraction methods whose algorithm is simple and can improve the performance of recognizer under noisy environment. Noise compensation method such as spectral subtraction [6] is usually realized by directly eliminating the noise. Model adaptation uses a certain amount of test data to adapt the HMM model parameters to the noisy environment. Parallel model combination uses parallel hidden Markov models perform simultaneous processes for noise and speech. When the SNR decreases, or fluctuation of noise increases, some methods of noiserobust speech feature extraction can lost more useful information than usual methods especially for invoiced sounds. Focusing on the noise-robust speech feature extraction and adequately considering the nonstationarity of noise, we proposed a method named dynamic minimum subband spectral subtraction (DMSSS), which can suppress the noise sensible components of speech and maintain noise-robust speech features at the same time. It can enhance the noise resistance of recognizer with no preknowledge of noise, and its computational cost is low. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1056 – 1061, 2006. © Springer-Verlag Berlin Heidelberg 2006

Robust Speech Feature Extraction Based on DMSSS

1057

2 Theory and Realization of DMSSS It is difficult to estimate the power spectrum accurately because the power spectrum is not time-invariant even for stationary noise. But for a short segment of noise, its fluctuation is limited. If the analysis window is short, power spectrum of noise can be thought to be stationary within this window. We can increase the SNR by wholly removing the prominent increments of spectra caused by noise. If noise and speech are assumed to be independent for additive noise model, then the noisy speech signal y(t) is the sum of the speech signal s(t) and the noise signal n(t), y(t)= s(t) + n(t)

(1)

After the spectral analysis we can obtain the power spectrum representation

Y(e jȦ )=S(e jȦ )+N(e jȦ ) .

(2)

But this summation does not hold for the amplitude spectra because their phases are not consistent. If the noise and speech are independent and we assume the distribution of noise is Gaussian, for one subband of power spectra of certain analysis window, we can think the discrete short-term power spectra have the relation like

E[|Yk |2 =E[|Sk |2 + E[| N k |2 ] .

(3)

If we use λ (k) standing for the expectation of | N k | , for a short time we can think

|Yk |2 =| Sk |2 +λ (k) .

(4)

λ(k) cannot be obtained directly, however the relation of minimums of power spectral in a short time span can be known, which is min[Yk ]T 2 = min[Sk ]T 2 + λ (k) .

(5)

and we have following relation

[Yk ]2 − min[Yk ]2 T = [Sk ]2 − min[Sk ]2 T .

(6)

where T is the time span required to analyze the above relations. After the above subtraction is made, the relative forms of spectra are still retained and the noise effects to the spectra are suppressed. If T was selected reasonable, the minimum of subband power spectra will close to noise subband power spectra in this time span. As Mel energy spectral are often used for recognition, we can use Mel energy instead of power spectral. first the time segments are constructed for estimating the minimum of Mel energy. Supposing the number of frames for estimating the minimum of Mel energy is N, we can select forward, backward or bi-directional approaches. For example, if the numbers of frames for calculating is even, then the forward approach uses

1058

X. Ma, W. Zhou, and F. Ju

the current frame and the past N-1 frames to make up one segment that is used for calculating m in[ M el , M el , ...M el ] , where subscript t stands for the time of current frame. The backward approach applies the current and the ahead N-1 frames and bi-directional approach uses the current, past and ahead frames to form the required segment. As the change of speech is slow, the adjacent min Mel(t) s should not have sharp change. So after calculating the min Mel(t) , we should smooth them along time direction. The simple way for smoothing is to average the adjacent min Mel(t) using formula (7), t − N +1

t−N+2

min Mel (t) = [min Mel(t − 1) + min Mel(t + 1)] / 2 . SP

(7)

where the min Mel (t) is the smoothed minimum of Mel energy in one subband. Then we can subtract it from subband Mel energy like (8 )

MelS(t) = Mel(t) − min MelSP (t) .

(8)

MelS(t) is the subband Mel energy after subtracting dynamic minimum. To avoid the MelS(t) becoming too small, we must set a positive threshold ε , and modify formula (8) as follows:

MelS(t) = max{Mel(t) − min MelSP (t), ε} .

(9)

where the function max(•, •) stands for selecting the larger from the two parameters. According to our experience, the ε is chosen between 10~100. Fig.1a shows the three-dimensional Mel energy spectrogram of noisy speech (corrupted by 0dB addictive noise) before and after DMSSS process.

x 10

x 1 011

Mel energy amplitude

2 .5

2 1 .5

1 0 .5 0 30

2 1 .5 1

0 .5

0 30

C h a n n e l N2o0 10

100

300 200 F ra m e N o

400

500

400 10

Channel No

200

100

300

F ra m e N o

(a )

(b )

Fig. 1. Three-dimensional spectrogram of noisy speech. (a) Noisy speech before DMSSS enhancement, (b) Noisy speech after DMSSS enhancement.

Fig 2 gives the comparison of Log Mel energy spectra for clean, noisy, and processed noisy signal spectra by using DMSSS. It can be found the effects of noise mainly concentrate in the valley of log energy spectra. After DMSSS, the noise effects can be alleviated observably.

Robust Speech Feature Extraction Based on DMSSS

1059

2 6 2 4

Log Mel energy amplitude

2 2 2 0 1 8 1 6 1 4 1 2 1 0

1 0

1 5 C h a n n e l N o

2 0

2 5

3 0

Fig. 2. Log Mel energy amplitude of one frame in speech “zhiyue” dot, dashdotted, and solid line show the noisy speech, enhanced noisy speech and clean speech

4 Experiments and Results Speech recognition experiments are conducted on the “Chinese 863”speech recognition database(a database widely used for Chinese speech recognition) [7]. The input speech is sampled at 16kHz and segmented as 25ms frames with an overlap of 15ms and preemphasis coefficient is 0.95. Triphone HMMs with 5 states single gauss are trained with HTK3.2 [8]. The features evaluated in these experiments include MFCCs, and MFCCs enhanced with DMSSS, PKISO, DMSSS combining with PKISO. Feature vectors of baselines have 39 elements consisting of 12 MFCC, 0’th cepstral parameters and their delta and acceleration coefficients. The DMSSS and PKISO methods use improved MFCCs introduced by us and Strope etc. respectively [5]. Training is conducted with clean signals, and recognizing is done with noisy speech signals at different SNRs. The noisy speech signal are generated artificially by mixing speech signals with Gaussian white and babble noises. For evaluate the results, the recognition accuracy rate defined by HTK book [8] is used as our criterion, The syllable recognition rates are shown in Table 1. Table 1. The syllable recognition Accuracy rates of BASELINE, PKISO, DMSSS(N=32), PKISO +DMSSS (N=32) SNR (dB) 0 10 20 30

Babble BASELINE

PKISO

DMSSS

0.07 35.55 75.42 82.80

1.82 42.76 61.26 76.41

25.56 58.36 72.78 76.56

Gaussian white PKISO+DMSSS BASELINE

28.10 66.04 78.88 80.22

2.86 27.16 60.82 80.30

PKISO

DMSSS

PKISO+DMSSS

-3.10 35.56 62.77 77.65

25.99 53.26 69.77 74.38

31.56 69.66 70.13 80.83

Clearly, for syllable recognition, PKISO, DMSSS, and combined PKISO with DMSSS can improve recognition performance against speech corrupted by noises. The performance of PKISO and DMSSS combination is the best in these methods. For further finding the performance of above methods to different phones, we statistic the recognition results of voiced, unvoiced consonants, and vowels. Results are shown in Table 2, we can find when the SNR is high, PKISO or DMSSS will become a little poor than plain MFCCs for unvoiced consonants, but for voiced consonants

1060

X. Ma, W. Zhou, and F. Ju

and vowels, the performance of these techniques are similar. With the decreasing of SNR, DMSSS and PKISO decrease slower than plain MFCCs When SNR is less than 20dB, the best results are always obtained with the PKISO and DMSSS combination whether for voiced, unvoiced consonants, or vowels. Table 2. The recognition accuracy rates of BASELINE, PKISO, DMSSS(N=32), PKISO +DMSSS (N=32) for unvoiced consonants, voiced consonants, vowel Type of phone

unvoiced consonant

voiced consonant

vowel

SNR (dB) 0 10 20 30 0 10 20 30 0 10 20 30

Babble

Gaussian white

BASELINE

PKISO

DMSSS

PKISO+DMSSS

BASELINE

PKISO

DMSSS

PKISO +DMSSS

-0.54 35.45 62.38 78.88 0.80 56.78 67.58 90.32 53.63 67.91 82.88 88.84

0.03 42.84 51.32 61.73 2.03 72.65 89.63 91.76 50.13 72.36 91.06 92.13

6.86 58.89 67.08 70.05 16.70 67.16 78.08 84.55 65.78 76.88 86.00 88.50

16.80 59.25 69.66 73.98 47.12 76.12 91.00 91.98 66.09 78.57 91.02 91.81

-1.41 11.55 51.92 82.38 -0.43 38.17 75.35 86.98 50.81 59.42 75.79 84.30

5.58 33.14 72.25 78.61 2.87 53.58 80.45 88.76 55.10 75.50 82.78 91.61

8.32 34.66 71.68 78.33 15.17 58.95 80.06 86.30 42.94 63.52 81.68 84.30

20.46 57.32 76.70 84.34 44.09 75.50 90.01 90.34 61.65 82.38 90.85 95.45

5 The Effect of Length of Short-Time Segment in DMSSS The syllable recognition rates at different length of short-time segment (N) for noises are illustrated as Fig.3. For babble noise, the best average recognition rates emerge approximately at N=32, but for Gaussian white noise the best average recognition rates emerge approximately at N=24. Clearly, different N for different noises can bring the changing of recognition rate. Generally, if N is too large, the minimum estimated with DMSSS will tend to zero, the abilities of DMSSS to suppress noise will decrease. If N is too small, the minimum may change a lot and cause the method ineffective.

(a)

(b)

Fig. 3. The effects of length of short-time segment to recognition results with DMSSS at different SNR (0,10,20,30dB) (a) for babble white noises, (b) for Gaussian noises

Robust Speech Feature Extraction Based on DMSSS

1061

6 Conclusions This paper presents a novel nonlinear approach for speech recognition, which is named dynamic minimum subband spectral subtraction (DMSSS). Theoretical analysis indicates the proposed method is easily realized, and experimental results demonstrate its effectiveness of improving the robustness in automatic speech recognition. The experiments also show the length of short-time segment for dynamic minimum has effects to noises suppressing. When the length is chosen reasonably, our algorithm will have significant performance improvement.

References 1. Vivek Tyagi, Christian Wellekens: On De-emphasizing The Spurious Components in The Spectral Modulation for Robust Speech Recognition. Robust 2004, COST278 and ISCA Tutorial and Research Workshop Robustness Issues in Conversational Interaction August (2004) 30-31 2. Nolazco-Flores, J., Young, S.: Continuous Speech Recognition in Noise using Spectral Subtraction and HMM Adaptation. ICASSP Pp.I. (1994) 409-412 3. Raj, B., Seltzer, M., Stern, R.: Reconstruction of Damaged Spectrographic Features for Robust Speech Recognition. Proceedings ICSLP, volume 1, Beijing, China (2000) 375-360 4. Gales, M.J.F., Young, S.J.: Robust Continuous Speech Recognition using Parallel Model Combination. IEEE Transactions on Speech and Audio Processing 4 (1996) 352–359 5. Strope, B., Alwan, A.: A Model of Dynamic Auditory Perception and Its Application to Robust Word Recognition. IEEE Transactions on Speech and Audio Processing 5 (1997) 451–464 6. Boll, S.F.: Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Transactions on Acoustics. Speech and Signal Processing 27(2) (1979) 113-120 7. http://www.cass.net.cn/chinese/s18_yys/yuyin/product/product_2.htm 8. http://htk.eng.cam.ac.uk/

Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain and Its Application Choong Ho Lee Graduate School of Hanbat National University, San 16-1 Deokmyeong-dong Yuseong-gu Daejeon 305-719 Korea [emailprotected]

Abstract. Searching and enhancement of shadow area in the satellite imagery is one of growing interest because of new possible needs of application in this field. This paper proposes an algorithm to search the shadow areas caused by buildings which are very common in satellite imagery of urban area in Korea. Binarization using histogram and threshold has demerits to have scattered small shadow areas which should be ignored for some applications. The proposed searching algorithm uses the fast Fourier transform and computes correlation in frequency domain. We search the threshold for correlation which is appropriate to obtain the shadow areas which do not include the scattered small dark areas. Experimental results show this method is valid to extract shadow areas from the satellite imagery.

1 Introduction There has been considerable recent interest in searching and enhancement of shadow area of 1-m satellite imagery [1, 2]. It is reported that the shadow area in the satellite imagery is useful to detect building images semi-automatically [3, 4]. Sohn et al. reported a searching and enhancement algorithm for shadow area which can not be performed automatically [1, 2]. K. L. Kim et al. reported some more complex method to search special areas which uses clustering, labeling, segmentation, feature extraction, and fuzzy theory [5]. K. L. Kim et al. also reported the feature extraction method which can be performed semi-automatically or automatically by comparing color images with grey-scale images [6]. However, no methods to search and enhance the shadow area semi-automatically or automatically are reported as far as authors know. We present a searching and enhancement algorithm which can be performed semiautomatically or automatically. The searching algorithm uses the correlation to obtain the template to extract shadow area from a satellite imagery. The algorithm preserves the bright area (sunny area) because we perform the enhancement algorithm only for shadow area which is separated from the bright area.

2 Searching and Enhancement Algorithm Enhancing the picture quality of the shadow area tends to degrade that of the bright area in the satellite imagery as shown Fig. 1, when we use the enhancement D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1062 – 1067, 2006. © Springer-Verlag Berlin Heidelberg 2006

Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain

1063

algorithms such as histogram equalization, histogram specification, or contrast stretching etc. which are proper to process dark area in images. 2.1 Extraction of Shadow Area To prevent the degradation of picture quality in the bright area, we need to separate an image into two parts, sunny area and shadow area, to use the algorithms only in shadow area but to preserve the sunny area as it is.

Fig. 1. A 500x500 satellite image which includes shadow area

Binarization algorithms based on the grey levels can lead to the problem of diffuse shadow areas as shown Fig. 2. We introduce the correlation to make a template which can be used to extract shadow area from the satellite image. Correlation C (u , v ) can be computed by solving

C (u , v) = Re{F (u , v) * G (u , v)} .

(1)

where f ( x, y ) and g ( x, y ) are two image signals and F (u , v ) and G (u , v ) are Fourier transformations. Actually, the correlation C (u , v ) can be computed by first rotating an image m by 180 degrees and then using the FFT-based convolution techniques as follows

C (u , v) = Re[ F −1{F ( M )G π (m)}] .

(2)

F −1 means inverse Fourier transformation, and F (M ) denotes Fourier transπ formation of image M and G is Fourier transformation of an image m rotated by

where

180 degrees. Using the template Fig. 3, the shadow area can be extracted as shown by Fig. 4. Likewise, the bright area can be extracted as shown by Fig. 5.

1064

C.H. Lee

Fig. 2. After binarization by greylevel 128

Fig. 3. A template for shadow area obtained by correlation of 8x8 shadow block and original image

Fig. 4. Shadow areas which are extracted from the original image

Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain

1065

Fig. 5. Sunny areas which are extracted

2.2 Enhancement of Picture Quality of Shadow Area To enhance the picture quality of shadow area, histogram equalization is used. Fig. 6 shows the result.

Fig. 6. Shadow area after histogram equalization

3 Simulation Results In the experiment, we used 8x8 block of shadow area to compute the correlation of the satellite imagery. Actually, we computed the correlation using Eq. 2 (refer [7]). The Fig. 3 is obtained using threshold 1,500,000 which is a little less than the maximum value 2,229,975. The correlation provides the template for shadow areas which can extract the shadow areas from the original satellite imagery. The shadow area obtained by using the template does not have diffused point areas. The image quality of shadow area is improved by histogram equalization while the bright area is preserved as it is. After that process the shadow area and the bright area are added.

1066

C.H. Lee

Fig. 7. Reconstructed image

Fig. 7 is the resultant image obtained by histogram equalization for the original image Fig. 1. Fig. 8 shows the resultant image using the algorithm we suggested. After histogram equalization without our algorithm, the objective picture qualities of shadow area and bright area are degraded to 27.99 dB and 22.79 dB respectively in PSNRs(peak-signal-to-noise-ratio). Bright area degrades more than shadow area does. Considering that while the subjective picture quality of shadow area aggrades the degradation of bright area in the satellite imagery can be critical sometimes. Thus, the algorithm we propose is useful to improve the picture quality of shadow area preserving that of the bright area.

4 Conclusions The correlation of a small block of shadow area and the satellite imagery provides the template which is not diffuse for shadow areas. Using the template, the shadow area and the bright area can be separated. While the conventional enhancement algorithms degrade the picture quality of bright area, our algorithm can enhance the picture quality of shadow area more effectively preserving that of bright area as it is.

References 1. Sohn, H. G., Yun, K. H., Park H. K. : Enhanced Urban Information Recognition through Correction of Shadow Effects. In: Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Busan Korea, (2003) 187–190 2. Sohn, H. G., Yun, K. H., Lee, D. C. : A Study of the Corrction of Shadow Effects in Aerial Color Photo (Focusing on Road in Urban Area). In: Proceedings of Joint Conf. of Korean Society of GIS and Korean Society of Remote Sensing, (2003) 383–387 3. Ye, C. S. ,Lee, K. H. : Detection Using Shadow Information in KOMSAT Satellite Imagery. In: Proceedings of Joint Conf. of Korean Society of GIS and Korean Society of Remote Sensing Vol.16, No.3, 2000 383–387

Searching Algorithm for Shadow Areas Using Correlation in Fourier Domain

1067

4. Yoon, T.H. : Semi-automatic Building Segmentation from 1m Resolution Aerial Images. In: Master Thesis, Korean Advanced Institute of Science and Technology, Daejeon, Korea 5. Kim. K. L., Kim, U. N., Kim H. J.: Methods on Recognition and Recovery Process of Censored Areas in Digital Image. In: Proceedings of Korean Society of Surverying, Geodesy, Photogrammetry and Cartography, (2002) 1–11 6. Kim. K. L., Kim, U. N., Chun, H. W. : A Study on Semi-automatic Feature Extraction Using False Color Aerial Image In: Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, (2002) 109–115 7. Cluster Image Processing Toolbox User’s Guide. The Mathworks Inc. MA, (1997)

Shadow Detection Based on rgb Color Model Baisheng Chen and Duansheng Chen Department of Computer Science of Huaqiao University, Quanzhou, 362021, China {samchen, dschen}@hqu.edu.cn

Abstract. A shadow detection scheme based on photometric invariant rgb color model is proposed. We firstly study the photometric invariance of rgb color model and deduce some important property. The algorithm combines the cues of moving cast shadow on brightness and chromaticity successively to detect candidate shadow regions in rgb color space; finally, a post-processing by exploiting region-based geometry information to exclude pseudo shadow segments. Results are presented for several video sequences representing a variety of illumination conditions and ground materials when the shadows are cast on different surface types. The results show our approach robust to widely different background and illuminations.

1 Introduction Moving cast shadows can cause object merging, object shape distortion, and even object losses (due to the shadow cast over another object). For this reason, moving shadows detection is critical for accurate objects detection in vision surveillance applications. Many algorithms have been proposed in the literatures that deal with shadows. These approaches are mostly classified as model-based and feature-based. The first approach comprises methods which are designed for special applications, such as aerial image understanding [1], surveillance [2]. They exploit a prioriknowledge of the 3-D geometry of the scene, the objects and the illumination. These model-based approaches have two major limitations. Simple rectilinear models can be used only for simple objects, for instance buildings and vehicles. In addition, the priori-knowledge of the illumination and 3-D geometry of the scene is not always available. The second overcomes these limitations by exploiting shadow geometry, intensity and color properties. For example, [3] utilizes the rationale that shadows have similar chromaticity, but lower brightness than the background to remove shadows in HSV color space. The approach proposed is shadow feature based. We exploit the shadow features on brightness and chromaticity to detect shadows and implement the algorithm in the photometric invariant normalized rgb color space. The remainder of this paper is organized as follows. In section 2, we focus on the photometric invariance of normalized rgb color model. At the next section, normalized rgb color model based shadow detection scheme is detailed described. Experimental results and analysis are given in section 4. At the last section, we draw a conclusion about our work. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1068 – 1074, 2006. © Springer-Verlag Berlin Heidelberg 2006

Shadow Detection Based on rgb Color Model

1069

2 rgb Color Model In our work, we focus on the normalized color model defined as follows:

R G B ; g= ; b= R+G+B R+G+B R+G+B

(1)

where r + g + b = 1 . The normalized rgb color model defined above possesses photometric invariant features, which is insensitive to surface orientation, illumination direction and intensity. Photometric invariant features are functions describing the color configuration of each image coordinate discounting local illumination variations, such as shadings and shadows. Given the red, green and blue sensors with spectral sensitivities f R (λ ) , f G (λ ) and

f B (λ ) respectively, to obtain an image of the surface patch illuminated by a SPD of the incident light denoted by e(λ ) , the measured sensors values will be given [4] as,

C = m b (n,s) ³ f C (λ )e(λ )c b (λ )dλ + ms (n,s,v) ³ f C (λ )e(λ )cs (λ )dλ λ

(2)

where c b (λ ) and cs (λ ) are the albedo and Fresnel reflectance respectively. λ denotes the wavelength, n is the surface patch normal; s is the direction of the illumination source, and v of the direction of the viewer. Geometric terms m b and ms denote the geometric dependencies on the body and surface reflection respectively. Considering the neutral interface reflection (NIR) model and white illumination, it stands e(λ ) = e, cs (λ ) = cs . Then, we proposed the measured sensor values are given: C w = em b (n,s)k C + ems (n,s,v)cs ³ f C (λ )dλ λ

(3)

for C w = {R w , G w , Bw } giving the red, green and blue sensor response under the

assumption of a white light source. k C = ³ f C (λ )c b (λ )dλ is a compact formulation λ

depending on the sensors and the surface albedo. If the integrated white condition holds ³ f R (λ )dλ = ³ f G (λ )dλ = ³ f B (λ )dλ = f , we have, λ

C w = Cb + Cs = em b (n,s)k C + ems (n,s, v)cs f

(4)

According to the body reflection term of eq. (3), Cb = em b (n,s)k C , then the normalized rgb color model is insensitive to surface orientation, illumination direction and intensity as can be seen from

r(Rb ,Gb ,Bb ) =

emb (n,s)kR kR = emb (n,s)(kR + kG + kB ) kR + kG + kB

(5)

It only depends on the sensors and the surface albedo. Equal arguments also hold for g and b components.

1070

B. Chen and D. Chen

3 Normalized rgb Color Model Based Shadow Detection Our approach, depicted in Fig. 1, is a multistage approach where each stage of the algorithm removes moving object pixels, which cannot be shadow pixels. Input video frame is passed through the system and Mi is the binary mask of potential shadow pixels updated after each step.

Fig. 1. Steps of the shadow detection algorithm

Step 1—Moving Object Detection: To model the background, recent history of each k

pixel x, is modeled by a mixture of K Gaussians: P(x)= ¦ i=1 Wi ×η ( x, μi , ¦i ) , where

for each R, G, B channel, P(x) is the probability of observing pixel value x. η is the Gaussian function whose ith mixture component is characterized by the mean μi , covariance ¦ i and weight Wi . The distortion of brightness for each pixel between the incoming frame and the reference image is computed to abstract the candidate pixels of moving objects. This process is performed as follows:

1 I(x) − B(x) ≥ 2*σ (x) F(x) = ® otherwise ¯0

(6)

where σ ( x) is the mean value of the distortion of brightness for the pixel at the position x, and it is computed as follows:

σ (x,i) = max(σmin ,α I(x,i) − B(x) + (1−α )σ (x,i −1))

(7)

where a minimum distortion value σ min is introduced as a noise threshold to prevent the distortion value from decreasing below a minimum should the background measurements remain strictly constant over a long period. After the initial detection, the binary mask (M1 in Fig. 1) contains the moving object, its shadow and noisy isolated pixels. Step 2—Luminance Ratio Test: Researches in [5] state that the ratio between pixels when illuminated and the same pixels under shadows can be roughly linear. Step2 exploits this observation to initially segment shadow regions. The following intensity test is applied to moving objects and shadow pixels to further reduce their number. Let p(x) be the pixel at position x, where I(x) and B(x) are the corresponding pixel values in the input image and in the background model, respectively. ∀p(x) ∈ M1 if(α T f o (i, j ) = ® ¯ 0 else

(1)

f B (i, j ) = 1 , becomes a pixel pertinent to the animal. However, in the case f B (i, j ) has a middle level value pertinent to the shade At this time, the pixel that is

area, a lower threshold can be applied. Accordingly, the threshold T is used adaptively such as in equation (2), according to the value of the background image pixel.

(2)

Once the threshold is applied by the above equation, it can be extracted perfectly, even if an object enters the shaded area. The program processing screen of the suggested tracking system is as presented in figure 2. Users can set the experimental time and frequency. In addition, the system is divided into areas, and is possible to trace several moving animals simultaneously. The maximum possible tracking area is four, as revealed as figure 3. Similar to the water maze test, for the experiment measuring the time required to obtain a specific area for the experimental animal, if the goal area is set up, the arrival time can be compared. When the animal comes to the goal area, the experiment is finished automatically.

1096

J.-H. Han et al.

3.2 Analysis System 3.2.1 Object's Basic Feature Expression through Shape Descriptors There are experimental animal size, location, and direction as features to compare and analyze both forms and behavior aspects of the experimental animal. Almost importantly, prior to obtaining the features, it is assumed that the animal binary image of m×n size is given. Features for the size are expressed as the number of pixels that the animal consumes. n

A = ∑∑ B(i, j )

(3)

i =1 j =1

Fig. 3. Shape descriptor

In a binary image, as the experimental animal's central location is the same as the area location, its central location (xc,yc) can be obtained by equation (4). n

xc =

∑∑ jB(i, j ) i =1 j =1

yc =

∑∑ iB(i, j ) i =1 j =1

(4)

The shape descriptor used in this paper is presented in figure 3.[7] If the animal's central location is fixed in advance, the animal's position can be calculated by using the central location and obtaining its major axis, minor axis, and slope as in equation (5). In the animal experiment, as the same animal motion is recorded for the experimental time, the long and short lengths are measured in the labeling area. Therefore, if the long and short length is compared, it can be found whether the animal stands, crouches, or moves. Equation (5a) stands for the motion to the x axis; equation (5b) to the y axis and equation (5c) to two dimensional directions. Equations (5d) and (5e), are the long and short axis, respectively, and equation (5f) is the direction. 1 ( x − x c )2 μ xx = (5a) ∑ A ( x , y )∈R

μ yy =

1 2 ∑ ( y − yc ) A ( x , y )∈R

(5b)

Tracking, Record, and Analysis System of Animal’s Motion for the Clinic Experiment

1 ¦ (x − xc )( y − y c ) A ( x , y )∈R

μ xy =

a = 2 2 μ xx + μ yy + b = 2 2 μ xx + μ yy −

(μ (μ

1097

(5c)

− μ yy ) + 4μ xy2

(5d)

− μ yy ) + 4 μ xy2

(5e)

− 2 μ xy −1 , °tan (μ xx − μ yy ) + (μ xx − μ yy )2 + 4μ xy2 °° θ =® 2 ° −1 μ yy − μ xx + (μ yy − μ xx ) + 4 μ xy2 tan ° − 2μ xy °¯

if ( μ yy ≥ μ xx ) (5f)

The shape descriptor used in the proposed system is the HMI. This is suggested by the nonlinear compound of a geometrical moment by Hu. The main feature of the HMI is that it has invariable character for various conversions such as the object location, rotation, size and so on. In the two dimensional image spaces taking discrete values, if each pixel value of each image having M×N size is f (i, j ), ( p + q) degree moment mpq will appear as in equation (6) M

m pq = ¦¦ i p j q f (i, j )

(6)

i = 0 j =0

If there is only one object that exists, and all background values in images are zero, (xc,yc), the center of the object, can be determined using equation (7).

xc = m10 / m00 ,

y c = m01 / m00

(7)

In this point, the 0 degree moment is the object area, and the first moment (m10, m01) is the distribution value of the i and j axis. After obtaining the center, the central moment and fixed central moment such as equation (8), can be found by summing the values, excluding the center, in accordance with the weigh of the coordinates in the object value.

μ pq = ¦¦ (x − xc ) p ( y − y c )q f ( x, y )

(8a)

η pq = μ pq / μ 00( p + q + 2 ) / 2

(8b)

When using the fixed center moment, HMI can be determined [5][6]. HMI values have 7 values originally. Even though the values from M5 to M7 among the values can express the object more precisely, owing to describing the components of the result, the components individually are very sensitive to noise. In this system, only four values from M1 to M4 are used to reduce calculations and to obtain a fast response time, and then perceive the variation of the animal's outer shape.

1098

J.-H. Han et al.

3.2.2 Function of the Analysis System First, the animal trace from the experimental time can be seen. At this point as a starting point appears to be red and the final point appears to be blue, the trace when an object moves can be visibly distinguished. Second, the total distance that the animal moves and average motion speed can be measured. In the case of the experimental speed, it is possible for the user to define and measure the speed section, and average motion speed. For example, the low speed section can be separated from the high speed section and then motion time can be measured. Third, to make it possible to analyze the distance the animal moves in some area of the maze, the tracking area is divided into districts, the motion distance and possession time are measured, and the share compared with the total district is shown as a percentage. Fourth, every analysis is conducted based on the tracking area designated by the user. In the analysis program, it is possible for the user to voluntarily designate and analyze the area. Fifth, after the experiment finishes, new and additional analyses are possible, as the user can replay and watch the experiment. The existing system does not suggest a method to measure and analyze object shape. Consequently, only the object central location is recorded, and the analysis of motion distance, speed, and trace is made possible. The developed system proposes the method to conduct modeling and record animal shape by analyzing data and extending the analysis functions. This makes it possible to analyze data of the specific form such as the following, and offer this to users.

Fig. 4. Analysis program processing screen

First, the long and short axis of the animal are measured and a graph is offered to compare at a relative rate. Its short axis is the shortest constituent of the trunk horizontally. Accordingly, it hardly changes during the experimental time. However, the long axis is included in body length from tail to head, and changes continuously. If variation of the relative rates between the long and short axis is relatively low, it

Tracking, Record, and Analysis System of Animal’s Motion for the Clinic Experiment

1099

can be concluded that the animal does a specific behavior. However, the in case the animal stands and crouches, as the body length becomes short, it can be used as a criterion to distinguish abnormal animal behavior. This behavior is called rearing; and can be utilized as analysis data by offering a graph as presented in figure 5.

(a)

(b)

Fig. 5. Result graph, where (a) is the Long and short axis rate graph, (b) is the HMI graph

Second, the developed system uses the constant moment as the method of modeling the experimental animal contour. The user can grasp a change in its contour through a variation rate of the HMI value and detect abnormal behavior.

4 Conclusion The system proposed in this paper consists of general image capture board, a tracking program which operates in driver and window surroundings, and an analysis program. An automated experiment is made possible when the proposed system is deployed. Therefore, the system can retrench labor power, and continue the experiment without limitation as long as disk space is sufficient. The existing system can only be measured in a restricted situation. But, if the animal can be distinguished without any relationship between background and illumination, the proposed system is a unique product, making it possible to experiment with several kinds of experiments, as measured in any situation. In addition, in the case the animal experiment is conducted using passive records by hands, experimental data is not objective, and the experiment result is not sometimes acknowledged externally. The measuring method for this system extracts data objects using an instrument, the experiment result can be confirmed, admitted objectivity, and if possible, analyzed several times. The development technique is a technique that extracts the moving animal in an image, and models, recognizes, and tracks the moving animal in various situations. Accordingly, the result of this study is applicable directly to the lookout and preservation system, and so on.

1100

J.-H. Han et al.

References 1. Naohiro, A., Akihiro, F.: Detection Obstructions and Tracking Moving Object by Image Processing Technique. Electronics and Communications in Japan. Vol.82. (1999) 28-33 2. Betke, M., Haritaoglu, E.: L. S. Davis.:Real-Time Multiple Vehicle Detection and Tracking from a Moving Vehicle. Machine Vision and Applications. 12:2. (2000) 69-72 3. http://vision.fe.uni-lj.si/research/trackan/index.html 4. Gharavi, H., Mills, M.: Block-Matching Motion Estimation. IEEE Trans. on Commum. Vol.38. (1990) 950-953 5. Gouda, I. S., Lynn, A.A.: Moment invariants and Quantization Effects. IEEE Proceedings. (1998) 157-163 6. Alexander, G., Mamistvalov: N-Dimensional Moment Invariants and Conceptual mathematical Theory of Recognition N-Dimentional Solids. IEEE Trans. On Pattern analysis and Machine Intelligence. Vol.20. No.8 (1998) 7. Haralick, R.M., Shapiro, L.G.: Computer and Robot Vision. Journal Addison – Wesley. Vol.1. (1992)

VEP Estimation with Feature Enhancement by Whiten Filter for Brain Computer Interface* Jin-an Guan School of Electronic Engineering, South-Central University for Nationalities, Wuhan, 430074, China [emailprotected]

Abstract. An imitating-natural-reading paradigm was used to induce robust visual evoked potentials (VEPs) which as carriers for a brain-computer interface based mental speller. Support vector machine (SVM) was adopted in the single-trail classification on the features. To improve the accuracy of pattern recognition, thus to boost up the bit rate of whole system, a 300ms window was used to estimate the accurate time of target stimuli present from EEG signals. As the spontaneous EEG could be regarded as a stationary random process in a short period, a whiten filter was constructed by the AR parameters which calculated from those non-target induced signals. In succession, real-world signals were input to the filter where the spontaneous EEGs were whitened. Finally, a wavelet method was applied to have the white signals filtered. The results boosted up the classification accuracy by enhancing the target signals.

1 Introduction We are now constructing a mental speller based on a novel technique which is called Brain Computer Interface (BCI). BCI provides a direct communication channel from the user’s brain to the external world by reading the electrical signatures of brain’s activity and its responses to external stimuli. These responses can then be translated to computer commands, thus providing a communication link, particularly for people with severe disabilities [1], [2]. Visual Evoked Potentials (VEPs) are usually exploited as communication carriers between brain and computer in a mental speller paradigm. The measured responses are often considered as the combination of electric activity resulted by multiple brain generators, active in association with the eliciting event, and noises which are brain activity not related to the stimulus, together with interference from non neural sources, like eye blinks and other artifacts. Even though they are dominated by lower frequencies, compared to background electroencephalogram (EEG), due to poor signal-to-noise ratio (SNR) conditions, their form is difficult to be estimated in a trial to trial scheme [3],[4]. As spontaneous EEG could be regarded as a stationary random process in a short period, we could process the signals using stationary signal models [5]. In order to *

This work is supported by NSF of South-Central University for Nationalities Grant # YZZ05015 to Jin-an Guan.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1101 – 1106, 2006. © Springer-Verlag Berlin Heidelberg 2006

1102

J.-a. Guan

identify patterns from EEG signals for the speller application, we use a Support Vector Machine (SVM) as our classifier [6]. To speed up the bits rate of our system, a relatively shorter time window to detect evoked potentials is preferred. This paper presents a novel component of VEP-N2, which not reported in literatures, as features for BCI. Before features were input into a classifier, an AR whiten filter procedure was implemented to have the N2 components enhanced.

2 Methods 2.1 Experimental Setup and Data Acquisition Experimental model and data come from the cognitive laboratory in South-Central University for Nationalities of China. The objective of the experimental data acquisition was to obtain EEG signals during Imitating-Natural-Reading paradigm with target onset and non target onset. EEG activity was recorded from Fz, Cz, Pz, and Oz. Subjects viewed a monitor which has a window in the center and with a size of 16×16 pixels containing gray patterns against a black background. The continuous symbol string which consists of target and non-target symbols move through the window smoothly from right to left at a speed of 160ms/symbol. The epoch was started at a short tune, which remind the subject to focus his eye to the window where non-target symbols were moving continuously. The delay between start time and the target symbol to appear varied randomly between 1.5-3 s. In each trial, acquisition of EEG started at 320ms (for subject H, and 210ms for subject M and T ) before target onset and then halted at 880ms (for subject H, and 990ms for subject M and T ) after target presenting, thus totally 512 samples in 1.2 seconds were sampled. More detailed description about the experiment could be found in [7]. 2.2 Feature Enhancement Using Whiten Filter Spontaneous EEGs are comprehensive electrical physiological reflections of neural system on the cortex or scalp. It will running ceaselessly as long as a man is alive, on matter he/shi is in a state of active thinking or be induced passively or in a state of unconsciousness. The producing mechanism is very complex. But the spontaneous EEG could be regarded as a stationary random process in a short period, we could process the signals using stationary signal models. For an Auto Regressive model, we denote w(n) the white noise, the output of spontaneous EEG(denote by E(n)) through system A(z) is a white noise: p

w( n) = ¦ ak E (n − k ) . k =0

(1)

Where ak are parameters of AR model, p is the order of AR model. Now, presume that the trial recordings from user are only linearly mixed of Event-Related Potential (ERP) and spontaneous EEG, and in succession hypothesize that they are independent each other, then the recorded signal x(n) of every stimuli could be represented as follow: x ( n) = s ( n) + E ( n) . (2)

VEP Estimation with Feature Enhancement by Whiten Filter for BCI

1103

Where s(n) is ERP component, E(n) is spontaneous EEG. In order to discriminate the EEG of target and non-target stimulus, we take the non-target EEG as the spontaneous EEG, i.e. E(n). To wipe off E(n), three steps should be taken: (1) Calculate the parameters of AR model using the non-target stimuli signals to construction a whiten filter; (2) Filtering x(n) by the whiten filter: p

k =0

y(n) = ¦ ak x(n − k ) = ¦ ak {s(n − k ) + E (n − k )} = ¦ ak s(n − k ) + w(n) .

(3)

Let Y (n) = ¦ ak s (n − k ) as the result of ERP through the whiten filter, we have: k =0

y ( n ) = Y ( n ) + w( n ) .

(4)

(3) Filtering the whitened noise signals using Mallat wavelet algorithm. Fig.1 is the 84 trial averaged waveform of target (thick dash line) and non-target stimuli from subject H. Fig.2 is the results of whitened signals after filtered by wavelet. Compare Fig.1 to Fig.2, we could find that besides a few point at the beginning caused by cutting edge effective, the non-target signal is become smoother and target signal is more prominent. These suggest that the spontaneous EEG were rejected effectively.

SUBJECT H CHANNEL Oz target vs. non-target sample rate 427Hz

amplitude (microvolt)

-5

-10

100

120

140

samples( time interval 150ms~450ms)

Fig.1. 84 trials averaged waveform of target (thick dash line) and non-target stimuli (solid line)

1104

J.-a. Guan

subject H channel Oz

targets vs.non-targets

0.7

0.6

wavelet filtered

relative amplitude

0.5

target time interval: 150ms~450ms

0.4

0.3

0.2

0.1

non-target time interval: -300ms~0ms

-0.1

-0.2

100

120

140

samples

Fig. 2. The results of the signals in Fig.4 after filtered by wavelet

2.3 Single Trial Estimation of ERP Using SVM

We are now Using N2 components enhanced by the methods described above as features, to be classified by a SVM classifier. In our experiments, the Matlab6 Toolbox was used to perform the classification. The radial basis function was taken as kernel function. To prevent overfitting and underestimating the generalization error during training, the dataset of all trials was equally divided into two parts, the training set and testing set. The model parameters of Ȟ-SVM and generalization errors were estimated by a 10 by 10-fold cross-validation procedure which only be performed on the training set. Then, using these best parameters, we performed a leave-one-out procedure 10 times to evaluate the averaged classification accuracy on the testing set. Values of gamma and Ȟ, and the expected correct classification rate, were determined by the run of with best generalization performance. These parameters were finally performed to the testing set which was not appeared in the training stage. Finally, we performed a leave-one-out procedure 30 times to evaluate the averaged classification accuracy for the testing set. At this stage, two steps were done: firstly, using all training data but one leave-out as testing data, then a set of classification parameters were obtained; secondly, using these parameters to classifying the testing set. These steps repeated for 30 times to get the averaged performance.

3 Results and Discussion Results tabulated in Table 1 show the differences effective of classification from the three subjects using N2 as features. The best averaged correct classification rate is from subject H, the value is 94.1%. The results from subject M and T is 84.8% and 82.3% respectively.

VEP Estimation with Feature Enhancement by Whiten Filter for BCI

1105

Table 1. Averaged results of 30 leave-one-out cross-validation of three subjects

Subject Accuracy(%)

max min avg 96.0 91.8 94.1

max min avg 87.6 82.3 84.8

max min avg 83.5 79.2 82.3

The reason of exploit the signals from Oz is that the N2 waves from Oz is stronger than other sites. Fig. 3 shows this is feasible by grand average of trials. The figure also shows other interesting features that, from forehead to occipital, the P2 decreases and the N2 increases respectively. The averaged peak amplitude from Oz is – 9.1microvolts and at time point 245ms. The P3 wave starts about at 300ms and reaches to top about at 420ms. Therefore, in order to evaluate the feasibility of using

Fig. 3. Grand averaged potentials of all trials from subject H 15

amplitude (microvolt)

Oz 10

subject M subject T subject H

-5 245ms

-10 -200

-100

100

200

300

400

500

600

700

800

900

time (ms)

Fig. 4. Comparison of grand averaged EEG signals at electrode Oz from three subjects

1106

J.-a. Guan

N2 waves as features for brain-computer interface, we intercept the segment only from 0ms to 300ms as features to input into the Ȟ–SVM classifier. Fig.4 shows the grand averaged EEG signals at electrode Oz from three subjects. It shows that the amplitude of N2 from subject H is greatest in the three subjects. The N2 from subject T is smaller, and subject M almost have not N2 be elicited. These implied that the effectives of using N2 as features for BCI would be differences from subjects. Therefore, using N2 as features for classifier would be a subject- specified option.

4 Conclusion The present paper introduced a novel component N2 of VEP as a carrier for constructing a BCI based speller. In this experiment, the best averaged correct classification rate with single-trial EEG is up to 94%. Comparison to our previous works in literature [7], conducting an AR whiten filter to have the N2 features enhanced is a feasible way. In literature [7], due to without preprocess of AR whiten filter, the best classification accuracy is only 90%. The results also suggest that the VEP inducing paradigm using imitated-naturalreading could reduce the signal-to-noise ratio significantly and thus could achieve higher correct rate of classification than those VEP inducing paradigms using flashing stimulation. Due to the robust performance, the N2 components of VEP can be exploited as features for brain computer interfaces. But these features are subjectspecified because of the difference amplitude of N2 in different subjects.

References 1. Wolpaw, J. R., Birbaumer, N., McFarland, D.J. et al.: Brain-computer Interfaces for Communication and Control, Clin. Neurophsiol., 113(2002)767–791 2. Thulasidas, M., Guan, C., Wu, J.K.: Robust Classification of EEG Signal for Brain– Computer Interface. IEEE Trans. Neural Syst. Rehab. Eng., 14(2006)24-29 3. Garrett, D., Peterson, D. A., Anderson, C. W., et al.: Comparison of Linear, Nonlinear, and Feature Selection Methods for EEG Signal Classification. IEEE Trans. Neural Syst. Rehab. Eng., 11(2003)141–144 4. Blankertz, B., Muller, K.R., Curio, G., et al.: The BCI Competition 2003: Progress and Perspectives in Detection and Discrimination of EEG Single Trials , IEEE Trans. Biomed. Eng., 51(2004)1044–1051 5. Gao, K.F.: Carriers Extraction for Brain Computer Interface Using Wavelet. Master dissertation of SCUEC. (2004) 5 6. Chang, C.-C., Lin, C.-J.: Training Support Vector Classifiers: Theory and algorithms. Neural Computation, 13(2001)2119–2147 7. Guan, J.A., Chen, Y.G., Lin, J.R.: N2 Components as Features for Brain Computer Interface, Proc. of the first int. conf. on neural interface & control, IEEE press, Wuhan, (2005) 45-49

Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification Zhiyong Wu1,2, Lianhong Cai2, and Helen M. Meng1 1

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China [emailprotected], [emailprotected] 2 Department of Computer, Tsinghua University, Beijing 100084, China [emailprotected]

Abstract. This paper investigates the estimation of fusion weights under varying acoustic noise conditions for audio-visual multi-level hybrid fusion strategy in speaker identification. The multi-level fusion combines model level and decision level fusion via dynamic Bayesian networks (DBNs). A novel methodology known as support vector regression (SVR) is utilized to estimate the fusion weights directly from audio features; Sigma-Pi network sampling method is also incorporated to reduce feature dimensions. Experiments on the homegrown Chinese database and CMU English database both demonstrate that the method improves the accuracies of audio-visual bimodal speaker identification under dynamically varying acoustic noise conditions.

1 Introduction Human speech is bimodal. While audio is major source of speech information, visual component is considered to be valuable supplementary in noisy environments because it remains unaffected by acoustic noise. Many studies show fusion of audio visual features leads to more accurate speaker identification in noisy environments [1-3]. The audio-visual fusion can be divided into feature level, decision level and model level [2-3]. We have proposed a multi-level hybrid fusion strategy based on dynamic Bayesian networks (DBNs). It combines model level and decision level fusion to achieve improved performance [4]. In such a strategy, the fusion weights are of great importance as they must capture the reliability of inputs which may vary dynamically. In literature, fusion weights have been usually determined during training and remain fixed for all subsequent testing [2-3]. Hence the weights may not match input testing patterns well, leading to inferior accuracies compared with mono-modal identification as speech can vary dramatically at temporal level (noise bursts) in practice. This paper attempts to estimate fusion weights directly from audio stream to capture dynamical variations of acoustic noise in a reasonable way. A novel method known as support vector regression (SVR) [5] is utilized, which performs function approximation based on structural risk minimization. Sigma-Pi network [6] sampling is also incorporated for feature dimension reduction. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1107 – 1112, 2006. © Springer-Verlag Berlin Heidelberg 2006

1108

Z. Wu, L. Cai, and H.M. Meng

2 Multi-level Fusion of Audio-Visual Features The audio-visual features can complement each other. Furthermore, different levels of fusion strategies can reinforce each other too. For example, model level fusion outperforms decision level fusion in most cases; and performance of decision level fusion may be better than that of model level in the very noisy environments [2, 4].

Fig. 1. DBN based audio-visual multi-level fusion

In view of the advantages of model level and decision level fusion, we proposed a multi-level fusion strategy via DBNs, as illustrated in figure 1. There are three models: audio-only, video-only, and audio-visual correlative model (AVCM) that performs model level fusion. These models are further combined at decision level to deliver the final speaker identification result. AVCM captures the inter-dependencies between audio and visual features and the loose temporal synchronicity among them. Further studies of multi-level fusion are given in [4]. The formula is: P(OA, OV | MA, MV, MAV) = [P(OA | MA)]λA [P(OV | MV)]λV [P(OA, OV | MAV)]λAV.

(1)

Where P(OA | MA) is the formula for audio-only model MA of audio observation OA, P(OV | MV) is the formula for video-only model MV of video observation OV, and P(OA, OV | MAV) is the formula for AVCM model MAV. λA, λV and λAV are fusion weights.

3 Estimating Fusion Weight with Support Vector Regression The estimation of fusion weights is a key issue. We enforce constraints of λA+λV+λAV =1 and λA,λV,λAv0. We also impose λA=λAV by assuming performances of both audioonly and AVCM models are equally dependent on the quality of acoustic speech. Support vector regression (SVR) is used to estimate audio weight λA directly from original audio features because it has powerful ability in learning and can achieve high degree of generalization by means of structural risk minimization [5]. 3.1 Fusion Weight Estimating Strategy Figure 2 depicts processing steps using SVR to estimate audio fusion weight. Primary audio features are first extracted from original audio speech, which are re-sampled by

Weight Estimation for Audio-Visual Multi-level Fusion

1109

¦ t

Fig. 2. Processing steps for fusion weight estimation

Sigma-Pi sampling [6] to obtain secondary distribution features that describe distributions of original audio features. Finally, SVR is used to predict the fusion weight. The quality of input audio is obtained from relatively long time span (e.g., 1500ms) of original audio features. If these features are input directly into the SVR module to estimate λA, there will be too many dimensions for computation (e.g., a 28-order speech feature vector sampled with the frame shift of 11ms will give 28×1500/11= 3818 dimensions!). In order to reduce the amount of computation, we propose to use Sigma-Pi networks to sample the primary audio features prior to further processing. 3.2 Dimension Reduction with Sigma-Pi Sampling Sigma-Pi sampling is defined on sequences of primary audio features, the horizontal vertex is time and the vertical vertex represents the primary features. It consists of two windows of different size with constant distance in time and feature position. The size of small window is fixed to 1 and the size of large window is changeable.

¦ ∏

Fig. 3. Schematic overview of Sigma-Pi sampling [6]

If the primary audio feature values are p(t,f), the secondary distribution features s(f1,f2,t0,ǻt,ǻf) are calculated as follows:

1 Δt Δf

ª p ( t , f ) Δ t − 1 Δf −1 p ( t + t ¦« ¦¦ t ¬ t '= 0 f '= 0 1

º ¼

+ t ', f 2 + f ') . »

(2)

Where f1 is the feature of small window, f2 is the feature of bottom left corner of large window, t0 is the time difference between two windows and ǻtǻf is the extension of large window in time and feature. Small window value is multiplied with the mean of large window which are then integrated over time, resulting in a single secondary feature value, which reflects the distributions of original primary audio features.

1110

Z. Wu, L. Cai, and H.M. Meng

We assume that different orders of primary features are independent, the parameters f2=f1 and ǻf=0 of Sigma-Pi sampling are fixed and only t0 and ǻt are variable. Sigma-Pi sampling can reduce the dimensions of features greatly; only 28 secondary distribution feature values are calculated from 3818 primary audio features.

4 Databases and Setup We perform the weight estimation experiment in the scope of the audio-visual textprompted speaker identification. Two databases are used. One is our homegrown audio-visual bimodal database having 60 subjects (38 males 22 females, aged from 20 to 65) with each subject speaks 30 continuous Chinese digits (upper to 6 digits per utterance), each utterance is repeated 3 times at intervals of 1 month. The other is CMU’s bimodal database [7] which has 10 subjects (7 males 3 females) speaking 78 English words repeated 10 times. Artificial white Gaussian noise was added to original audio data (SNR=30dB) to simulate various SNR levels. The fusion models were trained at 30dB SNR and tested under all SNR levels. We applied cross-validation for every subject’s data, i.e. 90% of all the data are used as training set, the remaining 10% as test set, and this partitioning is repeated until all the data had been covered in the test set. The acoustic features include 13 Mel frequency cepstral coefficients (MFCCs) and 1 energy (with frame size 25ms, frame shift 11ms) together with their corresponding delta parameters. The visual features include the mouth width, upper lip height, lower lip height [7] and their delta values. The frame rate of visual features is 30 frames per second (fps), which is up-sampled to 90fps (11ms) to match with the audio features by copying and inserting two frames between each two original visual feature frames.

5 Experiments 5.1 Learning SVR Parameters Weight estimation is carried out using ȝ-SVR [5] whose parameters are trained with following steps. First multi-level AVCM DBNs are trained. A DBN is developed for each word with left-to-right non-skipping topological structure. Audio sub-model has 5 states, video sub-model has 3 states; each state is modeled using Gaussian mixture model (GMM) with 3 mixtures. All DBNs are implemented using GMTK toolkit [8]. Then for each test set with one specific SNR level and each value of audio fusion weight λA which varies from 0 to 1 at 0.02 intervals, perform the speaker identification. The words’ DBNs are connected to form a whole sentence model, which is then used to identify the speakers. For each SNR level value, the fusion weight λA with the best identification accuracy is recorded as the target weight value for SVR training. 5.2 Choosing Sigma-Pi Parameters The parameters t0 and ǻt of Sigma-Pi are chosen first. During this stage, the value of t0 varies from 100ms to 1000ms at 100ms intervals, and ǻt varies from 50ms to

Weight Estimation for Audio-Visual Multi-level Fusion

1111

300ms at 50ms intervals. The tests are carried out for all combinations of t0 and ǻt. Results show that when t0=500ms and ǻt=150ms, the performance is the best. These two parameters are then taken as the basic parameters for the following experiments. 5.3 Speaker Identification Results We conduct the speaker identification experiments with fixed noise and random varying noise conditions (with mean acoustic SNR varies from 30dB to 0dB at 10dB intervals) through the whole sentence. Two different weight estimation methods are tested: (1) fixed weight, the fusion weight remains fixed for the test set after trained, as the tradition way mentioned in [2-3]; (2) our proposed method, the estimated weight changes automatically according to the acoustic noise conditions. The experimental results on our homegrown Chinese database are summarized in table 1. The experiments are also conducted on the CMU English database to check the validation of the proposed method whose results are summarized in table 2. It can be seen that the method proposed in this paper improves the accuracies of speaker identification at different acoustic SNR levels when the noise varies dynamically comparing to the traditional fixed weight method. When the acoustic noise changes, that is to say the noise condition for the test set varies and does not match with the training set, the performance degrades dramatically for the traditional fixed weight method, while the performance differences are not significant for the proposed method in this paper. It indicates that our proposed method can predict the fusion weight well under dynamically varying acoustic noise conditions, and can improve the performance of the audio-visual bimodal speaker identification. Table 1. Accuracies of speaker identification on our own Chinese database

mean SNR

30dB 20dB 10dB 0dB fixed varying fixed varying fixed varying fixed varying fixed weight 100% 98% 91% 85% 79% 72% 76% 70% proposed method 100% 100% 91% 90% 80% 78% 77% 75% Table 2. Accuracies of speaker identification on CMU English database

mean SNR

30dB 20dB 10dB 0dB fixed varying fixed varying fixed varying fixed varying fixed weight 100% 99% 92% 86% 81% 77% 77% 73% proposed method 100% 100% 93% 92% 81% 81% 79% 78%

6 Conclusions We investigate a fusion weights estimation method of multi-level hybrid fusion for audio-visual speaker identification by means of support vector regression (SVR). The proposed method estimates the fusion weights directly from the audio features. In the method, Sigma-Pi network re-sampling is introduced to reduce the dimensions of the

1112

Z. Wu, L. Cai, and H.M. Meng

audio features. The experiments show that the method improves the speaker identification performance at different acoustic SNR levels under varying acoustic noise conditions, which indicates that the proposed method can predict the fusion weight well under such circ*mstances.

Acknowledgments This work is supported by the joint research fund of NSFC-RGC (National Natural Science Foundation of China - Research Grant Council of Hong Kong) under grant No. 60418012 and N-CUHK417/04.

References 1. Senior, A., Neti, C., Maison, B.: On the Use of Visual Information for Improving Audiobased Speaker Recognition. In: Audio-visual Speech Processing Conf. (1999) 108–111 2. Nefian, A.V., Liang, L.H., Fu, T.Y., Liu, X.X.: A Bayesian Approach to Audio-Visual Speaker Identification. In: Proc. 4th Int. Conf. AVBPA, Vol. 2688 (2003) 761–769 3. Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A Review of Speech-based Bimodal Recognition. IEEE Trans. Multimedia 4 (2002) 23–37 4. Wu, Z.Y., Cai, L.H., Meng, M.H.: Multi-level Fusion of Audio and Visual Features for Speaker Identification. In: Proc. Int. Conf. Biometrics, LNCS 3832 (2006) 493–499 5. Scholkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New Support Vector Algorithms. Neural Computation 12 (2000) 1083–1121 6. Gramß, T., Strube, H.W.: Recognition of Isolated Words based on Psychoacoustics and Neurobiology. Speech Communication 9 (1990) 35–40 7. Chen, T.: Audiovisual Speech Processing. IEEE Trans. Signal Processing 18 (2001) 9–21 8. Bilmes, J., Zweig, G.: The Graphical Models Toolkit: An Open Source Software System for Speech and Time-series Processing. In: Proc. Int. Conf. ICASSP. (2002) 3916–3919

A Study on Optimal Configuration for the Mobile Manipulator Considering the Minimal Movement Jin-Gu Kang1, Kwan-Houng Lee2, and Jane-Jin Kim3 1

Dept. Visual Broadcastion Media Keukdong College DanPyung-Ri 154-1,Gamgog Myun,Eumsung Gun Chungbuk,467-900,Republic of Korea [emailprotected] 2 School of Electronics & Information Engineering, Cheongju University, Naedok-Dong Sangdang-Gu Cheongju-City, Chungbuk, 360-764, Republic of Korea [emailprotected] 3 Dept.Computer Information Keukdong College DanPyung-Ri 154-1,Gamgog Myun,Eumsung Gun Chungbuk,467-900,Republic of Korea [emailprotected]

Abstract. A Mobile Manipulator--a serial connection of a mobile robot and a task robot--is redundant by itself. Using it’s redundant freedom, a mobile manipulator can perform various task. In this paper, to improve task execution efficiency utilizing the redundancy, optimal configurations of the mobile manipulator are maintained while it is moving to a new task point. And using a cost function for optimality defined as a combination of the square errors of the desired and actual configurations of the mobile robot and of the task robot, the job which the mobile manipulator performs is optimized. Here, The proposed algorithm is experimentally verified and discussed with a mobile manipulator, PURL-II

1 Introduction While a mobile robot can expand the size of the work space but does no work, a vertical multi-joints robot or manipulator can’t move but it can do work. And at present, there has been a lot of research on the redundant robot which has more degrees of freedom than non-combination robots in the given work space, so it can have optimal position and optimized job performance[6][13]. While there has been a lot of work done on the control for both mobile robot navigation and the fixed manipulator motion, there are few reports on the cooperative control for a robot with movement and manipulation ability[4]. Different from the fixed redundant robot, the mobile manipulator has the characteristic that with respect to the given working environments, it has the merits of abnormal movement avoidance, collision avoidance, efficient application of the corresponding mechanical parts and improvement of adjustment. Because of these characteristics, it is desirable that one uses the mobile manipulator with the transportation ability and dexterous handling in difficult working environments[5]. This paper explains the mobile manipulator PURL-II which is a combination in series of a mobile robot that has 3 degrees of freedom for efficient job accomplishment and a task robot that has 5 degrees of freedom. We have analyzed the kinematics and inverse kinematics of each robot to D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1113 – 1124, 2006. © Springer-Verlag Berlin Heidelberg 2006

1114

J.-G. Kang, K.-H. Lee, and J.-J. Kim

define the 'Mobility' of the mobile robot as the most important feature of the mobile manipulator. We researched the optimal position and movement of robot so that the combined robot can perform the task with minimal joint displacement and adjust the weighting value using the this 'Mobility'. When the mobile robot performed the job with the cooperation of the task robot, we investigated the optimizing criteria of the task using the 'Gradient Method' to minimize the movement of the whole robots. The results that we acquired by implementing the proposed algorithm through computer simulation and the experiment using PURL-II are demonstrated.

2 Mobile Manipulator 2.1 Configuration of the Mobile Manipulator The robot that was used in our research is shown in Fig. 1

Fig. 1. Complete PURL-II

The robot PURL-II consists of the task robot with 5 degrees of freedom and the mobile robot with 3 degrees of freedom. We mounted the ROB3 with 5 joints as the task robot, and installed a gripper at the end-effector so it can grip things. In addition to that, we mounted the portable PC to be used as the host computer to monitor the controller of the mobile manipulator and to monitor the states of the robot. 2.2 Kinematics Analysis of the Mobile Robot We analyzed the kinematics to calculate the position in Cartesian coordinate system using the variables of the mobile robot[3]. The coordinate system and modeling for the kinematics is shown in Fig. 2. Let us denote the present position of the mobile robot as pm , the velocity as p m , the average velocity of gravity center as vm ,c , angular velocity of the mobile robot as ω , the angle between X coordinate and the mobile robot as θ m , The Cartesian velocity p m , is represented in terms of joint variables as follows.

A Study on Optimal Configuration

p m = J ( p m ) q m ª x m º ª cos θ m « » « « y m » = « sin θ m « z m » « 0 « » « «¬θ m »¼ «¬ 0

0 0 1 0

0º » 0» 0» » 1 »¼

ª v m ,c º « » «vm ,z » «¬ ω »¼

1115

(1)

(2)

where vm , z is provided by a ball-screw joint along the Z axis. Under Pure Rolling Condition and Non Slipping Condition, let us denote the wheel radius as r , the distance between wheel and the center as l . Then xm , y m , and θm are calculated by (3) as follows[15].

x m =

r (q m ,l + q m ,r )cos θ m 2

(3a)

r (q m,l + q m,r )sinθ m 2

(3b)

r (q m ,l − q m , r ) 2l

(3c)

y m =

θm =

2.3 Kinematics Analysis of the Mobile Manipulator Each robot which is designed to accomplish each independent objective concurrently should perform its corresponding movement to complete the task. The trajectory is needed for kinematics analysis of the whole system, so that we can make the combination robot perform the task efficiently using the redundant degree of freedom generated by the combination of the two robots[9][10]. From Fig. 3, We can see the Cartesian coordinate of the implemented mobile/task robot system and the link coordinate system of each joint in space[3]. This system is an independent mobile manipulator without wire. The vector q of the whole system joint variables can be defined qt = [qt1 qt 2 qt 3 qt 4 qt 5 ] and q m = [q m 6 q m 7 q m8 q m 9 ] that represents the joint variable vector of the task robot. That is shown as

ª q º T q = « t » = [qt1 qt 2 qt 3 qt 4 qt 5 qm6 qm7 qm8 qm9 ] q ¬ m¼

(4)

The linear velocity, and angular velocity of mobile robot in Cartesian space with respect to the fixed frame of the world frame can be expressed as (5). 0

ª 0V º ª 0 J º Pm = « 0 m » = « 0 m ,v » q m = 0 J m q m ¬ ω m ¼ «¬ J m ,ω ¼»

(5)

1116

J.-G. Kang, K.-H. Lee, and J.-J. Kim

vC ry

vR xW

Fig. 2. Mobile robot modeling and coordinate system

In view of Fig. 3, let us represent the Jacobian of vector qt (task robot joint variable) with respect to frame {1}. These results are shown in (6) as follows[8] . m

ª mV º ª m J º Pt = « m t » = « m t ,v » qt = mJ t qt ¬ ω t ¼ ¬« J t ,ω ¼»

(6)

Given the Jacobians, 0 J m and m J t , for each robot, if we express the Jacobian of the mobile 0

[

Pt = 0Vt

manipulator as 0

Jt ;

the linear velocity. Then angular velocity

ω t ] from the end-effector to the world frame is represented as (7). T

ª 0V º ª 0V º ª 0ω + 0 R 1V º Pt = « 0 t » = « 0 m » + « 0m 1 1 t » R1 ω t ¼ ¬ ωt ¼ ¬ ωm ¼ ¬

= 0 J m q m + 0 J t q t =

[J 0

0 m

]

(7)

ª q m º « » ¬ qt ¼

Here 0 R1 is rotational transformation from world frame to the base frame of the task robot. Namely, in view of (5)~(7), the movements of mobile robot and task robot are involved with the movement of end-effector.

3 Algorithm for System Application 3.1 Task Planning for Minimal Movement Because the position of base frame of task robot varies according to the movement of mobile robot, through inverse kinematics, the task planning has many solutions with respect to the robot movement. and we must find the accurate solution to satisfy both the optimal accomplishment of the task and the efficient completion of the task.

A Study on Optimal Configuration

1117

Fig. 3. Coordinate system of the mobile manipulator

In this paper, we have the objective of minimization of movement of the whole robot in performing the task, so we express the vector for mobile manipulator states as (8). ªqm º q = « » ¬ qt ¼

(8)

where qm = [xm ym zm θ m ] T and qt = [θ 1 θ 2 θ 3 θ 4 θ 5 ] T .Here, q is the vector for the mobile manipulator and consists of qm representing the position and direction of mobile robot in Cartesian space and qt , the joint variable to each n link of the task robot. Now to plan the task to minimize the whole movement of mobile manipulators, a cost function, L , is defined as L = Δ q T Δ q = ( q f − qi )T ( q f − qi ) = ( q m , f − q m ,i ) T ( q m , f − q m ,i ) + ( q t , f − q t , i ) T ( q t , f − q t ,i )

Here, q i = [q m ,i

and q f = [q m , f

q t ,i qt , f

]

(9)

represents the initial states of the mobile manipulator,

represents the final states after having accomplished the task. In

the final states, the end-effector of the task robot must be placed at the desired position X t ,d . For that, equation (10) must be satisfied. In (10), we denote as R(θ m , f ) and f (qt , f ) , respectively, the rotational transformation to X − Y plane and kinematics equation of task robot[14].

X t ,d = R(θ m , f ) f (qt , f ) + X m , f

(10)

where X t ,d represents the desired position of task robot, and X m , f is the final position of mobile robot. We can express the final position of the mobile robot X m , f as the function of the desired coordinate X t ,d , joint variables θ m, f and qt , f , then the cost function that represents the robot movement is expressed as the n × 1 space function of θ m, f and qt , f as (11).

1118

J.-G. Kang, K.-H. Lee, and J.-J. Kim

L = {Xt ,d − R(θm, f ) f (qt , f ) − X m,i }T {Xt ,d − R(θm, f ) f (qt , f ) − X m,i }

(11)

+ {qt , f − qt ,i }T {qt, f − qt ,i }

In the equation (11), θ m , f and qt , f which minimize the cost function L must satisfy the condition in (12).

ª ∂L « ∂θ ∇L = « m , f « ∂L « ∂q ¬ t, f

º » »=0 » » ¼

(12)

Because the cost function is nonlinear, it is difficult to find analytically the optimum solution that satisfies (12). So in this papers, we find the solution through the numeric analysis using the gradient method described by (13) . ªθ m , f ( k +1 ) º ªθ m , f ( k ) º « »=« » − η ∇L ¬ q t , f ( k +1 ) ¼ ¬ q t , f ( k ) ¼

θ m , f ( k ) , qt , f ( k )

(13)

This recursive process will stop, when ∇L < ε ≈ 0 . That is, θ m , f ( k ) and qt , f ( k ) are optimum solutions. Through the optimum solutions of θ m, f and qt , f the final robot state q f can be calculated as (14). ª q m , f º ª X t ,d − R (θ m , f ) f ( q t , f ) º qf = « »=« » qt , f ¬ qt , f ¼ ¬ ¼

(14)

There are several efficient searching algorithms. However, the simple gradient method is applied for this case. 3.2 Mobility of Mobile Robot In this research, we define “mobility of the mobile robot” as the amount of movement of the mobile robot when the input magnitude of the wheel velocity is unity. That is, the mobility is defined as the corresponding quality of movement in any direction[1]. The mobile robot used in this research does move and rotate because each wheel is rotated independently under the control. The robot satisfies (15) with remarked kinematics by denoting left, right wheel velocities ( q m ,l , q m , r ) and linear velocity and angular velocity ( v m , ω ). vm = r

ω =

q m , l + q m , r 2

r q m ,l − q m ,r 2 l

(15a)

(15b)

A Study on Optimal Configuration

1119

Rewriting (15a), (15b), we get (16a) and (16b).

q m ,r =

vm + ω l r

(16a)

q m ,l =

vm − ω l r

(16b)

Mobility is the output to input ratio with a unity vector, v m = 1 , or q 2 m ,l + q 2 m , r = 1 and the mobility v m in any angular velocity ω is calculated by (17). vm = r

1 −ω 2

l2 r2

(17)

When the mobile robot has the velocity of unity norm, the mobility of mobile robot is represented as Fig. 4. It shows that the output, v and ω ѽ in workspace for all direction inputs that are variance of robot direction and movement. For any input, the direction of maximum movement is current robot direction when the velocities of two wheels are same[7]. At the situation, there does not occur any angular movement of the robot. 3.3 Assigning of Weighting Value Using Mobility From the mobility, we can know the mobility of robot in any direction, and the adaptability to a given task in the present posture of mobile robot. If the present posture of mobile robot is adaptable to the task, that is, the mobility is large to a certain direction, q m , l

vm vm = 0

vm = 1

Δθ

q m , r

Fig. 4. Motion generation efficiency

we impose the lower weighting value on the term in the cost function of (18) to assign large amount of movement to the mobile along the direction. If not, by imposing the higher weighting value on the term we can make the movement of mobile robot small. Equation (18) represents the cost function with weighting value L = { X t ,d − R (θ m , f ) f ( qt , f ) − X m ,i }T Wm { X t ,d − R (θ m , f ) f ( qt , f ) − X m ,i } + {qt , f − qt ,i }T Wt {qt , f − qt ,i }

(18)

Here, Wm and Wt are weighting matrices imposed on the movement of the mobile robot and task robot, respectively. In the cost function, the mobility of mobile robot is

1120

J.-G. Kang, K.-H. Lee, and J.-J. Kim

expressed in the Cartesian coordinate space, so the weighting matrix Wm of the mobile robot must be applied. after decomposing each component to each axis in Cartesian coordinate system as shown in Fig. 5 and is represented as (19). 0 0 º ªω x 0 « » 0 ωy 0 0 » Wm = « « 0 0 ωz 0 » « » 0 0 0 ω θ »¼ ¬«

Where, ω x =

(19)

1 k1 1 , ω y= , ω z= , and v ⋅ cos(φ ) cos(α ) + e ( z d − f z (qt ) ) 2 v ⋅ sin(φ ) sin(α ) + e

ω θ=1 .

3.4 Mobile Robot Control

The mobile robot carries the task robot to the reachable boundary to the goal position, i.e., within the reachable workspace. We establish the coordinate system as shown in Fig. 6 so that the robot can take the desired posture and position movement from the initial position according to the assignment of the weighting value of the mobile robot to the desired position. After starting at the present position, ( xi , yi ) , the robot reaches the desired position, ( xd , yd ) . Here the current robot direction φ , the position error α from present position to the desired position, the distance error e to the desired position, the direction of mobile robot at the desired position θ are noted [7,11]. e = − v cos α

α = − ω + θ =

(20a)

v sin α e

(20b)

v sin α e

(20c)

A Lyapunov candidate function is defined as in (21).

V = V1 + V2 = 12 λ e 2 + 12 (α 2 + hθ 2 )

(21)

where V1 means the error energy to the distance and V2 means the error energy in the direction. After differentiating both sides in equation (21) in terms of time, we can acquire the result as in equation (22). V = V + V = λ ee + α α + hθ θ (22) 1

(

)

Let us substitute equation (20) into the corresponding part in equation (22), it results in equation (23). . v sin α (α + hθ ) º ª V = − λ e v cos α + α «− ω + ⋅ (23) » α e ¬ ¼ Note that V < 0 is required for a given V to be a stable system. On this basis, we can design the nonlinear controller of the mobile robot as in (24).

A Study on Optimal Configuration

1121

( xd , yd )

e α

φ ( xi , yi )

Fig. 5. Decomposing mobility

Fig. 6. Position movement of mobile robot by imposed weighting value

v = γ ( e cos α ) , ( γ > 0 ) ω = kα + γ

cos α sin α

(24a)

( α + hθ ) , ( k , h > 0 )

(24b)

Therefore, using this controller for the mobile robot, V approaches to zero as t → ∞ ; e and α also approach almost to zero as shown in (25). V = − λ ( γ cos 2 α ) e 2 − kα 2 ≤ 0

(25)

(a)

(c)

(b)

(d)

Fig. 7. The optimal position planning to move a point of action of a robot to (1, 0.5, 0.7)

1122

J.-G. Kang, K.-H. Lee, and J.-J. Kim

4 Simulation For verifying the proposed algorithm, simulations were performed with PURL-II. Fig. 7 shows the simulation results with a 3 DOF task robot and a 3 DOF mobile robot. The goal is positioning the end-effect to (1, 0.5, 0.7), while initial configuration of mobile robot is (-1.0, -1.0, 1.3, 60°) and that of task robot is (18°, 60°, 90°). The optimally determined configuration of mobile robot is (0.0368, -0.497, 1.14, 44.1°) and that of task robot is (1.99°, 25.57°, 86.63°). Fig. 7 shows movements of the task robot in different view points.

5 Experiment and Conclusion Before the real experiments, assumptions for moving manipulators operational condition are set as follows: 1. In the initial stage, the object is in the end-effect of the task robot. 2. The mobile robot satisfies the pure rolling and non-slippage conditions. 3. There is no obstacle in the mobile robot path. 4. There is no disturbance of the total system. And the task robot is configured as the joint angles of (18°, 60°, 90°), then the coordinate of the end-effect is set up for (0.02, 0.04, 1.3). From this location, the mobile manipulator must bring the object to (1, 1.5, 0.5). An optimal path which is calculated using the algorithm which is stated in the previous section has Wx = 10.0, Wy = 10.0, and Wz = 2.66. And at last the mobile robots angle is 76.52° from the X axis; the difference is coming from the moving of the right wheels 0.8m and the moving of the left wheels 1.4m. Next time, the mobile robot is different with the X axis by 11.64°with right wheels 0.4m moving and the left wheels 0.5m moving. Hence, the total moving distance of mobile robot is (1.2m, 1.9m), the total angle is 88.16°, and the each joint angle of task robot are (-6.45°, 9.87°, 34.92°). The experimental results are shown by the photography in Fig. 8. For the real experiment, the wheel pure rolling condition is not satisfied, also by the control the velocity through the robot kinematics, the distance error occurs from the cumulative velocity error. Using a timer in CPU for estimating velocity, timer error also causes velocity error. Hence consequently, the final position of end-effect is placed at (1.2, 1.5, 0.8) on object. A new redundancy resolution scheme for a mobile manipulator is proposed in this paper. While the mobile robot is moving from one task (starting) point to the next task point, the task robot is controlled to have the posture proper to the next task, which can be pre-determined based upon TOMM[2][16]. Minimization of the cost function following the gradient method leads a mobile manipulator an optimal configuration at the new task point. These schemes can be also applied to the robot trajectory planning. The efficiency of this scheme is verified through the real experiments with PURL-II. The different of the result between simulation and experiment is caused by the error between the control input and the action of the mobile robot because of the roughness of the bottom, and is caused by the summation of position error through velocity control. In further study, it is necessary that a proper control algorithm should be developed to improve the control accuracy as well as efficiency in utilizing redundancy.

A Study on Optimal Configuration

1123

Fig. 8. Response of robot posture

References 1. Mason, M. T.: Compliance and Force Control for Computer Controlled Manipulators. IEEE Transaction on Systems, Man, Cybernetics, 11 (1981) 418~432 2. Lee, S.K., Lee, Jang M..: Task-Oriented Dual-Arm Manipulability and Its Application to Configuration Optimization. in Proceeding 27th IEEE International Conference on Decision and Control Austin, TX, (1988) 3. Spong, Mark W., Robot Dynamics and Control, John Wiley & Sons, (1989) 92-101 4. Francois G. P.: Using Minimax Approaches to Plan Optimal Task Commutation Configuration for Combined Mobile Platform-Manipulator System. IEEE Transaction on Robotics and Automation, 10 (1994) 44-53 5. Lewis F. L., Control of Robot Manipulators, Macmillan Publishing, (1993) 136-140 6. Tsuneo Y.: Manipulability of Robotic Mechan- isms. The International Journal of Robotics Robotics Research, 4 (1994) 3-9 7. Aicardi, M.: Closed-Loop Steering of Unicycle-like Vehicles via Lyapunov Techniques. IEEE Robotics and Automation Magazine, 10 (1995) 27-35 8. You, S.S.: A Unified Dynamic Model and Control System for Robotic Mainpulator with Geometric End-Effector Constraints. KSME International Journal, 10 (1996) 203-212 9. Jang, J.H., Han, C.S.: The State Sensitivity Analysis of the Front Wheel Steering Vehicle: In the Time Domain,” KSME International Journal, 11 (1997)595-604 10. Hong, K.S., Kim, Y.M., Choi, C.: Inverse Kinematics of a Reclaimer: Closed-Form Solution by Exploiting Geometric Constraints. KSME International Journal, 11 (1997) 629-638 11. Hare, N., Fing, Y.: Mobile Robot Path Planning and Tracking an Optimal Control Approach. International Conference on Control, Automation Robotics and Vision, (1997) 9-11 12. Lee J., Cho, H. S.: Mobile Manipulator Motion Planning for Multiple Task Using Global Optimization Approach. Journal of Intelligent and Robotics System, (1997)169-190 13. Stephen, L. C.: Task Compatibility of Manipulator Postures. The International Journal of Robotics Research, 7 (1998) 13-21

1124

J.-G. Kang, K.-H. Lee, and J.-J. Kim

14. You, S.S., Jeong, S.K.: Kinematics and Dynamic Modeling for Holonomic Constrained Multiple Robot System through Principle of Workspace Orthogonalization. KSME International Journal, 12 (1998) 170-180 15. James, C. A., John, H. M.: Shortest Distance Path for Wheeled Mobile Robot. IEEE Transactions on Robotics and Automation, 14 (1998) 657-662 16. Lee, J. M.: Dynamic Modeling and Cooperative Control of a Redundant Manipulator Based on Decomposition. KSME International Journal. 12 (1998) 642-658

Multi-objective Flow Shop Scheduling Using Differential Evolution Bin Qian, Ling Wang, De-Xian Huang, and Xiong Wang Dept. of Automation, Tsinghua Uinv Beijing, 100084, P.R. China [emailprotected]

Abstract. This paper proposes an effective Differential Evolution (DE) based hybrid algorithm for Multi-objective Permutation Flow Shop Scheduling Problem (MPFSSP), which is a typical NP-hard combinatorial optimization problem. In the proposed Multi-objective Hybrid DE (MOHDE), both DE-based searching operators and some special local searching operators are designed to balance the exploration and exploitation abilities. Firstly, to make DE suitable for solving MPFSSP, a largest-order-value (LOV) rule based on random key representation is developed to convert the continuous values of individuals in DE to job permutations. Then, to enrich the searching behaviors and to avoid premature convergence, a Variable Neighborhood Search (VNS) based local search with multiple different neighborhoods is designed and incorporated into the MOHDE. Simulation results and comparisons with the famous random-weight genetic algorithm (RWGA) demonstrate the effectiveness and robustness of our proposed MOHDE.

1 Introduction Flow Shop Scheduling Problems (FSSP) are one of the most well known problems in the area of scheduling and has been proved to be strongly NP-hard [1]. Due to its importance in many industrial areas, FSSP has attracted much attention and wide research in both Computer Science and Operation Research fields. The Permutation FSSP with n jobs and m machines is commonly defined as follows: each of n jobs is to be sequentially processed on machine 1, , m . The processing time p i , j of job i

on machine j is given. At any time, each machine can process at most one job and each job can be processed on at most one machine. The sequence in which the jobs are to be processed is the same for each machine. There are various scheduling objectives to be considered. Among them are makespan, maximum tardiness, total tardiness, maximum flowtime, and total flowtime [2]. Most real-world optimization problems in manufacturing systems are multiobjective. Some researchers have tackled multi-objective FSSP. For example, Daniels and Chambers [3] considered the tradeoff between the makespan and the maximum tardiness. Rajendran [4] presented a branch-and-bound algorithm and two heuristic algorithms to minimize the total flowtime with a constraint condition on the makespan. Ishibuchi and Murata [5] applied Genetic Algorithms based algorithm to the two-objective and three-objective FSSP. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1125 – 1136, 2006. © Springer-Verlag Berlin Heidelberg 2006

1126

B. Qian et al.

Differential Evolution (DE) algorithm [6] is a novel parallel direct search method, which has been proved a simple and efficient heuristic for global optimization over continuous spaces. VNS [7] is very effective and can be easily applied to a variety of problems. Recently, Onwubolu and Davendra [8] described a novel optimization method based on a differential evolution (exploration) algorithm for single objective FSSP. Tasgetiren [9] presented a PSO algorithm with VNS to solve FSSP. Both of their algorithms achieved good results. Recently, hybrid heuristics have been a hot topic in the fields of both Computer Science and Operational Research [15]. It is well known that the performances of evolutionary algorithms can be improved by combining problem-dependent local search. Memetic algorithms (MAs) [16] may be considered as a union of a population-based global search and local improvements. In MAs, several studies [16][17] have been focused on how to achieve a reasonable combination of global search and local search, and how to make a good balance between exploration and exploitation. Inspired by MAs’ spirit, a hybrid algorithm based on DE and VNS is proposed for MPFSSP in this paper.

2 Formulation of MPFSSP The MPFSSP can be described as follows: denote ci , j as the complete time of job i on machine j , ci is the completion time of job i , Ti is the tardiness time of job i , d i is the due date of job i , and let π = ( σ 1 , σ 2 , , σ n ) be any a processing sequence

of all jobs. In our study, we minimize two objectives: makespan C max and maximum tardiness Tmax . Then the mathematical formulation of the MPFSSP to minimize C max and Tmax can be described in (1): cσ 1 ,1 = pσ 1 ,1, °c ° σ j ,1 = cσ j −1 ,1 + pσ 1 ,1 , j = 2, , n ° cσ ,i = cσ ,i −1 + pσ , i , i = 2, , m 1 1 ° 1 ° cσ j ,i = max cσ j −1 , i , cσ j , i −1 + pσ j ,i , i = 2, , m; j = 2, , n . ® ° Cmax = cσ n , m ° ci = ci , m ° ° Ti = max{ci − d i ,0} °T ¯ max = max{T1 , , Tn }

{

}

(1)

3 Brief Review of DE DE algorithm introduced by Storn and Price [6] is a branch of evolutionary algorithms for optimization problems over continuous domains. DE is a population-based evolutionary computation technique, which uses simple differential operator to create new candidate solutions and one-to-one competition scheme to greedily select new

Multi-objective Flow Shop Scheduling Using Differential Evolution

1127

candidate. The theoretical framework of DE is very simple and DE is easy to be coded and implemented with computer. Besides, it is computationally inexpensive in terms of memory requirements and CPU times. Thus, nowadays DE has attracted much attention and wide applications in various fields. In DE, it starts with the random initialization of a population of individuals in the search space and works on the cooperative behaviors of the individuals in the population. Therefore, it finds the global best solution by utilizing the distance and direction information according to the differentiations among population. However, the searching behavior of each individual in the search space is adjusted by dynamically altering the differentiation’s direction and step length in which this differentiation performs. At each generation, the mutation and crossover operators are applied on the individuals, and a new population arises. Then, selection takes place, and the corresponding individuals from both populations compete to comprise the next generation. Currently, there are several variants of DE algorithm can be found in http://www.icsi.berkeley.edu/%7Estorn/code.html.

4 MOHDE for MPFSSP A general multi-objective optimization problem (MOP) with w objectives can be expressed as follows: minimize f1 ( x ), f 2 (x ), ..., f w (x ) ,

(2)

where f1 ( x ), f 2 ( x ), ..., f w (x ) are w objectives to be minimized and x ∈ solutions set . Let us consider two solutions a and b , the solution a is said to dominate b if ∀i ∈ {1,2,..., w} : f i (a ) ≤ f i (b ) and ∃j : {1,2,..., w} : f j (a ) < f j (b ) .

(3)

If a solution is not dominated by any other solutions of the MOP, that solution is defined as a nondominated solution. The solutions that are nondominated within the entire solution space are defined as Pareto solutions and comprise so-called Pareto trade-off front. In recent years, various evolutionary algorithms have been presented to solve different MOPs. For a review, see [21][22]. The aim of MOHDE is to try to obtain all Pareto solutions of MPFSSP. In this section, we will explain the implementation of MOHDE for MPFSSP in detail by illustrating the key techniques used in MPFSSP, including solution representation, VNS–based local search and solution repair mechanism. 4.1 Solution Representation

Because of DE’s continuous nature, it can not be directly adopted for FSSP. So the applications of DE on combinatorial optimization problems are very limited. The important problem in applying DE to MPFSSP is to find a suitable mapping between job sequence and individuals (continuous vectors) in DE. For the n-job and mmachine problem, each vector contains n number of dimensions corresponding to n operations. In this paper, we propose a largest-order-value (LOV) rule based on random key representation [10] to convert DE’s individual X i = [ xi ,1 , xi ,2 , , xi , n ] to the job solution/permutation vector π i = { π i ,1 , π i , 2 , , π i ,n }.

1128

B. Qian et al.

According to LOV rule, X i = [ xi ,1 , xi ,2 , , xi ,n ] are firstly ranked by descending order to get the sequence ϕ i = [ ϕ i ,1 , ϕ i , 2 , , ϕ i ,n ]. Then the job permutation π i is calculated by the following formula:

π i ,ϕ i ,k = k .

(4)

We provide a simple example to illustrate the LOV rule in table 1. In this stance ( n = 6 ), when k = 1 , then ϕ i ,1 = 4 and π i ,ϕ i ,1 = π i ,4 = 1 ; when k = 5 , then ϕ i ,5 = 2 and π i ,ϕ i , 5 = π i , 2 = 5 , and so on. This representation is unique and simple in terms of finding new permutations. Table 1. Solution Representation Dimension k

xi , k

1 1.36

2 3.85

3 2.55

4 0.63

5 2.68

6 0.82

ϕ i, k

π i, k

4.2 VNS-Based Local Search

In Reeves [19][20], it is observed that, in the context of FSSP, the solution space landscape induced by some specific operators (i.e. insert, interchange, inverse, etc) has a “big-valley”, where local optimal solutions tend to be relatively close to each other and to the global optimal solution. This encourages us to develop a local search method to exploit this structure and to guide the DE’s population to the bottom region of big-valley, where contains the global optimal solution and better local optimal solutions. In addition, MPFSSP’s huge space and multi-objective property make it difficult to use only one neighborhood can achieve good results. So we design a VNSbased local search with multiple different neighborhoods to enrich the local searching behaviors and to avoid premature convergence. The neighborhood of local search is based on the insert+inverse+interchange variant of the VNS method proposed in [7][9]. Pseudo code of local search is given as follows: Convert an individual X i (t ) to a job permutation π i according to the LOV rule; Set loop=1; do k=0; k_max=3; do randomly select u and v , where u ≠ v ; if (k=0) then π i _ 1 = insert (π i ,u, v ) ; if (k=1) then π i _ 1 = inverse(π i ,u, v ) ;

if (k=2) then π i _ 1 = interchange(π i ,u, v ) ;

Multi-objective Flow Shop Scheduling Using Differential Evolution

1129

if π i _ 1 dominates π i then

k=0; π i = π i _ 1 ; elseif k=k+1 endif; while ( k 1,τ =1+1/ ( μ −1) , and

a,bare matrices with the same size, then

(a + b )T P(a + b ) ≤ μa T Pa + τb T Pb .

(12)

Theorem 1: The linear time-invariant block of the Wiener-typed nonlinear system is shown as (2), (3), and the state space expression of the observer is given as (4), where the feedback vector K and the observer feedback vector L are stable. In addition,

δ ª¬η ( k ) º¼ ≤ σ

. Then, if the following assumption A1 is satisfied, then

S and S e

are invariant sets in the sense of (11) and (7), respectively. Moreover, the control law (8) will converge to unconstrained stable feedback control law u (k ) = Kxˆ (k ) , and the system is closed-loop asymptotically stable. A1 There exist

μ > 1, μ> 1, ( μ , μ∈ R )

such that

μ~Ξ T P Ξ ≤ (1 − e 2 )P .

(13)

τ(1 + σ 2 ) C T LT ExT PEx LC ≤ Pe .

(14)

μΨ T Pe Ψ + τσ 2C T LT Pe LC ≤ Pe .

(15)

where τ = 1 + 1/ ( μ − 1)

τ= 1 + 1/ ( μ− 1) , and E x is a transform factor

xˆ = E xT zˆ . Proof: For ∀zˆ (k ) ∈ S , we conclude from (11) and (14) that

μ~zˆ(k )Ξ T P Ξzˆ(k ) ≤ (1 − e 2 )zˆ T (k )P zˆ(k ) ≤ 1 − e 2 . For ∀e ( k ) ∈ Se , from (13),

(16)

xˆ T (k + 1)Pxˆ (k + 1) ≤ 1 and δ ª¬η ( k ) º¼ ≤ σ , we have

τeT ( k ) ª¬1 + δ ( ⋅) º¼ C T LT ExT PEx LCe ( k ) ≤ eT ( k ) Pe e ( k ) ≤ e 2 . 2

Moreover, applying Lemma 1, for

(17)

∀zˆ ( k ) ∈ S we have

(

)

zˆT ( k +1) Pzˆ ( k +1) ≤ μzˆT ( k ) ΞT PΞz ( k ) +τe ( k ) 1+ δ ( ⋅) CT LT ExT PEx LCe ( k ) ≤ 1 . T

Hence, S is an invariant set of extended state In similar way, we can gain from (15) that

(18)

zˆ ( k ) . S e is an invariant set. Moreover, take

the same way of [8] , we can prove the closed-loop stability of our algorithm.

Dual-Mode Control Algorithm for Wiener-Typed Nonlinear Systems

1161

4 Case Study Plant:

x(k + 1) = Ax(k ) + Bu (k ) , η (k ) = Cx(k ) .

(19)

y (k ) = η 4 (k )sin [η (k )] − η 5 (k ) .

(20)

ª2.3 −1.2º ª1º ª2.0º − 1.5 ≤ u (k ) ≤ 3 , A = « , B = « » , C = [1 0]. x( 0) = « » » ¬2.5¼ ¬1 0 ¼ ¬0 ¼ T Firstly, we set K = [− 2.4179 1.1495] , and L = [1.0556 0.3704] . The

where

control performance tracking {40, -40} double-step signals is shown in Fig.1. the ~᧹1.5 , σ = 0.1 The upper subfigparameters are: n d = 5 , e = 0.4 . μ᧹1.1᧨μ

ures are curves of r (dash-dot line: set points), y (solid line), and η (dashed line), respectively; the lower subfigure is the curve of u . These results validate the feasibility of our proposed algorithm.

Fig. 1. Control performance tracking step signals

Fig. 2. Tracks and ellipsoid invariant sets of system state and estimated state

1162

H. Zhang and Y. Wang

x and xˆ , where the solid and the dashed ellipsoids refers to the invariant sets S for n d = 5 and n d = 0 , respectively. The dashed lines and circular points are the track of xˆ , and the solid lines and star points denote the track of x . Fig, 2 illuminates the power of the auxiliary vector, i.e. the tracks of x and xˆ will converge to the origin quickly enough, which Fig. 2 shows the tracks and ellipsoid invariant sets of

validates the superiority of this proposed algorithm.

5 Conclusion For Wiener-typed nonlinear systems subjected to hard input constraints, a control algorithm based on dual-model technique is proposed. This algorithm has the following two advantages: 1) high precision, 2) large closed-loop stability region. Its feasibility and superiority are validated by simulation results.

References 1. Chen, H. (ed.): Quasi-finite horizon nonlinear model predictive control scheme with guaranteed stability. Automatica, Vol. 14,(1998) 1205-1217 2. Bloemen, H.H.J. (ed.): Wiener model identification and predictive control for dual composition control of a distillation column. J. Process Control, Vol.14, (2001) 601-620 3. Kalafatis, A. (ed.): A new approach to the identification of pH process based on the Wiener model, Chem.Eng.Science, Vol.50, (1995) 3693-3701 4. Pajunen, G.A. (ed.): Identification of a pH process represented by a nonlinear Wiener model. IFAC Adaptive. Syst. Control Signal Processing, (1983) 91-95. 5. Norquay, S.J. (ed.): Application of Wiener model predictive control (WMPC) to an industrial C2-splitter, J. Process Control, Vol.9, (1999) 461-473 6. Wang, X.J. (ed.): Weighting adaptive control of Wiener model based on multilayer feedforward neural networks, Proc. the 4th Word Congress on Intelligent Control and Automation. June 10th-14th , 2002, Shanghai, China 7. Kalafatis, A.D., Wang, L.: Identification of Wiener-type nonlinear system in a noisy environment. Int.J.Control. Vol.66, (1997) 923-941 8. Yang J.J.: Study on the Model Predictive Control Method of System with Input Constraints. Doctor Thesis of Northeast University of China. June, (2000). 9. Szmaier, M. (ed.): Suboptimal control of linear systems with state and control inequality constrains. Proc. IEEE Conf. Dec. Contr., (1997) 761-762 10. George,E.F. (ed.): Computer Methods for Mathematical Computations [B], Prentice-Hall, Inc., Englewood Cliffs, New Jersey, (1977) 156-166

NDP Methods for Multi-chain MDPs

Hao Tang1,2, , Lei Zhou1 , and Arai Tamio2 1

School of Computer and Information, Hefei University of Technology, Tunxi Road No.193, Hefei, Anhui 230009, P.R. China, 2 Dept. of Precision Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan [emailprotected] / {htang,arai}@prince.pe.u-tokyo.ac.jp

Abstract. Simulation optimization techniques are discussed for multichain Markov decision processes (MDPs) by the learning of performance potentials. Diﬀerent from ergodic or unichain models, where a single sample path suﬃces to be used for the learning of potentials, under a multichain case, there are more than one recurrent classes for the underlying Markov chain, therefore the sample path has to be restarted often so as not to circulate only in one recurrent class. Similar to unichain models, temporal diﬀerence (TD) learning algorithms can also be developed for learning potentials. In addition, by representing the estimates of potentials via a neural network, one neuro-dynamic programming (NDP) method, i.e., the critic algorithm, is derived as what has been supposed for unichain models. The obtained results are also applicable for general multichain semi-Markov decision processes (SMDPs), and we use a numerical example to illustrate the extension.

Introduction

For an ergodic or unichain MDP , we assume each stationary policy have only one recurrent class. However, this assumption may not hold for some practical problems. So, it is valuable to discuss the more general case, i.e., multi-chain MDPs. The underlying Markov chain corresponding to at least one stationary policy consists of two or more recurrent classes [1]. Performance potentials, introduced mainly by Cao for MDPs, have been proved eﬃcient in optimizing ergodic MDPs [5]. Recently, Cao and Guo have extended it to discrete-time multichain models [2]. Consider that potentials of multi-chain MDPs can also be estimated by sample paths as in unichain cases,

Partially supported by the National Nature Science Foundation of China (60404009), the Nature Science Foundation of Anhui Province(050420303), and the Support Project of Hefei University of Technology for Science and Technology-Innovation Groups. Corresponding author. Received his PH.D degree in 2002 from University of Science and Technology of China, thereafter an associate professor of Hefei University of Technology, P.R. China, and now also a visiting scholar at Advanced Robotics with Artiﬁcial Intelligence Lab in the University of Tokyo, Japan.

D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1163–1168, 2006. c Springer-Verlag Berlin Heidelberg 2006

1164

H. Tang, L. Zhou, and A. Tamio

so that neuro-dynamic programming (NDP) methods can be developed [3,4]. We have showed its successful application in ergodic MDPs or SMDPs by using potentials [6]. In this paper, we will discuss one model of NDP, i.e., critic algorithm, for multichain processes, where the potentials are learned based on TD(λ) learning and approximated by a neural network, and then approximate policy iteration is followed.

Continuous-Time Multi-chain MDPs

Consider a MDP {X(t), t ≥ 0} with state space Φ = {1, 2, · · · , M }, and let {Xn : Xn ∈ Φ, n = 0, 1, 2 · · · } be its underlying Markov chain, where Xn denotes the system state at the nth decision epoch. The set of stationary policies is denoted by Ω = {v|v : Φ → D} with D being the compact action set. Associated with any policy v is a M × M matrix Av such that its ij-element aij (v(i)) gives the transition rate to state j upon taking action v(i) at state i. Now suppose the process corresponding to policy v has m recurrent classes, i.e., Φ1 , Φ2 , · · · , Φm , and an additional class Φm+1 for transient states so that Φ = Φ1 ∪ Φ2 ∪ · · · ∪ Φm ∪ Φm+1 . We can assume matrix Av has the following canonical form " v # AR 0 v , (1) A = LvR Lvm+1 where AvR =diag(Av1 , Av2 , · · · , Avm ) with each Avi serving as the inﬁnitesimal generator of recurrent class m. Our goal is to choose a policy v ∗ from Ω that minimizes a given criteria in ∗ expectation, that is for any v ∈ Ω, ηαv ≤ ηαv under discounted criteria (α > 0, is ∗ a discount factor) or η v ≤ η v under average criteria. For every i, j ∈ Φ, let aij (v(i)) be a continuous function deﬁned on compact set D. Since Φ is ﬁnite, we can select a constant μ ≥ maxi∈Φ,v(i)∈D {−aii (v(i))}. v Av and constant μ can yield a stochastic matrix P = Av /μ + I, and I is the v identity matrix. In addition, P determines a uniformized Markov chain of the original continuous-time process with discount factor β = μ/(μ + α).

3.1

Multiple Sample Paths-Based NDP Optimization by Critic Algorithm Learning of Performance Potential

In this paper, we only consider average criteria for multichain models. As α = 0, let the performance potential vector, g v = (g v (1), g v (2), · · · , g v (M ))τ , satisfy v (−Av + μPv )g v = f v , where Pv is deﬁned as a Cesaro limit of P [1]. By the v v deﬁnition of P and g , let v

(I − P + Pv )g v = f v

(2)

NDP Methods for Multi-chain MDPs

1165

with gv = μg v , denoting the potential vector of the uniformized chain. By (2), we get gv (i) = f (i, v(i)) + P i (v(i))g v − Pvi g v (3) v

P i (v(i)) and Pvi denote the ith row of matrix P and Pv respectively. It is easy to prove η v = μPv g v = Pv gv . The main idea of NDP is to approximate some values through parametric architectures. Here, we will use a neural network to represent the potentials, and train the architecture parameters with sample paths by TD(λ) learning. TD(λ) learning is a multi-step method, where λ refers to the use of eligibility traces. Suppose the network output * g v (i, r) to be the approximation of gv (i) as we input i. Here, r is the parameter vector of the network. Then the parametric TD formula of potentials can be derived by (3) as follows dn = d(Xn , Xn+1 , r) = f (Xn , v(Xn )) − η*v (Xn ) + g*v (Xn+1 , r) − g*v (Xn , r) (4) where η*v (Xn ) is the estimate of average cost η v (Xn ). We will consider accumulating traces for TD(λ) learning that takes the form as

βλZn−1 (i) Xn = i Zn (i) = (5) βλZn−1 (i) + 1 otherwise where Zn (i) denotes the trace of state i at the nth decision epoch. (4) and (5) yield the following parameterized TD(λ) learning r := r + γZn dn

(6)

g v (Xn , r) Zn := λZn−1 + ∇*

(7)

where and ∇ denotes the gradient with respect to r. 3.2

Diﬃculty of the Learning in Multichain Cases

As we simulate a multichain process, it must ultimately follows into one recurrent class and circulate in this class forever, and only partial states can be visited by a single sample path. Therefore we have to restart the process often or use other approaches to derive multiple sample paths to approximate potentials. Another important character is that there is neither unique stationary distribution nor unique average cost, so that η v (Xn ) may be diﬀerent rather than identical for variant state Xn . Then, the learning or estimation of potentials and average costs will become more diﬃcult in comparison with ergodic or unichain processes. First, if we know distinctly the classiﬁcation of states for any given policy, then it will be easier to treat the learning. Since the average costs is identical for any two states of the same recurrent class, we only need m units to store the m average costs corresponding to Φ1 , Φ2 , · · · , Φm . For each sample path, it appears similar to unichain with recurrent class Φz , z ∈ {1, 2, · · · , m}, and the average cost of recurrent class, i.e., ηzv , is estimated according to ηzv + δf (Xn , v(Xn )) η*zv := (1 − δ)*

(8)

1166

H. Tang, L. Zhou, and A. Tamio

where δ denotes the stepsize. Note that, no matter which state the sample path starts from, (8) will generate a good approximation of ηzv after suﬃcient steps. For a transient state, its average cost is mainly determined by the values of all the recurrent classes, and the ultimate transition probabilities to every recurrent classes. Then, the learning of average costs for transient class at the end of a sample path can take the form η v (Xn ) + δ* ηzv η*v (Xn ) := (1 − δ)*

(9)

where δ can be viewed as the statistic probability of transition from Xn to recurrent class Φz . On the other hand, it is very diﬃcult to deal with the situation as multichain structure is unknowable. The straightforward method is to memorize each value of average costs for all states, or directly to ﬁnd out the classiﬁcation for every policy under the condition that the model parameters are known. However, it is unpractical in large scale systems because of “the curse of dimensionality” and “the curse of modeling”. There is no good approach to overcome these obstacles in our learning. The only heuristic method we may suppose is that we still use (8) for the learning of all states visited by a sample path, and use the average of the values, learned in the past paths, as initial cost of the next sample path. 3.3

Critic Algorithm Based on Potential Learning

For an optimal policy v ∗ of the uniformized chain of a multichain MDP, the average costs and potentials satisfy the system of two optimality equations, that is, 3 4 ∗ ∗ ∗ v v∗ v∗ 0 = min{P η − η } and 0 = min f (i, v(i)) + P i (v(i))g v − gv − η v (i) v∈Ω

∗

v(i)∈B i

∗

with B i = {d|d ∈ D, P i (d)η v = η v (i)}. Obviously, they are similar to the optimality equations in [2,1]. The algorithms is as follow. Algorithm 1. Policy Evaluation Step 1. Select a neural network and initial parameters r, λ and η v . Step 2. Select integer N and let n = 0, choose the initial state Xn . v Step 3. Simulate the next state Xn+1 according to P . Step 4. Calculate η*v (Xn ), η*zv and r through (8) or (9), (4), (6) and (7). Step 5. If n < N , let n := n + 1, and go to step 3. Step 6. If stopping criteria is satisﬁed, exit; otherwise, go to step 2. Algorithm 2. Policy Improvement Step 1. Let k = 0 and select an arbitrary initial policy v0 . Step 2. Evaluate policy vk by estimating η*vk and g*vk through Algorithm 1. Step 3. Try to improve the policy by the following substeps: Substep 3.1. Choose a policy v*k+1 satisfying 3 v 4 (10) v*k+1 ∈ arg min P η*vk v∈Ω

Substep 3.2. For every i ∈ Φ, if P i (* vk+1 (i)) η*vk = η*vk (i), select an action vk+1 (i) such that 5 6 (11) g vk vk+1 (i) ∈ arg min f (i, v(i)) + P i (v(i))* k

v(i)∈B i

NDP Methods for Multi-chain MDPs

1167

where B i = {d|d ∈ D, P i (d)* η vk = η*vk (i)}; otherwise, let vk+1 (i) = v*k+1 (i). Step 4. If some stopping criteria is satisﬁed, exit; otherwise, let k := k + 1, and go to step 2.

A Numerical Example About a SMDP

Since a semi-Markov decision process can be treated as an equivalent continuoustime MDP by using an inﬁnitesimal generator [7,8], our results can also be extended to a multichain SMDP. An example is followed. Consider a SMDP with ﬁnite state space Φ = {1, 2, · · · , 25} and compact action set D = [1, 5]. Here, there are two recurrent classes, i.e., Φ1 = {1, 2, · · · , 10} and Φ2 = {11, 12, · · · , 20}, and a transient class Φ3 = {21, 22, · · · , 25}. For i, j ∈ Φ1 , the transition probabilities of the underlying embedded Markov chain satisfy pij (v(i)) = exp(−v(i)/j)/[M (1 + exp(−v(i)))] as j = nexti ; otherwise, pij (v(i)) = 1 − j=nexti pij (v(i)). Here we use nexti to denote the next state of i, and nexti=10 = 1. The sojourn time of state i ∈ Φ1 satisﬁes 3-order Erlang distribution with the parameter√3v(i). The performance cost of state i satisﬁes f (i, v(i)) = ln[(1 + i)v(i)] + i/(2v(i)). For i, j ∈ Φ2 , pij (v(i)) = 20 exp(−v(i)/j)/ k=11 exp(−v(i)/k), the sojourn time distribution Fij (t, v(i)) = 1 − x1 exp(Gv(i) t)e, and f (i, v(i)) = ln(i/v(i)) + (v(i) + 1)v(i)/i. Here, x1 = [5/8, 3/8], Gv(i) = v(i)[−1, 0; 0, −3]. For i ∈ Φ3 and j ∈ Φ, if j = i, pij (v(i)) = exp[−(v(i) − 50/i)2/j]/25; otherwise, pij (v(i)) = 1 − j=i pij (v(i)). In addition, Fij (t, v(i)) = 1 − x1 exp(Gv(i) t)e, and f (i, v(i)) = 0.5v 2 (i) + v(i)/i. For simplicity, we choose a BP network with topological architecture being 5 × 3 × 1. We set the hidden layer to be sigmoid, and the output layer is linear. We ﬁrst transform such a SMDP to an equivalent MDP [7,8], and then using the supposed critic algorithm, get the optimization results as showed in Table 1. By the computation-based policy iteration algorithm, the average costs of recurrent classes Φ1 and Φ2 are 2.54140 and 2.37472 respectively, and the average costs of transient states are 2.42300, 2.42177, 2.42068, 2.41970, and 2.41882 respectively as the parameter of stop criteria ε = 0.00001. The whole computation time is only several seconds. By the proposed NDP algorithm, we have derived similar optimization results. In addition, less memory is needed to store the potentials, however it comes at the cost of much longer computation time. We also illustrate the optimization process by Figure 1, which shows the generated sequence of

Table 1. Optimization results by using NDP-based algorithm ε η vε (i), i ∈ Φ1 η vε (i), i ∈ Φ2 η vε (i), i ∈ Φ3 ts (s) 0.01 2.73396 2.37736 2.48065,2.47802,2.47567,2.47358,2.47170 176.8 0.001 2.71005 2.37647 2.47309,2.47063,2.46844,2.46648,2.46473 238.4 0.00001 2.62910 2.37982 2.45202,2.45019,2.44854,2.44709,2.44577 903.3

1168

H. Tang, L. Zhou, and A. Tamio

3.8

3.3

3.2

3.6

3.1 3.4

average cost

3 3.2

2.8

2.9

2.8

state 1−10 2.7

state 1−10 2.6

2.6

state 21−25 state 21−25

2.5

2.4

state 11−20

2.4

state 11−25 2.2

iteration times

Fig. 1. Average case as α = 0 and λ = 0

2.3

iteration times

Fig. 2. Average case as α = 0 and λ = 0.2

average costs based on TD(0) learning. Figure 2 is the optimization processes of TD(λ) learning with λ = 0.2.

Conclusions

By multiple sample paths, we can solve the optimization problems of multi-chain MDPs through potential-based NDP. It is more complex than that for ergodic or unichain models. In addition, there are many other issues valuable to discuss, such as the robust decision schemes for uncertain multichain processes.

References 1. Puterman, M. L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994) 2. Cao, X. R., Guo, X. P.: A Uniﬁed Approach to Markov Decision Problems and Performance Sensitivity Analysis with Discounted and Average Criteria: Multichain Cases. Automatica. 40 (2004) 1749–1759 3. Bertsekas, D. P., Tsitsiklis, J. N.: Neuro-Dynamic Programming. Athena Scientiﬁc, Belmont Massachusetts (1996) 4. Sutton, R. S., Barto, A. G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge MA (1998) 5. Cao, X. R.: From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications. 13 (2003) 9-39 6. Tang, H., Yuan, J. B., Lu, Y., Cheng, W. J.: Performance Potential-Based NeuroDynamic Programming for SMDPs. Acta Automatic Sinica(Chinese). 31 (2005) 642-645 7. Cao, X. R.: Semi-Markov Decision Problems and Performance Sensitivity Analysis. IEEE Trans. on Automatic Control. 48 (2003) 758–769 8. Yin, B. Q., Xi, H. S., Zhou, Y. P.: Queueing System Performance Analysis and Markov Control Processes. Press of University of Science and Technology of China, Hefei (2004)

Research of an Omniberaing Sun Locating Method with Fisheye Picture Based on Transform Domain Algorithm Xi-hui Wang 1, Jian-ping Wang 2, and Chong-wei Zhang 2 1 Hefei University of Technology, School of Electrical Engineering and Automation, Graduate Student. 545 P. O. mailbox HFUT south campus, Hefei, Anhui Province, China [emailprotected] 2 Hefei University of Technology, School of Electrical Engineering and Automation, Professor

Abstract. In this paper, a novel method of locating the sun spot is presented. Used the math transform domain algorithm to emphasize the brightness of the sun area in the fisheye picture, the human’s vision brightness sensitivity is simulated by this optic color filter impact. The small sun in the fisheye picture is segmented, and transformed to the plane picture instead of the whole picture. It is easy to get the coordinates of the sun area barycenter from the plane picture to the fisheye picture. Then the azimuth angle and the vertical angle between the vision point and the sun spot are calculated. The results of the experimentation show that the amount of computation of the algorithm is reduced a lot. It is accurate, fast and real-time.

1 Introduction In the research of mobile tracking sun spot, one of the key technologies is to omniberaing judge the position of the sun spot fast, accurately and dynamically. However it is difficult that the sun spot is dynamically located by the method of traditional latitude and longitude orientation. The large angle and three-dimensional picture is hardly to get by the plane picture. It is imprecise that the sun spot is located by multi-hole locate system [1]. Observed from the ground, the track of the sun movement approximately is a 0°— 180°curve from east to west. The process of simulating human to judge the position of sun can be described as the following steps: first get the sky picture form the holding point, then located the brightest point or area in the picture, at last calculated the azimuth angle and the vertical angle between the vision point and the sun. A fast locating algorithm is presents based on fisheye picture. Picture of sky is get by fisheye lens from our vision point. the human’s vision brightness sensitivity is simulated which the high bright area is emphasized in the fish eye picture by (H, S, I) math transform domain algorithm. The small sun in the fisheye picture is segmented, and transformed to the plane picture instead of the whole picture. It is easy to get the coordinates of the sun area barycenter from the plane picture to the fisheye picture. Then the azimuth angle and the vertical angle between the vision point and the sun spot are calculated. Finally the omniberaing dynamical fast tracking of sun spot is achieved. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp. 1169 – 1174, 2006. © Springer-Verlag Berlin Heidelberg 2006

1170

X.-h. Wang , J.-p. Wang , and C.-w. Zhang

2 A Fast Judge Algorithm of the Brightest Point in Picture Based on Transform Domain Algorithm It is assured that the brightest point in the fisheye picture of the sky is the sun spot, and the naked eye is more sensitive to brightness than chroma. The optic color filter which intensifies the brightness information of the picture while impairing the chroma information of the background is achieved by math transform domain algorithm. [2] The function to transform the RGB picture into HIS picture is showed below:

I = ( R + G + B) / 3

S = 1−

3 min( R, G, B) I

H =θ

G≥B

H = 2π − θ

G≤B

(1)

(2)

In function (2)

ª « −1 θ = cos « « «¬

1 [(R − G ) + (R − B )] º» 2 » (R − G )2 + (R − B )(G − B ) » »¼

(3)

Fh Fs Fi can be used to identify the space F. In another word, F = {Fh , Fs , Fi } . In this equation Fh = WH H The only three-dimensional vectors

Fi = WS S

FS = WI I

WI are the three weights.

After anglicizing and counting with the experiment, in order to emphasize the sun area which is strong in brightness and short of change in color, we commonly define WH =0.12 WS =0.14 WI =0.65. In this way, the color of the background which is thin in brightness is weakened, and the brightest point that we are interested in is emphasized. The position of the sun is a bright spot in fisheye picture. In a HIS picture this spot can simply mark as a pixel area which is not only H ≤ 50 but also I ≥ 150

P = { pi H PI ≤ 50 ∪ I PI ≥ 150} i=1 2 … N

(4)

It is hard to locate the accurate position of the center of the sun area with fisheye picture because of distortion of fisheye picture, so it is necessary to transform the fisheye picture to plane picture. Considering the tremendous distance between vision point and the sun, it is only necessary to transform the small area of aggregation (4).

Research of an Omniberaing Sun Locating Method with Fisheye Picture

1171

3 The Transform Arithmetic between Fisheye Picture and Plane Picture In a fisheye picture f(x is

(x=1

…

y=1

…

n), the shooting angle

ϕ (get from the notebook of the fisheye lens). From these parameters, we can easily

get the paraboloid equation of fisheye lens. [3] The paraboloid equation is

(5)

In function (5) T t is the number of the effective pixels,Ȉx and Ȉy are the once quadratures. r1 =_OE_

Fig. 1. Relationship between fisheye picture andrandom distance plane picture

The relationship between fisheye picture and random distance plane picture is showed in Fig.1. P1 is a random point of the sun area in fisheye picture. With equation 5 , it is easy to calculate the corresponding point P2 in ellipse paraboloid. Linked point O and point P2, then prolonged the line OP2 across the plane ABCD on point P3. Point P3 is the only corresponding point P1. Mapping the every point in sun spot aggregation P into plane coordinates system, the sun spot area in plane picture is got. In order to locate the position of the sun, it is necessary to calculate the barycenter of the sun area in plane picture. N

xo =

¦x i =0

yo =

¦y

i =0

(6)

In (6) equations, x pi is the transverse coordinates, y pi is the vertical coordinates, ( xo

yo ) is the coordinates of the barycenter of the sun spot in plane picture.

1172

X.-h. Wang , J.-p. Wang , and C.-w. Zhang

4 The Location Arithmetic of Angles Between Sun and Vision Point Azimuth angle and vertical angle are the two key elements in tracking and locating system, and the aim of this paper is to find out the function of these two parameters. Because of the characteristic of the fisheye picture, it is easy to locate the position. So it is necessary to transform the xo yo to coordinates in fisheye picture. The transforming process refer to section 3. Supposing the corresponding coordinates in fisheye picture of xo yo is p1 x1, y1 .

(

)

Fig. 2. Point location in fisheye picture

(

)

Showed in Fig.2, every random point p1 x1, y1 in two-dimensional projection plane OXY has its corresponding point

p2 ( x2 , y 2 , z 2 ) in paraboloid. According to

the equation (5), the function of Z is

(

R 2 − (X 2 + Y 2 ) 2R

)

The relationship between p1 x1, y1 and

(7)

p2 ( x2 , y 2 , z 2 ) is showed below

x2 = x1 y 2 = y1 z2 =

(

R − x1 + y1 2R 2

(8)

)

Now it is safe and clear to conclude the functions of the azimuth angle vertical angle β

and

Research of an Omniberaing Sun Locating Method with Fisheye Picture

(

R 2 − x1 + y1 z 2R β = arctg 2 = arctg x2 x1

α = arctg

yt x2 + y 2 2

)

= arctg

(

1173

(9)

)·¸

¸ ¹

5 Experiment Analysis Fig.3 is the original picture caught by fisheye lens. Fig.4 is the result after transacting the Fig.3 with transform domain algorithm which achieves optic color filter impact. Seen from the Fig.4 the figure of the sun is clearly.

Fig. 3. Original picture caught by fisheye lens

Fig. 4. Picture after transact

1174

X.-h. Wang , J.-p. Wang , and C.-w. Zhang

6 Conclusion The proposed algorithm in this paper to fast determine the azimuth angle and vertical angle between sun and vision point has several advantages. 1 With fisheye lens , the whole sun moving trace is caught in one picture. 2 With transform domain algorithm which achieves optic color filter impact, fast segmenting the high bright part in picture is easy. 3 According to the relationship between fisheye picture and plane picture, only transform the high bright part would be satisfied, the amount of computation reduce a lot. 4 With help of this algorithm, the azimuth angle and vertical angle between sun and vision point accurately can be fast calculated In conclusion, the merits of the algorithm announced in this paper are short of amount of computation, high accuracy and the most important point is that it achieves the mobile omniberaing fast locating sun. The further key study should be to simplify algorithm and enhance real-time characteristic.

References 1. Chen, S.E., Quick Time, V.R.: An Image Based Approach to Virtual Environment. Proceedings of Siggraph’95. Los Angeles, LA, USA(1995) 29-38 2. Wang, J.P., Qian, B., Jiang, T.: Research on the Segmentation Method of License Plate Based on Space Transform Analysis. Journal of HEFEI University of Technology, 27(3) (2004) 251-255 3. Wang, J.Y., Yang, X.Q., Zhang, C.M.: Environments of Full View Navigation Based on Picture Taken by Eye Fish Lens. Journal of System Simulation, Vol. 13. SUPPL, (2001) 66-68 4. Shah, S., Aggarwal, J.K.: A Simple Calibration Procedure for Fish Eye (high distortion) Lens Camera. In: Proceedings of the IEEE International Conference on Robotics and Automation, San Diego, CA, USA, 3 (1994) 3422- 3427

Author Index

Ahn, Jae-Hyeong 566, 676, 1093 An, GaoYun 90 An, Senjian 450 Baek, Seong-Joon 488, 735 Barajas-Montiel, Sandra E. 876 Bashar, Md. Rezaul 9 Bayarsaikhan, Battulga 201 Bi, Houjie 938 Cai, Lianhong 1107 Cao, Xue 556 Cao, Yongfeng 528 Chang, Un-Dong 566, 676 Chen, Baisheng 1068 Chen, Duansheng 1068 Chen, Guobin 211 Chen, Huajie 882 Chen, Hui 631 Chen, Jianhua 809 Chen, Jiayu 410 Chen, Jingnian 1036 Chen, Min-Jiang 364 Chen, Tao 141 Chen, Tianding 100, 120, 263, 285 Chen, TsiuShuang 1137 Chen, Weidong 932 Chen, Wen-Sheng 191, 294, 547 Chen, Xiaosu 995 Chen, Yanquan 285 Cheng, Long 430 Cho, Youn-Ho 470 Choi, Hyunsoo 970 Choi, Soo-Mi 945 Dai, Li 888 Dai, Ru-Wei 131 Dimililer, Kamil 913 Dong, Shugang 715 Dong, Yuan 906 Duan, Huilong 639, 803 Erkmen, Burcu

779, 1081

Fang, Bin 294, 547 Fang, Zhijun 211, 958 Feng, Chen 715, 761 Feng, Yueping 600 Fu, Shujun 1036 Fung, Richard Y.K. 1137 Gao, Ming 1 Gao, Xiao-Shan 191 Gao, Xue 657 Gao, Yong 172 Gomez, Emilia 951 Gu, Juan-juan 663 Gu, Xuemai 741 Guan, Jin-an 1101 Guo, Ge 689 Guo, Jun 773, 906 Guo, Qing 741 Guo, Xiaoxin 600, 815 Ha, Jong-Eun 478, 606, 728 Han, Bo 840 Han, Jae-Hyuk 1093 Han, Jialing 72 Han, Xiao 870 Hazan, Amaury 951 He, Hongjie 374 He, Yong 42 Hong, Hyunki 827 Hong, Sung-Hoon 488 Hou, Gang 72 Hou, Zeng-Guang 430 Hu, Dewen 645, 864 Hu, Dongchuan 689 Hu, Min 421 Hu, Zhijian 251 Huang, De-Xian 1125, 1151 Huang, Dezhi 906 Huang, Zhichu 982 Hwang, Yongho 827 Jang, Daesik 1024 Jang, Dong-Young 230 Jeong, Mun-Ho 478, 606, 728 Ji, Guangrong 715, 761

1176

Author Index

Ji, Zhenhai 846 Jia, Chunfu 709 Jia, Xiuping 791 Jiang, Gang-Yi 626, 988 Jiang, Julang 421 Jiang, Tao 421, 932 Jiao, Liangbao 938 Jin, Weidong 150 Jin, Yi-hui 1151 Jing, Zhong 220 Jo, Kang-Hyun 1030 Ju, Fang 1056 Kang, Dong-Joong 478, 606, 728 Kang, Hang-Bong 852 Kang, Hyun-Deok 1030 Kang, Jiayin 797 Kang, Jin-Gu 1113 Kang, Sanggil 54 Khashman, Adnan 913 Kim, Cheol-Ki 976 Kim, Daejin 488 Kim, Dong Kook 488 Kim, Dong-Woo 566, 676 Kim, Hyung Moo 440 Kim, Hyun-Joong 945 Kim, Jane-Jin 1113 Kim, Jeong-Sik 945 Kim, Kap-Sung 821 Kim, Wonil 54, 894 Kim, Yong-Deak 626 Kim, Young-Gil 566 Kim, Youngouk 1042 Kong, Jun 19, 62, 72 Kong, Min 900 Kwak, Hoon Sung 334, 440 Kwon, Dong-Jin 1093 Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee,

Bae-Ho 488 Bum Ju 721 Bum-Jong 1018 Choong Ho 1062 Chulhee 970 DoHoon 976 Eunjae 970 Han Jeong 334 Han-Ku 894 Heon Gyu 721 Hwa-Sei 976 Jihoon 1042

Lee, Kang-Kue 470 Lee, Kwan-Houng 1113, 1145 Lee, Kyoung-Mi 182 Lee, Wang-Heon 478 Lei, Jianjun 773 Li, Fucui 626 Li, Hua 572 Li, Hui 241 Li, Jiao 755 Li, Lei 1075 Li, Miao 220 Li, Xiao 797 Li, Yao-Dong 497 Li, Yongjie 834 Li, Zheng-fang 683, 696 Li, Zhi-shun 1008 Lin, Xinggang 517 Liu, Bo 1151 Liu, Gang 773 Liu, Guixi 1013 Liu, Heng 578 Liu, Jian 303, 620 Liu, Jinzhu 797 Liu, Ping 858 Liu, Qingbao 864 Liu, Qing-yun 1008 Liu, Wanquan 450 Liu, Wei 919 Liu, Yang 864 Long, Lei 1137 Lu, Chong 450 Lu, Yan-Sheng 364 Lu, Yinghua 19, 62, 72 Luan, Qingxian 797 Luo, Bin 312, 900 Luo, Jie 241 Luo, Xiaobin 631 Lv, Yujie 241 Ma, Jixin 312 Ma, Si-liang 870 Ma, Xin 1056 Ma, Yongjun 1087 Maestre, Esteban 951 Meng, Helen M. 1107 Min, Lequan 797 Moon, Cheol-Hong 230, 821 Nam, Mi Young 201 Nguyen, Q. 322

Author Index Nian, Rui 715, 761 Nie, Xiangfei 773 Niu, Yifeng 343 Noh, Kiyong 721 Oh, Sangyoon

Paik, Joonki 1042 Pang, Yunjie 815 Park, Aaron 488, 735 Park, Changhan 1042 Park, Changwoo 1042 Park, Jong-Seung 1018 Park, Kyu-Sik 470, 1051 Peng, Fuyuan 620 Peng, Yuhua 353 Ping, Xijian 689, 1075 ¨ un¸c 402 Polat, Ov¨ Premaratne, P. 322 Puiggros, Montserrat 951 Qi Miao 62 Qian, Bin 1125 Ramirez, Rafael 951 Reyes-Garc´ıa, Carlos A. 876 Rhee, Phill Kyu 9, 201 Rong, Haina 150 Ruan, QiuQi 90, 1036 Ryu, Keun Ho 721 Safaei, F. 322 Sedai, Suman 201 Sekeroglu, Boran 913 Seo, Duck Won 440 Shang, Yan 461, 517 Shao, Wenze 925 Shao, Yongni 42 Shen, Lincheng 343 Shi, Chaojian 30, 651 Shi, Xinling 809 Shi, Zhongchao 702 Shon, Ho-Sun 721 Song, Changzhe 1013 Song, Huazhu 840 Song, Jian-She 162 Song, Jiatao 211 Song, Young-Jun 566, 676, 1093 Su, Guangda 461, 517

Sun, Hong Sun, Ning

1177

392, 410, 528 846

Tamio, Arai 1163 Tan, Min 430 Tan, Yihua 303 Tang, Hao 1163 Tang, Jianliang 191 Tang, Yuan Yan 547 Tao, Liang 663 Tian, Jie 241 Tian, Jinwen 303 Tian, Yan 620 Tian, Zheng 749 Tong, Li 1075 Uwamahoro, Diane Rurangirwa Vicente, Veronica

951

Wan, Changxuan 958 Wang, Bin 30, 651 Wang, Chun-Dong 1 Wang, Chun-Heng 131, 497 Wang, Hai-Hui 364 Wang, Haila 906 Wang, Hong 614 Wang, Jian 773 Wang, Jian-Li 141 Wang, Jian-ping 1169 Wang, Jianzhong 19 Wang, Jue 858 Wang, Junyan 517 Wang, Kuanquan 964 Wang, Lei 670 Wang, Liguo 755, 767, 791 Wang, Ling 1125, 1151 Wang, Lu 670 Wang, Shuhua 62 Wang, Tao 275 Wang, Wei 72 Wang, Wenqia 1036 Wang, Wenyuan 670 Wang, Xi-hui 1169 Wang, Xin 815 Wang, Xiong 1125 Wang, Xiu-Feng 1 Wang, Yangsheng 172, 593, 702 Wang, Yongji 1157 Wang, Yu-Er 988

728

1178

Author Index

Wang, Yunxiao 600, 815 Wang, Yuru 62 Wang, Zhang 303 Wang, Zhengxuan 600 Wang, Zhengyou 211, 958 Wei, Wei 882 Wei, Zhihui 584, 925 Wen, Jing 547 Wen, Xian-Bin 749 Wu, Dan 741 Wu, Shaoxiong 112 Wu, Shiqian 211, 958 Wu, Xiangqian 964 Wu, Yan 888 Wu, Zhenhua 995 Wu, Zhiyong 1107 Wu, ZhongCheng 1002 Xia, Shunren 639, 803 Xia, Yong 497 Xiang, Youjun 696 Xiang, Zhiyu 785 Xiao, Bai-Hua 131, 497 Xiao, Daoju 995 Xiao, Huan 670 Xiao, Yi 81 Xie, Ling 374 Xie, Sheng-Li 507 Xie, Yubo 620 Xin, Guan 81 Xing, Guobo 620 Xu, Dong 572 Xu, Ge 392, 410 Xu, Jun 162 Xu, Lei 131 Xu, Liang 858 Xu, Min 141 Xu, Shenghua 958 Xu, Weidong 639, 803 Xu, Xin 528 Xu, Yong 220 Xu, Zhi-liang 683, 696 Xu, Zhiwen 600 Xue, Feng 421 Xue, Quan 211 Yan, Jingqi 578 Yan, Qinghua 982 Yang, Jianwei 294 Yang, Jing-Yu 220, 556

Yang, Juanqi 689 Yang, Miao 809 Yang, Ming 353 Yang, Shiming 761 Yang, Wen 392, 410 Yang, Xi 932 Yang, Zhen 773 Yao, Dezhong 834 Yi, Wenjuan 626 Yıldırım, T¨ ulay 402, 779, 1081 Yoon, Kyoungro 894 Yoon, Kyungro 54 Yoon, Won-Jung 1051 You, Bum-Jae 728 You, He 81 You, Kang Soo 334, 440 You, Xinge 547 Young, Nam Mi 9 Yu, Mei 626 Yuan, Xiaoliang 906 Zeng, De-lu 683 Zeng, Weiming 211, 958 Zhang, Baixing 461 Zhang, Chengxue 251 Zhang, Chong-wei 1169 Zhang, David 220, 578, 964 Zhang, De 938 Zhang, Fan 538 Zhang, Gexiang 150 Zhang, Haitao 1157 Zhang, Han-ling 1008 Zhang, Hua 749 Zhang, Jiafan 982 Zhang, Jiashu 374, 631 Zhang, Jingbo 19 Zhang, Jingdan 19 Zhang, Jingwei 645 Zhang, Jun 584 Zhang, Li-bao 382 Zhang, LiPing 1002 Zhang, Shanwen 614 Zhang, Tai-Ping 547 Zhang, Tong 858 Zhang, Xinhong 538 Zhang, Xinxiang 294 Zhang, Yan 275 Zhang, Ye 755, 767, 791 Zhang, Yonglin 982 Zhang, Yong Sheng 275

Author Index Zhang, Yousheng 421 Zhang, Yu 870 Zhang, Yufeng 809 Zhang, Zanchao 803 Zhang, Zeng-Nian 988 Zhang, Zutao 631 Zhao, Chunhui 767 Zhao, Di 1013 Zhao, Guoxing 312 Zhao, Haifeng 900 Zhao, Li 846 Zhao, Xuying 593, 702 Zhao, YuHong 919 Zheng, Wenming 846 Zheng, Xiaolong 593, 702 Zheng, Yong-An 162

Zheng, Yuan F. 932 Zhong, Anming 709 Zhong, Luo 840 Zhou, Lei 1163 Zhou, Lijian 715, 761 Zhou, Ming-quan 382 Zhou, Weidong 1056 Zhou, Wenhui 785 Zhou, Wen-Ming 162 Zhou, Xinhong 353 Zhou, Zhi-Heng 507 Zhou, Zongtan 645, 864 Zhu, Liangjia 645 Zhu, Zhong-Jie 988 Zhuo, Qing 670 Zou, Cairong 846

1179

Lecture Notes in Control and Information Sciences Edited by M. Thoma and M. Morari Further volumes of this series can be found on our homepage: springer.com

Vol. 345: Huang, D.-S.; Li, K.; Irwin, G.W. (Eds.) Intelligent Computing in Signal Processing and Pattern Recognition 1179 p. 2006 [3-540-37257-1] Vol. 344: Huang, D.-S.; Li, K.; Irwin, G.W. (Eds.) Intelligent Control and Automation 1121 p. 2006 [3-540-37255-5] Vol. 341: Commault, C.; Marchand, N. (Eds.) Positive Systems 448 p. 2006 [3-540-34771-2] Vol. 340: Diehl, M.; Mombaur, K. (Eds.) Fast Motions in Biomechanics and Robotics 500 p. 2006 [3-540-36118-9] Vol. 339: Alamir, M. Stabilization of Nonlinear Systems Using Receding-horizon Control Schemes 325 p. 2006 [1-84628-470-8] Vol. 338: Tokarzewski, J. Finite Zeros in Discrete Time Control Systems 325 p. 2006 [3-540-33464-5] Vol. 337: Blom, H.; Lygeros, J. (Eds.) Stochastic Hybrid Systems 395 p. 2006 [3-540-33466-1] Vol. 336: Pettersen, K.Y.; Gravdahl, J.T.; Nijmeijer, H. (Eds.) Group Coordination and Cooperative Control 310 p. 2006 [3-540-33468-8] Vol. 335: Kozáowski, K. (Ed.) Robot Motion and Control 424 p. 2006 [1-84628-404-X] Vol. 334: Edwards, C.; Fossas Colet, E.; Fridman, L. (Eds.) Advances in Variable Structure and Sliding Mode Control 504 p. 2006 [3-540-32800-9] Vol. 333: Banavar, R.N.; Sankaranarayanan, V. Switched Finite Time Control of a Class of Underactuated Systems 99 p. 2006 [3-540-32799-1] Vol. 332: Xu, S.; Lam, J. Robust Control and Filtering of Singular Systems 234 p. 2006 [3-540-32797-5] Vol. 331: Antsaklis, P.J.; Tabuada, P. (Eds.) Networked Embedded Sensing and Control 367 p. 2006 [3-540-32794-0] Vol. 330: Koumoutsakos, P.; Mezic, I. (Eds.) Control of Fluid Flow 200 p. 2006 [3-540-25140-5]

Vol. 329: Francis, B.A.; Smith, M.C.; Willems, J.C. (Eds.) Control of Uncertain Systems: Modelling, Approximation, and Design 429 p. 2006 [3-540-31754-6] Vol. 328: Lora, A.; Lamnabhi-Lagarrigue, F.; Panteley, E. (Eds.) Advanced Topics in Control Systems Theory 305 p. 2006 [1-84628-313-2] Vol. 327: Fournier, J.-D.; Grimm, J.; Leblond, J.; Partington, J.R. (Eds.) Harmonic Analysis and Rational Approximation 301 p. 2006 [3-540-30922-5] Vol. 326: Wang, H.-S.; Yung, C.-F.; Chang, F.-R.

H∞ Control for Nonlinear Descriptor Systems

164 p. 2006 [1-84628-289-6] Vol. 325: Amato, F. Robust Control of Linear Systems Subject to Uncertain Time-Varying Parameters 180 p. 2006 [3-540-23950-2] Vol. 324: Christoˇdes, P.; El-Farra, N. Control of Nonlinear and Hybrid Process Systems 446 p. 2005 [3-540-28456-7] Vol. 323: Bandyopadhyay, B.; Janardhanan, S. Discrete-time Sliding Mode Control 147 p. 2005 [3-540-28140-1] Vol. 322: Meurer, T.; Graichen, K.; Gilles, E.D. (Eds.) Control and Observer Design for Nonlinear Finite and Inˇnite Dimensional Systems 422 p. 2005 [3-540-27938-5] Vol. 321: Dayawansa, W.P.; Lindquist, A.; Zhou, Y. (Eds.) New Directions and Applications in Control Theory 400 p. 2005 [3-540-23953-7] Vol. 320: Steffen, T. Control Reconˇguration of Dynamical Systems 290 p. 2005 [3-540-25730-6] Vol. 319: Hofbaur, M.W. Hybrid Estimation of Complex Systems 148 p. 2005 [3-540-25727-6] Vol. 318: Gershon, E.; Shaked, U.; Yaesh, I. H∞ Control and Estimation of State-muliplicative Linear Systems 256 p. 2005 [1-85233-997-7] Vol. 317: Ma, C.; Wonham, M. Nonblocking Supervisory Control of State Tree Structures 208 p. 2005 [3-540-25069-7]

Vol. 316: Patel, R.V.; Shadpey, F. Control of Redundant Robot Manipulators 224 p. 2005 [3-540-25071-9] Vol. 315: Herbordt, W. Sound Capture for Human/Machine Interfaces: Practical Aspects of Microphone Array Signal Processing 286 p. 2005 [3-540-23954-5]

Vol. 300: Nakamura, M.; Goto, S.; Kyura, N.; Zhang, T. Mechatronic Servo System Control Problems in Industries and their Theoretical Solutions 212 p. 2004 [3-540-21096-2] Vol. 299: Tarn, T.-J.; Chen, S.-B.; Zhou, C. (Eds.) Robotic Welding, Intelligence and Automation 214 p. 2004 [3-540-20804-6]

Vol. 314: Gil', M.I. Explicit Stability Conditions for Continuous Systems 193 p. 2005 [3-540-23984-7]

Vol. 298: Choi, Y.; Chung, W.K. PID Trajectory Tracking Control for Mechanical Systems 127 p. 2004 [3-540-20567-5]

Vol. 313: Li, Z.; Soh, Y.; Wen, C. Switched and Impulsive Systems 277 p. 2005 [3-540-23952-9]

Vol. 297: Damm, T. Rational Matrix Equations in Stochastic Control 219 p. 2004 [3-540-20516-0]

Vol. 312: Henrion, D.; Garulli, A. (Eds.) Positive Polynomials in Control 313 p. 2005 [3-540-23948-0]

Vol. 296: Matsuo, T.; Hasegawa, Y. Realization Theory of Discrete-Time Dynamical Systems 235 p. 2003 [3-540-40675-1]

Vol. 311: Lamnabhi-Lagarrigue, F.; Lora, A.; Panteley, E. (Eds.) Advanced Topics in Control Systems Theory 294 p. 2005 [1-85233-923-3]

Vol. 295: Kang, W.; Xiao, M.; Borges, C. (Eds) New Trends in Nonlinear Dynamics and Control, and their Applications 365 p. 2003 [3-540-10474-0]

Vol. 310: Janczak, A. Identiˇcation of Nonlinear Systems Using Neural Networks and Polynomial Models 197 p. 2005 [3-540-23185-4]

Vol. 294: Benvenuti, L.; De Santis, A.; Farina, L. (Eds) Positive Systems: Theory and Applications (POSTA 2003) 414 p. 2003 [3-540-40342-6]

Vol. 309: Kumar, V.; Leonard, N.; Morse, A.S. (Eds.) Cooperative Control 301 p. 2005 [3-540-22861-6] Vol. 308: Tarbouriech, S.; Abdallah, C.T.; Chiasson, J. (Eds.) Advances in Communication Control Networks 358 p. 2005 [3-540-22819-5]

Vol. 307: Kwon, S.J.; Chung, W.K. Perturbation Compensator based Robust Tracking Control and State Estimation of Mechanical Systems 158 p. 2004 [3-540-22077-1] Vol. 306: Bien, Z.Z.; Stefanov, D. (Eds.) Advances in Rehabilitation 472 p. 2004 [3-540-21986-2] Vol. 305: Nebylov, A. Ensuring Control Accuracy 256 p. 2004 [3-540-21876-9] Vol. 304: Margaris, N.I. Theory of the Non-linear Analog Phase Locked Loop 303 p. 2004 [3-540-21339-2] Vol. 303: Mahmoud, M.S. Resilient Control of Uncertain Dynamical Systems 278 p. 2004 [3-540-21351-1]

Vol. 293: Chen, G. and Hill, D.J. Bifurcation Control 320 p. 2003 [3-540-40341-8] Vol. 292: Chen, G. and Yu, X. Chaos Control 380 p. 2003 [3-540-40405-8] Vol. 291: Xu, J.-X. and Tan, Y. Linear and Nonlinear Iterative Learning Control 189 p. 2003 [3-540-40173-3] Vol. 290: Borrelli, F. Constrained Optimal Control of Linear and Hybrid Systems 237 p. 2003 [3-540-00257-X] Vol. 289: Giarre, L. and Bamieh, B. Multidisciplinary Research in Control 237 p. 2003 [3-540-00917-5] Vol. 288: Taware, A. and Tao, G. Control of Sandwich Nonlinear Systems 393 p. 2003 [3-540-44115-8] Vol. 287: Mahmoud, M.M.; Jiang, J.; Zhang, Y. Active Fault Tolerant Control Systems 239 p. 2003 [3-540-00318-5]

Vol. 302: Filatov, N.M.; Unbehauen, H. Adaptive Dual Control: Theory and Applications 237 p. 2004 [3-540-21373-2]

Vol. 286: Rantzer, A. and Byrnes C.I. (Eds) Directions in Mathematical Systems Theory and Optimization 399 p. 2003 [3-540-00065-8]

Vol. 301: de Queiroz, M.; Malisoff, M.; Wolenski, P. (Eds.) Optimal Control, Stabilization and Nonsmooth Analysis 373 p. 2004 [3-540-21330-9]

Vol. 285: Wang, Q.-G. Decoupling Control 373 p. 2003 [3-540-44128-X]

Intelligent Computing in Signal Processing and Pattern Recognition: International Conference on Intelligent Computing, ICIC 2006, Kunming, China, (Lecture Notes in Control and Information Sciences) - PDF Free Download (2024)

References