From mchang21 at uiuc.edu Mon Mar 2 09:53:34 2009 From: mchang21 at uiuc.edu (Ming-Wei Chang) Date: Mon, 02 Mar 2009 09:53:34 -0600 Subject: [nl-uiuc] Upcoming talk at the AIIS seminar Message-ID: Dear faculty and students, A Ph.D. candidate of our department, Lin Tan, will give a talk (details below) for the AIIS seminar at 4:00 pm, Mar 5th (this Thursday). The room number is 3405. Hope to see you there! /* Leveraging Code Comments to Improve Software Reliability */ Software reliability is critically important. This work focuses on addressing fundamental challenges of software reliability: obtaining accurate program specifications and discovering tools/languages limitations. In this talk, I will show that comments provide a great data source for obtaining important information, including specifications and problems of current tools/languages. First, I will present a novel approach, iComment, which is the first work to automatically extract specifications from comments written in natural language and use these specifications to detect comment-code inconsistencies, i.e., software bugs and bad comments. Our evaluation on large real-world software such as the Linux kernel, Mozilla, Apache and Wine and 2 types of comments shows that iComment effectively extracted 1832 specifications and detected 60 new bugs and bad comments. iComment combines techniques from different areas, including natural language processing (NLP), machine learning, information retrieval, program analysis and statistics. To help explain the pros and cons of extracting specifications from comments compared to extracting specifications from code, I will briefly discuss AutoISES, which infers security specifications by statically analyzing source code, and then directly use these specifications to automatically detect security bugs/violations. I will also briefly present, cComment, which studies comment semantics and characteristics to further understand what other comments can be utilized, how we can utilize them, and what important problems/limitations they reveal. We discovered many interesting findings that can guide the design of new languages and tools for improving reliability, programmer productivity, software evolution, etc. Bio: Lin Tan is a Ph.D. candidate in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Her research areas include software systems, software reliability and security, with a focus on using interdisciplinary techniques such as machine learning, data mining, computer architecture and program analysis to address systems reliability problems. She currently holds an IBM Ph.D. Fellowship. Her recent work on architectural support for intrusion detection has been successfully transferred and licensed since 2006, and was selected into the IEEE Micro's Top Picks 2006. Thank you, Ming-Wei From mchang21 at uiuc.edu Thu Mar 5 08:54:33 2009 From: mchang21 at uiuc.edu (Ming-Wei Chang) Date: Thu, 05 Mar 2009 08:54:33 -0600 Subject: [nl-uiuc] (Reminder) AIIS seminar at 4:00 pm In-Reply-To: (Ming-Wei Chang's message of "Mon, 02 Mar 2009 09:53:34 -0600") References: Message-ID: Dear faculty and students, This is a reminder for the AIIS seminar at 4:00pm this afternoon (room 3405). A Ph.D. candidate of our department, Lin Tan, will give an interesting talk (details below). The room number is 3405. Hope to see you there! Best, Ming-Wei Ming-Wei Chang writes: > Dear faculty and students, > > A Ph.D. candidate of our department, Lin Tan, will give a talk (details > below) for the AIIS seminar at 4:00 pm, Mar 5th (this Thursday). The > room number is 3405. Hope to see you there! > > /* Leveraging Code Comments to Improve Software Reliability */ > > Software reliability is critically important. This work focuses on > addressing fundamental challenges of software reliability: obtaining > accurate program specifications and discovering tools/languages > limitations. In this talk, I will show that comments provide a great > data source for obtaining important information, including > specifications and problems of current tools/languages. First, I will > present a novel approach, iComment, which is the first work to > automatically extract specifications from comments written in natural > language and use these specifications to detect comment-code > inconsistencies, i.e., software bugs and bad comments. Our evaluation > on large real-world software such as the Linux kernel, Mozilla, Apache > and Wine and 2 types of comments shows that iComment effectively > extracted 1832 specifications and detected 60 new bugs and bad > comments. iComment combines techniques from different areas, including > natural language processing (NLP), machine learning, information > retrieval, program analysis and statistics. To help explain the pros > and cons of extracting specifications from comments compared to > extracting specifications from code, I will briefly discuss AutoISES, > which infers security specifications by statically analyzing source > code, and then directly use these specifications to automatically > detect security bugs/violations. I will also briefly present, > cComment, which studies comment semantics and characteristics to > further understand what other comments can be utilized, how we can > utilize them, and what important problems/limitations they reveal. We > discovered many interesting findings that can guide the design of new > languages and tools for improving reliability, programmer > productivity, software evolution, etc. > > > Bio: > > Lin Tan is a Ph.D. candidate in the Department of Computer Science at > the University of Illinois at Urbana-Champaign. Her research areas > include software systems, software reliability and security, with a > focus on using interdisciplinary techniques such as machine learning, > data mining, computer architecture and program analysis to address > systems reliability problems. She currently holds an IBM > Ph.D. Fellowship. Her recent work on architectural support for > intrusion detection has been successfully transferred and licensed > since 2006, and was selected into the IEEE Micro's Top Picks 2006. > > Thank you, > Ming-Wei > > > _______________________________________________ > cogcomp mailing list > cogcomp at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cogcomp From juliahmr at cs.uiuc.edu Mon Mar 9 23:36:22 2009 From: juliahmr at cs.uiuc.edu (Julia Hockenmaier) Date: Mon, 9 Mar 2009 23:36:22 -0500 Subject: [nl-uiuc] NLP lunch: Tutorial on Constrained Conditional Models for NLP Message-ID: <0B8D5C42-3EF1-4E38-8FF2-CAE0A813DFF4@cs.uiuc.edu> Dear all, this week and next week, we will get a tutorial by Lev Ratinov, Ming- Wei Chang and Dan Roth about Constrained Conditional Models for NLP -- this is a practice for their EACL tutorial, so please come and give them feedback! The schedule for this semester is almost full -- http://nlp.cs.uiuc.edu/lunch.html but if you want to give a talk, please let me know! Julia Constrained Conditional Models for NLP Ming-Wei Chang, Lev Ratinov, Dan Roth Making decisions in natural language processing problems often involves assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate, what assignments are possible. This setting is of particular significance in structured learning problems such as semantic role labeling, named entity and relation recognition, co-reference resolution, transliteration, summarization and machine translation, but the approach has a broader set of applications such as textual entailment and question answering. In all these cases, it is natural to either formulate the decision process as a constrained optimization problem, or to break up the complex problem into a set of subproblems and require solutions to be consistent modulo (soft, possibly) constraints. In both cases, the resulting objective function is composed of learned models, subject to domain or problem specific constraints. Constrained Conditional Models is a learning and inference framework that refers to augmenting the learning of conditional (probabilistic or discriminative) models with declarative constraints (written, for example, using a first-order representation) as a way to support decisions in an expressive output space while maintaining modularity and tractability of training and inference. Models of this kind have recently attracted much attention within the NLP community. Formulating problems as constrained optimization problems over the output of learned models has several advantages. It allows one to focus on the modeling of problems by providing the opportunity to incorporate problem specific global constraints using a first order language, freeing the developer from (much of the) low level feature engineering, and it can also guarantee exact inference. It provides the freedom of decoupling the stage of model generation (learning) from that of the constrained inference stage, often resulting in simplifying the learning stage and the engineering problem of building an NLP system, while improving the quality of the solutions. The primary goal of this tutorial is to introduce the framework of Constrained Conditional Models (CCMs) to the broader ACL community, motivate it as a generic framework for learning and inference in global NLP decision problems, present some of the key theoretical and practical issues involved in using CCMs and survey some of the existing applications of it as a way to promote further development of the framework and additional applications. The tutorial will thus be useful for many of the senior and junior researchers that have interest in global decision problems in NLP, providing a concise overview of recent perspectives and research results. From mchang21 at uiuc.edu Mon Mar 16 16:57:31 2009 From: mchang21 at uiuc.edu (Ming-Wei Chang) Date: Mon, 16 Mar 2009 16:57:31 -0500 Subject: [nl-uiuc] Upcoming talk at the AIIS seminar Message-ID: Dear faculty and students, A Ph.D. candidate of CS department, Kevin Small, will give a talk (details below) for the AIIS seminar at 4:00 pm, Mar 19th (this Thursday). The room number is 3405. Hope to see you there! Interactive Learning Protocols for Natural Language Applications Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpus-based statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks including parsing, machine translation, and information extraction. However, while supervised machine learning is well understood, its successful application to practical scenarios incur significant costs associated with annotating large data sets and feature engineering. In this talk, I will describe methods for reducing annotation costs and improving system performance through interactive learning protocols. The first part of the talk describes my research on active learning strategies for the structured output and pipeline model settings, two widely-used models for complex application scenarios where obtaining labeled data is particularly expensive. Secondly, I will introduce the interactive feature space construction protocol, which uses a more sophisticated interaction to incrementally add application-targeted domain knowledge into the feature space to improve performance and reduce the need for labeled data. I will also present empirical results for the semantic role labeling and named entity/relation extraction NLP tasks, demonstrating state of the art performance with significantly reduced annotation requirements. BIO: Kevin Small is a Ph.D. candidate in the Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests are in the areas of machine learning, natural language processing, and artificial intelligence. At UIUC, he is a member of the Cognitive Computation Group under the direction of Professor Dan Roth. Kevin?s primary research results concern using interactive learning protocols to improve the performance of machine learning algorithms while reducing sample complexity. Best, Ming-Wei From mchang21 at uiuc.edu Thu Mar 19 11:25:03 2009 From: mchang21 at uiuc.edu (Ming-Wei Chang) Date: Thu, 19 Mar 2009 11:25:03 -0500 Subject: [nl-uiuc] (Reminder) Upcoming talk at the AIIS seminar In-Reply-To: (Ming-Wei Chang's message of "Mon, 16 Mar 2009 16:57:31 -0500") References: Message-ID: Dear faculty and students, This is a reminder for the talk by Kevin Small in this afternoon. The talk will start at 4:00 pm at SC 3405. Hope to see you there! Best, Ming-Wei Ming-Wei Chang writes: > Dear faculty and students, > > A Ph.D. candidate of CS department, Kevin Small, will give a talk (details > below) for the AIIS seminar at 4:00 pm, Mar 19th (this Thursday). The > room number is 3405. Hope to see you there! > > Interactive Learning Protocols for Natural Language Applications > > Statistical machine learning has become an integral technology for > solving many informatics applications. In particular, corpus-based > statistical techniques have emerged as the dominant paradigm for core > natural language processing (NLP) tasks including parsing, machine > translation, and information extraction. However, while supervised > machine learning is well understood, its successful application to > practical scenarios incur significant costs associated with annotating > large data sets and feature engineering. > > In this talk, I will describe methods for reducing annotation costs > and improving system performance through interactive learning > protocols. The first part of the talk describes my research on active > learning strategies for the structured output and pipeline model > settings, two widely-used models for complex application scenarios > where obtaining labeled data is particularly expensive. Secondly, I > will introduce the interactive feature space construction protocol, > which uses a more sophisticated interaction to incrementally add > application-targeted domain knowledge into the feature space to improve > performance and reduce the need for labeled data. I will also present > empirical results for the semantic role labeling and named > entity/relation extraction NLP tasks, demonstrating state of the art > performance with significantly reduced annotation requirements. > > BIO: > > Kevin Small is a Ph.D. candidate in the Department of Computer Science > at the University of Illinois at Urbana-Champaign. His research > interests are in the areas of machine learning, natural language > processing, and artificial intelligence. At UIUC, he is a member of > the Cognitive Computation Group under the direction of Professor Dan > Roth. Kevin?s primary research results concern using interactive > learning protocols to improve the performance of machine learning > algorithms while reducing sample complexity. > > > Best, > > Ming-Wei > > > _______________________________________________ > cogcomp mailing list > cogcomp at cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cogcomp From mchang21 at uiuc.edu Mon Mar 30 13:23:19 2009 From: mchang21 at uiuc.edu (Chang Ming-Wei) Date: Mon, 30 Mar 2009 21:23:19 +0300 Subject: [nl-uiuc] Upcoming Talk in the AIIS Seminar Message-ID: <6ac2aea20903301123m39813db5v8da3763f08b13714@mail.gmail.com> Dear faculty and students, A Ph.D. candidate of CS department, Alexandre Klementiev, will give a talk (details below) for the AIIS seminar at 4:00 pm, Apr 2nd (this Thursday). The room number is 3405. Hope to see you there! Title: Learning with Incidental Supervision Abstract: Moving toward understanding and automatic generation of natural human languages requires a toolbox of core capabilities. It is well accepted today that it is essentially impossible to manually encode many of these capabilities without the aid of machine learning techniques, which automatically acquire them from available natural language data. Corpus-based supervised learning has emerged as the dominant approach, and it relies crucially on the availability of labeled data. However, while unsupervised data is usually plentiful, its annotation is a laborious process for a number of realistic Natural Language Processing tasks, especially those dealing with structured output spaces. In this talk, I will argue that it is often possible to derive a surrogate supervision signal from a small amount of background knowledge and often plentiful weakly structured unsupervised data. We call this setting "learning with incidental supervision", and study it in the context of the following tasks. First, we consider the problem of Named Entity (NE) annotation transfer to a resource-poor language in a bilingual corpus. We demonstrate that temporal similarity of NE counterparts across languages can be used as an incidental supervision signal to drive learning of a discriminative transliteration model. Second, we consider the task of unsupervised aggregation of structured output models. We demonstrate for ranked data that agreement between constituent models can serve as an incidental supervision signal sufficient to learn an effective aggregation model. Bio: Alexandre Klementiev is a Ph.D. candidate at the UIUC Department of Computer Science working with Prof. Dan Roth. His research interests are on the intersection of Machine Learning and Natural Language Processing. More specifically, he is interested in weakly supervised learning problems in NLP, multilingual information extraction, and information fusion. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.cs.uiuc.edu/pipermail/nl-uiuc/attachments/20090330/3c773fc3/attachment.html