{"id":1624,"date":"2024-10-10T17:31:38","date_gmt":"2024-10-10T15:31:38","guid":{"rendered":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/?p=1624"},"modified":"2024-10-21T17:12:30","modified_gmt":"2024-10-21T15:12:30","slug":"multiple-imputation-for-heterogeneous-biological-data","status":"publish","type":"post","link":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/multiple-imputation-for-heterogeneous-biological-data\/","title":{"rendered":"Multiple imputation for heterogeneous biological data"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;3.22&#8243;][et_pb_row _builder_version=&#8221;3.22&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221;][et_pb_column type=&#8221;1_4&#8243; _builder_version=&#8221;3.0.47&#8243;][et_pb_text _builder_version=&#8221;3.22.1&#8243; background_color=&#8221;#072c72&#8243; border_color_all=&#8221;#3255c9&#8243; text_orientation=&#8221;right&#8221; background_layout=&#8221;dark&#8221; custom_padding=&#8221;20px|15px|15px|&#8221; z_index_tablet=&#8221;500&#8243;]<\/p>\n<p><em>2022<\/em><\/p>\n<p><em>Masters Projects<\/em><\/p>\n<p><span data-sheets-root=\"1\">@Mathematics and Statistics<\/span><\/p>\n<p>[\/et_pb_text][et_pb_text _builder_version=&#8221;3.22.1&#8243; text_orientation=&#8221;right&#8221; z_index_tablet=&#8221;500&#8243;]<\/p>\n<p>+Computer Science<\/p>\n<p>+Biology<\/p>\n<p>+Medicine<\/p>\n<p>\u00a0<\/p>\n<p>#Multiple imputation<\/p>\n<p>#complex data<\/p>\n<p>#missing values<\/p>\n<p>#biology<\/p>\n<p>#heterogeneity<\/p>\n<p>#sequential imputation<\/p>\n<p>#clusterwise regression<\/p>\n<p>\u00a0<\/p>\n<p>[\/et_pb_text][\/et_pb_column][et_pb_column type=&#8221;3_4&#8243; _builder_version=&#8221;3.0.47&#8243;][et_pb_text _builder_version=&#8221;3.22.1&#8243; z_index_tablet=&#8221;500&#8243;]<\/p>\n<h3><strong>Project Summary<\/strong><\/h3>\n<p>Identifying clusters of individuals in heterogeneous data is a classical task in data mining. However, many cluster analyses methods do not address missing data. The topics of missing data is largely studied in the literature, but it remains limited in the context of clustering.<\/p>\n<p>Multiple imputation is one of the efficient methods to tackle the missing data issue, notably in the regression framework, but also recently in clustering. Its principle consists in replacing each missing value by several plausible values. Denoting M this number, it consists in generating M imputed data sets, i.e. completed. In the linear regression framework, regression coefficients are then estimated from each imputed data set, leading to M sets of regression coefficients. Finally, these estimates are pooled using the so-called Rubin\u2019s rules, providing a unique point estimate as well as a unique estimate of its associated standard error.<\/p>\n<p>One classical way for generating imputed data consists in imputing variables one by one according to univariate regression models. This technique is well-known from the medical field under the name fully conditional specification and very popular beyond this field. However, such methods are not tailored to deal with heterogeneous data. Some works have been recently done in this line, but they remain limited.<\/p>\n<p>Thus, in this project we propose a novel fully conditional specification method based on clusterwise linear regression instead of linear regression. Clusterwise regression methods boil down building a collection of local regression models so that each group of individuals is associated to a specific regression model. In this work, the novel multiple imputation method is presented. Its properties are theoretically studied and assessed by an extensive simulation study.<\/p>\n<p>From a practical point of view, this method offers to practicians an efficient way to tackle complex data in their medical studies. More generally, this multiple imputation method can be valuable for any clustering analysis with missing values (since implicitly assuming a heterogeneity between individuals) and thus covers a large spectrum of applications in data science.<\/p>\n<p>\u00a0<\/p>\n<h5><strong>Matthieu Resche-Rigon <\/strong><\/h5>\n<p>\u00a0<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row custom_margin=&#8221;120px||&#8221; _builder_version=&#8221;3.22.1&#8243; admin_label=&#8221;Row&#8221; locked=&#8221;off&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;3.22.1&#8243;][et_pb_divider _builder_version=&#8221;3.22.1&#8243;][\/et_pb_divider][et_pb_text admin_label=&#8221;\u00c0 lire aussi&#8221; _builder_version=&#8221;3.22.1&#8243; z_index_tablet=&#8221;500&#8243; locked=&#8221;off&#8221;]<\/p>\n<h2><span class=\"st\">Projects in the same discipline<br \/><\/span><\/h2>\n<p>[\/et_pb_text][et_pb_blog posts_number=&#8221;4&#8243; include_categories=&#8221;35&#8243; show_author=&#8221;off&#8221; show_date=&#8221;off&#8221; show_pagination=&#8221;off&#8221; module_id=&#8221;page_type_blog&#8221; _builder_version=&#8221;3.22.1&#8243; header_level=&#8221;h4&#8243; border_width_bottom_fullwidth=&#8221;1px&#8221; border_color_bottom_fullwidth=&#8221;rgba(51,51,51,0.18)&#8221; custom_padding=&#8221;||50px|&#8221; z_index_tablet=&#8221;500&#8243; locked=&#8221;off&#8221;][\/et_pb_blog][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>2022Masters Projects@Mathematics and Statistics +Computer Science+Biology+Medicine\u00a0#Multiple imputation#complex data#missing values#biology#heterogeneity#sequential imputation#clusterwise regression\u00a0 Project SummaryIdentifying clusters of individuals in heterogeneous data is a classical task in data mining. However, many cluster analyses methods do not address missing data. The topics of missing data is largely studied in the literature, but it remains limited in the context of&hellip; <a class=\"continue\" href=\"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/multiple-imputation-for-heterogeneous-biological-data\/\">Lire la suite<span> Multiple imputation for heterogeneous biological data<\/span><\/a><\/p>\n","protected":false},"author":560,"featured_media":2263,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[55,1,29,35],"tags":[],"class_list":["post-1624","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-55","category-diip","category-masters-internship","category-mathematics-statistics"],"_links":{"self":[{"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/posts\/1624","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/users\/560"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/comments?post=1624"}],"version-history":[{"count":3,"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/posts\/1624\/revisions"}],"predecessor-version":[{"id":2349,"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/posts\/1624\/revisions\/2349"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/media\/2263"}],"wp:attachment":[{"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/media?parent=1624"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/categories?post=1624"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordpress-test.app.u-pariscite.fr\/diip\/wp-json\/wp\/v2\/tags?post=1624"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}