我想计算疾病对的Tanimoto系数(集合/连接的交集)。样本数据在下面,仅针对1对疾病。 其中疾病1是NK细胞缺陷和疾病2是腺苷琥珀酸裂解酶缺陷。计算Tanimoto系数
第1组是疾病1(NK细胞缺陷),其具有来自Gene1列的所有基因。
第2组是疾病2(腺嘌呤琥珀酸裂解酶缺陷症),其具有来自Gene2栏的所有基因。
**Gene1** **Gene2** **Disease1** **Disease2**
IMPDH1 XDH NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 ADA NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 NPR1 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 IMPDH1 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 IMPDH2 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 PPP3R2 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 RRM1 NK cell defects Adenylosuccinate lyase deficiency
NPR1 POLA1 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 ITGAL NK cell defects Adenylosuccinate lyase deficiency
ITGAL NPR1 NK cell defects Adenylosuccinate lyase deficiency
CASP3 NPR1 NK cell defects Adenylosuccinate lyase deficiency
PTK2B NPR1 NK cell defects Adenylosuccinate lyase deficiency
TNF GUCY1A2 NK cell defects Adenylosuccinate lyase deficiency
PTK2B GUCY1A2 NK cell defects Adenylosuccinate lyase deficiency
任何建议,就如何做到这一点在MySQL或R
感谢,
罗汉
您可以定义在这种情况下,交集和并集?可重复的数据将帮助人们回答很长的路要走。尝试在data.frame上使用'dput'。 – TheComeOnMan
集合1是Disease1,其中包含Gene1中的所有基因,集合2是Disease2,其中包含Gene2列中的所有基因。交集是Gene1和Gene2中常见基因IMPDH1,PPP3R2,ITGAL,NPR1的数目。 Union是Gene1和Gene2 Column中基因的总数。 – Rgeek