2017-09-05 13 views
1

我有一个关于组合的问题,但在一个相当复杂的情况下,我还没有找到任何帮助。我试图找到一种方法来报告数据集中的所有可能组合。报告列的所有可能的组合

有关土地变化文献调查的数据报告,并指出每篇文章中报告哪些近似和潜在驱动因素。因此,这些行表示单独的文章,并且列表示所有邻近和下面的驱动程序。有六种类型的接近驱动程序和五种类型的基础驱动程序。对于每篇文章,将1放置在该文章中标识的驱动程序的列中,并在驱动程序的列中放置一个0。该表看起来大致像这样:

key | d1 | d2 |...| d6 | i1 |...| i5 | 
-------------------------------------- 
A1 | 1 | 0 |...| 1 | 1 |...| 0 | 
A2 | 0 | 1 |...| 0 | 0 |...| 1 | 

凡文章A1标识D1和D6直接驱动和I1作为间接驱动等

我想什么做的是找出文章的数量报告直接驱动因素,间接驱动因素以及直接驱动因素和间接驱动因素的所有可能组合。举例来说,有多少篇文章标识了d1,d2和i1;多少个确定d1,d2和i2;等等?我的学生在Excel文件中有表格,我在想,也许Calc或Base可能有一个功能来自动执行该过程。有没有人有一个想法,我该如何做到这一点?

谢谢!

+0

所以你想识别所有2^11组合,并计算每个组合的数量?这是2048种不同的组合。 –

+0

这就是为什么我希望简化这个过程。这个想法是确定哪些驱动程序组合在文献中最常出现。 – Napoletano

+0

使用一个小UDF将驱动程序组合成条件字符串(或连接二进制数字)。然后使用数据透视表来统计每个组合字符串的数量。 – MacroMarc

回答

0

我终于放弃了,并采取了蛮力的方法。我将表格导出为文本并将其拉入MySQL,然后使用bash脚本遍历选项。如果其他人有类似的问题,这里是bash脚本:

# Generate array containing factors 
faclis1=(d_inf d_com d_inm d_ind d_agr d_bos i_dem i_eco i_tec i_pol i_cul); 
#faclis=("d_inf" "d_com" "d_inm"); 
a=0 
#echo ${faclis[@]}; 


# Remove output file if exists 
if [ -e permcounts.txt ]; 
    then 
    rm permcounts.txt; 
    fi; 

# Cycle through list of factors 
for f1 in ${faclis1[@]}; 
do 
# only proceed if factor not null 
if [ ${f1} ]; 
then 
# print remaining array just to be sure 
echo "factor list is ${faclis1[@]}"; 
#echo ${faclis[@]}; 
echo "Now on factor ${f1}"; 
echo "FACTOR ${f1}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1;" metamorelia >> permcounts.txt; 
# create sub array without current factor, 2 factors 
faclis2=(${faclis1[@]/${f1}/}); 
#set sub-counter 
b=0 
#echo "${faclis2[@]}"; 
# loop through sub array, two factors 
for f2 in ${faclis2[@]}; 
do 
if [ ${f2} ] && \ 
[ "${f1}" != "${f2}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis3=(${faclis2[@]//${f2}}); 
c=0 
#echo "${faclis3[@]}"; 
# loop through sub-array 
for f3 in ${faclis3[@]}; 
do 
if [ ${f3} ] && \ 
[ "${f1}" != "${f3}" ] && \ 
[ "${f2}" != "${f3}" ]; 

then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis4=(${faclis3[@]//${f3}}); 
d=0 
#echo "${faclis4[@]}"; 
# loop through sub-array 
for f4 in ${faclis4[@]}; 
do 
if [ ${f4} ] && \ 
[ "${f1}" != "${f4}" ] && \ 
[ "${f2}" != "${f4}" ] && \ 
[ "${f3}" != "${f4}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis5=(${faclis4[@]//${f4}}); 
e=0 
#echo "${faclis5[@]}"; 
# loop through sub-array 
for f5 in ${faclis5[@]}; 
do 
if [ ${f5} ] && \ 
[ "${f1}" != "${f5}" ] && \ 
[ "${f2}" != "${f5}" ] && \ 
[ "${f3}" != "${f5}" ] && \ 
[ "${f4}" != "${f5}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis6=(${faclis5[@]//${f5}}); 
f=0 
#echo "${faclis6[@]}"; 
# loop through sub-array 
for f6 in ${faclis6[@]}; 
do 
if [ ${f6} ] && \ 
[ "${f1}" != "${f6}" ] && \ 
[ "${f2}" != "${f6}" ] && \ 
[ "${f3}" != "${f6}" ] && \ 
[ "${f4}" != "${f6}" ] && \ 
[ "${f5}" != "${f6}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis7=(${faclis6[@]//${f6}}); 
g=0 
#echo "${faclis7[@]}"; 
# loop through sub-array 
for f7 in ${faclis7[@]}; 
do 
if [ ${f7} ] && \ 
[ "${f1}" != "${f7}" ] && \ 
[ "${f2}" != "${f7}" ] && \ 
[ "${f3}" != "${f7}" ] && \ 
[ "${f4}" != "${f7}" ] && \ 
[ "${f5}" != "${f7}" ] && \ 
[ "${f6}" != "${f7}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis8=(${faclis7[@]//${f7}}); 
h=0 
#echo "${faclis8[@]}"; 
# loop through sub-array 
for f8 in ${faclis8[@]}; 
do 
if [ ${f8} ] && \ 
[ "${f1}" != "${f8}" ] && \ 
[ "${f2}" != "${f8}" ] && \ 
[ "${f3}" != "${f8}" ] && \ 
[ "${f4}" != "${f8}" ] && \ 
[ "${f5}" != "${f8}" ] && \ 
[ "${f6}" != "${f8}" ] && \ 
[ "${f7}" != "${f8}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7} \ 
AND ${f8}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1 and \ 
${f8} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis9=(${faclis8[@]//${f8}}); 
i=0 
#echo "${faclis9[@]}"; 
# loop through sub-array 
for f9 in ${faclis9[@]}; 
do 
if [ ${f9} ] && \ 
[ "${f1}" != "${f9}" ] && \ 
[ "${f2}" != "${f9}" ] && \ 
[ "${f3}" != "${f9}" ] && \ 
[ "${f4}" != "${f9}" ] && \ 
[ "${f5}" != "${f9}" ] && \ 
[ "${f6}" != "${f9}" ] && \ 
[ "${f7}" != "${f9}" ] && \ 
[ "${f8}" != "${f9}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7} \ 
AND ${f8} \ 
AND ${f9}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1 and \ 
${f8} = 1 and \ 
${f9} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis10=(${faclis9[@]//${f9}}); 
j=0 
#echo "${faclis10[@]}"; 
# loop through sub-array 
for f10 in ${faclis10[@]}; 
do 
if [ ${f10} ] && \ 
[ "${f1}" != "${f10}" ] && \ 
[ "${f2}" != "${f10}" ] && \ 
[ "${f3}" != "${f10}" ] && \ 
[ "${f4}" != "${f10}" ] && \ 
[ "${f5}" != "${f10}" ] && \ 
[ "${f6}" != "${f10}" ] && \ 
[ "${f7}" != "${f10}" ] && \ 
[ "${f8}" != "${f10}" ] && \ 
[ "${f9}" != "${f10}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7} \ 
AND ${f8} \ 
AND ${f9} \ 
AND ${f10}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1 and \ 
${f8} = 1 and \ 
${f9} = 1 and \ 
${f10} = 1;" metamorelia >> permcounts.txt; 

# next sub-array 
faclis11=(${faclis10[@]//${f10}}); 
k=0 
#echo "${faclis11[@]}"; 
# loop through sub-array 
for f11 in ${faclis11[@]}; 
do 
if [ ${f11} ] && \ 
[ "${f1}" != "${f11}" ] && \ 
[ "${f2}" != "${f11}" ] && \ 
[ "${f3}" != "${f11}" ] && \ 
[ "${f4}" != "${f11}" ] && \ 
[ "${f5}" != "${f11}" ] && \ 
[ "${f6}" != "${f11}" ] && \ 
[ "${f7}" != "${f11}" ] && \ 
[ "${f8}" != "${f11}" ] && \ 
[ "${f9}" != "${f11}" ] && \ 
[ "${f10}" != "${f11}" ]; 
then 
echo "FACTOR ${f1} \ 
AND ${f2} \ 
AND ${f3} \ 
AND ${f4} \ 
AND ${f5} \ 
AND ${f6} \ 
AND ${f7} \ 
AND ${f8} \ 
AND ${f9} \ 
AND ${f10} \ 
AND ${f11}" >> permcounts.txt; 
mysql -u harvey -pdavid -e "select count(clave) from genfact where \ 
${f1} = 1 and \ 
${f2} = 1 and \ 
${f3} = 1 and \ 
${f4} = 1 and \ 
${f5} = 1 and \ 
${f6} = 1 and \ 
${f7} = 1 and \ 
${f8} = 1 and \ 
${f9} = 1 and \ 
${f10} = 1 and \ 
${f11} = 1;" metamorelia >> permcounts.txt; 

unset faclis11[k]; 
k=$((${k} + 1)); 
fi; 
done; 
unset faclis10[j]; 
j=$((${j} + 1)); 
fi; 
done; 
unset faclis9[i]; 
i=$((${i} + 1)); 
fi; 
done; 
unset faclis8[h]; 
h=$((${h} + 1)); 
fi; 
done; 
unset faclis7[g]; 
g=$((${g} + 1)); 
fi; 
done; 
unset faclis6[f]; 
f=$((${f} + 1)); 
fi; 
done; 
unset faclis5[e]; 
e=$((${e} + 1)); 
fi; 
done; 
unset faclis4[d]; 
d=$((${d} + 1)); 
fi; 
done; 
unset faclis3[c]; 
c=$((${c} + 1)); 
fi; 
done; 
# Remove analyzed factors from vector 
unset faclis2[b]; 
b=$((${b} + 1)); 
fi; 
done; 
# remove nth item from array (progressively remove one item) 
unset faclis1[a]; 
# increment n for next round 
a=$((${a} + 1)); 
echo ${n}; 
fi; 
done; 

这个脚本是在有点低效我想我加入了很多不必要的操作,但它得到了这份工作完成。 (我想是的,我的学生必须导航输出文件,以确保一切都在那里。)