2017-06-17 38 views
1

我想迭代相同的代码,用于像SAS一样的不同宏集,然后附加所有填充在一起的表。由于我来自萨斯背景,我很困惑如何在Pyspark环境中做到这一点。任何帮助深表感谢!如何在SAS中像pyspark一样循环宏?

实施例代码如下:

STEP1:定义宏变量

lastyear_st=201615 
lastyear_end=201622 

thisyear_st=201715 
thisyear_end=201722 

STEP2:循环通过各种宏变量

代码
customer_spend=sqlContext.sql(""" 
select a.customer_code, 
sum(case when a.week_id between %d and %d then a.spend else 0 end) as spend 
from tableA 
group by a.card_code 
""" 
%(lastyear_st,lastyear_end) 
(thisyear_st,thisyear_end)) 

STEP3:附加上述各填充数据集的到基础表

回答

1
# macroVars are your start and end values arranged as list of list. 
# where each innner list contains start and end value 

macroVars = [[201615,201622],[201715, 201722]] 

# loop thru list of list ==> 
for start,end in macroVars: 

    # prepare query using the values of start and end 
    query = "SELECT a.customer_code,Sum(CASE\ 
    WHEN a.week_id BETWEEN {} AND {} \ 
    THEN a.spend \ 
    ELSE 0 END) \ 
    AS spend FROM tablea GROUP BY a.card_code".format(start,end) 

    # execute query 
    customer_spend = sqlContext.sql(query) 

    # depending on your base table setup use appropriate write command for example 

    customer_spend\ 
    .write.mode('append')\ 
    .parquet(os.path.join(tempfile.mkdtemp(), 'data')) 
+0

嗨普希卡,谢谢你。我也可以在列表中使用字符串值吗?所以我的意思是,它可以是[['a','b','c'],[1,2,'x]]等等。 –

+0

是的,你也可以使用字符串 – Pushkr

+0

我也可以单独定义一个宏变量出数组,并在数组中引用它,例如:a =“”“花> 0然后1 else 0结束”“”[[a ,1,2],[a,2,4]] –