我有一个CSV文件,其中有一列日期和另一列的Twitter追随者数量。我想计算Twitter追随者的月份增长率,但日期可能不会相隔30天。所以,如果我有迭代csv中的日期列来计算每30天变量的增长率
- 2016年3月10日以200追随者
- 2016年2月8日以195名追随者
- 2016年1月1日以105名追随者
我怎样才能通过迭代来产生月份增长率?我已经尝试与大熊猫一起工作,但有困难。我想过使用R来做这件事,但我宁愿用Python做,因为我会将数据输出到Python中的新CSV中。
我有一个CSV文件,其中有一列日期和另一列的Twitter追随者数量。我想计算Twitter追随者的月份增长率,但日期可能不会相隔30天。所以,如果我有迭代csv中的日期列来计算每30天变量的增长率
我怎样才能通过迭代来产生月份增长率?我已经尝试与大熊猫一起工作,但有困难。我想过使用R来做这件事,但我宁愿用Python做,因为我会将数据输出到Python中的新CSV中。
我的团队,我用下面的函数来解决这个问题。 下面的代码:
def compute_mom(data_list):
list_tuple = zip(data_list[1:],data_list)
raw_mom_growth_rate = [((float(nxt) - float(prev))/float(prev))*100 for nxt, prev in list_tuple]
return [round(mom, 2) for mom in raw_mom_growth_rate]
希望这有助于..
这里有一个defaultdict
import csv
from collections import defaultdict
from datetime import datetime
path = "C:\\Users\\USER\\Desktop\\YOUR_FILE_HERE.csv"
with open(path, "r") as f:
d = defaultdict(int)
rows = csv.reader(f)
for dte, followers in rows:
dte = datetime.strptime(dte, "%Y-%m-%d")
d[dte.year, dte.month] += int(followers)
print d
to_date_followers = 0
for (year, month) in sorted(d):
last_month_and_year = (12, year-1) if month == 1 else (month-1, year)
old_followers = d.get(last_month_and_year, 0)
new_followers = d[year, month]
to_date_followers += new_followers
print "%d followers gained in %s, %s resulting in a %.2f%% increase from %s (%s followers to date)" % (
new_followers-old_followers, month, year, new_followers*100.0/to_date_followers, ', '.join(str(x) for x in last_month_and_year), to_date_followers
)
的方法对于下面输入:
2015-12-05,10
2015-12-31,10
2016-01-01,105
2016-02-08,195
2016-03-01,200
2016-03-10,200
2017-03-01,200
它打印:
defaultdict(<type 'int'>, {(2015, 12): 20, (2016, 1): 105, (2016, 3): 400,
(2017, 3): 200, (2016, 2): 195})
20 followers gained in 12, 2015 resulting in a 100.00% increase from 11, 2015 (20 followers to date)
105 followers gained in 1, 2016 resulting in a 84.00% increase from 12, 2015 (125 followers to date)
195 followers gained in 2, 2016 resulting in a 60.94% increase from 1, 2016 (320 followers to date)
400 followers gained in 3, 2016 resulting in a 55.56% increase from 2, 2016 (720 followers to date)
200 followers gained in 3, 2017 resulting in a 21.74% increase from 2, 2017 (920 followers to date)
如果你想要一个月到一个月的比较,只是玩一个月到几个月的差异(而不是新的追随者到跑步比率),那么这本字典就有你需要的所有数据 - 我在给定年份有多少追随者,月 - 你只需要根据这些数据进行计算 – Bahrom
非常感谢你的回复。我能想出下面的代码,实现我所期待的(我没想到能够做到这一点,但偶然发现在正确的时间正确的功能):
import csv, datetime, string, os
import pandas as pd
df = pd.read_csv('file_name.csv', sep=',')
# This converts our date strings to date_time objects
df['Date'] = pd.to_datetime(df['Date'])
# But we only want the date, so we strip the time part
df['Date'] = df['Date'].dt.date
sep = ' '
# This allows us to iterate through the rows in a pandas dataframe
for index, row in df.iterrows():
if index == 0:
start_date = df.iloc[0]['Date']
Present = df.iloc[0]['Count']
continue
# This assigns the date of the row to the variable end_date
end_date = df.iloc[index]['Date']
delta = start_date - end_date
# If the number of days is >= to 30
if delta >= 30:
print "Start Date: {}, End Date: {}, delta is {}".format(start_date, end_date, delta)
Past = df.iloc[index]['Count']
percent_change = ((Present-Past)/Past)*100
df.set_value(index, 'MoM', percent_change)
# Sets a new start date and new TW FW count
start_date = df.iloc[index]['Date']
Present = df.iloc[index]['Count']
你能不能给我们一个样本输入和输出样本(CSV格式)? – Bahrom
'm =(y2 - y1)/(x2 - x1)''所以你不会做'rate =(followers - prev_fol)/(time - prev_time)'?这将代表随时间变化的追随者在任何时间间隔内的变化 –
我认为这实际上取决于它意味着什么是“月月”增长率。假设你想要瞬间增长率,规范化为30天的月份? – Paul