我想获得百分比频率在pyspark。我这样做的pyhton如下我怎样才能获得百分比频率在pyspark
Companies = df['Company'].value_counts(normalize = True)
获取的频率是相当简单:
# Dates in descending order of complaint frequency
df.createOrReplaceTempView('Comp')
CompDF = spark.sql("SELECT Company, count(*) as cnt \
FROM Comp \
GROUP BY Company \
ORDER BY cnt DESC")
CompDF.show()
+--------------------+----+
| Company| cnt|
+--------------------+----+
|BANK OF AMERICA, ...|1387|
| EQUIFAX, INC.|1285|
|WELLS FARGO & COM...|1119|
|Experian Informat...|1115|
|TRANSUNION INTERM...|1001|
|JPMORGAN CHASE & CO.| 905|
| CITIBANK, N.A.| 772|
|OCWEN LOAN SERVIC...| 481|
如何得到百分之频率从这里?我尝试了一堆没有太多运气的东西。 任何帮助,将不胜感激。
有关使用总如何计算百分比。 – Suresh
如果您发现答案有帮助,请接受它 - 谢谢 – desertnaut