2016-01-24 26 views
0

我想在R中导入一个CSV数据。它是一行数据,并有逗号分隔的条目。数据虚拟提供如下:如何在R中逐字输入CSV?

Id,SecoId,TertioID,CreateDate,Lat,Long,Duration,Istrue,JournalDate,Post 3232,123,345,30/04/14 2:00,11.726,11.728,5,FALSE,02/04/2014 05:02 +01:00,ABC 3233,124,346,30/04/14 3:00,11.789,11.779,6,TRUE,03/04/2014 06:00 +01:00,BCD 

这是一个单行的CSV。如何正确读取它。 这只是一个虚拟数据集。提供给我的数据集有35个变量和10000个观察值。任何人都可以向我提供正确的逻辑和相关的代码。

编辑:所需的输出是:

Id SecoId TertioID CreateDate Lat  Long Duration Istrue  JournalDate   Post 
3232 123  345 30/04/14 2:00 11.726 11.728 5  FALSE 02/04/2014 05:02 +01:00 ABC 
3233 124  346 30/04/14 3:00 11.789 11.779 6  TRUE 03/04/2014 06:00 +01:00 BCD 

Logic Thought by me: 

1. Count the number of variables in the dataset. 
2. read the file word by word. 
3. Store the values between "," in a cell of the table, and doesnot alter the spaces between the values i.e. in CreateDate value it accepts "30/04/14 2:00" as a single value. 
4. the loop runs until the last variable is encountered. and when the loop ends the new row is created and observation is stored from there. 

虽然我不能成功地创建一个相关的代码。

如果在R中逐字逐句阅读,那么谁能帮我解决相关问题?

+0

你应该可以通过逐行读取来达到你想要的效果,这是'read.csv()'的默认行为。 –

+0

请阅读编辑@TimBiegeleisen –

回答

1

尝试这种情况:

# Read in data 
vec <- "Id,SecoId,TertioID,CreateDate,Lat,Long,Duration,Istrue,JournalDate,Post 3232,123,345,30/04/14 2:00,11.726,11.728,5,FALSE,02/04/2014 05:02 +01:00,ABC 3233,124,346,30/04/14 3:00,11.789,11.779,6,TRUE,03/04/2014 06:00 +01:00,BCD" 

# Put in delimiters for where the line breaks should have been and split the data for each line. 
vec <- unlist(strsplit(gsub("([a-z]|[A-Z]) (\\d)", "\\1;\\2", vec), ";")) 

# Process data for each column 
list.split <- strsplit(vec, ",") 

# Write out the data to a matrix 
mat.out <- matrix(unlist(list.split), ncol = length(list.split[[1]]), nrow = length(list.split), byrow = TRUE) 
colnames(mat.out) <- mat.out[1,] 
mat.out <- mat.out[-1,] 
+0

谢谢@JackeJR,但我有一个10000个观测值的CSV文件。 –

+2

你可以做一个'readLine'来将整个文件读入一个向量。 – JackeJR

+0

@JackeJR这是我的感觉,我不认为OP真的需要阅读一个字和一个时间。 –

2

如果inp是输入的单线然后计算领域,k的数量,并从该计算图案pat以匹配它们。使用gsub使用read.csv每个图案匹配之后插入一个新行,并最终在结果显示为:

k <- length(read.table(text = inp, comment = " ", sep = ",")) # no of fields 
pat <- sprintf("((.*?,){%d}.*? +)", k-1) # pattern to match k fields 
read.csv(text = gsub(pat, "\\1\n", inp), strip.white = TRUE, as.is = TRUE) 

如果inp是在问题的端输入线上面的代码输出该数据帧:

PulseId JourneyId TransmissionId CreateDate  Lat  Long Speed 
1 367515  3237    1 30/04/14 4:02 51.53749 -3.590589  7 
2 3657521  3237    1 30/04/14 4:02 51.53704 -3.589859 11 
3 3657522  3237    1 30/04/14 4:02 51.53695 -3.589748 12 
    Heading HAccuracy Altitude VAccuracy DDuration DDistance DHeading RSL 
1  129  15  98   0   1 8.639347  1292 22.4 
2  141  10  99   0   1 11.811534  1 22.4 
3  144  10  100   0   1 12.805132  3 22.4 
    RSLRoadTypeId RSLValidation RSLCountryId PulseTypeId IsNightTime Congestion 
1    2    1   826   2  FALSE   0 
2    5    1   826   2  FALSE   0 
3    2    1   826   2  FALSE   0 
    Idle AccelBrake Cornering IsNearRailway IsSpeedValid Familiar IntLat3 
1 0 0.2038734 1.60655912   FALSE   TRUE  1 51537 
2 0 0.0000000 0.01957049   FALSE   TRUE  1 51537 
3 0 0.1019367 0.06404887   FALSE   TRUE  1 51537 
    IntLong3    LocalDateTime Smoothness PhoneId  PolicyId 
1 -3591 30/04/2014 05:02:45 +01:00   2  43 4663627m000010 
2 -3590 30/04/2014 05:02:51 +01:00   0  43 4663627m000010 
3 -3590 30/04/2014 05:02:52 +01:00   1  43 4663627m000010 
          DevideId  DNA 
1 829eba198fa483a49f14b66b8f1dadb5 0.04444444 
2 829eba198fa483a49f14b66b8f1dadb5 0.04444444 
3 829eba198fa483a49f14b66b8f1dadb5 0.04444444