如何在R中逐字输入CSV？

我想在R中导入一个CSV数据。它是一行数据，并有逗号分隔的条目。数据虚拟提供如下：如何在R中逐字输入CSV？

Id,SecoId,TertioID,CreateDate,Lat,Long,Duration,Istrue,JournalDate,Post 3232,123,345,30/04/14 2:00,11.726,11.728,5,FALSE,02/04/2014 05:02 +01:00,ABC 3233,124,346,30/04/14 3:00,11.789,11.779,6,TRUE,03/04/2014 06:00 +01:00,BCD

这是一个单行的CSV。如何正确读取它。这只是一个虚拟数据集。提供给我的数据集有35个变量和10000个观察值。任何人都可以向我提供正确的逻辑和相关的代码。

编辑：所需的输出是：

Id SecoId TertioID CreateDate Lat  Long Duration Istrue  JournalDate   Post 
3232 123  345 30/04/14 2:00 11.726 11.728 5  FALSE 02/04/2014 05:02 +01:00 ABC 
3233 124  346 30/04/14 3:00 11.789 11.779 6  TRUE 03/04/2014 06:00 +01:00 BCD 

Logic Thought by me: 

1. Count the number of variables in the dataset. 
2. read the file word by word. 
3. Store the values between "," in a cell of the table, and doesnot alter the spaces between the values i.e. in CreateDate value it accepts "30/04/14 2:00" as a single value. 
4. the loop runs until the last variable is encountered. and when the loop ends the new row is created and observation is stored from there.

虽然我不能成功地创建一个相关的代码。

如果在R中逐字逐句阅读，那么谁能帮我解决相关问题？

来源

2016-01-24 desmond.carros

你应该可以通过逐行读取来达到你想要的效果，这是'read.csv（）'的默认行为。 –

请阅读编辑@TimBiegeleisen –

尝试这种情况：

# Read in data 
vec <- "Id,SecoId,TertioID,CreateDate,Lat,Long,Duration,Istrue,JournalDate,Post 3232,123,345,30/04/14 2:00,11.726,11.728,5,FALSE,02/04/2014 05:02 +01:00,ABC 3233,124,346,30/04/14 3:00,11.789,11.779,6,TRUE,03/04/2014 06:00 +01:00,BCD" 

# Put in delimiters for where the line breaks should have been and split the data for each line. 
vec <- unlist(strsplit(gsub("([a-z]|[A-Z]) (\\d)", "\\1;\\2", vec), ";")) 

# Process data for each column 
list.split <- strsplit(vec, ",") 

# Write out the data to a matrix 
mat.out <- matrix(unlist(list.split), ncol = length(list.split[[1]]), nrow = length(list.split), byrow = TRUE) 
colnames(mat.out) <- mat.out[1,] 
mat.out <- mat.out[-1,]

来源

2016-01-24 12:01:09 JackeJR

谢谢@JackeJR，但我有一个10000个观测值的CSV文件。 –

你可以做一个'readLine'来将整个文件读入一个向量。 – JackeJR

@JackeJR这是我的感觉，我不认为OP真的需要阅读一个字和一个时间。 –

如果inp是输入的单线然后计算领域，k的数量，并从该计算图案pat以匹配它们。使用gsub使用read.csv每个图案匹配之后插入一个新行，并最终在结果显示为：

k <- length(read.table(text = inp, comment = " ", sep = ",")) # no of fields 
pat <- sprintf("((.*?,){%d}.*? +)", k-1) # pattern to match k fields 
read.csv(text = gsub(pat, "\\1\n", inp), strip.white = TRUE, as.is = TRUE)

如果inp是在问题的端输入线上面的代码输出该数据帧：

PulseId JourneyId TransmissionId CreateDate  Lat  Long Speed 
1 367515  3237    1 30/04/14 4:02 51.53749 -3.590589  7 
2 3657521  3237    1 30/04/14 4:02 51.53704 -3.589859 11 
3 3657522  3237    1 30/04/14 4:02 51.53695 -3.589748 12 
    Heading HAccuracy Altitude VAccuracy DDuration DDistance DHeading RSL 
1  129  15  98   0   1 8.639347  1292 22.4 
2  141  10  99   0   1 11.811534  1 22.4 
3  144  10  100   0   1 12.805132  3 22.4 
    RSLRoadTypeId RSLValidation RSLCountryId PulseTypeId IsNightTime Congestion 
1    2    1   826   2  FALSE   0 
2    5    1   826   2  FALSE   0 
3    2    1   826   2  FALSE   0 
    Idle AccelBrake Cornering IsNearRailway IsSpeedValid Familiar IntLat3 
1 0 0.2038734 1.60655912   FALSE   TRUE  1 51537 
2 0 0.0000000 0.01957049   FALSE   TRUE  1 51537 
3 0 0.1019367 0.06404887   FALSE   TRUE  1 51537 
    IntLong3    LocalDateTime Smoothness PhoneId  PolicyId 
1 -3591 30/04/2014 05:02:45 +01:00   2  43 4663627m000010 
2 -3590 30/04/2014 05:02:51 +01:00   0  43 4663627m000010 
3 -3590 30/04/2014 05:02:52 +01:00   1  43 4663627m000010 
          DevideId  DNA 
1 829eba198fa483a49f14b66b8f1dadb5 0.04444444 
2 829eba198fa483a49f14b66b8f1dadb5 0.04444444 
3 829eba198fa483a49f14b66b8f1dadb5 0.04444444

来源

2016-01-24 20:14:17

如何在R中逐字输入CSV？

回答

相关问题