上一篇有些程式碼或許不夠精簡,因為R是個善於做向量矩陣運算的語言,如果不好好利用他的這個特性,就有點像是未成對的倚天劍或屠龍刀。
以下是改成叫精簡的版本,有些邏輯性的bug也做修正。
Previous article is not presenting reduced or elegant code, just being readable to me. R is born to do matrix/vector computing. If we don't use its strength, we are just like using a good saber but in scabbard. So, let's draw our saber from the scabbard.
以下是完整的程式碼
Here's the whole original code.
待之後更新完套件後,會再仔細說明,謝謝。
After updating the package, I will explain the details of this code. Thanks.
================================================================
require(openxlsx)
require(zoo)
require(Hmisc)
path <- "/Users/user/Desktop/myExData.xlsx"
data <- read.xlsx(path)
#=============
data[,'Date'] <- as.Date(data[,'Date'], origin = "1899-12-30")
colNameVector <- colnames(data)
data$Date <- as.POSIXlt(data$Date) # transform to POSIXlt type
year.list <- levels(factor(data$Date$year + 1900))
### sorting data
inc.order <- order(data$Date, decreasing = FALSE)
data <- data[inc.order,]
### building an empty data frame
final.data <- data.frame(data[,1:length(colNameVector)])
final.data[,] <- NA
final.data$Date <- as.POSIXlt(final.data$Date)
year <- substr(data[1,2],1,4)
origin <- paste(year, "-01-01", sep = "")
origin <- as.Date(origin)
diff <- as.Date(data[1,2])-origin
#=============
year.list <- sprintf("%s-01-01", year.list)
year.list <- as.Date(year.list)
yearDays.list <- mapply(yearDays, year.list)
daySum <- sum(yearDays.list)
daySum <- as.numeric(daySum - diff)
final.data[1:daySum,] <- NA # remove first few null days because data is not starting from 1/1
final.data[,2] <- seq(data[1,2], by = "1 days", length.out = daySum)
### setting rownames
rownames(data) <- c(1:nrow(data))
### duplicate identical column names
colnames(data) <- names(final.data)
my.index <- match(data$Date, as.POSIXlt(final.data$Date))
final.data[my.index,] <- data[,]
x <- ifelse(which(colnames(data) == "Date") == 1, 2, 1 )
tag <- max(which(!is.na(final.data[, x])))
final.data <- head(final.data,tag) # cutting off last empty records
#==============
final.data[which(is.na(final.data[, c(3:10)[1]])), c(3:10)] <- data[1, c(3:10)]
final.data[which(is.na(final.data[, c(11:14)[1]])), c(11:14)] <- numeric(4)
#==============
final.data$Date <- as.Date(final.data$Date) # for correcting date time in excel
colnames(final.data) <- colNameVector
### return data
write.xlsx(final.data, file = "/Users/user/Desktop/myExData_full.xlsx")
沒有留言:
張貼留言