我从来没有真正用过VBA或任何其他类型的代码,除了几年前在VB中做过的小事情。这是我尝试编写一些代码,用于搜索客户帐户的Excel表数据库并搜索可能的重复帐户。不幸的是,在我需要运行这台计算机的机器上,它只能处理大约3,500个条目而不会崩溃Excel。我对这两个代码都做出了巨大的贡献,就像机器运行缓慢一样。优化VBA/Excel宏代码(在大表中查找重复项)
可以做些什么来优化下面的代码,以及我将来应该使用哪些VBA中的最佳实践?
'Essentially, this loops through each row in the sheet
'For each row, it loops through every row after it, searching for duplicates of itself (skipping over a rows that have previously been marked as duplicates)
'Duplicates are defined by entries that meet a 'threshhold' of similarity
'The threshhold is defined as the number '5', first and last names are each two points, address and email address are one point
'That means that in order for an entry to meet the thresshold, the first and last name must be the same, and it must also have either the same address or email
'When duplicates are found, the duplicate column is marked as 'Yes' for that row, and the first occurence column is marked with a number defining the row number where the account first appeared
Sub Main():
Dim lNameCol, fNameCol, addressCol, emailCol, duplicateCol, fOccurenceCol As String
'Defines the column letters for the various data fields
lNameCol = "A"
fNameCol = "B"
addressCol = "C"
emailCol = "D"
duplicateCol = "E" 'The column where a entry/row will be marked as being a duplicate
fOccurenceCol = "F" 'The column that contains the row number where a duplicate accounts first occurence was found
Call Duplicates(lNameCol, fNameCol, addressCol, emailCol, duplicateCol, fOccurenceCol)
End Sub
'Gets number of rows in currently active sheet
Function RowCount():
Application.ActiveSheet.UsedRange
RowCount = Worksheets("Sheet1").UsedRange.Rows.Count
End Function
'Finds and labels duplicates
Sub Duplicates(ByVal lNameCol As String, ByVal fNameCol As String, ByVal addressCol As String, ByVal emailCol As String, ByVal duplicateCol As String, ByVal fOccurenceCol As String)
Dim lRowCount As Integer
lRowCount = RowCount()
'Loops through each row in the sheet
For i = 1 To lRowCount
Dim duplicate, lastName, firstName, email, address As String
'Sets these variables' values corresponding cell value in row 'i'
'UCase capitilizes things to make entries case-insensitive
duplicate = UCase(Range(duplicateCol & i).Value)
lastName = UCase(Range(lNameCol & i).Value)
firstName = UCase(Range(fNameCol & i).Value)
email = UCase(Range(emailCol & i).Value)
address = UCase(Range(addressCol & i).Value)
'Checks to make sure row has not already been marked a duplicate, if it hasn't it continues
If (StrComp(duplicate = "YES", vbTextCompare) = 1) Then
'Loops through every row after the current row (row 'i')
For n = (i + 1) To lRowCount
'duplicateThreshold is an integer that defines the threshhold of similarity that rows need to have in order to be labeled a duplicate
Dim duplicateThreshhold As Integer
Dim lastName2, firstName2, email2, address2 As String
duplicateThreshhold = 0
'These are the entry variables for account entry at row 'n' being compared to the account entry at row 'i'
lastName2 = UCase(Range(lNameCol & n).Value)
firstName2 = UCase(Range(fNameCol & n).Value)
email2 = UCase(Range(emailCol & n).Value)
address2 = UCase(Range(addressCol & n).Value)
'Adds 2 points to threshhold if first name is the same
If lastName = lastName2 Then
duplicateThreshhold = duplicateThreshhold + 2
End If
'Adds 2 points to threshold if last name is the same
If firstName = firstName2 Then
duplicateThreshhold = duplicateThreshhold + 2
End If
'The remaining two fields give 1 point each to the thresshold
'As long as the sum of the points given by first and last name is always greater than half of the threshhold, first and last name will always be required
If email = email2 Or address = address2 Then
duplicateThreshhold = duplicateThreshhold + 1
End If
If duplicateThreshhold > 4 Then
'Labels duplicate entries as duplicates
Range(duplicateCol & i).Value = "Yes"
Range(duplicateCol & n).Value = "Yes"
'Labels duplicate entries with the first occurence of that entry
Range(fOccurenceCol & i).Value = i 'Labels first occurence account's row number
Range(fOccurenceCol & n).Value = i
End If
Next
End If
Next
End Sub
为什么不用去重功能在Excel中,而不是使用VBA?在Excel 2013中:突出显示您的专栏,导航至* Data-> Remove Duplicates * –
我在[本网站]上找到了提示(https://blogs.office.com/2009/03/12/excel-vba-性能编码最佳实践/)有助于让我的代码更快运行,好的第一步可能是看看那里。 – eirikdaude
好吧,这听起来像是你正试图建立你自己的'模糊逻辑'的味道(MS已经完成得很好)。看看[设置自动筛选多通配符](http://stackoverflow.com/questions/16602872/),[高级筛选条件可以在VBA而不是范围?](http://stackoverflow.com/问题/ 34532282 /)和[自动筛选器是否可以从Dictionary键中同时采用包容性和非包含性通配符?](http://stackoverflow.com/questions/34614417)。 – Jeeped