Image

Finding Duplicate Files

Discuss CD+G's, VCD's, song book creators, and any other karaoke related software.
mnementh
Posts: 674
Joined: Tue Apr 28, 2009 5:41 am
Location: Dundee, Scotland

Finding Duplicate Files

Post by mnementh »

Finding duplicate files seems to be a bit of a hot topic on this BBs and I can see this getting worse as more and more of us go over to PC based system.

It's all too easy, these days, with the availability of huge external hard drives to fill them up with 1,000's of duplicate files.

The problem, as I see it, is that standard duplicate filders simply can't find dupes due to non standard filenames and inconsistancies in separating fields in the filenames, i.e. _-_ instead of - (space hyphen space), etc.

However, while this might be the case for the main body of filenames, I've found that the Disc ID is "almost" always correct.

Unfortunately, as I've said in other threads on Dupefiles, I've been unable to find a Duplicate Finder that will "zero" in on specific sections of a filename, so I've had to produce a workaround using Excel.

If I was asked which p[rogram best uses the power of a PC and can be readily used for a huge amount of processing needs, the spreadsheet, in my humble opinion" is head and shouldera above evrything else.

I've written some VBA (Visual Basic for Applications) code for Excel that will perform some simple searches and show duplicate entries in a selected field of a filename.

The requirements for this are fairly simple.

Microsoft Excel

Couldn't be easier, could it?

However, additionally, a utility to change the separators in your filenames would help as the Excel code ONLY works with the standard "space hyphen space" separator.

There are many Freeware file ranaming utilities out there but if you're running XP, I'd get ExplorerXP (just Google it). In addition to being an Explorer type program, it has fairly powerful renaming features such as TRIM, REPLACE< RENUMBER and more.

The next post will describe what the Excel code will and won't do.

Sandy


mnementh
Posts: 674
Joined: Tue Apr 28, 2009 5:41 am
Location: Dundee, Scotland

Post by mnementh »

Hi again.
Here are the can and can'ts of the Excel code.

Can
List your files
Split your filenames into DISC ID, ARTIST and SONG (ONLY in the spreadsheet, NOT your hard drive)
Search for duplicate files on ANY of the fields generated.
Work with Excel 2000, 2002 & 2003 (not tested with 2007)


Can'ts
Won't work with anything except .ZIP files (will ignore all others)
Won't work with anything but the standard separator (space hyphen space)
CANNOT, UNDER ANY CIRCUMSTANCES ALTER YOUR FILES. If you choose to delete the found duplicates, you MUST do it manually, yourself.

When you run the code (GetFiles) in Excel, you will be asked for the path to your files. It's easiest to use Explorer or ExplorerXP to locate your karaoke files, then copy the address from the address bar. You only need the parent folder as the code will read through all sub folders and list all .ZIP files found.

Assuming your separators are the standard type, the code will then generate columns for FILENAME, DISC ID, ARTIST & SONG.

The second piece of code (SortColumn) will ask you to select the letter at the top of the column you wish to sort, then sort the column automatically, colouring any duplicate cells found RED.

This is where you will see why duplicate files don'y show up in standard dupe finder programs.

At this point, you can go to the leftmost column and note the files you might want to delete with Explore/ExplorerXP.

The code will follow in the next post.

Sandy
mnementh
Posts: 674
Joined: Tue Apr 28, 2009 5:41 am
Location: Dundee, Scotland

Post by mnementh »

Hi again,
almost there. 8)

If you're familiar with Excel you might already know how to get this code into the program but if not, here's how.

Fire Excel up and when at the main window, press

Alt + F11 (Hold down Alt and press F11)

This will open the VBE window (Visual Basic Editor)

At the top of the screen, click INSERT, then MODULE (you should now get a white window)

SELECT and COPY the code below from the first SUb to the last End Sub into the VBE Module window.

Click the "X" at the top right of the screen (this ONLY shuts down the VBE, NOT Excel)

If you now click TOOLS, MACRO, MACROS you should see 2 Macros listed;

GetFiles and SortColumn

Make sure you have the path to your files ready, e,g. J:\karaokefiles or whatever.

Click GetFiles and you will be asked for the path to your files. Enter your file location and click OK

The program will then search through ALL the folders at your location and list your files.

When you have your list, go to TOOLS, MACRO, MACROS again and select SortColumn.

You will be asked which column you want to check (Column "c" has the DISC ID info, for example)

When you click OK, the spreadsheet will sort on your selected column and indicate duplicate cells in RED.

OK, here is the code. REMEMBER YOU NEED TO COPY EVERYTHING FROM Sub to End Sub.

Sub GetFiles()
' Set sheet up

Dim ws As Worksheet
Application.DisplayAlerts = False
Err.Clear
On Error Resume Next
Set ws = Sheets("KaraokeFiles") 'If Err <> 0 Then Exit Sub
ws.Delete

Application.DisplayAlerts = True

Dim wksData As Worksheet

With ThisWorkbook
Set wksData = .Worksheets.Add(After:=.Worksheets(.Worksheets.Count), _
Type:=xlWorksheet)
End With
wksData.Cells(1).Value = "Full Path"
wksData.Cells(1).Font.Bold = True
wksData.Cells(2).Value = "Filename"
wksData.Cells(2).Font.Bold = True
wksData.Cells(3).Value = "Disc ID"
wksData.Cells(3).Font.Bold = True
wksData.Cells(4).Value = "Artist"
wksData.Cells(4).Font.Bold = True
wksData.Cells(5).Value = "Song"
wksData.Cells(5).Font.Bold = True

Rows("1:1").Select
With Selection
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlBottom
End With
Range("A1").Select


Dim FilePath As String
FilePath = InputBox("Enter path to your files here;")

If FilePath = "" Then
Exit Sub
End If

ActiveSheet.Name = "KaraokeFiles"

AllFilesBasic FilePath & "\", wksData 'Insert path here

End Sub

Sub AllFilesBasic(TopDir As String, wks As Worksheet)
Dim _
fsoFol As Object, _
fsoSubFol As Object, _
fsoFil As Object, _
rngStartPoint As Range, _
lOffset As Long

Static FSO As Object
If FSO Is Nothing Then Set FSO = CreateObject("Scripting.FileSystemObject")
Set fsoFol = FSO.GetFolder(TopDir)
Set rngStartPoint = wks.Cells(wks.Rows.Count, 1).End(xlUp).Offset(1)

For Each fsoFil In fsoFol.Files
If LCase(Right(fsoFil.Name, 4)) = ".zip" Then
rngStartPoint.Offset(lOffset).Value = fsoFil.Path
lOffset = lOffset + 1
End If
Next

For Each fsoSubFol In fsoFol.SubFolders
AllFilesBasic fsoSubFol.Path, wks
Next
wks.Range("A1").EntireColumn.AutoFit



Dim rng As Range
Dim LastRow As Long

LastRow = Cells(Rows.Count, "A").End(xlUp).Row
Set rng = Range("A2:A" & LastRow)
Range("B2").Value = LastRow - 1

Range("B2:B" & LastRow).Formula = "=MID(A2,FIND(""*"",SUBSTITUTE(A2,""\"",""*"",LEN(A2)-LEN(SUBSTITUTE(A2,""\"",""""))))+1,LEN(A2))"
wks.Range("B1").EntireColumn.AutoFit
Range("C2:C" & LastRow).Formula = "=LEFT(B2,FIND("" - "",B2,1))"
wks.Range("C1").EntireColumn.AutoFit
Range("D2:D" & LastRow).Formula = "=LEFT(MID(B2,FIND("" - "",B2)+3,255),FIND("" - "",MID(B2,FIND("" - "",B2)+3,255))-1)"
wks.Range("D1").EntireColumn.AutoFit
Range("E2:E" & LastRow).Formula = "=RIGHT(B2,LEN(B2)-2-FIND(CHAR(1),SUBSTITUTE(B2,"" - "",CHAR(1),LEN(B2)-LEN(SUBSTITUTE(B2,"" - "",""**"")))))"
'Range("E2:E" & LastRow).Formula = "=RIGHT(B2,LEN(B2)-(LEN(C2&"" - ""&D2&"" - ""))+1)"

wks.Range("E1").EntireColumn.AutoFit


End Sub



Sub SortColumn()

Cells.Select
Selection.Interior.ColorIndex = xlNone
Dim Col As String
Col = InputBox("Enter letter at top of column to sort: ")
Col = UCase(Col)
Select Case Col
Case "": Exit Sub
Case "A": Range("A" & "1").Select
Case "B": Range("B" & "1").Select
Case "C": Range("C" & "1").Select
Case "D": Range("D" & "1").Select
Case "E": Range("E" & "1").Select
Case "F": Range("F" & "1").Select
Case "G": Range("G" & "1").Select
Case "H": Range("H" & "1").Select
End Select

'Range("a1").Value = Col

Cells.Sort Key1:=ActiveCell, _
Order1:=xlAscending, Header:=xlYes, _
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom

ScreenUpdating = False
FirstItem = ActiveCell.Value
SecondItem = ActiveCell.Offset(1, 0).Value
Offsetcount = 1
Do While ActiveCell <> ""
If FirstItem = SecondItem Then
ActiveCell.Offset(Offsetcount, 0).Interior.Color = RGB(255, 0, 0)
Offsetcount = Offsetcount + 1
SecondItem = ActiveCell.Offset(Offsetcount, 0).Value
Else
ActiveCell.Offset(Offsetcount, 0).Select
FirstItem = ActiveCell.Value
SecondItem = ActiveCell.Offset(1, 0).Value
Offsetcount = 1
End If
Loop
ScreenUpdating = True

End Sub

The code has limited error checking but be assured, it does NOTHING to your files, it simply lists and/or sorts them.

I hope to improve this code to allow various renaming options and allow for an "autodelete" facility at a later stage.

I'd like to express my thanks to GTO of the VBA Express forum for his help in getting the listing section of the code up and running.

Sandy
Bigdog
Posts: 2937
Joined: Wed Jan 31, 2007 2:15 am

Post by Bigdog »

A few ways to manually delete these files...Which I can't really see as being a national disaster.. :roll: would be from either one of these methods which are basically the same.

#1 Use your hosting program opened in one window, Beside it open the hard drive window. Set the hosting program to sort by song name alphabetically. As you scroll down the list and find a dupe file...you look in the hard drive for that file and delete it.

#2 Use you song book program set the same way with the hard drive open.

Or use both methods (3 windows open) as a way to self check your work.

I just don't see the average "legal" KJ as having so many dupes they are a problem. This isn't 20 years ago when hard drive space was real expensive. They practically give big ones away now that could hold several hundred KJs songs.. :shock:
mnementh
Posts: 674
Joined: Tue Apr 28, 2009 5:41 am
Location: Dundee, Scotland

Post by mnementh »

Bigdog wrote:A few ways to manually delete these files...Which I can't really see as being a national disaster.. :roll: would be from either one of these methods which are basically the same.

#1 Use your hosting program opened in one window, Beside it open the hard drive window. Set the hosting program to sort by song name alphabetically. As you scroll down the list and find a dupe file...you look in the hard drive for that file and delete it.

#2 Use you song book program set the same way with the hard drive open.

Or use both methods (3 windows open) as a way to self check your work.

I just don't see the average "legal" KJ as having so many dupes they are a problem. This isn't 20 years ago when hard drive space was real expensive. They practically give big ones away now that could hold several hundred KJs songs.. :shock:
The problem with that method is the "fatigue" factor while you are trawling through possibly 1000's of files. It is far too easy to make a mistake and/or miss things.

This doesn't even allow for incorrect naming of files which would throw your "alphabetical" comparison completely awry. Your file manager and Hoster program might not have identical sorting methods.

The spreadsheet takes this error mechanism away and apart from anything else is phenomally fast.

I can list and dupe check 2,000 files in less than 10 seconds.

Trust me on this, when I had 50,000+ files, I worked on them for a couple of hours every night for two weeks and I still didn't get all the dupes. Not by a long way. After 500 or so files, I found my eyes getting tired and concentration falling away.

I'm a great believer in making things as simple as possible and with a couple of clicks in Excel, I doubt if it could be made simpler.

As for drives holding songs, your drives must be very inefficient as I have 30,000+ tracks on about 90Gb of drive space.

As to legality issues, I'm not really interested. I'm simply trying to make life easy for those with loads of dupes to get rid of them

Sandy
User avatar
wiseguy
Site Admin
Posts: 1906
Joined: Wed Aug 18, 2004 5:05 pm
Location: WV

Post by wiseguy »

This is only a problem because some people didn't have the common sense to foresee this situation when they began ripping their song tracks to a hard drive. Or they couldn't be bothered to take the time to do it right.

In this computer age people expect that there will be some program that will magically repair things for them. :roll: Not this time.

Sandy, what you are doing will help to some degree and everyone should appreciate your efforts. But with no standard song naming convention, misspellings, varying separators, varying arrangements, alternate artist naming, etc., the only way to get it is right is to do it manually. And should have been done starting with song #1 on disc #1.

The best thing that can come out of these posts is that maybe it will prevent someone from getting into this mess in the first place.
mnementh
Posts: 674
Joined: Tue Apr 28, 2009 5:41 am
Location: Dundee, Scotland

Post by mnementh »

wiseguy wrote:This is only a problem because some people didn't have the common sense to foresee this situation when they began ripping their song tracks to a hard drive. Or they couldn't be bothered to take the time to do it right.

Guilty as charged M'Lud.

Which is why I have written the code.


The best thing that can come out of these posts is that maybe it will prevent someone from getting into this mess in the first place.
Eventually, I hope to get the code to a form that will look at different separators that the user can enter.

Sandy
mnementh
Posts: 674
Joined: Tue Apr 28, 2009 5:41 am
Location: Dundee, Scotland

Post by mnementh »

Hi again,
Sorry, I wasn't completely clear in post #3 about copying the code into the VBE.

Copy everything from the FIRST "Sub" to the LAST "End Sub" into the VBE page of Excel.

Sandy
bigjohn1
Posts: 4
Joined: Sat Oct 02, 2010 1:05 am

Post by bigjohn1 »

Wow Wiseguy you must be new at being a kj or just do it at home.
First thing about legal files well I look at it like this If you have more then 10,000 track my bet is there not all from disks or files purchased and I think they ripped us off for to many years and like mp3 audio files thats why the music stores are gone our kids dont buy a dang song so why not the same for karaoke. Just a thought get yourself 50,000 song plus and try working with that. After all that KJFile manager works fine for most of what was above program is from Ladshaw
User avatar
wiseguy
Site Admin
Posts: 1906
Joined: Wed Aug 18, 2004 5:05 pm
Location: WV

Post by wiseguy »

bigjohn1 wrote:Wow Wiseguy you must be new at being a kj or just do it at home.
First thing about legal files well I look at it like this If you have more then 10,000 track my bet is there not all from disks or files purchased and I think they ripped us off for to many years and like mp3 audio files thats why the music stores are gone our kids dont buy a dang song so why not the same for karaoke. Just a thought get yourself 50,000 song plus and try working with that. After all that KJFile manager works fine for most of what was above program is from Ladshaw
With your lack of punctuation skills it's hard to understand what you are trying to say although it is obvious that you are clueless on the topic of this thread.

Am I to understand that you feel it is fine to have a collection of illegally obtained karaoke songs?
DanG2006
Posts: 1498
Joined: Sun Jan 01, 2006 8:37 pm
Location: USA

Post by DanG2006 »

bigjohn1 wrote:Wow Wiseguy you must be new at being a kj or just do it at home.
First thing about legal files well I look at it like this If you have more then 10,000 track my bet is there not all from disks or files purchased and I think they ripped us off for to many years and like mp3 audio files thats why the music stores are gone our kids dont buy a dang song so why not the same for karaoke. Just a thought get yourself 50,000 song plus and try working with that. After all that KJFile manager works fine for most of what was above program is from Ladshaw
With the Manu's at war with pirates, I don't think it wise to get a loaded hard drive at all. Not unless you want to deal with a lawsuit over displaying the logo of said pirated track. They don't have to prove that you didn't buy the track. Just displaying their trademarked logo is enough to cost you thousands of dollars. I have over 10,000 songs does that make me a pirate? NOPE I own every song legally. The only violation I have committed is format shifting. After November 15th, I will have permission for those shifted songs.
mnementh
Posts: 674
Joined: Tue Apr 28, 2009 5:41 am
Location: Dundee, Scotland

Post by mnementh »

DanG2006 wrote:They don't have to prove that you didn't buy the track
Errrr! Yes I actually believe they do!

Or has the "Innocent until PROVEN guilty" rule of Law changed?

If you are charged with fraud, theft, etc. then the onus is on the prosecution to PROVE that you did, IN FACT, do the dirty deed.

What happens if you are unable, for whatever reason, to provide proof of purchase or that the original disc has been lost.

Are you supposed to forego the use of the hard drive backup you have?

Sorry, if I've paid for something, as far as I'm concerned, I have the right to use that something In Perpetuety.

I might be wrong (wouldn't be the first or last time). :oops:

Sandy
DanG2006
Posts: 1498
Joined: Sun Jan 01, 2006 8:37 pm
Location: USA

Post by DanG2006 »

This is civil court and even if they have the track they are still guilty as charged since they never had the permission to do so to begin with.
They are willing to drop the suit if you contact them ahead of the actual filing of the suit IF you have every disc that the offending track is on.
As long as you are 1:1 and contact them after the letter of intent to file suit then you have nothing to worry about. They've already dropped at least three of them based on an audit of the discs/ systems. You need to be 1:1 compliant on every system, meaning you need to have discs for all systems - 6 systems=6 sets of discs.
User avatar
wiseguy
Site Admin
Posts: 1906
Joined: Wed Aug 18, 2004 5:05 pm
Location: WV

Post by wiseguy »

DanG2006 wrote:This is civil court and even if they have the track they are still guilty as charged since they never had the permission to do so to begin with.
They are willing to drop the suit if you contact them ahead of the actual filing of the suit IF you have every disc that the offending track is on.
As long as you are 1:1 and contact them after the letter of intent to file suit then you have nothing to worry about. They've already dropped at least three of them based on an audit of the discs/ systems. You need to be 1:1 compliant on every system, meaning you need to have discs for all systems - 6 systems=6 sets of discs.
By "they are still guilty as charged since they never had the permission to do so to begin with" I assume you mean permission to format shift.

Can you show us evidence of a single case where SC was successful in suing someone for format shifting when the person had proof of 1:1 compliance? Or are you just taking someone's word on this?
DanG2006
Posts: 1498
Joined: Sun Jan 01, 2006 8:37 pm
Location: USA

Post by DanG2006 »

None that I know of because the ones that were 1:1 did an audit before it got that far. All got permission to continue using their computers via a letter stating so.
Post Reply