Raw Data Processing: Identifying Entities

Raw Data Processing: Identifying Entities

NOTE: The following article is to help you format raw county data into a formatted Investment Dominator .csv import file. The good news is that using a formatted import file is just one of two ways of getting data into your CRM. If you do not want to create a formatted import file, please check out our optional Paid Data Service where we filter and format the data for you.

If the County saves both individual owner names as well as entities under a single Property Owner field then one of the data processing steps you will need to conduct is to programmatically identify these entities and move them to the ‘Company’ column within your list. You will also need to indicate under the ‘Type’ field that these records are companies by entering the value of ‘Company’ for these records.

Below is a list of common keywords that would indicate that the owner is not an individual but rather a company or entity:

‘
acquisition
aka
assoc
associa
association
bank
benefactor
beneficiary
business
c/o
care of
church
city
club
county
dept
develop
eastern
es
est
estate
et al
family
firm
foundation
holding
inc
incorporated
invest
ira
land
llc
lp
ltd
northern
of
owner
ownership
partner
property
school
service
southern
st
state
tenant
tr
trs
trust
univ
us
western

If you use Microsoft Excel to process your data (which is not recommended as there are more powerful data processing software out there for example Microsoft Access) finding the entities within a cell would look something like this…

Let’s say that the county puts the Owner Name in column A and this also includes the names of the entities, in column C you would put in a formula like this:

=IF(SUMPRODUCT(--ISNUMBER(MATCH(TRIM(MID(SUBSTITUTE(A2, " ", REPT(" ", LEN(A2))), (ROW($A$1:$A$50)-1)*LEN(A2)+1, LEN(A2))), {"incorporated","acquisition","aka","assoc","associa","association","bank","benefactor","beneficiary","business","c/o","care of","church","city","club","county","dept","develop","eastern","es","est","estate","et al","family","firm","foundation","holding","inc","invest","ira","land","llc","lp","ltd","northern","of","owner","ownership","partner","property","school","service","southern","st","state","tenant","tr","trs","trust","univ","us","western"}, 0))) > 0, A2, "")

If you copy this formula down then you notice that it moves the company names over to column C which is what you want:

You would simply rename column C to ‘Company’ which is the accepted header value for the Investment Dominator import and then you would copy/paste the values of the column only before saving your final export .csv file.

Prefixes and Suffixes

Raw data lists also contain multipart names or prefixes and suffixes that need to be identified and properly formatted into the ‘First Name’ and ‘Last Name’ columns in the case of ‘Individual’ records and should not be treated as companies simply because they contain multiple spaces. You can also use this list to distinguish between common multipart names and middle initials that the county may also include in the raw data and which should be removed.

Below is a list of common multipart names, prefixes, and suffixes that you should account for in your data processing scripts:

Atty
Da
De
Del
Della
Des
Di
Dr
Du
Esq
Et Al
I
II
III
IV
Jr
La
Le
Mac
Mc
MD
Mr
Mrs
O
PhD
Prof
Ret
Rev
Sir
Sr
Von

Below is an example macro that you can use (and adjust as needed) that identifies companies or multipart names based on keyword lists like the ones posted above. We use similar scripts to identify the Companies in a list and move the company name to the ‘Company’ column in our list.

First, create a workbook with 3 sheets:

    Set InfoWs = Sheets("info")
    Set Kw = Sheets("keywords")
    Set SmpleWs = Sheets("sample")

The “info” sheet must have a title in column “A”, where you paste the OWNERS on row 2. The “keywords” sheet starts in Column “A” row 1, and the data will be on the “sample” sheet row 2, so the titles for the “sample” will be “Type”, “First Name”, “Last Name” and “Company”.

Then add and run this macro to identify records based on the keywords listed in your “keywords” sheet:

Private Sub GetCompanies_Individuals()
    Dim wrdLRow As Long
    Dim wrdLp As Long
    Dim OwnersLrow As Long
    Dim OwnersLp As Long
    Dim fndWord As Long
    Dim InfoWs As Worksheet
    Dim Kw As Worksheet
    Dim SmpleWs As Worksheet
    Dim names() As String

    On Error Resume Next

    'Define worksheet that has data on it....
    Set InfoWs = Sheets("info")
    Set Kw = Sheets("keywords")
    Set SmpleWs = Sheets("sample")

    'Get last row for keywords based on column A
    wrdLRow = Kw.Cells(Rows.Count, "A").End(xlUp).Row
    
    'Get last row for owners based on column A
    OwnersLrow = InfoWs.Cells(Rows.Count, "A").End(xlUp).Row

    'Loop through lists and find matches....
    For OwnersLp = 2 To OwnersLrow
        fndWord = 0
        For wrdLp = 1 To wrdLRow

            'Look for word...
            fndWord = Application.WorksheetFunction.Search(Kw.Cells(wrdLp, "A"), InfoWs.Cells(OwnersLp, "A"))
            'If we found the word....then
            If fndWord > 0 Then
                    SmpleWs.Cells(OwnersLp, "A") = "Company"
                    SmpleWs.Cells(OwnersLp, "D") = WorksheetFunction.Proper(InfoWs.Cells(OwnersLp, "A"))
                    'fndWord = 0
                Exit For
            End If
        Next wrdLp
        
        If fndWord = 0 Then
            names = VBA.Split(InfoWs.Cells(OwnersLp, "A"), " ")
            SmpleWs.Cells(OwnersLp, "A") = "Individual"
            SmpleWs.Cells(OwnersLp, "B") = WorksheetFunction.Proper(names(0))
            SmpleWs.Cells(OwnersLp, "C") = WorksheetFunction.Proper(names(1))
        End If

    Next OwnersLp
End Sub

NOTE: If you are not familiar with creating macros, data processing software, data processing techniques, and best practices then we highly suggest you hire a professional data processor to filter and process your raw data lists to be used for import. This article explains how you can easily outsource this step to a highly qualified professional.

Related Training Modules