Excel: Extract text with FIND and MID
Data analysis begins with usable data, and that means arranging every item in a data set into its own field where we can count, sort and otherwise test it out.
But what if you get a spreadsheet where the pieces of data are all packed in one field? Say, something like this (which I cobbled together from Major League Baseball data in honor of the Nationals’ first playoff appearance):
NAME: Sean Burnett POS: RP AGE: 30 WT: 200 BORN: Dunedin, FL SALARY: 2350000 NAME: Tyler Clippard POS: RP AGE: 27 WT: 200 BORN: Lexington, KY SALARY: 1650000 NAME: Ross Detwiler POS: SP AGE: 26 WT: 174 BORN: St. Louis, MO SALARY: 485000 NAME: Christian Garcia POS: RP AGE: 27 WT: 215 BORN: Miami, FL SALARY: N/A NAME: Gio Gonzalez POS: SP AGE: 27 WT: 205 BORN: Hialeah, FL SALARY: 3335000 NAME: Mike Gonzalez POS: RP AGE: 34 WT: 215 BORN: Robstown, TX SALARY: N/A NAME: Ryan Mattheus POS: RP AGE: 28 WT: 215 BORN: Sacramento, CA SALARY: 481000 NAME: Craig Stammen POS: RP AGE: 28 WT: 200 BORN: Coldwater, OH SALARY: 485000 NAME: Drew Storen POS: RP AGE: 25 WT: 180 BORN: Indianapolis, IN SALARY: 498750 NAME: Jordan Zimmermann POS: SP AGE: 26 WT: 218 BORN: Auburndale, WI SALARY: 2300000
Let’s say you want to extract the city of each player’s birth into a separate column. The varying length of each player’s name means the birth place isn’t always in the same position in the string, so a typical text-to-columns operation won’t work. So, how to do it?
The answer lies in two very handy Excel functions: FIND and MID.
FIND locates characters you specify and returns its numeric place in the string.
MID returns X characters from a string beginning at a location X you specify.
For example, we can locate the position where each city name begins by using FIND to locate the string “BORN:” in each cell. The city name itself always starts six characters after the position of that string, so we add six to the result:
=FIND("BORN:",A2)+6
In the first row above, the functions returns 50. In the second row, 52. We’ll feed that value to the MID function as the starting point for our extraction.
MID takes three arguments: Text or cell, position to start, number of characters to return. So, we use the above FIND function as the second argument and, for now, extract 10 characters:
=MID(A2,FIND("BORN:",A2)+6,10)
That gets us part of the way there. We’re starting in the right spot, but 10 characters isn’t always the length of the city and state, so it leads to choppy results:
Dunedin, F Lexington, St. Louis, Miami, FL Hialeah, F Robstown, Sacramento Coldwater, Indianapol Auburndale
What we need to do is tell MID the exact number of characters to return each time even though the length of the city varies. We can figure this out using FIND again.
The city name is always followed by the word “SALARY”. So, if we search for the position of that word and subtract the position of “BORN,” we’ll get the length of what’s between the two. The ultimate formula looks like spaghetti but works just fine:
=MID(A2,FIND("BORN:",A2)+6,(FIND("SALARY",A2)-(FIND("BORN:",A2)+6)))
Used on the example text, it returns:
Dunedin, FL Lexington, KY St. Louis, MO Miami, FL Hialeah, FL Robstown, TX Sacramento, CA Coldwater, OH Indianapolis, IN Auburndale, WI
That’s it. Fairly handy and further proof that Excel is a versatile part of the data analyst’s tool kit.
Like data? My book Practical SQL: A Beginner’s Guide to Storytelling with Data from No Starch Press offers a step-by-step guide to working with databases. Learn how to wrangle the everyday data you encounter to gain meaningful insights.
When I have this problem I always use code to solve it. I prefer to parse every row with PHP or Python and then clean the data using the functions available on the language. I supposed everyone use the tool they feel better using it.
Agreed — if you’re a coder, that’s a good way to go. I like using regular expressions, for example, or string functions in Python.
For those people who don’t code but use Excel quite a bit, this is an approach they can take without too much pain.
This was super helpful! Thank you!
This was great..thank you!
Excellent! Now I can move on to the truly horrific part of my project.
Lyla,
I hope it’s not that bad!
very helpful
thank you
Anthony,
Thanks man! Perfect to what I need.
salomsalom,
May I ask why coders and programmers with years and decades of experience feel the need to hijack every one else’s blog posts? I am new to coding and programming. I searched for (and found) this post which does exactly what I need it to.
If you want me to know how you do it why not write your own blog post so beginners like me have a choice of ways to accomplish this? Otherwise your comment adds nothing regardless of how much you know.
Troy
Sean Burnett,RP,30,200,Dunedin, FL, 2350000.
What if the string is as shown above. How do we extract “Dunedin” using excel text functions??
Thanks in advance
Samuel,
I don’t have a ready answer, but whatever the answer is it’s going to be clunky. See this discussion:
https://groups.google.com/forum/#!topic/microsoft.public.excel/FAteST1C2Ks
Nevertheless, if that’s the string you’re given, then it’s simply a comma-delimited list that you can separate into columns using Excel’s text-to-columns feature. I’d go that route before trying to compete in Excel gymnastics.
This has been really useful, thank you!
this is exactly what i needed! thanks! excel for the win ๐
Very instructive, solved my issue.
Thanks!!
This has set me on the right path to solving my problem. I am taking specific pieces of data from a huge chunk of html pulled into the document.
I have a unique code on each piece of data to start the search at but it appears that if I used a FIND term which is not unique, like “The” or “0” which appear in the data multiple times… it just returns #VALUE!
However if I continue typing after “The” until it becomes a unique string it works… for that single piece of data alone. It is very frustrating.
Jeff,
Extracting data from big pieces of HTML might be a task better served by using Regular Expressions inside some type of scripting language, such as Python.
Jeff,
You need Python, and Beautiful Soup, and probably some love from the StackExchange community!
Very helpful information! Made task easy to accomplish!
Thank you very much!
Regards,
Maria P.
Nicely explained. Since im just learning excel, here a little question: Could i use LEFT/RIGHT function to do this? From what i understand, these functions require one less argument (i hope this is right, i mostly have my knowledge from this beginners site: http://www.excel-aid.com/the-excel-left-and-right-functions-2.html, i hope the information given there is correct). So i could maybe use FIND with one of these functions, then return the city name plus some other data that is still left (either to the left or the right, depending on the function you use). Then i have something like “NAME: Sean Burnett POS: RP AGE: 30 WT: 200 BORN: Dunedin, FL” which i could use the RIGHT function on in cunjunction with FIND, and that would give me the city name. Of course, this is impractical, and i would have to temporarily store unnecessary data (probably in a hidden column), but i was curious if i get the concept. Sorry for my bad english btw. ๐
Thank You Very Much for the detailed explanation of both the function. It really helped me.
Mary,
It’s possible, though I haven’t tried with this example.
Canada (CA) – Quebec – Bromont – 2 De L’Aeroport Blvd || Bromont PQ CA
Hi Anthony,
I have an above example where I want to extract the Country, State and City from in three seperate columns.
In the above example the First one is Country (Canada (CA)) and Second one after first “-” (Quebec)is State and the third one after second “-” (Bromont) is City
Can you please help me to make a formula to get the result for all three.
Purshottam,
You should be able to do it if you simply apply the techniques in the post. You’ll just have to write three separate formulas — one in each of three cells — to pull in the three values.
Hi Anthony,
How to extract numeric value from below text in excel :
CPG3_ROM_NETALIA_14378
Please help me to make a formula to get the result
please help me to calculate two different times and dates which are in one cell with some other data…how can isolate or seperate them from the others data.i believe that the calculation will be bettrt and easier when the calculation data are isolated.what do you mean,do anyone knows?
hello. what is the program(formula) for finding the third word in a sentence?
thanks Sir
its really very useful and wonderful
thanks this method worked a treat for me – although i’m dealing with messier data so have some more work to do and might need to use a bit of the old manual labor!
Alex,
Understood. I’ve resorted to the brute-force method many a time with data!
Hi Anthony.
With mid and find can you do an “OR” option.
Example- say the weight was either in kgs or lbs and I
want it to find WTKG or WTLB in the in the middle of the string.
NAME: Sean Burnett WTKG: 200 KGBORN: Dunedin, FL
NAME: Tyler Clippard WTLB: 200 LBS BORN: Lexington, KY
NAME: Ross Detwiler WTKG: 174 KG BORN: St. Louis, MO NAME: NAME: Christian Garcia WTLB: 215 BORN: Miami, FL
Jennifer,
Yes, if you nest them. See if this helps:
https://groups.google.com/forum/#!topic/excel-vba-help/C1bavzNYwl4
Anthony-
Thank you. Very helpful.
A4 Residual Hills F1
A5 Aluvial Plain #VALUE!
A6 Hills Residual F1
A7 harvest Hills F1
A8 Plain Water #VALUE!
A9 Structural Hills F1
A10 Land use #VALUE!
A11 Residual Plain #VALUE!
Please solve my problem =IF(FIND(“Hills”,A4),”F1″,IF(FIND(“Plain”,A4),”F2″,IF(FIND(“Land”,A4),”F3″””)))
in this wherever Hills are the that should be f1.
If you want to know more about “Searching for text in Excel”, check this link ……..
http://www.exceltip.com/excel-editing/searching-for-text-in-microsoft-excel.html
Hi,
Thanks for the help on FIND function!
I was glad to see the Nationals make it to the playoffs this year!
I went to the same college that Jordan Zimmerman went to, UWSP, and the small town he is from is about 20 minutes from Stevens
Point! I actually dated a gal from Auberndale! I was very proud
to see him get a no-hitter on the last day of the regular season!
Made Point prouder I am sure!
Service Request/Sales Order Handling/201295/4601491 QUART_POBLET_B__CARCER 201407018338-S1 SWAP/Ordering/Normal
Hi I want to extract 201407018338-S1 from the above string
It’s really helpful…
exactly what i want
Really good information for every student.
Thank you very much!!!
Finally somebody took the time to make an example easy to understand.
Hey,
I have data in following format
3):XYZ #2(3):224 #3(7):MIAMI #4(3):IND #5(15):14301-8909493-3 #6(1):8
I want extract XYZ in column 1 and 224 in Column 2 and 14301-8909493-3 in column 3 and 8 in column 4
Please help me on this.
Hey Anthony,
You have shared really useful tips about text extract in excel. It is very very useful for me as I have just started to learn excel.
Thanks
Aarti.
Hey Aarti,
Its useful for every one not only for you. Thanks Mr. Anthony to share this command. Please share more useful commands about Excel which helps to all blog readers.
Regards
Rajesh
Hi, I want to extract the Town and the State from the following, I’m trying to use a “,” as a separator but it doesn’t work. Any ideas how to do it? Thanks
355 Country Club Drive, Downingtown, PA 19335
“=MID(C4,FIND(“,”,C4)+1,11)
Mary,
There are probably a couple of ways to do this, and this is where Excel formulas start to get complicated. But still:
To get the town, try:
To get the state, use:
It’s worth taking some time, especially with the first formula, to understand the logic. We’re basically looking for the first comma, and then calculating the number of characters to extract using a nested find. Crazy, I know.
Thank you for taking the time to answer this! Great source!
Elaine
Sir – within one cell I have up to four pieces of information. Which looks like
Address of Installed: 23.Bridgeway.Ave
Correct Address : 24.Bridgeway.Ave
Address of Installed: 25.Bridgeway.Ave
Correct Address : 26.Bridgeway.Ave
Sometimes the cell has one of these sometimes it has 5. How do I extract Installed addresses into one cell, and the correct addresses into another cell without using VBA?
Thank you,
Joe
I want to extract the check number from this string of information (This is generic info that shows you what I’m looking at):
ORIG CO NAME:BCBS ILLINOIS ORIG ID:99999999 DESC DATE: CO ENTRY DESCR:HCCLAIMPMTSEC:CCD TRACE#:0101010101010 EED:199999 IND ID:C15272E54732230 IND NAME:CP20150929E547322300-1 TRN*1*C162547E85472617*1357954792*CP 20151104E224821515-1962644807\ PAYABLE TRN: 9574575158FB
The information that I need is after the TRN*1* until the next *. The C162547E85472617. The check numbers are not always the same length. All of the formulas that I have tried are not giving me what I’m looking for. Any suggestions?
Lots of help! Thanks! ๐
Hello, Iโm trying to build a formula that will extract all of the 5 digit zip codes that appear randomly in a test string from one cell. Hereโs an example of a test string: delete 96360 , 96361 x 1 , add 96361 x 2 can only have 1. The text strings are completely free text with little or no constants other than the length of the values Iโm trying to extract. If the formula would just extract each of the 5 digit zips, and return each with a space in between, that would be ideal. So the formula result for the above would be: 96360 96361 or 96360 96361 96361? -Thanks, Ron M.
Thank you man! Exactly what i was looking for! Keep contributing!
Thank you for posting this. Exactly what I needed.
Hi I need to extract digit(69111) after count: and before },{value here is my format .
[{“value”:”false”,”count”:98707},{“value”:”NULL”,”count”:69111},{“value”:”true”,”count”:634}]
Regards
Dwarika Mohanty
I am looking this information for a long time and finally got it. Thanks a lot Mr. Anthony to post this valuable knowledge.
This is what i am looking for, but any possibility in Vba macro code?
Hi, I need to extract the number value after “Total:” Which always appears last under 1 to 4 lines
“WIDGET (Amount: 950.00 USD, QUANTITY: 1, PLEASE SELECT: 10 x 20 LARGE)
EXTRA (Amount: 400.00 USD, PLEASE SELECT: 10 x 10 EXTRA, Quantity: 1)
POWER (120 outlet Free) (Amount: 200.00 USD, Quantity: 1,
PLEASE SELECT : OPTION ONE)
Total: 1,550.00”
Thanks Andy
How would you extract a number at the very end of a multi line text/number cell
Thank you so much for this!
Lets say you want to extract “or” from a string but it also contains words like for which you want to exclude, how would you do it?
No I worked it out use ” or”
This was perfect, thank you so much!!!
thanks
Anthony
Hi Anthony,
I need some help organizing some excel data. I need to associate the multiple items to a single reference number and list them in one raw. Each item has a unique number and a few of these items can be linked to a single Reference number. Here is an example of what my spreadsheet looks like.
So, I need to be able to list the Ref number only once and list in one raw all items with the same category one after the other, at the same time duplicate the Ref number for each new category. This is an excel spread sheet with 6500 records which will continue to grow exponentially. I’ve tried using the VLookup, HLookup, index and match and all I’ve manage to get to this point is a massive headache. I’ve also recreated it in Access but I have not used it for about 10 years so I can’t remember how to do it there either. Help!!!
Anthony Sir,
Thank you very much.
Your post was really helpful to solve my problem.
God Bless all.
Best post on the subject matter I’ve come upon. Many other posts tell you to make sure your text is formatted a particular way. Your information (sir) shows us how to do it with the text as is. You are the master!
Thank you very much!
Ha! Glad to help, Della.
This was a huge help on a project, saved me at least 2 days.
Thank you for posting and examples!
Ellen,
Glad it helped!
Hi Anthony:
I try to extract the all group of words such as Grad Candidate: Spring 2017 by using if statement. If it is true then extract Grad Candidate: Spring 2017, if is false then extract Freshmen Cohort Fall 2012 from a cell content as the following.
Adv Comm: Mathematics & Statistics, Advising Group – Senior w90, Freshmen Cohort Fall 2012, Grad Candidate: Spring 2017, Program: Undergraduate Degree.
Thanks in advance for your advise
Regards
Vivian
Just saved me hours of work. Thank you sir!