Convert long string with a pattern into array of Objects

I have a long function, which takes a long string (an example is in the end) without any dots or comas, makes a 1D array and then 2D array. Then from 2D array, I create an array of objects and then return it as the components in React application.

However I’m facing the following issues, and currently do not know how to overcome them:

  • when I filter with countries regex, I replace them, how to avoid it and have the endless line split
  • If you see on the example - in the lists, there are Team rows. I didn’t figure out how to exclude them when I’m mapping and pushing into the array.
arr:[...e]

I was trying to do the for loop to exclude specifically, but it didn’t work as I cannot assign a function to a key.

I sincerely ask for help, as I stuck.

const TeamList =(props)=>{

  let regex = /Team/gi


  let newExample = props.argument.replace(regex, '\enterTeam').split('\enter')

  let regx = /Ukraine|Grenada|Denmark|Switzerland|Pakistan|USA|India|Canada|Colombia|Venezuela|Brazil|United States/g
  let array2 = newExample.map(element => {
  return element.toString().replace(regx,'asd').split('asd')});

  let array3=[];
  array2.map((e,i)=>{ array3.push({title: e[0], arr:[...e]}) })


  return (
       <div>
         {array3.map((e,i)=>{return <div style={{width: '50vw', margin:'3vw auto 1vw auto'}} ><h3>{e.title}</h3> <ul>{e.arr.map((el,ind)=>{return <li>{el}</li>})}</ul></div>})}
       </div>

    )

}

Had an idea to

  • split the string on 2nd occurrence of coma, so I will have this as array1 = [‘team 20, company x’ , ‘Ukraine team members’]

  • Split the value at the second index of array1 based on the first occurrence of space which will end up as array2 = [‘Ukraine’, ‘team members’];

  • In my component then I will have: title: First index of array 1 + 1st index of array 2 description: 2nd index of array 2

But the question here is that I cannot split on the index of the second coma. A Separator can be regex or a word. If we slice - then I have to make the whole new function inside the function, which is not good I guess?

So can anyone help me with that, please?

let example =‘Team 51, Client Company: Calgary International Airport, Canada Heather Glenn, Florida Atlantic University, Instructor: Daniel Rottig, USA Karen Cerquera Molina, Universidad EAN, Instructor: Juan Manuel Gil, Colombia Maeva Magmui, Massey University, Instructor: Dirk Boehe, Venezuela Marcos Soares, Federal University of Parana, Instructor: Germano Glufke Reis, Brazil Maxwell Darko Addo, Kwame Nkrumah University of Science and Technology, Instructor: Samuel Yaw Akomea, Ghana Team 57, Client Company: OGG, Brazil Alexa Ralston, Nazareth College, Instructor: Jennifer SA Leigh, USA Joao Victor Poffo Oliveira, Universidade Regional de Blumenau, Instructor: Germano Adolfo Gehrke, Brazil Juan Felipe Acosta Munoz, EAFIT, Instructor: Daniel Gomez, Colombia Tyler Broughton-Ambrose, Johnson & Wales University, Instructor: Leilani Baumanis United States Team 112, Client Company: Lady Bay, Australia Ashleigh John, St. George’s University, Instructor: Reccia Charles, Grenada Cynthia Rivas, Georgia Gwinnett College, Instructor: Luis Torres, USA Morena T. Matshego, University of Botswana, Instructor: Tendy Matenge, Botswana Sara Maria Chinchilla Echeverri, EAFIT, Instructor: Daniela Acosta, Colombia Sara Yusty Salazar, EAFIT, Instructor: Carolina Garcia, Colombia’

With the caveat that a. This shouldn’t be done frontend and b. Should use a generator to stream the text & avoid many multiple passes, using split/regex is easier to explain:

Set up an array (output array)

  1. Keep them grouped in teams (I assume you’re scraping (???) so they’re in different HTML elements anyway).
  2. First function should process a team and return an array of team members as objects.
  3. In the function, set up a set of variables for this group. Should be whatever you want from the title (so like team, country etc) + an array that you’re going to push the team members to.
  4. Split team on newlines
  5. Take the first element, this will always contain the title information
  6. Second function (runs inside first function) processes a title, returns the information you want. I’ll not describe how to do this, if the format of the titles is identical each time it is trivial.
  7. Run that function and assign to the variables you set up.
  8. Third function (also runs inside
    function one) processess the remaining array of strings (which remember is the whole string, split on newlines, with the first element removed)
  9. Split on commas. You should be able to try const [a, b,c] = line.split(","), but you’ll have problems if the data isn’t consistent. In that case you’ll probably need to split on first comma, check what the second element starts with, split again if you can then tell there’s going to be a third element. You need to use a placeholders if the team member is missing info anyway.
  10. Once you have that three element array, map it to process each element, adding in the team info grabbed from before if you need that
  11. Take the return value of function three, turn that into an object and push it to the output array.
  12. Run function one for each team, and concat the results onto the final output array

Hi, @DanCouper thanks for the long reply. Will try to understand. But no, I’m not scrapping. I’m rebuilding old website and do not have access to the admin panel and Wordpress.

The issue is that I have 5 years, each year has 4 tables and each table contains 26-30 teams.

I know it’s bad approach, but currently, I’m uploading those 30 teams as a single string to a Firebase server.

If I can parse from the original website and put it right away to a good object - can you help me with that in terms of recommendations? What soft to use or how to write it?

Ah, this changes things: if it’s a table, then it’s already done for you?? Just loop over the rows and columns

@DanCouper
No it’s not done. You see example above in the first message? That’s how it’s recorded into autobase. I would love to do it quickly and more organize, but do not understand how.

You recommended scrapping - can you advise some tools?
https://x-culture.org/2019-2-winners/ - original website.

I got your message and sorry for the late reply Tyroni. I’m way past my bed time and was planning to go to sleep when I saw it. Told myself I was gonna check it tomorrow but then your problem got me intrigued.

Have you thought about turning the data into JSON? If you know how to use a web scraper, you can find markers like <br> <p> Instructor: etc. and find what goes before or after it to extract your info. I learned how to use Cheerio from Brad Traversy on YouTube for a discord bot. It’s pretty easy to learn. Otherwise, you can just copy paste the html parts with all the teams into your IDE - use find and replace. For example, find Instructor: and replace with "Instructor": ". It’ll be a bit tedious but it can be done. You can take it 5 teams at a time if you wanted. For my barista app, I actually had to input all of those into MongoDB since they were all on paper. :sweat:

1 Like

Cara hi, thank you for the reply and thanks for the suggestions! Damn, if you put everything manually - it’s an incredible job!

I will look at the proposed scraper and try it definitely.! :slight_smile:

1 Like

Web scraper is pretty cool! It’ll really help with creating your arrays. I really recommend turning it into JSON data, however. So you can import it to Firebase and do more stuff with it in the future. :grin: If this is just a temporary project, I understand if it’s not necessary.

Good luck! Lmk if you have more questions. :stuck_out_tongue_winking_eye:

1 Like

@CaraLagumen @DanCouper - thank you very much guys.
For the whole week I was thinking and trying to solve the problem as a damn String. Then facing 50% failure and your thoughts about scrappers, object I decided to turn my brains on and change approach and the angle of view.

What I did:
went to https://snipp.ru/tools/text-array - paste the plain text from the web here, delete empty rows. Now I have an array.

Then created the function, which creates array of objects. And problem solved :slight_smile:
I will put that function into upload logic, so my DB will have already an array of objects and I will just render them correctly.

Thank you very much! :slight_smile:

// let list =[ // copy the array of text like on the image 1]
let object=[]

let counter = 0;
list.map(element => {
 
  if (element.includes('Team')){
    object.push({"title": element})
    counter+=1 
  } else {
  if (object[counter-1].members === undefined){
    object[counter-1].members = []
    object[counter-1].members.push(element)
  }
  else{
     object[counter-1].members.push(element)
  }
  }
})

console.log(object)

And the result is image 2! :slight_smile:

2 Likes

Hey, great job! I’m glad you were able to solve your problem!

Sorry I didn’t follow up on this – I didn’t mean that because it was in a table it was completely done, just that it was much easier to pull values out as it was already in a defined data structure – just got distracted by work and didn’t have time to get an explanation up. I’m glad you’ve got it working though!