Extract text from txt file and insert it in CSV column

hy9fesh · June 30, 2020, 10:36pm

Say I have a folder full of .txt files. Within each text file, there is a line that says the following:

Name: [name here]

So for example, the first three text files could contain Bob Smith, Joe Snow, or Mary Fields in the “Name” field (in the place of [name here]).

I’d like to extract the names into a CSV file that looks like this:

file_name, name
1.txt, Bob Smith
2.txt, Joe Snow
3.txt, Mary Fields

Further, I’d like to create a column called “text” that contains all the contents within each .txt file (e.g., in the text column for 1.txt, it will say “Name: Bob Smith”) . This is the JS solution that I’ve tried:

const fs = require('fs');

const files = fs.readdirSync('./').filter((file) => /.txt$/.test(file));
if (!files.length) process.exit(1);

const text = fs.readFileSync('./');
if (!files.length) process.exit(1);

fs.writeFileSync('names.csv', 'file_name, name\n, text');

files.forEach((file) => {
  const match = fs.readFileSync(file, { encoding: 'utf8' }).match(/^(.*?)[,\/] Name:/mi);
  if (match && match[1]) {
    fs.appendFileSync('names.csv', `${file}, ${match[1]}\n, ${text}`);
  }
});

Catalactics · July 1, 2020, 5:59pm

Hey there @hy9fesh ,

For this problem, I don’t know how to do this using Python, but I do know how to do this using JavaScript with Node.js. Using Nodejs’ filesystem module that will read and write any file and JavaScripts functions that will turn it into a Comma Seperated Text and then use the filesystem again to make a .csv file. If you’re interested, you can ask more with me.

hy9fesh · July 1, 2020, 9:27pm

I would be open to a JS solution as well!

snigo · July 2, 2020, 8:37pm

NodeJS solution:

const fs = require('fs');

const files = fs.readdirSync('./').filter((file) => /.txt$/.test(file));
if (!files.length) process.exit(1);

fs.writeFileSync('names.csv', 'file_name, name\n');

files.forEach((file) => {
  const match = fs.readFileSync(file, { encoding: 'utf8' }).match(/Name:\s*(\b.+?)\s*$/);
  if (match && match[1]) {
    fs.appendFileSync('names.csv', `${file}, ${match[1]}\n`);
  }
});

Have fun!

hy9fesh · August 6, 2020, 6:43pm

It seems like when my file contains “Name:” on the first line, the code doesn’t output it into the CSV. Any idea why this could be happening?

hy9fesh · August 6, 2020, 7:18pm

a.txt does NOT print:

Name: John
abc

b.txt does print:

dcf

Name: Smith

snigo · August 6, 2020, 8:43pm

You never said there will be multiline txt files
Try adding m flag to the regex inside .match() method

hy9fesh · August 6, 2020, 11:32pm

Here’s the answer:

const fs = require('fs');

const files = fs.readdirSync('./').filter((file) => /.txt$/.test(file));
if (!files.length) process.exit(1);

fs.writeFileSync('names.csv', 'file_name, name\n');

files.forEach((file) => {
  const match = fs.readFileSync(file, { encoding: 'utf8' }).match(/Name:\s*(\b.+?)\s*$/m);
  if (match && match[1]) {
    fs.appendFileSync('names.csv', `${file}, ${match[1]}\n`);
  }
});