• Announcements

    • Daphne

      Your Wishlist!   12/13/2017

      htmlentities(

      'Tis the season! Let the PDFelement team know what features and tools are on your wishlist by submitting your idea or voting for others here.

      )
    • Daphne

      New Year Gift!   12/25/2017

      htmlentities(

      We have prepared the gifts as the thanks to you, for more details, please click here . 

       

      )
South Side Rob

Create my own Excel Conversion Template

9 posts in this topic

Is there a way I can create my own template to convert my PDF's to Excel? The automated steps available does not work so well. It puts data that should be on the same row on different rows. It combines multiple pieces of data and places them in the same cell etc. I know how to use the Data Extraction piece which is nice but I'd like to be able to create a template instead if possible. Please let me know thanks.  

Share this post


Link to post
Share on other sites

Hi,

 

Do you convert all the pages into one excel sheet? For the case, please try to convert each page to a separate excel sheet for a try again. After opening your PDF file in PDFelement, you can click the File>Preferences button, in the Convert>Excel tab, please choose the option "Each page into a single excel sheet". Then convert your PDF file again to check whether the result is better.

图像 7.png

 

In case the result is still not good, could you please attach your PDF file here to send to us for further tests?

 

Thanks

Share this post


Link to post
Share on other sites

Sure. After reading as much of the manual as I could, my PDF files are already populated so I will not be creating my own PDF files. I'm trying to extract the data from them into an Excel Workbook. The closest method that works is using the PDF converter rather than PDF element, however, I still have data that "runs into each other" instead of values in their own cells. I tried using the Form > Data Extraction task on PDF Element, but my problem is that each PDF file has about 5-15 pages of variable rows. Attached are two examples. 1 has 15 pages and 1 has 9 pages.  When you have the two PDF's open, you will notice that some races have, say 7 starters (1 row of data for each horse) and the other PDF for page 1 which is also race 1 for that track is only running 6 starters. None of my PDF's are fixed (row-wise) but the fields across are. Thank you in advance for your help...

DED-2018-0105.PDF

GG-2018-0105.PDF

Share this post


Link to post
Share on other sites

Hi South Side Rob,

 

We are sorry for the converting issue. For the case, you can use the data extraction function to extract the data to a .csv format file. Could you please confirm what's wrong with this data extraction function? Can it extract data into each cell? With more details, we will provide further instructions to help.

 

Thanks

Share this post


Link to post
Share on other sites

The data extraction piece works but the problem I have is that each page and each file has a variable set of rows to extract. As an example of the two files I attached, the first race on file DED-2018-0105.PDF, (which is on page one) has 7 rows of data I need to extract (one row for each horse). On page two of the same file, there are 8 rows of data I need to extract. Knowing how many rows of data extraction I need from file to file and from page to page is almost never the same. For this reason, I don't think the data extraction piece can be effective... One work-around I've considered, is saving each race as its own file and having a different data extraction file for each race based on the number of rows that need to be extracted but splitting out these PDF files will be too time-consuming. Right now, I've been using the PDF Converter by Wondershare and writing custom VBA scripts where it is checking each cell to see if multiple values landed in the same cell and splitting them out.

Share this post


Link to post
Share on other sites

Also, the data extraction function would be great if my PDF's had fixed locations. My PDF's are broken into let's say 4 sections. The first section is the race header which is somewhat fixed in its location but can sometimes include more rows for races that have many additional payout rows. The second section is the pace line rows for each horse that ran in the race. This can be from 4 rows to 20 rows. The columns are fixed but the rows are variable and the PDF does not reserve blank rows when there are fewer the 20 pace line rows. The third section are the top times ran by the horses in the past. Most times, this is only 10 rows (the columns are fixed) but sometimes, there may be less rows for horses who are just starting out. The last section if the Jockey/Trainer records where I receive one row for each horse that ran in the race. Same as the pace lines, there can be as few as 4 rows and as many as 20. Large races (Races with 10 or more horses), this data cannot fit on one page, thus, a 2nd page is needed to provide all the information. In summary, these PDF's have fixed positioning for the columns but again, have a variable amount of rows and that is where I think the data extraction piece would fall short. As far as I can tell, the data extraction expects the same amount of columns and the same amount of rows...

Share this post


Link to post
Share on other sites

Hi South Side Rob,

 

Thanks for your further details. Actually, the data extraction function works for the areas that you draw, so it does not need the same amount of columns and rows. After opening  your PDF file in our program, click the Form>Data Extraction button, then choose the option of "Extract data from marked PDF", then you can start to use your mouse to draw the areas that you want to extract data. All the areas that you marked will be extracted. Here are more details about this function for your reference:

https://pdf.wondershare.com/pdfelement/user-guide.html#extract_data

 

Hope it helps.

Share this post


Link to post
Share on other sites

Daphne, something tells me you did not open the PDF files in question. Just one of the PDF files has just over 10,000 data points. If I custom draw all 10,000 data points, there is no guarantee that the next file will have 10,000 data points. It might only have 8,000 or even 12,000. Again, this has to do with having variable rows in variable sections on each PDF page, which, also will be a variable amount. 

Share this post


Link to post
Share on other sites
9 hours ago, South Side Rob said:

Daphne, something tells me you did not open the PDF files in question. Just one of the PDF files has just over 10,000 data points. If I custom draw all 10,000 data points, there is no guarantee that the next file will have 10,000 data points. It might only have 8,000 or even 12,000. Again, this has to do with having variable rows in variable sections on each PDF page, which, also will be a variable amount. 

 

Hey South Side Rob, 

 

Thank you for such detailed explanations and trouble shooting. I have taken a look at your sheets and yes, there is a lot of data on every page. I think the program is having trouble reading the original and converting it into an excel file that looks exactly the same as the original because the amount of boxes vary row by row and the columns overall don't necessarily line up every time. To address your original question about templates: as you pointed out in the following posts, it would be hard to make a template for something that is bound to change from race to race. 

 

Was this chart originally created from excel? And what are you planning on doing with these documents? Perhaps we can find a solution from a different angle to suit your needs.

 

Best,

Rebecca

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now