I am using the terrific Python Requests library. I notice that the fine documentation has many examples of how to do something without explaining the why. For instance, both
r.text and r.content are shown as examples of how to get the server response. But where is it explained what these properties do? For instance, when would I choose one over the other? I see thar r.text returns a unicode object sometimes, and I suppose that there would be a difference for a non-text response. But where is all this documented? Note that the linked document does state:
You can also access the response body as bytes, for non-text requests:
But then it goes on to show an example of a text response! I can only suppose that the quote above means to say
non-text responses instead of non-text requests , as a non-text request does not make sense in HTTP.
In short, where is the proper documentation of the library, as opposed to the (excellent) tutorial on the Python Requests site?
![]()
15.5k2222 gold badges106106 silver badges158158 bronze badges
2 Answers![]()
The developer interface has more details:
r.text is the content of the response in Unicode, and r.content is the content of the response in bytes.
2,50233 gold badges1111 silver badges3131 bronze badges
Gary KerrGary Kerr
8,63622 gold badges3535 silver badges4444 bronze badges
Skyrim how to level restoration. It seems clear from the documentation is that r.content
If you read further down the page it addresses for example an image file
PyNEwbiePyNEwbie
3,24022 gold badges3030 silver badges7171 bronze badges
Not the answer you're looking for? Browse other questions tagged pythonpython-requests or ask your own question.
Using Powershell to Strip Content from PDF While Keeping PDF Format.
My Task:I have been attempting to perform what would be a simple task if the documents were not in PDF format. I have a bunch of PDFs that have unwanted data before the bulk of usable data starts, this is anything that comes before ‘%PDF’ in the documents. A script that pulls all the desired data and exports it to a new file was needed. That part was super easy.
The Problem:The data that is exported appears to be formatted correctly, except it doesn’t open as a PDF anymore. I can open it in Notepad++ and it looks identical to one that was clean manually and works. Examining the raw code of the Powershell altered PDF it appears that the ‘lines’ are much shorter than they should be.
I understand the PDF format doesn't really use lines, so that might be where the problem is being created. Either when the data is being initially put into an array, or when it’s being written the PDF format is probably being broken. Is there a way to retain the format of the PDF while it is modified and then saved? It’s probably the case that I’m missing something simple.
KVBKVB
1 Answer
So I was about to start looking at iTextSharp and decided to give an older language a try first, Winbatch. (bleh!) I almost made a screen scraper to do the work but the shame of taking that route got the better of me. So, the function library was the next stop.
This is just a little blurb I spit out with no error checking or logging going on at this point. All that will be added in along with file searches later. All in all it manages to clear all the unwanted extras in the PDF but keeping the exact format that is required by PDFs.
Now that I have an idea how this works, making a tool to do this in PS sounds more doable. There's a PS function out there in the wild called Get-HexDump that might be a good base to educate myself on bits and hex in PS. Since this works in Winbatch I assume there is some sort of equivalent in AutoIt and it could be reproduced in most basic languages.
There appears to be a lot of people out there trying to clear crud from before the header and after the end of their PDF docos, Hopefully this helps, I've got a half mill to hit with whatever script I morph this into. I might update with a PS version if I decide to go that route again, and if I remember.
KVBKVB
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Not the answer you're looking for? Browse other questions tagged powershellpdffile-io or ask your own question.
I have a PDF full of quotes:
I can extract the text in python using the following code:
This returns all the quotes as one paragraph. Is it possible to 'split' the pdf by the horizontal separator and split it into quotes that way?
user7692855user7692855
![]() 2 Answers
If you want to just extract the quotes from the pdf text you can use
regex to find all the quotes.
or just
bhansabhansa
4,97222 gold badges1515 silver badges3838 bronze badges
i could not find a way to split it by the horizontal separator, but i managed to do it in another way:
LiamLiam
What Is Active Content
2,72522 gold badges2020 silver badges3939 bronze badges
Not the answer you're looking for? Browse other questions tagged pythonpdfpypdf2 or ask your own question.Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |