mirror of
				https://github.com/johnwhitington/cpdf-source.git
				synced 2025-06-05 22:09:39 +02:00 
			
		
		
		
	more
This commit is contained in:
		
							
								
								
									
										18
									
								
								cpdfjson.ml
									
									
									
									
									
								
							
							
						
						
									
										18
									
								
								cpdfjson.ml
									
									
									
									
									
								
							| @@ -31,13 +31,21 @@ Objects 1..n: The PDF's objects. | |||||||
|     In original (utf8=false) mode, the bytes of the string in PDF representation |     In original (utf8=false) mode, the bytes of the string in PDF representation | ||||||
|     are converted into UTF8, rather than the string itself being converted. In |     are converted into UTF8, rather than the string itself being converted. In | ||||||
|     UTF8 mode (utf8=true), instead: |     UTF8 mode (utf8=true), instead: | ||||||
|       - If a String contains only PDFDocEncoding characters, is is converted |       1 If a String contains only PDFDocEncoding characters, is is converted | ||||||
|  |         to UTF8, and stored as {"U" : "..."}. | ||||||
|  |       2 If a String has a BOM and successfully converts to UTF8, it is converted | ||||||
|         to UTF8, and stored as {"U" : "..."} |         to UTF8, and stored as {"U" : "..."} | ||||||
|       - If a String has a BOM and successfully converts to UTF8, it is converted |       3 If a String has a BOM but fails to convert, or has no BOM, it is stored | ||||||
|         to UTF8, and stored as {"V" : "..."} |  | ||||||
|       - If a String has a BOM but fails to convert, or has no BOM, it is stored |  | ||||||
|         in original mode, as an unmarked string. |         in original mode, as an unmarked string. | ||||||
|     In all cases, this process is still reversible. |     In all cases, this process is still reversible: | ||||||
|  |       1. We try to convert back from UTF8 to PDFDocEncoding - this will always work | ||||||
|  |          on an unchanged string. If the string has changed, and we cannot convert to | ||||||
|  |          PDFDocEncoding, we convert back to UTF16 with a BOM. | ||||||
|  |       2. Same as (1) - if unaltered, will be UTF16, if altered, could be PDFDocEncoding | ||||||
|  |          or UTF16 | ||||||
|  |       3. As in non-UTF-mode, reversible as we know. | ||||||
|  |     We need to mark strings as {"U" : ...} or not to preseve the distinction between | ||||||
|  |     PDFDocEncoding / UTF16BE on the one hand, and byte strings on the other. | ||||||
|  |  | ||||||
| There are two subformats: parsing content streams or not.  Hello World in CPDF | There are two subformats: parsing content streams or not.  Hello World in CPDF | ||||||
| JSON without parsing content streams: | JSON without parsing content streams: | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user