lmi-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi-commits] [lmi] master 7d88df5 2/3: Do not let files end in backslas


From: Greg Chicares
Subject: [lmi-commits] [lmi] master 7d88df5 2/3: Do not let files end in backslash-newline
Date: Tue, 19 May 2020 11:41:47 -0400 (EDT)

branch: master
commit 7d88df5a637e856185cbb8fd01f617853beb17fe
Author: Gregory W. Chicares <address@hidden>
Commit: Gregory W. Chicares <address@hidden>

    Do not let files end in backslash-newline
    
    The 'read' documentation for POSIX (IEEE 1003.1-2017) says:
    
    | Although the standard input is required to be a text file, and
    | therefore will always end with a <newline> (unless it is an empty
    | file), the processing of continuation lines when the -r option is
    | not used can result in the input not ending with a <newline>.
    | This occurs if the last line of the input file ends with a
    | <backslash> <newline>.
    
    It seems best to follow the C rules and disallow <backslash> <newline>
    at the end of any file.
---
 objects.make              |  2 ++
 test_coding_rules.cpp     | 17 ++++++++++++-----
 test_coding_rules_test.sh | 11 +++++++++++
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/objects.make b/objects.make
index 1f58504..0b7e613 100644
--- a/objects.make
+++ b/objects.make
@@ -1208,3 +1208,5 @@ product_files$(EXEEXT): \
   my_rnd.o \
   my_tier.o \
   liblmi$(SHREXT) \
+
+# This file does not end in backslash-newline.
diff --git a/test_coding_rules.cpp b/test_coding_rules.cpp
index e851611..a300e8e 100644
--- a/test_coding_rules.cpp
+++ b/test_coding_rules.cpp
@@ -26,6 +26,7 @@
 #include "istream_to_string.hpp"
 #include "main_common.hpp"
 #include "miscellany.hpp"               // begins_with(), split_into_lines()
+#include "ssize_lmi.hpp"
 
 #include <boost/filesystem/convenience.hpp> // fs::extension()
 #include <boost/filesystem/fstream.hpp>
@@ -117,8 +118,9 @@ class file final
 /// an exception for empty files, but there's no reason for lmi to
 /// have any.
 ///
-/// Add a '\n' sentry at the beginning of the string for the reason
-/// explained in 'regex_test.cpp'.
+/// Add a newline at the beginning of the string, and require
+/// a newline at the end, so that "\n" can be used in regexen
+/// instead of '^' and '$' anchors--see 'regex_test.cpp'.
 ///
 /// Files
 ///   ChangeLog-2004-and-prior *.txt *.xpm
@@ -223,10 +225,15 @@ file::file(std::string const& file_path)
         }
 
     data_ = '\n' + data();
-    // The '\n' sentinel just added makes back() safe for 0-byte files:
-    if('\n' != data().back())
+
+    int const datasize = lmi::ssize(data());
+    if(1 <= datasize && '\n' != data_[datasize - 1])
+        {
+        throw std::runtime_error(R"(File does not end in newline.)");
+        }
+    if(2 <= datasize && '\\' == data_[datasize - 2])
         {
-        throw std::runtime_error(R"(File does not end in '\n'.)");
+        throw std::runtime_error(R"(File ends in backslash-newline.)");
         }
 }
 
diff --git a/test_coding_rules_test.sh b/test_coding_rules_test.sh
index 79c50f5..8b7c8b4 100755
--- a/test_coding_rules_test.sh
+++ b/test_coding_rules_test.sh
@@ -48,6 +48,13 @@ cat >eraseme_000 <<EOF
 $boilerplate
 EOF
 
+touch eraseme_0_bytes.touchstone
+
+printf '\n'   >eraseme_1_byte_good.touchstone
+printf ' '    >eraseme_1_byte_bad.touchstone
+printf 'z\n'  >eraseme_2_bytes_good.touchstone
+printf '\\\n' >eraseme_2_bytes_bad.touchstone
+
 # Files in general: copyright.
 
 cat >eraseme_copyright_000 <<EOF
@@ -389,6 +396,8 @@ Exception--file 'a_nonexistent_file': File not found.
 File 'an_expungible_file.bak' ignored as being expungible.
 Exception--file 'an_unexpected_file': File is unexpectedly uncategorizable.
 Exception--file 'another.unexpected.file': File is unexpectedly 
uncategorizable.
+Exception--file 'eraseme_1_byte_bad.touchstone': File does not end in newline.
+Exception--file 'eraseme_2_bytes_bad.touchstone': File ends in 
backslash-newline.
 File 'eraseme_copyright_001' lacks current copyright.
 File 'eraseme_copyright_001' breaks taboo '\(c\) *[0-9]'.
 File 'eraseme_copyright_003.html' lacks current copyright.
@@ -451,3 +460,5 @@ diff --unified=0 expected_eraseme observed_eraseme && rm 
--force \
   another.unexpected.file \
   eraseme* \
   ./*eraseme \
+
+# This file does not end in backslash-newline.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]